OpenVINO Model Server

Welcome to the API server. This server provides OpenAPI compatible API endpoints under the /api/v1 path.

API Endpoints

Base URL: https://ovms.wmcloud.org/api/v1

Authentication:Required for all /api endpoints

Header: Authorization: Bearer YOUR_API_KEY

Example: Chat

from openai import OpenAI

client = OpenAI(
    base_url="https://ovms.wmcloud.org/api/v1",
    api_key="sk-yourkeygoeshere"
)

stream = client.chat.completions.create(
    model="OpenVINO/Qwen3-8B-int4-ov",
    messages=[{"role": "user", "content": "Say this is a test"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

Example: Embedding

from openai import OpenAI
import numpy as np

client = OpenAI(
  base_url="https://ovms.wmcloud.org/api/v1",
  api_key="sk-yourkeygoeshere"
)
model = "OpenVINO/bge-base-en-v1.5-fp16-ov"
embedding_responses = client.embeddings.create(
    input=[
        "That is a happy person",
        "That is a happy very person"
    ],
    model=model,
)
embedding_from_string1 = np.array(embedding_responses.data[0].embedding)
embedding_from_string2 = np.array(embedding_responses.data[1].embedding)
cos_sim = np.dot(embedding_from_string1, embedding_from_string2)/(np.linalg.norm(embedding_from_string1)*np.linalg.norm(embedding_from_string2))
print("Similarity score as cos_sim", cos_sim)

Available models

API for models listing : https://ovms.wmcloud.org/api/v1/models

For API Key, please contact sthottingal at wikimedia.org