Chat completions
The chat completions endpoint is the main way to interact with Mira models. It accepts a conversation (a list of messages) and returns the model's next reply. Request and response shapes follow the OpenAI Chat Completions API, so any OpenAI SDK works as a drop-in — only the base URL and API key change.
Endpoint
POST
/v1/chat/completionsCreate a chat completionRequest parameters
ParameterTypeReq.Description
modelstringYesMira model identifier — `mira` or `mira-thinking`. List via GET /v1/models.
messagesarrayYesConversation array. Each message has `role` (system / user / assistant) and `content`.
temperaturefloatNoSampling temperature (0 – 2). Lower is more deterministic; default ~0.7.
max_tokensintegerNoCap on completion tokens. Defaults to the model max.
streambooleanNoStream the response as Server-Sent Events. Defaults to false.
toolsarrayNoFunction-calling schema. See Tool use.
tool_choicestring|objectNoForce a specific tool, auto, or none. Mirrors OpenAI's shape.
Message format
- system — sets the model's behavior. Conventionally first; optional.
- user — the prompt the model should respond to.
- assistant — prior model responses — include them when continuing a conversation.
Example request
bash
curl https://api.vmira.ai/v1/chat/completions \
-H "Authorization: Bearer $MIRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mira",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain recursion in simple terms"}
],
"max_tokens": 1024,
"temperature": 0.7
}'python
from openai import OpenAI
client = OpenAI(
api_key="sk-mira-...",
base_url="https://api.vmira.ai/v1",
)
resp = client.chat.completions.create(
model="mira",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain recursion in simple terms"},
],
max_tokens=1024,
temperature=0.7,
)
print(resp.choices[0].message.content)typescript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.MIRA_API_KEY,
baseURL: "https://api.vmira.ai/v1",
});
const resp = await client.chat.completions.create({
model: "mira",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain recursion in simple terms" },
],
max_tokens: 1024,
temperature: 0.7,
});
console.log(resp.choices[0].message.content);Response shape
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1779800000,
"model": "mira",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Recursion is when a function calls itself..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}Streaming
When stream: true, the response arrives in chunks via Server-Sent Events. Each chunk carries a delta with new content. The stream terminates with a literal data: [DONE] sentinel.
sse
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1779800000,"model":"mira","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1779800000,"model":"mira","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1779800000,"model":"mira","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1779800000,"model":"mira","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]python
from openai import OpenAI
client = OpenAI(api_key="sk-mira-...", base_url="https://api.vmira.ai/v1")
stream = client.chat.completions.create(
model="mira",
messages=[{"role": "user", "content": "Stream me a haiku about TLS."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)