Streaming
Streaming lets you receive the model's response as it is generated, rather than waiting for the full completion. This dramatically improves the user experience, especially for long responses, since text starts appearing almost instantly.
How to enable streaming
Add the stream: true parameter to your request body for /v1/chat/completions. Instead of a single JSON response, the server will send a series of events in Server-Sent Events (SSE) format.
{
"model": "mira",
"stream": true,
"messages": [
{ "role": "user", "content": "Tell me a story about space" }
]
}Server-Sent Events format
When streaming, the server sends the response as a sequence of lines, each prefixed with data: and containing a JSON object. The final event is data: [DONE], indicating the end of the stream.
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711000000,"model":"mira","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711000000,"model":"mira","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711000000,"model":"mira","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711000000,"model":"mira","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711000000,"model":"mira","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Stream event types
Each chunk in the stream contains a delta object inside choices[0]. Here is what delta can contain:
- delta.role — Appears in the first chunk, indicates the role ("assistant").
- delta.content — A fragment of the response text. Concatenate all fragments to get the full response.
- finish_reason — null during generation, "stop" on natural completion, "length" when token limit is reached.
Implementation examples
cURL
curl https://api.vmira.ai/v1/chat/completions \
-H "Authorization: Bearer sk-mira-YOUR_API_KEY" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "mira",
"stream": true,
"messages": [
{ "role": "user", "content": "Write a poem about programming" }
]
}'The -N flag disables output buffering in cURL, letting you see chunks as they arrive.
Python
import requests
response = requests.post(
"https://api.vmira.ai/v1/chat/completions",
headers={
"Authorization": "Bearer sk-mira-YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "mira",
"stream": True,
"messages": [
{"role": "user", "content": "Explain the theory of relativity"}
],
},
stream=True, # Enable response streaming in requests
)
for line in response.iter_lines():
if line:
decoded = line.decode("utf-8")
if decoded.startswith("data: ") and decoded != "data: [DONE]":
import json
chunk = json.loads(decoded[6:])
content = chunk["choices"][0]["delta"].get("content", "")
if content:
print(content, end="", flush=True)
print() # Final newlinePython (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="sk-mira-YOUR_API_KEY",
base_url="https://api.vmira.ai/v1",
)
stream = client.chat.completions.create(
model="mira",
messages=[
{"role": "user", "content": "Tell me about quantum computers"}
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print()JavaScript (fetch)
const response = await fetch("https://api.vmira.ai/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer sk-mira-YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "mira",
stream: true,
messages: [
{ role: "user", content: "Hello! Tell me about yourself." },
],
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() || "";
for (const line of lines) {
if (line.startsWith("data: ") && line !== "data: [DONE]") {
const chunk = JSON.parse(line.slice(6));
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content); // Node.js
// Or append to a DOM element in the browser
}
}
}
}
console.log(); // Final newlineJavaScript (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "sk-mira-YOUR_API_KEY",
baseURL: "https://api.vmira.ai/v1",
});
const stream = await client.chat.completions.create({
model: "mira",
messages: [
{ role: "user", content: "Explain recursion" },
],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
console.log();Error handling in streams
If an error occurs during streaming (e.g., rate limit exceeded), the server will send an event with an error field instead of a normal chunk. Always wrap stream processing in try/catch.
When to use streaming
- Chat interfaces — Displaying text as it generates creates a natural conversational feel.
- Long responses — The user sees the start of the response without waiting for the end.
- Progress indicator — Streaming serves as a natural indicator that the model is working.