Thinking mode

Thinking mode is a feature of the mira-thinking model that performs an internal "chain of thought" before generating the final response. This approach significantly improves accuracy on tasks requiring logical reasoning, mathematical computation, and multi-step analysis.

How it works

When you send a request to mira-thinking, the model goes through two stages:

Thinking stage — the model generates an internal reasoning trace, breaking the problem into sub-tasks, verifying intermediate results, and exploring alternative approaches
Response stage — based on the reasoning, the model produces a final, edited response incorporating all discovered solutions

Thinking tokens are included in usage.completion_tokens and billed as output tokens. You can see the reasoning trace in the API response via the thinking_content field.

When to use thinking mode

Task typeThinking benefitExample

Mathematical proofsHighProve that sqrt(2) is irrational

Code debuggingHighFind the race condition in multithreaded code

Logic puzzlesHighSolve Einstein's riddle

Multi-step planningHighDesign a microservices architecture

Data analysisMediumIdentify trends in a dataset

Simple chatLowUse mira instead

TranslationLowUse mira instead

Using via API

Thinking mode is automatically available when using the mira-thinking model. No additional parameters are required.

Request with thinking

const response = await fetch("https://api.vmira.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk-mira-YOUR_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "mira-thinking",
    messages: [{
      role: "user",
      content: "Prove that the sum of angles in a triangle is 180 degrees"
    }],
  }),
});

Reading thinking output

The API response contains both the final answer (content) and the reasoning chain (thinking_content):

Response structure

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The sum of angles in a triangle is 180°. Here is a formal proof...",
      "thinking_content": "I need to prove this rigorously. Let me consider the parallel lines approach..."
    }
  }],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 1450,
    "total_tokens": 1478
  }
}

Comparison: with and without thinking

Without thinking (mira)

Prompt

User: What is 17 × 23 + 156 ÷ 12 - 89?

mira response: 17 × 23 = 391, 156 ÷ 12 = 13, 391 + 13 - 89 = 315

With thinking (mira-thinking)

Thinking trace + response

[Thinking]
Let me break this expression into parts following order of operations (PEMDAS):
1. Multiplication: 17 × 23 = 391
2. Division: 156 ÷ 12 = 13
3. Addition: 391 + 13 = 404
4. Subtraction: 404 - 89 = 315
Let me verify: 17×23 = 17×20 + 17×3 = 340 + 51 = 391 ✓
156÷12 = 13 ✓ (since 12×13 = 156)
391 + 13 = 404, 404 - 89 = 315 ✓

[Final answer]
17 × 23 + 156 ÷ 12 - 89 = 315

Computation steps:
• 17 × 23 = 391
• 156 ÷ 12 = 13
• 391 + 13 - 89 = 315

Thinking token costs

Thinking tokens are available in the mira-thinking model and are billed at the output token rate. A typical thinking request uses between 200 and 2,000 additional tokens for the reasoning chain. Current rates: platform.vmira.ai/pricing.

For simple tasks (translation, summarization, chat), use the mira model. Thinking mode in mira-thinking will increase cost and latency without noticeable quality improvement for such tasks.

Limitations

Latency — response time is 2-5x longer than the standard mira model due to the additional reasoning stage
Context — 1M context window — sufficient for most documents
Streaming — when streaming, thinking_content is sent before the main response
Not for every task — thinking mode doesn't improve knowledge-recall tasks (facts, translation)

Choosing a model

When to use which model

Code generation

Debugging code with thinking