Thinking mode

Thinking mode is a feature of the mira-pro and mira-max models that performs an internal "chain of thought" before generating the final response. This approach significantly improves accuracy on tasks requiring logical reasoning, mathematical computation, and multi-step analysis.

How it works

When you send a request to mira-pro or mira-max, the model goes through two stages:

  • Thinking stagethe model generates an internal reasoning trace, breaking the problem into sub-tasks, verifying intermediate results, and exploring alternative approaches
  • Response stagebased on the reasoning, the model produces a final, edited response incorporating all discovered solutions

Thinking tokens are included in usage.completion_tokens and billed as output tokens. You can see the reasoning trace in the API response via the thinking_content field.

When to use thinking mode

Task typeThinking benefitExample
Mathematical proofsHighProve that sqrt(2) is irrational
Code debuggingHighFind the race condition in multithreaded code
Logic puzzlesHighSolve Einstein's riddle
Multi-step planningHighDesign a microservices architecture
Data analysisMediumIdentify trends in a dataset
Simple chatLowUse mira instead
TranslationLowUse mira instead

Using via API

Thinking mode is automatically available when using the mira-pro and mira-max models. No additional parameters are required.

Request with thinking
const response = await fetch("https://api.vmira.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk-mira-YOUR_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "mira-pro",
    messages: [{
      role: "user",
      content: "Prove that the sum of angles in a triangle is 180 degrees"
    }],
  }),
});

Reading thinking output

The API response contains both the final answer (content) and the reasoning chain (thinking_content):

Response structure
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The sum of angles in a triangle is 180°. Here is a formal proof...",
      "thinking_content": "I need to prove this rigorously. Let me consider the parallel lines approach..."
    }
  }],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 1450,
    "total_tokens": 1478
  }
}

Comparison: with and without thinking

Without thinking (mira)

Prompt
User: What is 17 × 23 + 156 ÷ 12 - 89?

mira response: 17 × 23 = 391, 156 ÷ 12 = 13, 391 + 13 - 89 = 315

With thinking (mira-pro)

Thinking trace + response
[Thinking]
Let me break this expression into parts following order of operations (PEMDAS):
1. Multiplication: 17 × 23 = 391
2. Division: 156 ÷ 12 = 13
3. Addition: 391 + 13 = 404
4. Subtraction: 404 - 89 = 315
Let me verify: 17×23 = 17×20 + 17×3 = 340 + 51 = 391 ✓
156÷12 = 13 ✓ (since 12×13 = 156)
391 + 13 = 404, 404 - 89 = 315 ✓

[Final answer]
17 × 23 + 156 ÷ 12 - 89 = 315

Computation steps:
• 17 × 23 = 391
• 156 ÷ 12 = 13
• 391 + 13 - 89 = 315

Thinking token costs

Thinking tokens are available in mira-pro and mira-max and are billed at the output token rate of the respective model (300 ₽ / 1M for mira-pro, 750 ₽ / 1M for mira-max). A typical thinking request uses between 200 and 2,000 additional tokens for the reasoning chain.

For simple tasks (translation, summarization, chat), use the mira model. Thinking mode in mira-pro and mira-max will increase cost and latency without noticeable quality improvement for such tasks.

Limitations

  • Latencyresponse time is 2-5x longer than the standard mira model due to the additional reasoning stage
  • Context32K context window — for large documents use mira-pro or mira-max
  • Streamingwhen streaming, thinking_content is sent before the main response
  • Not for every taskthinking mode doesn't improve knowledge-recall tasks (facts, translation)