Vision (Image Input)

Vision via the public API (/v1/chat/completions) is under development. This feature is available through the Mira Chat interface at platform.vmira.ai.

Mira models can analyze images included in your request. You can send images as Base64-encoded strings or as URL references, using the OpenAI-compatible content blocks format.

Vision is available for all models: mira, mira-pro, and mira-max. The mira-pro and mira-max models with thinking mode may respond slower due to the reasoning step.

Supported Formats

FormatMIME TypeMax Size
JPEGimage/jpeg20 MB
PNGimage/png20 MB
GIFimage/gif20 MB
WebPimage/webp20 MB

Sending an Image via URL

The simplest approach is to pass a publicly accessible image URL inside a content block with type image_url.

cURL
curl https://api.vmira.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-mira-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mira",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/photo.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Sending a Base64 Image

If the image is on disk or generated dynamically, encode it to Base64 and pass it with a data URI.

Python
import base64, requests

with open("photo.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "https://api.vmira.ai/v1/chat/completions",
    headers={"Authorization": "Bearer sk-mira-YOUR_KEY"},
    json={
        "model": "mira",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe this image"},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{img_b64}"
                        }
                    }
                ]
            }
        ],
        "max_tokens": 1024
    }
)

print(response.json()["choices"][0]["message"]["content"])
JavaScript
import fs from "fs";

const imgBuffer = fs.readFileSync("photo.png");
const imgB64 = imgBuffer.toString("base64");

const response = await fetch("https://api.vmira.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer sk-mira-YOUR_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "mira",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "What is in this image?" },
          {
            type: "image_url",
            image_url: {
              url: `data:image/png;base64,${imgB64}`,
            },
          },
        ],
      },
    ],
    max_tokens: 1024,
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

Multiple Images

You can send multiple images in a single request by adding multiple image_url blocks in the content array. The model will analyze all images together.

JSON body
{
  "model": "mira-pro",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "Compare these two images" },
        { "type": "image_url", "image_url": { "url": "https://example.com/image1.jpg" } },
        { "type": "image_url", "image_url": { "url": "https://example.com/image2.jpg" } }
      ]
    }
  ],
  "max_tokens": 2048
}
You can send up to 10 images per request. Note that each image consumes tokens: approximately 85 tokens per 512x512 pixel tile.

Image Quality Best Practices

  • ResolutionFor fine details, use images at least 768px on the long edge. Very small images reduce accuracy.
  • ClarityAvoid blurry, heavily compressed, or very dark photos.
  • CroppingCrop the image to the region of interest so the model focuses on the relevant content.
  • Text in imagesThe model reads printed text well. Handwritten text is recognized less reliably.

Limitations

  • People identificationThe model does not identify specific people by face. It can describe appearance but will not name individuals.
  • Spatial reasoningPrecise measurements, counting small objects, and determining exact spatial positions may be inaccurate.
  • Medical / specialized imagesThe model is not a diagnostic tool. Do not use it for medical diagnosis.
Do not send images containing sensitive information (documents, passwords, personal data) unless your application is appropriately secured.

Next Steps