Mira Voice

Mira Voice bundles speech-to-text (STT) and text-to-speech (TTS) into one product. Use it to transcribe audio files, support real-time dictation, and generate natural-sounding voice-over in dozens of languages.

Capabilities

Speech-to-text — accurate transcription with auto language detection, timestamps and punctuation
Text-to-speech — multiple voices, rate and emotion controls
Multi-language — 50+ languages with native-grade quality for Russian and English
Streaming — TTS returns audio over chunked transfer; STT supports streaming uploads
Audio formats — wav, mp3, ogg, opus, webm, flac, m4a

Speech-to-text (STT)

Send multipart/form-data with an audio field. Optionally hint the language with an ISO code (ru, en, …).

cURL

curl https://api.vmira.ai/v1/audio/transcribe \
  -H "Authorization: Bearer $MIRA_API_KEY" \
  -F "audio=@meeting.mp3" \
  -F "language=en"

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-mira-YOUR_API_KEY",
    base_url="https://api.vmira.ai/v1",
)

with open("meeting.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="mira-voice",
        file=f,
        language="en",
    )

print(transcript.text)

STT response

JSON

{
  "text": "This is a sample transcription from the audio file.",
  "language": "en",
  "duration": 4.82,
  "segments": [
    { "start": 0.0, "end": 4.82, "text": "This is a sample…" }
  ]
}

Text-to-speech (TTS)

cURL

curl https://api.vmira.ai/v1/audio/speech \
  -H "Authorization: Bearer $MIRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Welcome to Mira.",
    "voice": "aria",
    "format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-mira-YOUR_API_KEY",
    base_url="https://api.vmira.ai/v1",
)

with client.audio.speech.with_streaming_response.create(
    model="mira-voice",
    voice="aria",
    input="Hello, world!",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")

Available voices

VoiceTypeLanguages

ariafemale, neutralAll

novafemale, warmAll

onyxmale, deepAll

echomale, calmAll

sageneutral, friendlyAll

Parameters

STT — /v1/audio/transcribe

audio — required multipart field; up to 25 MB
language — optional ISO language code for better accuracy

TTS — /v1/audio/speech

input — required text to synthesize
voice — voice identifier (see table)
format — mp3 | wav | ogg | opus | flac
speed — 0.5–2.0 (default 1.0)

See /pricing for current per-minute STT and per-1000-character TTS rates, and /docs/api/reference for the full endpoint list.