Mira Voice

Mira Voice bundles speech-to-text (STT) and text-to-speech (TTS) into one product. Use it to transcribe audio files, support real-time dictation, and generate natural-sounding voice-over in dozens of languages.

Capabilities

  • Speech-to-textaccurate transcription with auto language detection, timestamps and punctuation
  • Text-to-speechmultiple voices, rate and emotion controls
  • Multi-language50+ languages with native-grade quality for Russian and English
  • StreamingTTS returns audio over chunked transfer; STT supports streaming uploads
  • Audio formatswav, mp3, ogg, opus, webm, flac, m4a

Speech-to-text (STT)

Send multipart/form-data with an audio field. Optionally hint the language with an ISO code (ru, en, …).

cURL
curl https://api.vmira.ai/v1/audio/transcribe \
  -H "Authorization: Bearer $MIRA_API_KEY" \
  -F "audio=@meeting.mp3" \
  -F "language=en"

Python

Python
from openai import OpenAI

client = OpenAI(
    api_key="sk-mira-YOUR_API_KEY",
    base_url="https://api.vmira.ai/v1",
)

with open("meeting.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="mira-voice",
        file=f,
        language="en",
    )

print(transcript.text)

STT response

JSON
{
  "text": "This is a sample transcription from the audio file.",
  "language": "en",
  "duration": 4.82,
  "segments": [
    { "start": 0.0, "end": 4.82, "text": "This is a sample…" }
  ]
}

Text-to-speech (TTS)

cURL
curl https://api.vmira.ai/v1/audio/speech \
  -H "Authorization: Bearer $MIRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Welcome to Mira.",
    "voice": "aria",
    "format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Python

Python
from openai import OpenAI

client = OpenAI(
    api_key="sk-mira-YOUR_API_KEY",
    base_url="https://api.vmira.ai/v1",
)

with client.audio.speech.with_streaming_response.create(
    model="mira-voice",
    voice="aria",
    input="Hello, world!",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")

Available voices

VoiceTypeLanguages
ariafemale, neutralAll
novafemale, warmAll
onyxmale, deepAll
echomale, calmAll
sageneutral, friendlyAll

Parameters

STT — /v1/audio/transcribe

  • audiorequired multipart field; up to 25 MB
  • languageoptional ISO language code for better accuracy

TTS — /v1/audio/speech

  • inputrequired text to synthesize
  • voicevoice identifier (see table)
  • formatmp3 | wav | ogg | opus | flac
  • speed0.5–2.0 (default 1.0)
See /pricing for current per-minute STT and per-1000-character TTS rates, and /docs/api/reference for the full endpoint list.