Mira Voice
Mira Voice bundles speech-to-text (STT) and text-to-speech (TTS) into one product. Use it to transcribe audio files, support real-time dictation, and generate natural-sounding voice-over in dozens of languages.
Capabilities
- Speech-to-text — accurate transcription with auto language detection, timestamps and punctuation
- Text-to-speech — multiple voices, rate and emotion controls
- Multi-language — 50+ languages with native-grade quality for Russian and English
- Streaming — TTS returns audio over chunked transfer; STT supports streaming uploads
- Audio formats — wav, mp3, ogg, opus, webm, flac, m4a
Speech-to-text (STT)
Send multipart/form-data with an audio field. Optionally hint the language with an ISO code (ru, en, …).
cURL
curl https://api.vmira.ai/v1/audio/transcribe \ -H "Authorization: Bearer $MIRA_API_KEY" \ -F "audio=@meeting.mp3" \ -F "language=en"
Python
Python
from openai import OpenAI
client = OpenAI(
api_key="sk-mira-YOUR_API_KEY",
base_url="https://api.vmira.ai/v1",
)
with open("meeting.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="mira-voice",
file=f,
language="en",
)
print(transcript.text)STT response
JSON
{
"text": "This is a sample transcription from the audio file.",
"language": "en",
"duration": 4.82,
"segments": [
{ "start": 0.0, "end": 4.82, "text": "This is a sample…" }
]
}Text-to-speech (TTS)
cURL
curl https://api.vmira.ai/v1/audio/speech \
-H "Authorization: Bearer $MIRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Welcome to Mira.",
"voice": "aria",
"format": "mp3",
"speed": 1.0
}' \
--output speech.mp3Python
Python
from openai import OpenAI
client = OpenAI(
api_key="sk-mira-YOUR_API_KEY",
base_url="https://api.vmira.ai/v1",
)
with client.audio.speech.with_streaming_response.create(
model="mira-voice",
voice="aria",
input="Hello, world!",
response_format="mp3",
) as response:
response.stream_to_file("speech.mp3")Available voices
VoiceTypeLanguages
ariafemale, neutralAll
novafemale, warmAll
onyxmale, deepAll
echomale, calmAll
sageneutral, friendlyAll
Parameters
STT — /v1/audio/transcribe
- audio — required multipart field; up to 25 MB
- language — optional ISO language code for better accuracy
TTS — /v1/audio/speech
- input — required text to synthesize
- voice — voice identifier (see table)
- format — mp3 | wav | ogg | opus | flac
- speed — 0.5–2.0 (default 1.0)
See /pricing for current per-minute STT and per-1000-character TTS rates, and /docs/api/reference for the full endpoint list.