Speech-to-Text REST API
Process short audio files synchronously with immediate response. Instant transcription and translation for quick audio processing with multiple format support.
Synchronous Processing
Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.
Saaras v3: State-of-the-Art Speech Recognition (Recommended)
Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.
Recommended for new integrations. Saaras v3 offers improved accuracy and flexible output modes. Learn more in the docs introduction.
Output Modes
| Mode | Description |
|---|---|
transcribe (default) | Standard transcription in the original language |
translate | Translates speech to English |
verbatim | Exact word-for-word transcription |
translit | Romanization to Latin script |
codemix | Code-mixed text output |
Code Examples for Saaras v3
from sarvamai import SarvamAI
client = SarvamAI(
api_subscription_key="YOUR_SARVAM_API_KEY",
)
# Transcribe mode (default)
response = client.speech_to_text.transcribe(
file=open("audio.wav", "rb"),
model="saaras:v3",
mode="transcribe" # or "translate", "verbatim", "translit", "codemix"
)
print(response)Explore response fields and errors in API response format below.
Legacy Models (Deprecated Soon)
The following models will be deprecated soon. We recommend migrating to Saaras v3 for new integrations.
Saarika v2.5: Speech to Text Transcription
Saarika is a speech-to-text transcription model that excels at multi-speaker content, mixed language content, and conference recordings.
Deprecation Notice: Saarika v2.5 will be deprecated soon. Use Saaras v3 with mode="transcribe" instead.
from sarvamai import SarvamAI
client = SarvamAI(
api_subscription_key="YOUR_SARVAM_API_KEY",
)
response = client.speech_to_text.transcribe(
file=open("audio.wav", "rb"),
model="saaras:v3",
mode="transcribe",
language_code="hi-IN"
)
print(response)Saaras v2.5: Speech to Text Translation
Saaras v2.5 is available on the Speech-to-Text Translate endpoint for translating speech directly to English.
Deprecation Notice: Saaras v2.5 will be deprecated soon. Use Saaras v3 with mode="translate" instead.
from sarvamai import SarvamAI
client = SarvamAI(
api_subscription_key="YOUR_SARVAM_API_KEY",
)
response = client.speech_to_text.translate(
file=open("audio.wav", "rb"),
model="saaras:v3",
mode="translate"
)
print(response)API Response Format
Speech to Text Transcription Response
| Field | Type | Description |
|---|---|---|
request_id | string | Unique identifier for the request |
transcript | string | The transcribed text from the audio file |
language_code | string | BCP-47 language code of detected language (e.g., hi-IN). Returns null if no language detected |
{
"request_id": "20241115_12345678-1234-5678-1234-567812345678",
"transcript": "नमस्ते, आप कैसे हैं?",
"language_code": "hi-IN"
}
Speech to Text Translation Response
| Field | Type | Description |
|---|---|---|
request_id | string | Unique identifier for the request |
transcript | string | Translated text in English |
language_code | string | BCP-47 code of the detected source language |
Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN
{
"request_id": "20241115_12345678-1234-5678-1234-567812345678",
"transcript": "Hello, how are you?",
"language_code": "hi-IN"
}
Error Responses
All errors return a JSON object with an error field containing details about what went wrong.
Error Response Structure
{
"error": {
"message": "Human-readable error description",
"code": "error_code_for_programmatic_handling",
"request_id": "unique_request_identifier"
}
}
Error Codes Reference
| HTTP Status | Error Code | When This Happens | What To Do |
|---|---|---|---|
400 | invalid_request_error | Missing required parameters or malformed request | Check request format and required fields |
403 | invalid_api_key_error | API key is invalid, missing, or expired | Verify your API key in the dashboard |
422 | unprocessable_entity_error | Invalid audio format or file too large | Use supported formats: WAV, MP3, AAC, FLAC, OGG |
429 | insufficient_quota_error | API quota or rate limit exceeded | Wait for reset or upgrade your plan |
500 | internal_server_error | Unexpected server error | Retry the request; contact support if persistent |
503 | rate_limit_exceeded_error | Service temporarily overloaded | Retry with exponential backoff |
Example Error Response
{
"error": {
"message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG",
"code": "unprocessable_entity_error",
"request_id": "20241115_abc12345"
}
}
Next Steps
Get API Key
Sign up and get your API key from the Sarvam AI dashboard.
Test integration
Call the API with short sample audio (under 30 seconds) using the code examples above.
Go live
Deploy your integration and monitor usage and errors from your dashboard.
Need help? Contact us on Discord for guidance.
Is this page helpful?