Speech-to-Text REST API

Process short audio files synchronously with immediate response. Instant transcription and translation for quick audio processing with multiple format support.

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.

Recommended for new integrations. Saaras v3 offers improved accuracy and flexible output modes. Learn more in the docs introduction.

Output Modes

ModeDescription
transcribe (default)Standard transcription in the original language
translateTranslates speech to English
verbatimExact word-for-word transcription
translitRomanization to Latin script
codemixCode-mixed text output

Code Examples for Saaras v3

from sarvamai import SarvamAI client = SarvamAI( api_subscription_key="YOUR_SARVAM_API_KEY", ) # Transcribe mode (default) response = client.speech_to_text.transcribe( file=open("audio.wav", "rb"), model="saaras:v3", mode="transcribe" # or "translate", "verbatim", "translit", "codemix" ) print(response)

Explore response fields and errors in API response format below.

Legacy Models (Deprecated Soon)

The following models will be deprecated soon. We recommend migrating to Saaras v3 for new integrations.

Saarika v2.5: Speech to Text Transcription

Saarika is a speech-to-text transcription model that excels at multi-speaker content, mixed language content, and conference recordings.

Deprecation Notice: Saarika v2.5 will be deprecated soon. Use Saaras v3 with mode="transcribe" instead.

from sarvamai import SarvamAI client = SarvamAI( api_subscription_key="YOUR_SARVAM_API_KEY", ) response = client.speech_to_text.transcribe( file=open("audio.wav", "rb"), model="saaras:v3", mode="transcribe", language_code="hi-IN" ) print(response)

Saaras v2.5: Speech to Text Translation

Saaras v2.5 is available on the Speech-to-Text Translate endpoint for translating speech directly to English.

Deprecation Notice: Saaras v2.5 will be deprecated soon. Use Saaras v3 with mode="translate" instead.

from sarvamai import SarvamAI client = SarvamAI( api_subscription_key="YOUR_SARVAM_API_KEY", ) response = client.speech_to_text.translate( file=open("audio.wav", "rb"), model="saaras:v3", mode="translate" ) print(response)

API Response Format

Speech to Text Transcription Response

FieldTypeDescription
request_idstringUnique identifier for the request
transcriptstringThe transcribed text from the audio file
language_codestringBCP-47 language code of detected language (e.g., hi-IN). Returns null if no language detected
json
{ "request_id": "20241115_12345678-1234-5678-1234-567812345678", "transcript": "नमस्ते, आप कैसे हैं?", "language_code": "hi-IN" }

Speech to Text Translation Response

FieldTypeDescription
request_idstringUnique identifier for the request
transcriptstringTranslated text in English
language_codestringBCP-47 code of the detected source language

Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

json
{ "request_id": "20241115_12345678-1234-5678-1234-567812345678", "transcript": "Hello, how are you?", "language_code": "hi-IN" }

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

json
{ "error": { "message": "Human-readable error description", "code": "error_code_for_programmatic_handling", "request_id": "unique_request_identifier" } }

Error Codes Reference

HTTP StatusError CodeWhen This HappensWhat To Do
400invalid_request_errorMissing required parameters or malformed requestCheck request format and required fields
403invalid_api_key_errorAPI key is invalid, missing, or expiredVerify your API key in the dashboard
422unprocessable_entity_errorInvalid audio format or file too largeUse supported formats: WAV, MP3, AAC, FLAC, OGG
429insufficient_quota_errorAPI quota or rate limit exceededWait for reset or upgrade your plan
500internal_server_errorUnexpected server errorRetry the request; contact support if persistent
503rate_limit_exceeded_errorService temporarily overloadedRetry with exponential backoff

Example Error Response

json
{ "error": { "message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG", "code": "unprocessable_entity_error", "request_id": "20241115_abc12345" } }

Next Steps

  1. Get API Key

    Sign up and get your API key from the Sarvam AI dashboard.

  2. Test integration

    Call the API with short sample audio (under 30 seconds) using the code examples above.

  3. Go live

    Deploy your integration and monitor usage and errors from your dashboard.

Need help? Contact us on Discord for guidance.

Is this page helpful?