Speech-to-Text REST API

Process short audio files synchronously with immediate response. Instant transcription and translation for quick audio processing with multiple format support.

Synchronous Processing

Process short audio files with immediate response. Best for quick transcriptions and testing with a maximum duration of 30 seconds.

Saaras v3: State-of-the-Art Speech Recognition (Recommended)

Saaras v3 is our latest state-of-the-art speech recognition model with flexible output formats. It supports multiple modes for different use cases: transcribe, translate, verbatim, transliterate, and codemix.

Recommended for new integrations. Saaras v3 offers improved accuracy and flexible output modes. Learn more in the docs introduction.

Output Modes

Mode	Description
`transcribe` (default)	Standard transcription in the original language
`translate`	Translates speech to English
`verbatim`	Exact word-for-word transcription
`translit`	Romanization to Latin script
`codemix`	Code-mixed text output

Code Examples for Saaras v3

from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

# Transcribe mode (default)
response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saaras:v3",
    mode="transcribe"  # or "translate", "verbatim", "translit", "codemix"
)

print(response)

Explore response fields and errors in API response format below.

Legacy Models (Deprecated Soon)

The following models will be deprecated soon. We recommend migrating to Saaras v3 for new integrations.

Saarika v2.5: Speech to Text Transcription

Saarika is a speech-to-text transcription model that excels at multi-speaker content, mixed language content, and conference recordings.

Deprecation Notice: Saarika v2.5 will be deprecated soon. Use Saaras v3 with mode="transcribe" instead.

from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

response = client.speech_to_text.transcribe(
    file=open("audio.wav", "rb"),
    model="saaras:v3",
    mode="transcribe",
    language_code="hi-IN"
)

print(response)

Saaras v2.5: Speech to Text Translation

Saaras v2.5 is available on the Speech-to-Text Translate endpoint for translating speech directly to English.

Deprecation Notice: Saaras v2.5 will be deprecated soon. Use Saaras v3 with mode="translate" instead.

from sarvamai import SarvamAI

client = SarvamAI(
    api_subscription_key="YOUR_SARVAM_API_KEY",
)

response = client.speech_to_text.translate(
    file=open("audio.wav", "rb"),
    model="saaras:v3",
    mode="translate"
)

print(response)

API Response Format

Speech to Text Transcription Response

Field	Type	Description
`request_id`	string	Unique identifier for the request
`transcript`	string	The transcribed text from the audio file
`language_code`	string	BCP-47 language code of detected language (e.g., `hi-IN`). Returns `null` if no language detected

json

{
  "request_id": "20241115_12345678-1234-5678-1234-567812345678",
  "transcript": "नमस्ते, आप कैसे हैं?",
  "language_code": "hi-IN"
}

Speech to Text Translation Response

Field	Type	Description
`request_id`	string	Unique identifier for the request
`transcript`	string	Translated text in English
`language_code`	string	BCP-47 code of the detected source language

Supported source languages: hi-IN, bn-IN, kn-IN, ml-IN, mr-IN, od-IN, pa-IN, ta-IN, te-IN, gu-IN, en-IN

json

{
  "request_id": "20241115_12345678-1234-5678-1234-567812345678",
  "transcript": "Hello, how are you?",
  "language_code": "hi-IN"
}

Error Responses

All errors return a JSON object with an error field containing details about what went wrong.

Error Response Structure

json

{
  "error": {
    "message": "Human-readable error description",
    "code": "error_code_for_programmatic_handling",
    "request_id": "unique_request_identifier"
  }
}

Error Codes Reference

HTTP Status	Error Code	When This Happens	What To Do
`400`	`invalid_request_error`	Missing required parameters or malformed request	Check request format and required fields
`403`	`invalid_api_key_error`	API key is invalid, missing, or expired	Verify your API key in the dashboard
`422`	`unprocessable_entity_error`	Invalid audio format or file too large	Use supported formats: WAV, MP3, AAC, FLAC, OGG
`429`	`insufficient_quota_error`	API quota or rate limit exceeded	Wait for reset or upgrade your plan
`500`	`internal_server_error`	Unexpected server error	Retry the request; contact support if persistent
`503`	`rate_limit_exceeded_error`	Service temporarily overloaded	Retry with exponential backoff

Example Error Response

json

{
  "error": {
    "message": "Unsupported audio format. Supported formats: WAV, MP3, AAC, FLAC, OGG",
    "code": "unprocessable_entity_error",
    "request_id": "20241115_abc12345"
  }
}

Next Steps

Get API Key
Sign up and get your API key from the Sarvam AI dashboard.
Test integration
Call the API with short sample audio (under 30 seconds) using the code examples above.
Go live
Deploy your integration and monitor usage and errors from your dashboard.

Need help? Contact us on Discord for guidance.

Is this page helpful?

Previous |Webhooks

Authentication

Endpoints

Speech-to-Text REST API

Synchronous Processing

Saaras v3: State-of-the-Art Speech Recognition (Recommended)

Output Modes

Code Examples for Saaras v3

Legacy Models (Deprecated Soon)

Saarika v2.5: Speech to Text Transcription

Saaras v2.5: Speech to Text Translation

API Response Format

Speech to Text Transcription Response

Speech to Text Translation Response

Error Responses

Error Response Structure

Error Codes Reference

Example Error Response

Next Steps

Get API Key

Test integration

Go live