Model IDDocumentation Index
Fetch the complete documentation index at: https://docs.gmicloud.ai/llms.txt
Use this file to discover all available pages before exploring further.
Minimax TTS Speech 2.6 HD API Usage Guide
Overview
Minimax TTS Speech 2.6 HD is MiniMax’s latest high-performance text-to-speech model, capable of turning text into ultra-fast, natural, expressive speech, even mimicking a target voice from a short reference clip with zero-shot voice cloning and emotional nuance.Authentication
All API requests require authentication using an API key. Include your API key in the Authorization header:Submit Video Generation Request
Base URL
Endpoint
Request Format
Request Parameters
| Parameter | Type | Required | Description | Default | Constraints |
|---|---|---|---|---|---|
text | string | Yes | Text content to be converted to speech. | - | Required |
voice_id | string | No | Voice ID for speech synthesis. | ”English_expressive_narrator” | Alphanumeric string, underscores allowed |
speed | float | No | Speech speed multiplier. | 1 | 0.5 to 2 with step 0.1 |
vol | float | No | Volume level multiplier. | 1 | 0 to 10 with step 0.1 |
pitch | integer | No | Pitch adjustment in semitones. | 0 | -12 to 12 with step 1 |
emotion | string | No | Emotion control for synthesized speech. By default, the model automatically selects the most natural emotion based on text. Manual specification is only recommended when explicitly needed. | ”auto” | Options: “auto”, “calm”, “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised” |
language_boost | string | No | Controls whether recognition for specific minority languages and dialects is enhanced. If the language type is unknown, set to ‘auto’ and the model will automatically detect it. | ”auto” | - |
format | string | No | Specifies the format of the generated audio. Default is mp3. | ”mp3” | Options: “mp3”, “flac” |
audio_sample_rate | string | No | Specifies the sampling rate of the generated audio. Default is 32000 Hz. | ”32000” | Options: “8000”, “16000”, “22050”, “24000”, “32000”, “44100” |
bitrate | string | No | Specifies the bitrate of the generated audio. Default is 128000. Note: This parameter only applies to audio in mp3 format. | ”128000” | Options: “32000”, “64000”, “128000”, “256000” |
channel | string | No | Specifies the number of audio channels. 1 = mono, 2 = stereo. Default is 2 (stereo). | “2” | Options: “1”, “2” |
vm_pitch | integer | No | Voice modification pitch adjustment. Adjusts the pitch of the synthesized voice for voice-changing effects. Range: -100 (lower) to 100 (higher). | 0 | -100 to 100 with step 1 |
intensity | integer | No | Voice intensity adjustment. Controls the strength/power of the voice. Range: -100 (weaker) to 100 (stronger). | 0 | -100 to 100 with step 1 |
timbre | integer | No | Voice timbre adjustment. Modifies the tonal quality and character of the voice. Range: -100 to 100. | 0 | -100 to 100 with step 1 |
sound_effects | string | No | Applies special sound effects to the synthesized voice. These effects can create atmospheric or stylistic variations. | "" | Options: "", “spacious_echo”, “auditorium_echo”, “lofi_telephone”, “robotic” |
Response
Check Request Status
Endpoint
Example
Response
Request Status Values
| Status | Description |
|---|---|
queued | Request is waiting to be processed |
processing | Video generation is in progress |
success | Video generation completed successfully |
failed | Video generation failed |
cancelled | Request was cancelled |