Audio Models - GMI Cloud Documentation

Audio models turn text into speech, clone or style voices, generate music, or edit audio. Capabilities and latency profiles differ by provider and tier.

Technical topics

Text-to-speech (TTS), Natural speech from text; variants tuned for quality vs speed.
Voice cloning, Reference audio to match timbre or style where supported.
Real-time / low-latency Models optimized for interactive or live use cases.
Languages & prosody, Multilingual support, emotion, and pacing depend on the specific model.
Music generation, Lyrics- or prompt-driven music where available.

Model API & platform docs

For serving modes (serverless vs dedicated), billing, rate limits, task polling, and unified API patterns, see the API Reference section.

wan2.7-videoedit Chatterbox-tts