Kling Custom Voice Integration Guide
| Model | Model ID | Mode |
|---|---|---|
| Kling Custom Voice | kling-custom-voice | Asynchronous — final result delivered via webhook (and persisted on the request record) |
kling-custom-voice clones a target speaker into a reusable Kling voice profile. Submit a clean 5–30 second voice sample (an audio file URL or a previously generated Kling video ID), and Kling returns a globally unique voice_id that can be reused as the speaker for downstream Kling models such as kling-lip-sync or any avatar pipeline that accepts a voice_id.
kling-custom-voice
Asynchronous. The SubmitRequest response returns immediately with status: "dispatched"; the final result arrives via the webhook (and is also persisted on the request record).
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
voice_name | string | Yes | Display name for the voice profile. Max 20 characters. Must be unique within your Kling account. Voices that are no longer needed can be deleted via Kling’s delete-voice API. |
voice_url | URL | Conditional | Public URL of the source clip. Accepts .mp3 / .wav audio or .mp4 / .mov video. The clip must be 5–30 seconds, contain only one human voice, and be free of background noise/music. |
video_id | string | Conditional | ID of a previously generated Kling video to use as the voice source. Eligible videos are those generated on the V2.6 model with sound enabled, via the Avatar API, or via the Lip-Sync API. The referenced clip must satisfy the same 5–30 second / single-voice / clean-audio constraints. |
Provide exactly one ofvoice_urlorvideo_id. They are mutually exclusive — submitting both, or neither, will be rejected by Kling.
Submit Request — using voice_url
Submit Request — using video_id
webhook field in the request body is optional.
Final Outcome
outcome.voices is an array — typically of length 1 — describing the cloned voice profile:
| Field | Description |
|---|---|
voice_id | Globally unique. Pass this into other Kling models (e.g. as the speaker for kling-lip-sync or avatar generation) to use the cloned voice. |
voice_name | Echoes the voice_name you submitted. |
trial_url | Short audio sample of the cloned voice for quick QA/preview. |
Pricing
| Field | Value |
|---|---|
| Pre-charge | $0.07 per request, deducted at submission. |
| Post-charge adjustment | After the success callback arrives, the charge is reconciled to Kling’s reported final_unit_deduction (converted from Kling resource-pack units to micro-USD at the video-generation rate). The delta is applied as a single positive (charge) or negative (refund) adjustment against your account. |
Failure & Refund
If Kling reports the task as failed (or our pipeline classifies the response as a failure), the request status becomesfailed and the full pre-charge is refunded. No post-charge adjustment is applied.
End-to-End Flow
POST /requestswithkling-custom-voice, providing exactly one ofvoice_urlorvideo_id.- Wait for the webhook (or poll
GET /requests/{request_id}). - On success, capture
outcome.voices[0].voice_id. - Reuse that
voice_idas the speaker in subsequent Kling generation requests (e.g.kling-lip-sync, avatar pipelines).