Skip to main content
Model ID
kling-custom-voice
Calling method: sync

Kling Custom Voice Integration Guide

ModelModel IDMode
Kling Custom Voicekling-custom-voiceAsynchronous — final result delivered via webhook (and persisted on the request record)
kling-custom-voice clones a target speaker into a reusable Kling voice profile. Submit a clean 5–30 second voice sample (an audio file URL or a previously generated Kling video ID), and Kling returns a globally unique voice_id that can be reused as the speaker for downstream Kling models such as kling-lip-sync or any avatar pipeline that accepts a voice_id.

kling-custom-voice

Asynchronous. The SubmitRequest response returns immediately with status: "dispatched"; the final result arrives via the webhook (and is also persisted on the request record).

Parameters

ParameterTypeRequiredDescription
voice_namestringYesDisplay name for the voice profile. Max 20 characters. Must be unique within your Kling account. Voices that are no longer needed can be deleted via Kling’s delete-voice API.
voice_urlURLConditionalPublic URL of the source clip. Accepts .mp3 / .wav audio or .mp4 / .mov video. The clip must be 5–30 seconds, contain only one human voice, and be free of background noise/music.
video_idstringConditionalID of a previously generated Kling video to use as the voice source. Eligible videos are those generated on the V2.6 model with sound enabled, via the Avatar API, or via the Lip-Sync API. The referenced clip must satisfy the same 5–30 second / single-voice / clean-audio constraints.
Provide exactly one of voice_url or video_id. They are mutually exclusive — submitting both, or neither, will be rejected by Kling.

Submit Request — using voice_url

{
    "webhook": {
        "url": "YOUR_WEBHOOK_URL_HERE"
    },
    "model": "kling-custom-voice",
    "payload": {
        "voice_name": "ada",
        "voice_url": "https://p1-kling.klingai.com/kcdn/cdn-kcdn112452/kling-qa-test/voice-sample.mp3"
    }
}

Submit Request — using video_id

{
    "webhook": {
        "url": "YOUR_WEBHOOK_URL_HERE"
    },
    "model": "kling-custom-voice",
    "payload": {
        "voice_name": "ada",
        "video_id": "kling-video-id-9876"
    }
}
note: the webhook field in the request body is optional.

Final Outcome

outcome.voices is an array — typically of length 1 — describing the cloned voice profile:
{
    "request_id": "req-uuid-1234",
    "model": "kling-custom-voice",
    "status": "success",
    "outcome": {
        "voices": [
            {
                "voice_id": "kling-voice-id-2468",
                "voice_name": "ada",
                "trial_url": "https://cdn.klingai.com/.../trial.mp3"
            }
        ]
    }
}
FieldDescription
voice_idGlobally unique. Pass this into other Kling models (e.g. as the speaker for kling-lip-sync or avatar generation) to use the cloned voice.
voice_nameEchoes the voice_name you submitted.
trial_urlShort audio sample of the cloned voice for quick QA/preview.
If Kling reports success but returns no voices (rare), the outcome instead carries:
"outcome": {
    "error": "No voices found in the task result"
}

Pricing

FieldValue
Pre-charge$0.07 per request, deducted at submission.
Post-charge adjustmentAfter the success callback arrives, the charge is reconciled to Kling’s reported final_unit_deduction (converted from Kling resource-pack units to micro-USD at the video-generation rate). The delta is applied as a single positive (charge) or negative (refund) adjustment against your account.
The reconciliation is idempotent — duplicate success callbacks for the same task will not double-charge.

Failure & Refund

If Kling reports the task as failed (or our pipeline classifies the response as a failure), the request status becomes failed and the full pre-charge is refunded. No post-charge adjustment is applied.

End-to-End Flow

  1. POST /requests with kling-custom-voice, providing exactly one of voice_url or video_id.
  2. Wait for the webhook (or poll GET /requests/{request_id}).
  3. On success, capture outcome.voices[0].voice_id.
  4. Reuse that voice_id as the speaker in subsequent Kling generation requests (e.g. kling-lip-sync, avatar pipelines).