Lipsync
Sync lips in images or videos to audio tracks using the Kolbo API.
Sync lips in images or videos to match an audio track. Provide a source image or video and an audio file, and Kolbo generates a video with realistic lip movements.
Smart Select (recommended): Omit the model field and Kolbo automatically picks the best model for your input. This is the default and recommended approach for most use cases.
Model identifiers are Kolbo-specific. Never hardcode model identifiers — always fetch the current list from GET /api/v1/models?type=lipsync first. Models may be added, renamed, or retired at any time.
Endpoint
POST /api/v1/generate/lipsyncRequest Body
Accepts multipart/form-data (for file uploads) or application/json (for URL-based inputs).
| Field | Type | Required | Description |
|---|---|---|---|
source_url | string | No | URL of source image or video. File extension determines type — .mp4, .mov, .webm, .mkv, .avi, .m4v = video, otherwise image. |
image | file | No | Source image file (multipart upload, max 100 MB) |
video | file | No | Source video file (multipart upload, max 100 MB) |
audio_url | string | No | URL of the audio track |
audio | file | No | Audio file (multipart upload, max 100 MB) |
prompt | string | No | Text prompt for lipsync adjustments |
model | string | No | Model identifier from GET /api/v1/models?type=lipsync (default: auto-select) |
bounding_box_target | array | No | Face position as [x, y] — normalized 0-1 coordinates (e.g., [0.5, 0.4] for center-upper face) |
You must provide either source_url or an image/video file upload — not both. Similarly, provide either audio_url or an audio file upload.
Examples
cURL with URLs (Smart Select — recommended)
curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
-H "X-API-Key: kolbo_live_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_url": "https://example.com/portrait.jpg",
"audio_url": "https://example.com/speech.mp3"
}'cURL with File Uploads
curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
-H "X-API-Key: kolbo_live_YOUR_API_KEY" \
-F "[email protected]" \
-F "[email protected]"cURL with Video Source
curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
-H "X-API-Key: kolbo_live_YOUR_API_KEY" \
-F "[email protected]" \
-F "[email protected]"With Specific Model
To choose a specific model, first fetch identifiers from GET /api/v1/models?type=lipsync, then pass the identifier value:
curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
-H "X-API-Key: kolbo_live_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source_url": "https://example.com/portrait.jpg",
"audio_url": "https://example.com/speech.mp3",
"model": "your-model-identifier"
}'Model identifiers come from GET /api/v1/models?type=lipsync. Always fetch the latest list rather than hardcoding identifiers, as models may change over time.
JavaScript
const API_KEY = "kolbo_live_YOUR_API_KEY";
// Fetch available lipsync models
async function initModels() {
const res = await fetch("https://api.kolbo.ai/api/v1/models?type=lipsync", {
headers: { "X-API-Key": API_KEY },
});
const data = await res.json();
console.log("Available models:", data.models.map((m) => m.identifier));
}
async function main() {
await initModels();
// Using URLs (Smart Select)
const response = await fetch("https://api.kolbo.ai/api/v1/generate/lipsync", {
method: "POST",
headers: {
"X-API-Key": API_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({
source_url: "https://example.com/portrait.jpg",
audio_url: "https://example.com/speech.mp3",
}),
});
const data = await response.json();
console.log("Generation ID:", data.generation_id);
console.log("Poll URL:", data.poll_url);
// Poll for completion
const pollForResult = async (generationId) => {
while (true) {
await new Promise((r) => setTimeout(r, data.poll_interval_hint * 1000));
const status = await fetch(
`https://api.kolbo.ai/api/v1/generate/${generationId}/status`,
{ headers: { "X-API-Key": API_KEY } }
).then((r) => r.json());
console.log("State:", status.state, "Progress:", status.progress);
if (status.state === "completed") {
console.log("Video URL:", status.result.urls[0]);
return status;
}
if (status.state === "failed") {
console.error("Generation failed:", status.error);
return status;
}
}
};
await pollForResult(data.generation_id);
}
main();Python
import requests
import time
API_KEY = "kolbo_live_YOUR_API_KEY"
BASE_URL = "https://api.kolbo.ai/api"
HEADERS = {"X-API-Key": API_KEY}
# Fetch available lipsync models
models_res = requests.get(
f"{BASE_URL}/v1/models",
headers=HEADERS,
params={"type": "lipsync"},
)
print("Available models:", [m["identifier"] for m in models_res.json()["models"]])
# --- Option A: Using URLs (Smart Select) ---
response = requests.post(
f"{BASE_URL}/v1/generate/lipsync",
headers={**HEADERS, "Content-Type": "application/json"},
json={
"source_url": "https://example.com/portrait.jpg",
"audio_url": "https://example.com/speech.mp3",
},
)
data = response.json()
print("Generation ID:", data["generation_id"])
# --- Option B: Using file uploads ---
# with open("portrait.jpg", "rb") as img, open("speech.mp3", "rb") as aud:
# response = requests.post(
# f"{BASE_URL}/v1/generate/lipsync",
# headers=HEADERS,
# files={"image": img, "audio": aud},
# )
# data = response.json()
# Poll for completion
generation_id = data["generation_id"]
poll_interval = data.get("poll_interval_hint", 8)
while True:
time.sleep(poll_interval)
status = requests.get(
f"{BASE_URL}/v1/generate/{generation_id}/status",
headers=HEADERS,
).json()
print(f"State: {status['state']} Progress: {status.get('progress', 0)}%")
if status["state"] == "completed":
print("Video URL:", status["result"]["urls"][0])
break
if status["state"] == "failed":
print("Error:", status.get("error"))
breakResponse
Generation Started
{
"success": true,
"generation_id": "lip_abc123",
"type": "lipsync",
"model": "auto",
"credits_charged": 20,
"poll_url": "/api/v1/generate/lip_abc123/status",
"poll_interval_hint": 8
}Completed Status
{
"success": true,
"generation_id": "lip_abc123",
"type": "lipsync",
"state": "completed",
"progress": 100,
"result": {
"urls": ["https://cdn.kolbo.ai/videos/..."],
"thumbnail_url": "https://cdn.kolbo.ai/thumbs/...",
"duration": 12,
"aspect_ratio": "16:9",
"prompt_used": "...",
"model": "auto",
"created_at": "2026-04-12T10:00:00.000Z"
}
}Tips
- Use Smart Select (the default). Omit the
modelfield and Kolbo picks the best model for your input. This is the simplest and most future-proof approach. - Lipsync generation typically takes 1-5 minutes depending on the source length.
- Source type is auto-detected from the file extension when using
source_url. Video extensions (.mp4,.mov,.webm,.mkv,.avi,.m4v) are treated as video; everything else is treated as an image. - For best results with images, use a clear, front-facing portrait with visible lips.
- When using file uploads, the total request size limit is 100 MB per file.
- Use
bounding_box_targetto specify the face region if the model has trouble detecting it automatically. - Use
poll_interval_hintfrom the initial response to set your polling interval.