Lipsync

Sync lips in images or videos to match an audio track. Provide a source image or video and an audio file, and Kolbo generates a video with realistic lip movements.

Smart Select (recommended): Omit the model field and Kolbo automatically picks the best model for your input. This is the default and recommended approach for most use cases.

Model identifiers are Kolbo-specific. Never hardcode model identifiers — always fetch the current list from GET /api/v1/models?type=lipsync first. Models may be added, renamed, or retired at any time.

Endpoint

POST /api/v1/generate/lipsync

Request Body

Accepts multipart/form-data (for file uploads) or application/json (for URL-based inputs).

Field	Type	Required	Description
`source_url`	string	No	URL of source image or video. File extension determines type — `.mp4`, `.mov`, `.webm`, `.mkv`, `.avi`, `.m4v` = video, otherwise image.
`image`	file	No	Source image file (multipart upload, max 100 MB)
`video`	file	No	Source video file (multipart upload, max 100 MB)
`audio_url`	string	No	URL of the audio track
`audio`	file	No	Audio file (multipart upload, max 100 MB)
`prompt`	string	No	Text prompt for lipsync adjustments
`model`	string	No	Model identifier from `GET /api/v1/models?type=lipsync` (default: auto-select)
`bounding_box_target`	array	No	Face position as `[x, y]` — normalized 0-1 coordinates (e.g., `[0.5, 0.4]` for center-upper face)

You must provide either source_url or an image/video file upload — not both. Similarly, provide either audio_url or an audio file upload.

Examples

cURL with URLs (Smart Select — recommended)

curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
  -H "X-API-Key: kolbo_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_url": "https://example.com/portrait.jpg",
    "audio_url": "https://example.com/speech.mp3"
  }'

cURL with File Uploads

curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
  -H "X-API-Key: kolbo_live_YOUR_API_KEY" \
  -F "[email protected]" \
  -F "[email protected]"

cURL with Video Source

curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
  -H "X-API-Key: kolbo_live_YOUR_API_KEY" \
  -F "[email protected]" \
  -F "[email protected]"

With Specific Model

To choose a specific model, first fetch identifiers from GET /api/v1/models?type=lipsync, then pass the identifier value:

curl -X POST https://api.kolbo.ai/api/v1/generate/lipsync \
  -H "X-API-Key: kolbo_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source_url": "https://example.com/portrait.jpg",
    "audio_url": "https://example.com/speech.mp3",
    "model": "your-model-identifier"
  }'

Model identifiers come from GET /api/v1/models?type=lipsync. Always fetch the latest list rather than hardcoding identifiers, as models may change over time.

JavaScript

const API_KEY = "kolbo_live_YOUR_API_KEY";

// Fetch available lipsync models
async function initModels() {
  const res = await fetch("https://api.kolbo.ai/api/v1/models?type=lipsync", {
    headers: { "X-API-Key": API_KEY },
  });
  const data = await res.json();
  console.log("Available models:", data.models.map((m) => m.identifier));
}

async function main() {
  await initModels();

  // Using URLs (Smart Select)
  const response = await fetch("https://api.kolbo.ai/api/v1/generate/lipsync", {
    method: "POST",
    headers: {
      "X-API-Key": API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      source_url: "https://example.com/portrait.jpg",
      audio_url: "https://example.com/speech.mp3",
    }),
  });

  const data = await response.json();
  console.log("Generation ID:", data.generation_id);
  console.log("Poll URL:", data.poll_url);

  // Poll for completion
  const pollForResult = async (generationId) => {
    while (true) {
      await new Promise((r) => setTimeout(r, data.poll_interval_hint * 1000));
      const status = await fetch(
        `https://api.kolbo.ai/api/v1/generate/${generationId}/status`,
        { headers: { "X-API-Key": API_KEY } }
      ).then((r) => r.json());

      console.log("State:", status.state, "Progress:", status.progress);

      if (status.state === "completed") {
        console.log("Video URL:", status.result.urls[0]);
        return status;
      }
      if (status.state === "failed") {
        console.error("Generation failed:", status.error);
        return status;
      }
    }
  };

  await pollForResult(data.generation_id);
}

main();

Python

import requests
import time

API_KEY = "kolbo_live_YOUR_API_KEY"
BASE_URL = "https://api.kolbo.ai/api"
HEADERS = {"X-API-Key": API_KEY}

# Fetch available lipsync models
models_res = requests.get(
    f"{BASE_URL}/v1/models",
    headers=HEADERS,
    params={"type": "lipsync"},
)
print("Available models:", [m["identifier"] for m in models_res.json()["models"]])

# --- Option A: Using URLs (Smart Select) ---
response = requests.post(
    f"{BASE_URL}/v1/generate/lipsync",
    headers={**HEADERS, "Content-Type": "application/json"},
    json={
        "source_url": "https://example.com/portrait.jpg",
        "audio_url": "https://example.com/speech.mp3",
    },
)

data = response.json()
print("Generation ID:", data["generation_id"])

# --- Option B: Using file uploads ---
# with open("portrait.jpg", "rb") as img, open("speech.mp3", "rb") as aud:
#     response = requests.post(
#         f"{BASE_URL}/v1/generate/lipsync",
#         headers=HEADERS,
#         files={"image": img, "audio": aud},
#     )
#     data = response.json()

# Poll for completion
generation_id = data["generation_id"]
poll_interval = data.get("poll_interval_hint", 8)

while True:
    time.sleep(poll_interval)
    status = requests.get(
        f"{BASE_URL}/v1/generate/{generation_id}/status",
        headers=HEADERS,
    ).json()

    print(f"State: {status['state']}  Progress: {status.get('progress', 0)}%")

    if status["state"] == "completed":
        print("Video URL:", status["result"]["urls"][0])
        break
    if status["state"] == "failed":
        print("Error:", status.get("error"))
        break

Response

Generation Started

{
  "success": true,
  "generation_id": "lip_abc123",
  "type": "lipsync",
  "model": "auto",
  "credits_charged": 20,
  "poll_url": "/api/v1/generate/lip_abc123/status",
  "poll_interval_hint": 8
}

Completed Status

{
  "success": true,
  "generation_id": "lip_abc123",
  "type": "lipsync",
  "state": "completed",
  "progress": 100,
  "result": {
    "urls": ["https://cdn.kolbo.ai/videos/..."],
    "thumbnail_url": "https://cdn.kolbo.ai/thumbs/...",
    "duration": 12,
    "aspect_ratio": "16:9",
    "prompt_used": "...",
    "model": "auto",
    "created_at": "2026-04-12T10:00:00.000Z"
  }
}

Tips

Use Smart Select (the default). Omit the model field and Kolbo picks the best model for your input. This is the simplest and most future-proof approach.
Lipsync generation typically takes 1-5 minutes depending on the source length.
Source type is auto-detected from the file extension when using source_url. Video extensions (.mp4, .mov, .webm, .mkv, .avi, .m4v) are treated as video; everything else is treated as an image.
For best results with images, use a clear, front-facing portrait with visible lips.
When using file uploads, the total request size limit is 100 MB per file.
Use bounding_box_target to specify the face region if the model has trouble detecting it automatically.
Use poll_interval_hint from the initial response to set your polling interval.

On this page