Kolbo.AIKolbo.AI Docs
Developer API

Claude Code & MCP Setup

Use Kolbo AI as native tools in Claude Code via the MCP server.

Use Kolbo AI directly from Claude Code, Kolbo Code, or Claude Desktop as native tools. The MCP (Model Context Protocol) server handles polling internally — you call generate_image and get back the final URL.

Kolbo Code (automatic)

If you use Kolbo Code, the MCP server and skill are configured automatically when you log in:

kolbo auth login

That's it. After login, the agent already knows about your Kolbo tools and how to use them — no manual config required.

Manual Setup (Claude Code / Claude Desktop)

1. Get an API Key

Create a key at the Developer Console or via the Authentication API.

2. Configure MCP Server

Add to your Claude Code project settings (.claude/settings.json):

{
  "mcpServers": {
    "kolbo": {
      "command": "npx",
      "args": ["-y", "@kolbo/mcp"],
      "env": {
        "KOLBO_API_KEY": "kolbo_live_..."
      }
    }
  }
}

Or for Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "kolbo": {
      "command": "npx",
      "args": ["-y", "@kolbo/mcp"],
      "env": {
        "KOLBO_API_KEY": "kolbo_live_..."
      }
    }
  }
}

Environment Variables

VariableRequiredDescription
KOLBO_API_KEYYesYour Kolbo API key
KOLBO_API_URLNoCustom API URL (default: https://api.kolbo.ai/api)

Available Tools

Once configured, Claude Code has access to 30 Kolbo tools grouped by purpose:

Generation

ToolDescription
generate_imageText → image
generate_image_editExisting image(s) + prompt → edited image
generate_videoText → video
generate_video_from_imageStill image + motion prompt → video
generate_video_from_videoInput video + prompt → restyled video (video-to-video)
generate_elementsReference images/videos + prompt → animated video
generate_first_last_frameFirst frame + last frame → interpolated video
generate_lipsyncSource image/video + audio → lipsynced video
generate_creative_directorOne brief → N coordinated scenes (image or video)
generate_musicText (+ optional lyrics) → song
generate_speechText + voice → spoken audio
generate_soundText → sound effect
generate_3dText or reference images → 3D model (GLB/FBX/OBJ/USDZ)
transcribe_audioAudio/video URL or file → text + SRT subtitles

Every image/video/creative-director tool accepts visual_dna_ids and moodboard_id for character/style consistency — you can compose create_visual_dnagenerate_image (with the DNA applied to the output) in a single agent turn. generate_creative_director additionally accepts moodboard_ids (plural) for blending multiple styles. generate_music supports vocal_gender and custom lyrics; generate_speech accepts a voice by id or display name. Every tool that takes files accepts both public URLs and absolute local file paths (where applicable).

Chat

ToolDescription
chat_send_messageTalk to any Kolbo chat model; supports web search, deep think, multi-turn sessions
chat_list_conversationsList SDK chat threads
chat_get_messagesFetch messages in a conversation

Visual DNA (reusable character/style/product profiles)

ToolDescription
create_visual_dnaCreate a profile from URLs or local files (max 4 images + 1 video + 1 audio)
list_visual_dnasList your profiles
get_visual_dnaFetch one profile
delete_visual_dnaDelete a profile

Moodboards

ToolDescription
list_moodboardsBrowse presets + your moodboards
get_moodboardFetch one moodboard with all image URLs

Media Library

ToolDescription
upload_mediaUpload a local file (or URL) → stable Kolbo CDN URL for reuse
list_mediaBrowse your uploaded media with type filter and pagination

Discovery & Account

ToolDescription
list_modelsCurrent model catalog with costs and capabilities
list_voicesTTS voices (presets + your cloned voices)
list_presetsGeneration presets across image/video/music/text-to-video catalogs
check_creditsCheck credit balance
get_generation_statusPoll a generation by ID (fallback if a tool times out)

Usage Examples

In Claude Code, just ask naturally:

  • "Generate an image of a sunset over mountains"
  • "Create a 5-second video of waves crashing on a beach"
  • "Remove the background from this image" (attach a URL)
  • "Restyle this video in Studio Ghibli style" — video-to-video
  • "Animate these 4 product shots into a 10-second video" — generate_elements
  • "Morph from this before photo to this after photo" — first/last frame interpolation
  • "Lipsync this audio clip to this talking head video"
  • "Generate a 3D model of a medieval helmet I can import into Blender"
  • "Transcribe this podcast and give me the SRT file"
  • "Build a 4-scene storyboard for a coffee shop ad campaign"
  • "Upload this local image to my library so I can reuse it across generations"
  • "Make a lo-fi hip hop beat with male vocals"
  • "Read this paragraph out loud with a British female voice"
  • "Ask Claude what's new in AI this week, with web search on and deep think"
  • "Create a Visual DNA profile called 'Alex' from these images, then make 4 shots of Alex in different outfits" — the agent creates the DNA, then passes visual_dna_ids to four generate_image calls so Alex stays consistent across all four outputs
  • "Apply the 'Cyberpunk Neon' moodboard style to my new character render" — the agent finds the moodboard and passes moodboard_id to the generation
  • "What video presets are available?" — list_presets
  • "Check my credit balance"

Skill File

Kolbo Code installs this skill automatically. For Claude Code or Claude Desktop, add it manually as .claude/commands/kolbo.md in your project:

# Kolbo AI Creative Generation

You have access to the Kolbo AI platform via MCP. Kolbo routes to 100+ AI models for images, videos, music, speech, sound effects, multi-scene campaigns, and conversational chat — all behind a unified API with Smart Select model routing.

**IMPORTANT: Model identifiers are dynamic.** Never hardcode model names — always call `list_models` with a `type` filter to get current identifiers. Omit `model` entirely to let Smart Select pick the best model automatically (this is the recommended default).

## Tools

**Image generation**
- `generate_image` — text → image (~10–30s). Accepts `visual_dna_ids`, `moodboard_id`, `reference_images`, `num_images`, `enable_web_search`.
- `generate_image_edit` — image(s) + prompt → edited image (background removal, color change, object swap). Accepts `source_images`, `visual_dna_ids`, `moodboard_id`, `enable_web_search`.

**Video generation** (pick the one that matches the inputs)
- `generate_video` — text → video (~1–5min). Accepts `visual_dna_ids`, `reference_images`, `duration`, `aspect_ratio`.
- `generate_video_from_image` — still image + motion prompt → video.
- `generate_video_from_video` — input video + prompt → restyled video (video-to-video transformation). For style transfer, scene restyle, subject swap.
- `generate_elements` — reference images/videos + prompt → animated video. Use when animating specific uploaded assets like a product shot or character rig. Accepts `reference_images` (URLs) OR `files` (URLs or local paths), plus `preset_id`, `motion`, `visual_dna_ids`.
- `generate_first_last_frame` — first frame + last frame → interpolated video. Provide two frames as `first_frame_url`/`last_frame_url` (URL mode) OR `first_frame`/`last_frame` (URL-or-local-path mode). Do NOT mix URL and file inputs.
- `generate_lipsync``source` (image or video) + `audio` → lipsynced video. Both inputs accept URL or local path.

**Multi-scene**
- `generate_creative_director` — one brief → N coordinated scenes (1–8). `workflow_type: "image"` (default) or `"video"`. Use for storyboards, product showcases, ad campaigns. Accepts `visual_dna_ids` and `moodboard_id` (or `moodboard_ids` for blending) to keep characters/styles consistent across every scene — this is the ideal composition for "same character in 8 scenes" type asks.

**Audio / voice**
- `generate_music` — description (+ optional lyrics/style) → song. Accepts `instrumental`, `lyrics`, `vocal_gender` ("male"/"female"), `style`.
- `generate_speech` — text + voice → spoken audio. Voice can be a `voice_id` from `list_voices` OR a display name like "Rachel".
- `generate_sound` — description → sound effect (foley, ambience, UI sounds — NOT music, NOT speech).
- `transcribe_audio` — audio or video source (URL or local path) → text + SRT subtitles + downloadable .srt/.txt URLs. Works on podcasts, videos with audio tracks, voice memos.

**3D**
- `generate_3d` — text prompt or reference image(s) → 3D model in GLB/FBX/OBJ/USDZ. Modes auto-detect from inputs: text mode (prompt only), single mode (1 image), multi mode (multiple images for multi-view reconstruction).

**Chat**
- `chat_send_message` — talk to any chat model on Kolbo. Supports `web_search`, `deep_think`, and `system_prompt` (new sessions only). Omit `session_id` to start a new thread; pass it back on subsequent calls to keep context.
- `chat_list_conversations` — list past SDK chat threads
- `chat_get_messages` — fetch a thread's history

**Visual DNA** (reusable character/style/product profiles)
- `create_visual_dna` — build a profile from reference media. Accepts public URLs OR absolute local file paths in the same `images` array. Max 4 images + optional 1 video + 1 audio. 25MB/file cap. `dna_type`: `character`, `style`, `product`, or `scene`.
- `list_visual_dnas` / `get_visual_dna` / `delete_visual_dna` — CRUD over the user's profiles

**Moodboards**
- `list_moodboards` — presets + the user's moodboards (each has a `master_prompt` and `style_guide`)
- `get_moodboard` — full moodboard with all image URLs

**Media Library** (upload once, reuse everywhere)
- `upload_media` — upload a local file (or re-host a remote URL) to the user's Kolbo media library and get back a stable Kolbo CDN URL. Use this when the user wants to reference a local file in MULTIPLE subsequent generation calls — upload once, then pass the returned URL to `generate_image` / `generate_video` / `create_visual_dna` / etc. For a single-use reference, you can skip this and pass a URL directly.
- `list_media` — browse the user's uploaded media with `type` filter and pagination. Helpful to check if the user already has a file in their library before uploading a new one.

**Discovery & account**
- `list_models` — current catalog with `identifier`, `credit`, `supported_aspect_ratios`, `supported_durations`. Filter by `type`: `image`, `image_edit`, `video`, `video_from_image`, `music`, `speech`, `sound`, `chat`, `three_d`, `lipsync`. **This is the only source of truth for model identifiers** — never guess or hardcode them (e.g., `fal-ai/flux-2`, `seedance-1.5-pro-image-to-video`, `nano-banana-pro`). They change as models are added, renamed, or retired.
- `list_voices` — TTS voices (presets + cloned). Filter by `language`, `gender`, `provider`.
- `list_presets` — generation presets across image / video / music / text_to_video catalogs. Pass `type` to filter. Returns presets with `id`, `name`, `description`, `thumbnail_url`, `category`. Pass the returned `id` as `preset_id` to the matching generation tool.
- `check_credits` — remaining balance
- `get_generation_status` — poll a generation by id (fallback if a tool times out)

## Routing — user intent → first tool

| User says | Call |
|---|---|
| "make/create/generate an image of…" | `generate_image` |
| "edit/change/remove [x] in this image" | `generate_image_edit` with `source_images` |
| "make a video of…" | `generate_video` |
| "animate this image" / "make this photo move" | `generate_video_from_image` |
| "restyle this video" / "video in anime style" / "video-to-video" | `generate_video_from_video` with `source_video` |
| "animate these product shots" / "put these assets into a video" | `generate_elements` |
| "morph from this image to this image" / "transition between these two frames" | `generate_first_last_frame` |
| "sync this audio to this video" / "lipsync" / "make it talk" | `generate_lipsync` |
| "make a 3D model of…" / "generate a GLB / USDZ" | `generate_3d` |
| "transcribe this audio" / "subtitle this video" / "get SRT from this" | `transcribe_audio` |
| "storyboard / multi-scene ad / product showcase" | `generate_creative_director` |
| "write me a song" / "make a beat" | `generate_music` |
| "read this out loud" / "text to speech" | `list_voices``generate_speech` |
| "sound effect of…" | `generate_sound` |
| "ask GPT/Claude/Gemini…" / "chat about…" | `chat_send_message` |
| "continue our conversation" | `chat_list_conversations``chat_send_message` with `session_id` |
| "use the same character across N images" | `list_visual_dnas``create_visual_dna` if missing → `generate_image` with `visual_dna_ids: [id]` for each output |
| "use the same character across an 8-scene campaign" | `list_visual_dnas` / `create_visual_dna``generate_creative_director` with `visual_dna_ids: [id]` and `scene_count: 8` |
| "apply this moodboard style to [thing]" | `list_moodboards``generate_image` / `generate_image_edit` / `generate_creative_director` with `moodboard_id` |
| "use this preset" / "what presets are available for video" | `list_presets` → pass `preset_id` to the generation tool |
| "upload this local file so I can reuse it" | `upload_media` → use the returned URL in subsequent calls |
| "what files have I uploaded" / "find my previous upload" | `list_media` |
| "match this existing look" (user pastes image URL) | `generate_image` with `reference_images: [url]` (style reference, not edit source) |
| "4 variations of this" | `generate_image` with `num_images: 4` |
| "current events" / "based on today's news" | `generate_image` or `chat_send_message` with `enable_web_search: true` / `web_search: true` |
| "what's my credit balance" | `check_credits` |
| "what models are available for X" | `list_models` type=X |
| "use Flux 2 / Kling / Seedance / specific model" | `list_models` type=Y to find exact `identifier` → pass to generation tool |

## Rules

1. **Never hardcode model identifiers.** They change without notice. Call `list_models` with a `type` filter to get current identifiers only when the user cares about model choice — otherwise omit `model` entirely and let Smart Select pick the best model automatically.
2. **Default to Smart Select.** Omit the `model` parameter unless the user explicitly names a model or their ask implies a specific capability (e.g., "4K", "Kling", "Seedance", "Suno"). When the user does name a model, call `list_models` to find the matching `identifier` — do not guess the identifier format.
3. **Keep `enhance_prompt: true`** (the default) for image and video generation. Only disable if the user says "don't change my prompt."
4. **Chat sessions are sticky.** On every follow-up `chat_send_message` in the same conversation, pass back the `session_id` from the first response. Starting a new session each turn loses memory.
5. **Polling is automatic.** Generation tools block until the result is ready. If a tool returns a timeout error, the error message includes the `generation_id` — call `get_generation_status` with that id to check the latest state. The generation is almost certainly still running server-side.
6. **Check credits only when it matters.** Before Creative Director with many scenes, long videos, or 4K batches, call `check_credits`. Skip it for single images or chat messages — it's noise.
7. **Visual DNA is compositional — use it in the same turn.** When the user wants a character/style/product/scene consistent across multiple outputs:
   1. `list_visual_dnas` to check if one already exists
   2. `create_visual_dna` if needed (accepts URLs or absolute local paths, max 4 images + optional video + audio, 25MB per file)
   3. **Pass the profile's `id` as `visual_dna_ids: [id]` to every subsequent `generate_image` / `generate_image_edit` / `generate_video` / `generate_video_from_image` / `generate_creative_director` call.** This is not a suggestion — without this step the DNA has no effect on the output. You can create the DNA and use it in the same agent turn.
8. **Moodboards are compositional — same pattern.** `list_moodboards` → find the style the user wants → pass `moodboard_id` (or `moodboard_ids` for `generate_creative_director` when blending) to the generation tool. Don't manually fold the moodboard's `master_prompt` into your text prompt — pass the id and let the server handle it.
9. **Multiple variations = `num_images`, not loops.** If the user asks for "4 variations of X", call `generate_image` once with `num_images: 4` (or 5, 6 etc.). Only make multiple tool calls if each output needs a distinct prompt or DNA.
10. **`reference_images``source_images`.** `reference_images` on `generate_image` / `generate_video` / `generate_creative_director` are STYLE/COMPOSITION guidance — the model uses them as inspiration. `source_images` on `generate_image_edit` are EDIT SOURCES — the output is a modification of those images. Picking the wrong one causes wrong results.
11. **Absolute paths only for local files.** When `create_visual_dna` takes a local file path, it must be absolute (no `~`, no relative paths). URLs must be publicly reachable (not localhost, not behind auth).
12. **Error codes matter.** SDK errors now carry structured codes in the message like `[INSUFFICIENT_CREDITS]` or `[NOT_FOUND]`. If you see one, act on it: `INSUFFICIENT_CREDITS` → surface balance via `check_credits` and point user to app.kolbo.ai; `NOT_FOUND` → the id is wrong or the resource was deleted; `ACCESS_DENIED` → the user can't act on that resource.

## Workflows

**Image**
1. If the user did NOT name a specific model, skip straight to step 2 — Smart Select picks the best model automatically.
2. If the user named a model (e.g., "use Flux 2"): `list_models` type=image → find the matching `identifier` from the response → pass it as `model`.
3. `generate_image` with prompt + optional `model` + `aspect_ratio`.
4. Return the URL(s) as markdown links.

**Video**
1. If the user did NOT name a model, omit `model` — Smart Select handles it. If the user named one or needs specific durations: `list_models` type=video → find the `identifier` and check its `supported_durations` and `supported_aspect_ratios`.
2. `generate_video` with prompt, `duration`, `aspect_ratio`, and optional `model`.
3. On timeout: `get_generation_status` with the returned `generation_id`.

**Multi-scene campaign**
1. `generate_creative_director` with `prompt`, `scene_count` (1–8), `workflow_type`, and optional `reference_images` for style guidance.
2. Returns all completed scenes in one response.

**Chat thread**
1. First turn: `chat_send_message` with `message` (+ optional `system_prompt`). Save the returned `session_id`.
2. Follow-ups: `chat_send_message` with new `message` AND saved `session_id`.
3. Flip `web_search: true` for current-events questions, `deep_think: true` for hard reasoning.

**Character consistency across multiple outputs** (the most important composition pattern)
1. `list_visual_dnas` — the user may already have the character/style/product they want.
2. If not, `create_visual_dna` with `name`, `dna_type: "character"` (or `"style"` / `"product"` / `"scene"`), and reference media (URLs from a prior generation work great; absolute local paths also OK). Save the returned `id`.
3. **For each subsequent image/video/edit**, pass `visual_dna_ids: [id]` to the generation tool. Example: "make 4 shots of Alex in different outfits" →
   - `create_visual_dna(name: "Alex", dna_type: "character", images: [...])` → id = "abc123"
   - `generate_image(prompt: "Alex wearing a tuxedo at a formal event", visual_dna_ids: ["abc123"])`
   - `generate_image(prompt: "Alex in hiking gear on a mountain trail", visual_dna_ids: ["abc123"])`
   - ...and so on. Alex stays consistent because the DNA is applied server-side.
4. For a multi-scene campaign with one character, prefer `generate_creative_director` with `visual_dna_ids: [id]` and `scene_count: N` — one call, N consistent scenes.

**Image edit**
1. `generate_image_edit` with `source_images: [url]` and a clear edit instruction as the prompt.
2. You can pipe URLs directly from a prior `generate_image` output.
3. If editing should keep a known character consistent, also pass `visual_dna_ids`.

**TTS with a specific voice**
1. `list_voices` filtered by `language` / `gender` if the user specified them → pick a `voice_id`.
2. `generate_speech` with text + voice (can be voice_id OR display name like "Rachel").

**Moodboard-guided generation**
1. `list_moodboards` → pick the one the user wants (or they name it directly). Grab the `id`.
2. Pass `moodboard_id: "the_id"` directly to `generate_image` / `generate_image_edit` / `generate_creative_director`. The server resolves the moodboard's master_prompt and style_guide into the generation. **Do NOT manually paste the master_prompt into your text prompt** — that duplicates signal and is less effective than passing the id.
3. For `generate_creative_director`, you can pass `moodboard_ids: [id1, id2]` to blend multiple moodboards across scenes.

**Batch variations of the same prompt**
1. `generate_image` with `num_images: N` (where N is how many variations the user wants). Single call, N outputs. Don't loop.

**Current-events / real-world grounding**
1. `generate_image` or `chat_send_message` with `enable_web_search: true` / `web_search: true`. The server grounds the prompt in real web results.

**Recovering from a timeout**
1. Generation tools throw an error message like `"Generation timed out after 120s... call get_generation_status with generation_id='xyz'"`.
2. Extract the `generation_id` from the error message (it's always quoted).
3. Call `get_generation_status` with that id. The generation is almost certainly still running; just wait a bit and check again, or return the current state to the user.

**Animating reference assets (elements)**
1. For "animate this product", "put these characters in a scene", "make this logo move": call `generate_elements` with `reference_images: [url]` (or `files: [path]` for local files) plus a prompt.
2. If the user wants the same character / product kept consistent across multiple outputs, pass `visual_dna_ids: [id]` as well.
3. Returns a video URL.

**First-to-last frame interpolation**
1. If the user wants a smooth transition between two specific images (e.g., "morph from this before-shot to this after-shot"): `generate_first_last_frame`.
2. URL mode: pass `first_frame_url` + `last_frame_url`.
3. File mode: pass `first_frame` + `last_frame` (URL or absolute local path each).
4. Do NOT mix modes — the tool will reject "one URL + one file".

**Lipsync**
1. For "sync this audio to this talking head", "dub this video with this voice track": `generate_lipsync`.
2. `source` is the image or video of the face (URL or local path); `audio` is the voice track (URL or local path).
3. Model auto-selects unless the user names one. Some lipsync models support a text performance prompt — pass `text_prompt` in that case. Call `list_models` with `type: "lipsync"` to see which models are available.

**Video restyling (video-to-video)**
1. For "restyle this footage in anime style", "change this scene to night time", "swap the subject for a tiger": `generate_video_from_video`.
2. `source_video` accepts URL or absolute local path. `prompt` describes the transformation.
3. Accepts `visual_dna_ids` if you want the restyle to preserve a known character.

**3D generation**
1. Text mode: `generate_3d` with just a `prompt`.
2. Single-image mode: `generate_3d` with `reference_images: [url]`.
3. Multi-view mode: `generate_3d` with `reference_images: [url1, url2, url3...]` for better reconstruction. Different angles of the same object work best.
4. Returns URLs for GLB, FBX, OBJ, USDZ formats — pick the one the user's target pipeline needs.
5. **Be patient** — 3D is slow. Expect up to 15 min. If you see a timeout, fall back to `get_generation_status`.

**Transcription / subtitling**
1. `transcribe_audio` with `source` as a URL or absolute local path. Works on audio (mp3, wav, m4a, flac) and video (mp4, mov, webm).
2. Returns `text` (full transcript), `srt_url` and `txt_url` (downloadable files), and `duration`.
3. For long content (podcasts > 30min), expect polling to take a while — the timeout is 30 minutes for this tool.

**Upload-once, reference-many (media library)**
1. User says "I have these 5 product shots locally, make a campaign with them": first call `upload_media` for each local file → collect the returned Kolbo CDN URLs.
2. Now pass those stable URLs to `generate_creative_director` with `reference_images: [...]` or `visual_dna_ids` after `create_visual_dna`.
3. Advantage: the local files don't need to be re-resolved on every generation call, and the URLs are permanent so you can reference them across a whole session.
4. For discovering previously uploaded files: `list_media` with a type filter.

**Preset-driven generation**
1. When the user asks "use the Cyberpunk preset" or "what video presets are available": `list_presets` with an optional `type` filter.
2. Find the preset the user wants, grab its `id`.
3. Pass `preset_id: <id>` to the matching generation tool (`generate_image`, `generate_video`, `generate_music`, `generate_elements`). The server folds in the preset's prompt template and style automatically.

## Gotchas

- **Model identifiers change without notice.** NEVER guess or hardcode an identifier like `"gpt-4o"` or `"claude-sonnet"` — those are not Kolbo identifiers. Kolbo identifiers look like `fal-ai/flux-2`, `seedance-1.5-pro-image-to-video`, `nano-banana-pro`, etc. Always call `list_models` to get the current list, or omit `model` to let Smart Select choose.
- **`create_visual_dna` file rules**: absolute paths only, 25MB per file, 4 images max, URLs must be publicly reachable (not localhost, not behind auth).
- **Visual DNA only works if you pass the id.** `create_visual_dna` alone does nothing for the current generation — you MUST pass `visual_dna_ids: [id]` to the subsequent generation tool. Same for `moodboard_id`.
- **Generation timeouts include the id.** The error message looks like `"Generation timed out after 120s. Call get_generation_status with generation_id='xyz' to check."` — extract the id and call the fallback. Don't claim the generation failed; it's probably still running.
- **Deep-think chat gets 10 minutes.** `chat_send_message` with `deep_think: true` automatically uses a longer polling window. Web search gets 4 minutes. Default chat is 2 minutes. So don't preemptively set timeouts yourself.
- **Error codes carry signal.** `[INSUFFICIENT_CREDITS]` in an error message means call `check_credits` and tell the user to top up. `[NOT_FOUND]` means the id is wrong. `[ACCESS_DENIED]` means the user doesn't own the resource.
- **Don't over-generate.** If the user asked for one image, make one. Use `num_images: N` when they explicitly ask for N variations; don't invent extras.

## Response style

- Return generation URLs as clickable markdown links.
- Be brief — the user wants the result, not a tutorial.
- For multi-step workflows, state your plan in one sentence, then execute.
- Mention credit costs only when unusually high or when asked.
- Don't apologize for tool latency; generation takes as long as it takes.

Troubleshooting

"KOLBO_API_KEY environment variable is required"

Make sure KOLBO_API_KEY is set in the env section of your MCP server config.

Tool not appearing

Restart Claude Code after adding the MCP server config. Check that npx @kolbo/mcp runs without errors.

Generation timeout

Video and long music generations can exceed the default polling window. If a tool returns a timeout error, the message includes the generation_id — extract it and call get_generation_status with that id to check the latest state. The generation is almost certainly still running server-side; the client just stopped waiting.

Default polling windows:

  • Image / image-edit: 120s
  • Video / video-from-image / music: 300s
  • Creative Director: 600s
  • Chat (default): 120s
  • Chat with web_search: true: 240s
  • Chat with deep_think: true: 600s

create_visual_dna file errors

Local file paths must be absolute (no ~ or relative paths). Files are capped at 25MB each. URLs must be publicly reachable (not localhost, not behind auth).

Chat loses context between turns

You must pass the session_id returned from the first chat_send_message call back into subsequent calls in the same conversation. Omitting it starts a new thread every turn.