Skills, Plugin & MCP Setup

Connect Kolbo's AI tools to Claude, ChatGPT, Cursor and any MCP app — one-click connector (no API key) or a single config block. Native image, video, music and chat tools.

Use Kolbo AI directly from your AI coding agent as native tools. The MCP (Model Context Protocol) server handles polling internally — you call generate_image and get back the final URL. In widget-capable clients (claude.ai, Claude Desktop) the connector also renders interactive widgets: live generation cards, media galleries, a fullscreen viewer, and working downloads.

Kolbo ships 11 self-contained skills for image / video / music generation, Marketing Studio (UGC + DTC ads), product photoshoot, marketplace cards, Visual DNA character training, HTML artifact publishing, and the App Builder. They all drive the same @kolbo/mcp server under the hood.

Canonical public repo: Zoharvan12/kolbo-skills — MIT, contributor-friendly, cross-agent.

Easiest: paste this prompt to your AI

Most agents can set Kolbo up themselves. Copy this and paste it to Claude, ChatGPT, Cursor, or any AI assistant — it picks the right method (local config or remote connector) based on what it can do:

Connect the Kolbo AI MCP server (generate images, video, music and more).

- If you can run terminal commands (Claude Code, Cursor, Claude Desktop, or any local setup): run "npx -y @kolbo/mcp install" — it auto-configures Kolbo in the right place. If you can't run it, give me the command to run. Then I'll restart the app.
- If you're a browser chat (claude.ai, ChatGPT): add a custom connector with URL https://api.kolbo.ai/mcp under Settings → Connectors, then Connect → log in → Allow.

No API key needed — on first use a Kolbo login opens in my browser and I click Allow. When set up, confirm Kolbo is connected and offer to generate a test image of a sunset.

Prefer to do it yourself? Pick a method below.

Or connect in one click — no install, no API key

Works right in the browser — claude.ai, ChatGPT, Cursor, Claude Desktop, or any app that supports remote MCP connectors. Nothing to install, no key to copy:

In your app, open Settings → Connectors → Add custom connector.
Paste the Kolbo connector URL:

https://api.kolbo.ai/mcp

Click Connect, log in to your Kolbo account, and click Allow.

Done — every Kolbo tool is available; just ask in natural language (e.g. "generate an image of a sunset"). The connector uses OAuth, so you log in once and the app stores the token — no key changes hands, and you can revoke access anytime from your Developer Console.

Prefer a local config file or a terminal-based agent (Claude Code / Cursor / Desktop)? Use the one config block below instead.

Fastest install — one config block (works in every agent)

This runs Kolbo on your machine (so it can read your local files for uploads), and it's keyless — there's no API key in the config. On first use, a Kolbo login opens in your browser.

The easy way — one command that sets up the full Kolbo experience (the MCP tools and the routing skill) for every installed agent (Claude Desktop, Claude Code, Cursor), keyless:

npx -y @kolbo/mcp install

Then restart your agent. Or add just the server config by hand — identical for every MCP client, no API key:

{
  "mcpServers": {
    "kolbo": {
      "command": "npx",
      "args": ["-y", "@kolbo/mcp@latest"]
    }
  }
}

Agent	Where the config goes
Claude Code	`.claude/settings.json` (or `claude mcp add kolbo -- npx -y @kolbo/mcp@latest`)
Claude Desktop	`claude_desktop_config.json`
Cursor	`.cursor/mcp.json`
Kolbo Code	automatic on `kolbo auth login` — nothing to configure

Restart your agent. The first time you generate, a Kolbo login opens in your browser — click Allow (no API key to create). Every tool in the Available Tools list now works; just ask in natural language.

Prefer an API key? Create one at the Developer Console and add "env": { "KOLBO_API_KEY": "kolbo_live_..." } to the block above — that skips the browser login.

That's the entire required install. Everything below is an optional skill layer that adds slash-commands and smarter routing on top of the same server.

Optional skill layer — slash-commands & smart routing

The config above already exposes every tool. Install a skill on top of it only if you want one-word slash-commands (/kolbo:marketing-studio, etc.) and automatic routing to the best tool with the right defaults. The skill carries the same canonical routing logic that ships inside Kolbo Code, so however you connect, the behavior matches.

Claude Code (skill via plugin)

/plugin marketplace add Zoharvan12/kolbo-skills
/plugin install kolbo@kolbo-skills

This also writes the MCP config for you, so if you start here you can skip the config block above. You'll be prompted for your Kolbo API key on first install — it's stored in your OS keychain. Updates land via /plugin update kolbo@kolbo.

After install, the 11 slash commands are available:

Slash command	Skill
`/kolbo:generate`	Catch-all image / video / music / TTS / sound / 3D generation
`/kolbo:creative-director`	2–8 related outputs from one brief
`/kolbo:marketing-studio`	UGC + branded ad video (9 modes)
`/kolbo:dtc-ads`	Composed brand ad images
`/kolbo:product-photoshoot`	Brand product imagery (10 modes)
`/kolbo:marketplace-cards`	Amazon / Shopify / eBay listing visuals
`/kolbo:visual-dna`	Train a face-faithful character — returns `vdna_id`
`/kolbo:music`	Suno + variants — songs, lyrics, jingles, scores
`/kolbo:html-artifacts`	Slide decks, landing pages, dashboards
`/kolbo:transcription`	SRT + word-by-word + A/V analysis routing
`/kolbo:app-builder`	Full React app generation

Or just describe what you want in natural language — the matching skill loads automatically from its Use when: triggers (e.g., "make me a UGC ad" loads kolbo-marketing-studio).

Cursor, Codex, and other cross-agent installers

The same 11 skills, installable cross-agent. Works with Cursor, Codex, and any host that loads ~/.<agent>/skills/<name>/SKILL.md. Requires Node.js.

# Preferred — cross-agent skills CLI
npx skills add Zoharvan12/kolbo-skills

# Or — GitHub CLI v2.90+ extension
gh skill install Zoharvan12/kolbo-skills

Both methods install all 11 skills + write the MCP config. The kolbo-skills repo ships .codex-plugin/plugin.json and .cursor-plugin/plugin.json so each host picks up its own metadata.

Universal `./setup` script (any agent)

Clone-and-run installer — works as a fallback when neither marketplace nor npx skills fits:

git clone --depth 1 https://github.com/Zoharvan12/kolbo-skills.git
cd kolbo-skills
./setup

The script auto-detects Claude Code / Cursor / Codex (override with --host <agent>), prompts for your Kolbo API key, writes a per-agent MCP config pointing at npx -y @kolbo/mcp, and symlinks each skill subdirectory into place. Idempotent — safe to re-run after pulling updates.

Kolbo Code CLI (automatic)

If you use Kolbo Code (Kolbo.AI's own coding agent), the MCP server and the bundled kolbo skill are configured automatically when you log in:

kolbo auth login

That's it. The CLI carries the same canonical skill content as the public package — no manual setup needed.

Legacy: `kolbo-claude-plugin`

The earlier Claude-Code-only plugin still works:

/plugin marketplace add Zoharvan12/kolbo-claude-plugin
/plugin install kolbo@kolbo

This installs a single monolithic kolbo skill (auto-mirrored from kolbo-code). It's functionally equivalent to the new package for Claude Code, just without the slash-command-per-skill split. New installs should prefer Zoharvan12/kolbo-skills.

Manual Setup (Claude Code / Claude Desktop)

The manual path is just the keyless config block at the top of this page — paste the JSON into .claude/settings.json (Claude Code) or claude_desktop_config.json (Claude Desktop) and restart. No key, no skill or plugin required; you log in via the browser on first use.

Environment Variables

Both are optional — the local install logs in via the browser by default.

Variable	Required	Description
`KOLBO_API_KEY`	No	Set a `kolbo_live_` key to skip the browser login (create one at the Developer Console).
`KOLBO_API_URL`	No	Custom API URL (default: `https://api.kolbo.ai/api`)

Available Tools

Once configured, Claude Code has access to 86 Kolbo tools grouped by purpose:

Generation

Tool	Description
`generate_image`	Text → image
`generate_image_edit`	Existing image(s) + prompt → edited image
`generate_video`	Text → video
`generate_video_from_image`	Still image + motion prompt → video
`generate_video_from_video`	Input video → restyled video or burned-in subtitles (video-to-video). `prompt` optional; VEED Subtitles uses `preset` / `source_language` / `translation_language`. Motion-control models: `reference_images[0]` = the character image, `source_video` = the driving video whose motion transfers
`generate_elements`	Reference images/videos + prompt → animated video
`generate_first_last_frame`	First frame + last frame → interpolated video
`generate_lipsync`	Source image/video + audio → lipsynced video
`generate_creative_director`	One brief → N coordinated scenes (image or video)
`generate_music`	Text (+ optional lyrics) → song
`generate_speech`	Text + voice → spoken audio
`generate_sound`	Text → sound effect
`generate_3d`	Text or reference images → 3D model (GLB/FBX/OBJ/USDZ)
`transcribe_audio`	Audio/video URL or file → text + SRT subtitles

Every image/video/creative-director tool accepts visual_dna_ids and moodboard_id for character/style consistency — you can compose create_visual_dna → generate_image (with the DNA applied to the output) in a single agent turn. generate_creative_director additionally accepts moodboard_ids (plural) for blending multiple styles. Every generation tool also accepts an optional resolution arg — images use "1K" / "2K" / "4K" and videos use tiers like "720p" / "1080p" (values are model-dependent; list_models returns each model's supported_resolutions). Higher tiers and audio-enabled video models may multiply the credit cost (e.g. Kling "1080p" = 1.5×, Seedance UHD "2160p" = 4×, Veo 3.1 Lite with audio = 1.6×) — see Credit Multipliers. generate_music supports vocal_gender, custom lyrics, and duration_seconds (exact track length, 5–300s, for length-controllable models like ElevenLabs Music — Suno ignores it and picks its own length); generate_speech accepts a voice by id or display name. Every tool that takes files accepts both public URLs and absolute local file paths (where applicable). Model identifiers are resolved leniently — shorthand like "z-image" or "nano banana 2" auto-resolves to the exact identifier (list_models remains the source of truth for constraints and pricing).

Chat

Tool	Description
`chat_send_message`	Talk to any Kolbo chat model; supports `media_urls` for image/video/audio analysis (auto-routes to Gemini), web search, deep think, multi-turn sessions
`chat_list_conversations`	List SDK chat threads
`chat_get_messages`	Fetch messages in a conversation

Visual DNA (reusable character/style/product profiles)

Tool	Description
`create_visual_dna`	Create a profile from URLs or local files (max 4 images + 1 video + 1 audio)
`list_visual_dnas`	List your profiles
`get_visual_dna`	Fetch one profile
`delete_visual_dna`	Delete a profile

Moodboards

Tool	Description
`list_moodboards`	Browse presets + your moodboards
`get_moodboard`	Fetch one moodboard with all image URLs

Media Library

Tool	Description
`upload_media`	Upload a local file (or URL) → stable Kolbo CDN URL for reuse
`list_media`	Browse media library — filter by `project_id`, `folder_id`, `type`, `category` (ai / uploaded / edited / favorites / training-lab), `source_type`, `sort`, `search`, pagination
`get_media`	Fetch one media item by id (full details + extended metadata)
`get_media_stats`	Counts + total storage (`total`, `images`, `videos`, `audio`, `total_size_bytes`); optional `project_id`
`favorite_media` / `unfavorite_media`	Mark / unmark a media item as favorited (idempotent)
`delete_media`	Soft-delete → trash (30-day recovery). Use this for normal "delete" intent
`restore_media`	Restore a trashed item back to the library
`permanently_delete_media`	Hard delete (S3 + DB + folders + generation record) — NOT reversible. Always confirm
`move_media`	Move one item to a different project (owner only, target project must be accessible)
`bulk_delete_media`	Soft-delete up to 1000 ids; non-owned ids silently skipped
`bulk_restore_media`	Restore up to 1000 trashed ids
`bulk_permanently_delete_media`	Hard-delete up to 1000 ids. NOT reversible. Always confirm
`bulk_move_media`	Move up to 1000 ids to a target project. Atomic — rejects entirely if any id isn't owned
`move_folder_contents`	Move all items in a folder to a target project (owner-only on every item)
`list_media_folders`	List the user's media folders (owned + shared) — discover `folder_id` values
`create_media_folder` / `update_media_folder` / `delete_media_folder`	Full folder lifecycle (delete is owner-only and asks the user to confirm)
`add_media_to_folder` / `remove_media_from_folder`	Manage items inside a folder (bulk, up to 500 per call)
`share_media_folder` / `unshare_media_folder`	Share a folder by email or revoke access (owner only)

App Builder (generate & deploy full React apps)

Tool	Description
`app_builder_list_projects`	List your Kolbo projects to find a `project_id`
`app_builder_create_session`	Create a new App Builder session inside a project
`app_builder_generate_app`	Generate a React app from a prompt — builds, deploys, returns live URL
`app_builder_edit_app`	Edit an existing app with a natural language instruction
`app_builder_get_build_status`	Check build status manually (fallback after timeout)
`app_builder_get_session`	Get session details including GitHub repo URL + Supabase credentials for local dev
`app_builder_list_sessions`	List all sessions in a project
`app_builder_list_generations`	List generations for a session (needed before editing)
`app_builder_delete_session`	Permanently delete a session and all its resources (irreversible)

Artifacts

Tool	Description
`publish_html_artifact`	Publish an HTML page, SVG, or Mermaid diagram → public URL on `sites.kolbo.ai`. Pass `share_token` from a prior publish to update the same URL in place (old content kept in version history).

Music Library (deprecated — still works; now adapters over the unified Stock Library. Prefer search_stock_media with mediaType="music" — semantic vibe search over Kolbo's AI catalog + Coverr — and get_stock_asset for downloads. Track ids are now "source:sourceId"; stems/lyrics lookups return empty-with-guidance)

Tool	Description
`search_music_library`	Search licensed stock tracks by keyword + genre/mood/bpm/duration filters
`analyze_script_for_music`	AI: turn a video/voiceover script into a music search (`query`, `mood`, `genre`, `keywords`)
`browse_music_library`	Browse the catalog without a query (paginated)
`get_music_library_facets`	Valid genres, moods, instruments + BPM/duration ranges
`get_music_track_audio`	A track's downloadable 128/320/WAV URLs by id
`get_music_track_related`	Stems + alternate versions of a track
`get_music_track_lyrics`	Lyrics text, theme, explicit flag

Stock Library (multi-source stock media — Kolbo's own AI sound effects & music, Pexels, Unsplash, Pixabay, Coverr, Sketchfab 3D, Freesound, Music — find ready-made assets, not generate_image/generate_video)

Tool	Description
`search_stock_media`	Search photos/videos/illustrations/vectors/3D/music/sfx across providers (`source=all` interleaves). For `source=kolbo-ai` + `mediaType=sfx`/`music`, `query` is a natural-language vibe (semantic search)
`get_stock_sources`	List enabled sources + which media types/filters each supports
`get_stock_categories`	Dynamic category chips (Kolbo SFX = groups + sub-filters; pass as `category`/`subcategory`)
`get_stock_collections`	Kolbo SFX collections + themed packs (use `id` as `packId`/`collectionId`)
`get_stock_asset`	One asset + download variants (WAV+MP3 for Kolbo audio) + author/license/attribution
`analyze_script_for_stock`	AI: turn a script into b-roll search terms (`queries[]`, `mediaType`, `keywords`)
`import_stock_asset`	Copy a stock asset into the media library (CDN copy)

Shorts Creator (long video → restyled vertical shorts, two-phase)

Tool	Description
`shorts_analyze`	Phase 1: analyze a long video (Kolbo media-library URL, ≤30 min; `upload_media` first) → AI-picked best moments with titles, hooks, scores, accent beats. Flat 15 credits. Polls until moments are ready (~1-3 min)
`shorts_list_presets`	List restyle style presets (identifier, name, preview video, default mode + subtitle style)
`shorts_get_transcript`	Word-level Scribe transcript of the source video (`words`, `language`, `sourceDuration`) — the base for the Review & Edit workflow (`delete_ranges` cuts + edited `srt_content`)
`shorts_estimate`	Price a selection before rendering — free. Total + per-short credits and chunk counts. `delete_ranges` cuts shorten the effective duration (cheaper)
`shorts_render`	Phase 2: render up to 5 shorts (15-90s each) — `accents` mode (restyle strongest beats, cheaper) or `full` (restyle everything, pricier), optional burned-in subtitles (VEED preset, default `glass`), optional per-short `delete_ranges` (cut dead air, absolute source seconds, ≥8s must remain) + `srt_content` (user-edited SRT, cut-timeline times, ≤200KB). Polls until done (~5-20 min), returns final URLs; failed shorts auto-refund
`shorts_status`	One-shot job state read (phase / moments / shorts) — resume after a timeout
`shorts_cancel`	Cancel a job and refund unused credits

Discovery & Account

Tool	Description
`list_models`	Current model catalog with costs and capabilities. Optional `display_catalog: true` renders a visual model-catalog widget (claude.ai) when the USER asks to browse models — leave unset for internal lookups; harmless in text-only clients
`list_voices`	TTS voices (presets + your cloned voices)
`list_presets`	Generation presets across image/video/music/text-to-video catalogs
`list_projects`	List owned + shared projects (id, name, role, is_default) — call first to resolve a project name into the `project_id` you pass to any generation tool
`move_session`	Move a session (generation, chat, transcription…) and ALL its media library items to another project
`create_doc` / `list_docs` / `get_doc` / `update_doc` / `share_doc` / `delete_doc`	AI Docs (Magic Pad): author project-scoped HTML documents the user edits in the app; share returns a public link
`generate_character_sheet`	Multi-angle character sheet for a character DNA (credits) — pass the URL to `create_visual_dna` for stronger consistency
`list_visual_dna_folders` / `create_visual_dna_folder` / `update_visual_dna_folder` / `delete_visual_dna_folder` / `move_visual_dna_to_folder`	Organize Visual DNA characters into folders (delete moves contents to root, never deletes DNAs)
`create_project` / `update_project` / `archive_project` / `unarchive_project`	Project lifecycle — create/rename/describe/archive (deletion stays in-app)
`list_sessions`	Enumerate sessions across all types (filter by project/type) — pairs with `move_session`
`add_project_context` / `list_project_context` / `delete_project_context` / `get_project_profile` / `regenerate_project_profile`	Project knowledge base (RAG): feed scripts/URLs/notes; read the synthesized brief
`create_moodboard` / `update_moodboard` / `delete_moodboard`	Build moodboards from image URLs (AI style analysis → reusable master prompt)
`clone_voice` / `import_elevenlabs_voice` / `delete_voice`	Custom voices: clone from audio sample (credits), import by ElevenLabs ID
`trim_video`	Frame-accurate trim of a Kolbo-hosted video (tool waits for the job)
`check_credits`	Check credit balance
`get_generation_status`	Poll a generation by ID (fallback if a tool times out)

Every generation tool also accepts an optional project_id arg that routes the output into a specific project. Call list_projects to discover IDs. project_id is per-call, NOT sticky — once the user names a working project, pass it on every call; omit it only when no project was mentioned (output then falls back to the auto-created "API Generations" project). Misplaced work is recoverable via move_session / move_media.

Usage Examples

In Claude Code, just ask naturally:

"Generate an image of a sunset over mountains"
"Create a 5-second video of waves crashing on a beach"
"Remove the background from this image" (attach a URL)
"Restyle this video in Studio Ghibli style" — video-to-video
"Animate these 4 product shots into a 10-second video" — generate_elements
"Morph from this before photo to this after photo" — first/last frame interpolation
"Lipsync this audio clip to this talking head video"
"Generate a 3D model of a medieval helmet I can import into Blender"
"Transcribe this podcast and give me the SRT file"
"Build a 4-scene storyboard for a coffee shop ad campaign"
"Upload this local image to my library so I can reuse it across generations"
"Make a lo-fi hip hop beat with male vocals"
"Read this paragraph out loud with a British female voice"
"Ask Claude what's new in AI this week, with web search on and deep think"
"Build me a todo app with Supabase" / "Create a React landing page with a waitlist" — app_builder_list_projects → app_builder_create_session → app_builder_generate_app → agent returns live deployment URL
"Add dark mode to my app" / "Change the color scheme" — app_builder_list_generations → app_builder_edit_app → updated deployment URL
"Give me the GitHub repo so I can run it locally" — app_builder_get_session → returns github_repo_url + Supabase env vars
"Create a Visual DNA profile called 'Alex' from these images, then make 4 shots of Alex in different outfits" — the agent creates the DNA, then passes visual_dna_ids to four generate_image calls so Alex stays consistent across all four outputs
"Apply the 'Cyberpunk Neon' moodboard style to my new character render" — the agent finds the moodboard and passes moodboard_id to the generation
"What video presets are available?" — list_presets
"Check my credit balance"

Skill files

For any of the install methods above, the actual SKILL.md content lives in the Zoharvan12/kolbo-skills repo. Each of the 11 skills is self-contained — kolbo-<name>/SKILL.md plus its own references/ subfolder for progressive disclosure (heavy content is loaded by the agent on demand, not on every turn).

You don't normally need to read the SKILL.md files manually — the install methods above wire them up automatically. But if you're building a custom integration (e.g. a non-Claude / non-Cursor / non-Codex agent) and want to embed the routing rules into your own prompt, browse the source:

Entry point per skill: kolbo-<name>/SKILL.md — frontmatter, bootstrap stanza, routing rules, MCP tool inventory.
Reference files (loaded on demand): kolbo-<name>/references/**/*.md — model-specific prompt rules, workflow details, cost tables, troubleshooting.
Single-source-of-truth version: VERSION at the repo root — synced to every SKILL.md frontmatter and every plugin manifest by CI.

Embedded reference: routing rules (legacy, monolithic SKILL.md)

The old single-file Kolbo skill (one large kolbo.md listing every routing rule for every tool) is preserved below for users on custom integrations who can't load multi-file skills. Prefer the modular repo above for any new install — it's smaller per turn and easier to extend.

Show the full monolithic skill (legacy)

# Kolbo AI Creative Generation

You have access to the Kolbo AI platform via MCP. Kolbo routes to 100+ AI models for images, videos, music, speech, sound effects, multi-scene campaigns, and conversational chat — all behind a unified API with Smart Select model routing.

**IMPORTANT: Model identifiers are dynamic.** Never hardcode model names — always call `list_models` with a `type` filter to get current identifiers. Omit `model` entirely to let Smart Select pick the best model automatically (this is the recommended default).

## Tools

**Image generation**
- `generate_image` — text → image (~10–30s). Accepts `visual_dna_ids`, `moodboard_id`, `reference_images`, `num_images`, `enable_web_search`, `resolution` (`"1K"` / `"2K"` / `"4K"` — model-dependent; check `supported_resolutions` on `list_models`).
- `generate_image_edit` — image(s) + prompt → edited image (background removal, color change, object swap). Accepts `source_images`, `visual_dna_ids`, `moodboard_id`, `enable_web_search`, `resolution` (default `"1K"`).

**Video generation** (pick the one that matches the inputs)
- `generate_video` — text → video (~1–5min). Accepts `visual_dna_ids`, `reference_images`, `duration`, `aspect_ratio`, `resolution` (e.g. `"720p"` / `"1080p"` — model-dependent).
- `generate_video_from_image` — still image + motion prompt → video. Accepts `resolution`.
- `generate_video_from_video` — input video → restyled video (video-to-video transformation). For style transfer, scene restyle, subject swap. `prompt` is optional (prompt-less models ignore it). Accepts `resolution`. **VEED Subtitles** (`model: "veed/subtitles"`) burns styled subtitles into the video — pass `preset` (caption style) plus optional `source_language` / `translation_language` instead of a prompt. Advanced: bring your own subtitles with `srt_content` / `srt_file_url`, guide transcription with `vocabulary`, and override style with `customization` (position, shadow, per-tier font/weight/colour). **Motion-control / animate-move models invert the inputs**: `reference_images[0]` = the CHARACTER IMAGE to animate, `source_video` = the driving/reference video whose motion is transferred — omitting the character image returns `MOTION_CONTROL_INPUTS`.
- `generate_elements` — reference images/videos + prompt → animated video. Use when animating specific uploaded assets like a product shot or character rig. Accepts `reference_images` (URLs) OR `files` (URLs or local paths), plus `preset_id`, `motion`, `visual_dna_ids`, `resolution`.
- `generate_first_last_frame` — first frame + last frame → interpolated video. Provide two frames as `first_frame_url`/`last_frame_url` (URL mode) OR `first_frame`/`last_frame` (URL-or-local-path mode). Do NOT mix URL and file inputs. Accepts `resolution`.
- `generate_lipsync` — `source` (image or video) + `audio` → lipsynced video. Both inputs accept URL or local path. The Sync-3 model adds `active_speaker_detection` (pick whose lips get synced in a multi-person video — `auto_detect` or pixel `coordinates` + `frame_number`), `emotion`, `model_mode`, `temperature`, and `occlusion_detection_enabled`.

**Multi-scene**
- `generate_creative_director` — one brief → N coordinated scenes (1–8). `workflow_type: "image"` (default) or `"video"`. Use for storyboards, product showcases, ad campaigns. Accepts `visual_dna_ids` and `moodboard_id` (or `moodboard_ids` for blending) to keep characters/styles consistent across every scene — this is the ideal composition for "same character in 8 scenes" type asks. Also accepts `resolution` applied to every scene (multiplies cost across the whole batch).

**Cost note — resolution & audio multiply credit cost.** Some models charge more at higher resolution tiers: SeeDream 5.0 `"4K"` = 2×, Kling video `"1080p"` = 1.5×, Seedance UHD `"2160p"` = 4×, Hailuo `"1024P"` = 3×, Flux 2 / Nano Banana / Imagen / most Qwen = flat 1× across tiers. Video models with native audio also multiply when the session has `soundEnabled: true` (Veo 3.1 Lite = 1.6×, Pixverse V6 = 2×). Multipliers compound: `final = base × resolutionMult × (sound ? soundMult : 1) × durationSeconds`. Read `resolutionMultipliers` and `soundCreditMultiplier` on each model from `list_models` to know the cost before submitting.

**Audio / voice**
- `generate_music` — description (+ optional lyrics/style) → song. Accepts `instrumental`, `lyrics`, `vocal_gender` ("male"/"female"), `style`, and `duration_seconds` (exact track length, clamped 5–300s — only length-controllable models like ElevenLabs Music honor it; without it they default to ~10s. Suno ignores it).
- `generate_speech` — text + voice → spoken audio. Voice can be a `voice_id` from `list_voices` OR a display name like "Rachel".
- `generate_sound` — description → sound effect (foley, ambience, UI sounds — NOT music, NOT speech).
- `transcribe_audio` — audio or video source (URL or local path) → text + SRT subtitles + downloadable .srt/.txt URLs. Works on podcasts, videos with audio tracks, voice memos.

**3D**
- `generate_3d` — text prompt or reference image(s) → 3D model in GLB/FBX/OBJ/USDZ. Modes auto-detect from inputs: text mode (prompt only), single mode (1 image), multi mode (multiple images for multi-view reconstruction).

**Chat & Vision**
- `chat_send_message` — talk to any chat model on Kolbo. Pass `media_urls: [url, ...]` to analyze images, videos, or audio — the API auto-routes to a vision-capable model (Gemini) when media is detected. For local files, call `upload_media` first to get a CDN URL, then pass it in `media_urls`. Supports `web_search`, `deep_think`, and `system_prompt` (new sessions only). Omit `session_id` to start a new thread; pass it back on subsequent calls to keep context.
- `chat_list_conversations` — list past SDK chat threads
- `chat_get_messages` — fetch a thread's history

**Visual DNA** (reusable character/style/product profiles)
- `create_visual_dna` — build a profile from reference media. Accepts public URLs OR absolute local file paths in the same `images` array. Max 4 images + optional 1 video + 1 audio. 25MB/file cap. `dna_type`: `character`, `style`, `product`, or `scene`.
- `list_visual_dnas` / `get_visual_dna` / `delete_visual_dna` — CRUD over the user's profiles

**Moodboards**
- `list_moodboards` — presets + the user's moodboards (each has a `master_prompt` and `style_guide`)
- `get_moodboard` — full moodboard with all image URLs

**Media Library** (upload once, reuse everywhere — and browse the user's full library)
- `upload_media` — upload a local file (or re-host a remote URL) to the user's Kolbo media library and get back a stable Kolbo CDN URL. Use this when the user wants to reference a local file in MULTIPLE subsequent generation calls — upload once, then pass the returned URL to `generate_image` / `generate_video` / `create_visual_dna` / etc. For a single-use reference, you can skip this and pass a URL directly.
- `list_media` — browse the user's full Kolbo library (uploaded files **and** AI-generated outputs the user has saved). Powerful filters: `project_id`, `folder_id`, `type` (image / video / audio), `category` (`ai` / `uploaded` / `edited` / `favorites` / `training-lab` — these are the desktop app's section buttons), `source_type`, `search`, `sort` (`created_desc` default / `created_asc` / `name_asc` / `name_desc`), `page` + `page_size`. Combine filters freely. Use this to retrieve a past creation, find every video in a project, list all favorites, or surface everything in a folder for batch work.
- `list_media_folders` — list the user's media folders (owned + shared). Folders are user-scoped and span projects. Use this to discover `folder_id` values before calling `list_media`, or to show the user what folders exist.
- `create_media_folder` — create a new folder. Args: `name` (required), optional `description` / `color` (hex like `"#3B82F6"`) / `icon` (Lucide name like `"folder"` / `"star"`). Returns the folder id — pass it to `add_media_to_folder` next.
- `update_media_folder` — rename a folder or change its `color` / `icon` / `description`. Owner only. Provide only the fields you want to change.
- `delete_media_folder` — soft-delete a folder. Owner only. Items inside are detached but stay in the user's library. **Always confirm with the user before calling** — there's no undo flow.
- `add_media_to_folder` — bulk-add up to 500 `media_ids` to a folder. Idempotent (items already in the folder are skipped). Works for owners + shared members.
- `remove_media_from_folder` — bulk-remove `media_ids` from a folder. Items remain in the library.
- `share_media_folder` — share by `user_emails`. Owner only. Emails not matching a Kolbo account come back in `not_found` (no error). Shared members can list / add / remove items, but cannot delete or reshare.
- `unshare_media_folder` — revoke one user's access by `user_id` (from the folder's `shared_with` array).
- `favorite_media` / `unfavorite_media` — mark / unmark a media item as favorited by `media_id`. Idempotent. Favorites are per-user and surface in `list_media` with `category=favorites`.
- `get_media` — fetch one item by id (full details + extended metadata). Use when the user references a specific past creation ("tell me about this generation", "what prompt did I use for [item]"). Accepts a generation_id as a fallback for `media_id`.
- `get_media_stats` — counts + storage usage: `{ total, images, videos, audio, total_size_bytes }`. Optional `project_id`. Use for "how many videos do I have?", "what's my storage usage?", or sizing a bulk op before firing it.
- `delete_media` — **soft-delete** → user's trash (30-day recovery). Owner only. Idempotent. This is the right call for normal "delete this" intent.
- `restore_media` — bring a trashed item back to the active library. Pair with `delete_media`.
- `permanently_delete_media` — **hard delete**: MongoDB + S3 + folders + source generation record. NOT REVERSIBLE — no recovery flow. **Always confirm with the user before calling**; do not default here for "delete".
- `move_media` — move one item to a different project. Caller must own the item AND have access to the target project.
- `bulk_delete_media` — soft-delete up to 1000 ids. Items not owned by the caller are silently skipped (count returned in response).
- `bulk_restore_media` — restore up to 1000 trashed ids.
- `bulk_permanently_delete_media` — hard-delete up to 1000 ids. **NOT REVERSIBLE — always confirm with the user before calling.**
- `bulk_move_media` — move up to 1000 ids to a target project. **Atomic**: if ANY id isn't owned by the caller, the whole operation is rejected. Do not retry partially — surface the error to the user and ask them to pick a smaller batch.
- `move_folder_contents` — move every item in a folder to a target project. Caller must own ALL items + have access to the target project. Shared folder members cannot use this.

**App Builder** (generate & deploy full React apps)
- `app_builder_list_projects` — list Kolbo projects to find `project_id`
- `app_builder_create_session` — create a session, returns `session_id`
- `app_builder_generate_app` — generate app from prompt, polls until deployed, returns `deployment_url`
- `app_builder_edit_app` — edit existing app with natural language instruction (requires `generation_id` from `app_builder_list_generations`)
- `app_builder_get_build_status` — manual status check (fallback after timeout)
- `app_builder_get_session` — full session details incl. `github_repo_url` + `supabase_url` + `supabase_anon_key`
- `app_builder_list_sessions` — list all sessions in a project
- `app_builder_list_generations` — list generations for a session
- `app_builder_delete_session` — permanently delete session + all resources (IRREVERSIBLE — always confirm)

**Music library (stock / production music)** — DEPRECATED (still works): these tools are now adapters over the unified Stock Library. **Prefer `search_stock_media` with `mediaType: "music"`** (semantic vibe search over Kolbo's AI catalog + Coverr) and `get_stock_asset` for downloads. Track ids are now composite `"source:sourceId"`; stems/lyrics lookups return empty results with guidance. Distinct from `generate_music` (which composes a new song). All free (no credits).
- `search_music_library` — keyword search + filters (`mood`, `genre`, `bpmMin/Max`, `durationMin/Max`, `hasStems`, `hasLyrics`, `sort`, `limit`, `offset`). Returns tracks with `id`, `title`, `artist`, `durationSeconds`, `bpm`, `genres`, `moodTags`, and preview/`audioUrl`.
- `analyze_script_for_music` — pass a `script`; returns `{ query, mood, genre, keywords }` ready to feed into `search_music_library`. Use first when the user gives a script instead of music keywords.
- `browse_music_library` — paginated catalog browse with no query.
- `get_music_library_facets` — distinct genres / moods / instruments + bpm & duration ranges for precise filters.
- `get_music_track_audio` — pass `track_id`; returns the downloadable `{ 128, 320, wav }` URLs. Call after the user picks a track.
- `get_music_track_related` — stems + alternate versions of a master track.
- `get_music_track_lyrics` — lyrics text + theme + explicit flag.

**Stock library (multi-source stock media)** — find EXISTING ready-made photos, videos, illustrations, vectors, 3D models, sound effects, or music to use as b-roll/references/project assets. Sources: Kolbo AI (own AI sound effects + music), Pexels, Unsplash, Pixabay, Coverr, Sketchfab (3D), Freesound, Music. Distinct from `generate_image`/`generate_video`. All free (no credits). Always show the provider + creator credit (`attribution`).
- `search_stock_media` — search by `query` with `source` (`all` default, or `kolbo-ai`/`pexels`/`unsplash`/`pixabay`/`coverr`/`sketchfab`/`freesound`/`music`), `mediaType` (`image`/`illustration`/`vector`/`video`/`3d`/`music`/`sfx`), plus `category`, `subcategory`, `packId`, `collectionId`, `orientation`, `color`, `order`, `cursor`, `page`, `perPage`. `source=all` interleaves. **For Kolbo's own sound effects (`source=kolbo-ai`, `mediaType=sfx`) and music (`mediaType=music`), `query` is matched SEMANTICALLY — pass a natural-language vibe** ("tense ominous horror reveal", "uplifting corporate background"), not just keywords. Returns assets with `source`, `sourceId`, `mediaType`, dimensions, `author`, `attribution`, `thumbnailUrl`, and `downloadVariants` (WAV+MP3 for Kolbo audio).
- `get_stock_sources` — list enabled sources + which media types/filters each supports. Call to know what a source can do before searching.
- `get_stock_categories` — dynamic category chips per source; pass the returned `providerParam` as `category` (groups) or `subcategory` (Kolbo SFX sub-filters).
- `get_stock_collections` — Kolbo SFX category collections (77) + curated themed packs (40 — Horror, Car Chase, In The Kitchen, Outer Space…); use a returned `id` as `packId`/`collectionId` in `search_stock_media`.
- `get_stock_asset` — pass `source` + `id`; returns the full asset with all `downloadVariants`, author, license, attribution.
- `analyze_script_for_stock` — pass a `script`; returns `{ queries[], mediaType, keywords }` for auto b-roll. Use first when the user gives a script instead of search keywords, then run each query through `search_stock_media`.
- `import_stock_asset` — pass `source` + `id` (+ optional `variant`, `mediaType`, `project_id`); copies the asset into the media library (CDN copy, stable URL). Kolbo SFX are importable; licensed Music is not (use the music-library tools).

**Shorts Creator (long video → restyled vertical shorts)** — TWO-PHASE job flow: analyze picks the best moments, then the user selects moments + a style preset and renders. The restyle NEVER adds text; on-screen captions come only from the optional subtitles step.
- `shorts_analyze` — pass `video_url` (MUST be a Kolbo media-library URL — run `upload_media` first for local/external files; source 30s–30 min). Flat 15 credits. Polls until the moments are ready (~1-3 min) and returns `moments[]` — each with `moment_index`, `start`/`end` (seconds in the source), `title`, `hook`, `score`, `accent_beats`. Errors: `INSUFFICIENT_CREDITS`, `JOB_ALREADY_ACTIVE` (one active job at a time — `shorts_cancel` or wait), `SOURCE_TOO_LONG`/`SOURCE_TOO_SHORT`.
- `shorts_list_presets` — restyle style presets (`identifier`, `name`, `description`, `preview_video_url`, `default_mode`, `default_subtitle_preset`). Show these to the user before rendering; pass `identifier` as `preset_identifier`.
- `shorts_get_transcript` — word-level Scribe transcript of the source video: `{ words, language, sourceDuration }`, word times in absolute source seconds. Free. Use it for the Review & Edit workflow: pick `delete_ranges` (filler/dead air to cut) and build an edited SRT (`srt_content`, timestamps in the CUT timeline) to pass in the selection.
- `shorts_estimate` — FREE pricing preview. Pass `job_id` + `shorts` selection; returns `total_credits` + `per_short` chunk counts. Pricing: 200 credits per restyled chunk (`accents` = 1-3 chunks per short; `full` = one chunk per ~10s) + subtitles 40 credits/min (60s minimum). `delete_ranges` cuts shorten the effective duration, so the estimate gets cheaper. Always price before rendering and confirm the cost with the user.
- `shorts_render` — start the render (max 5 shorts, each 15-90s). Selection per short: `{ moment_index, preset_identifier, mode? ("accents" cheaper / "full" pricier), subtitles_enabled?, subtitles_preset? (default "glass"), start?, end?, delete_ranges? ([{start,end}] absolute source seconds — server enforces ≥8s remaining), srt_content? (user-edited SRT, cut-timeline times, ≤200KB; implies subtitles unless subtitles_enabled is false) }`. Credits reserved up-front; failed shorts auto-refund. Polls until a terminal phase (~5-20 min) and returns each short's `final_url`. On PARTIALLY_COMPLETED it returns successes + failures.
- `shorts_status` — single job-state read (`phase`: ANALYZING → AWAITING_SELECTION → RENDERING → COMPLETED/PARTIALLY_COMPLETED/FAILED/CANCELLED, plus moments/shorts). Use to recover after a polling timeout or resume a job later.
- `shorts_cancel` — cancel the job and refund unused credits.

**Discovery & account**
- `list_models` — current catalog with `identifier`, `credit`, `supported_aspect_ratios`, `supported_durations`, `supported_resolutions`, `resolutionMultipliers`, `soundCreditMultiplier`. Filter by `type`: `image`, `image_edit`, `video`, `video_from_image`, `music`, `speech`, `sound`, `chat`, `three_d`, `lipsync`. **This is the only source of truth for model identifiers** — never guess or hardcode them (e.g., `fal-ai/flux-2`, `seedance-1.5-pro-image-to-video`, `nano-banana-pro`). They change as models are added, renamed, or retired. Read `supported_resolutions` before passing `resolution`, and read `resolutionMultipliers` / `soundCreditMultiplier` if you need to predict the final credit cost. Optional `display_catalog: true` renders a visual catalog widget to the user (claude.ai) — set it ONLY when the user asked to browse models; leave unset for internal lookups.
- `list_voices` — TTS voices (presets + cloned). Filter by `language`, `gender`, `provider`.
- `list_presets` — generation presets across image / video / music / text_to_video catalogs. Pass `type` to filter. Returns presets with `id`, `name`, `description`, `thumbnail_url`, `category`. Pass the returned `id` as `preset_id` to the matching generation tool.
- `list_projects` — projects the user can write into (owned + shared with edit/full/owner). Returns `id`, `name`, `role`, `is_default`. **The API has no concept of project names — only ObjectIds.** Call this whenever the user names a project ("drop this in my Acme Campaign project") to resolve it to an `id`, then pass that `id` back as `project_id` on the generation tool. The project flagged `is_default: true` is the auto-created "API Generations" bucket every SDK generation lands in when `project_id` is omitted. `project_id` is per-call, NOT sticky — keep passing it on every call once the user has named a working project. NOT the same as `app_builder_list_projects` (App Builder coding sessions only).
- `move_session` — move a session (any type: generation sessions, chat conversations, transcriptions…) and ALL its media library items to another project. Args: `session_id`, `project_id` (target, from `list_projects`), optional `type` hint. Use when work landed in the wrong project — moving beats regenerating.
- `create_doc` — create an AI Doc (Magic Pad document). YOU author the full HTML content (plans, briefs, scripts, research summaries) with semantic elements (`<h1>`-`<h3>`, `<p>`, `<ul>`/`<ol>`, `<table>`, `<blockquote>`); scripts/styles are stripped server-side. Args: `title`, `content`, optional `project_id` (resolve via `list_projects` — pass it whenever the user works in a named project).
- `list_docs` / `get_doc` — discover and read AI Docs (`get_doc` returns full HTML; list is metadata-only, all projects unless `project_id` given).
- `update_doc` — replace a doc's title/content. Content replaces the WHOLE document: `get_doc` first, apply edits to the full HTML, send the complete result.
- `share_doc` — set `shared: true|false` (explicit, not a toggle); returns a stable public `share_url` (app.kolbo.ai/shared/magicpad/…). Optional `editable` lets link visitors edit.
- `delete_doc` — soft-delete (owner only). Confirm with the user unless they just created it in this conversation.
- `list_visual_dna_folders` / `create_visual_dna_folder` / `update_visual_dna_folder` / `delete_visual_dna_folder` / `move_visual_dna_to_folder` — organize Visual DNA characters into user folders (flat, unique names, optional hex `color`). Personal DNAs only — global presets must be imported first; org DNAs can't go in personal folders. Deleting a folder moves its DNAs back to root (never deletes them). Building a large cast? Create the folder first and `move_visual_dna_to_folder` each character as you create it. List a folder's contents via `list_visual_dnas` filtered by each profile's `folder_id`.
- `generate_character_sheet` — for CHARACTER DNAs, generate a multi-angle turnaround from reference image URLs (CHARGES CREDITS). It is the single biggest consistency booster, so when the user is about to create a character DNA, OFFER it first ("want a character sheet for stronger consistency? costs a few credits") and only run on a yes. Then pass the returned url as `character_sheet_url` to `create_visual_dna`.
- `create_project` — create a project (`name`, optional `description` = the brief, feeds the AI profile). Plan limits apply. Then pass the id as `project_id` on every subsequent call. `update_project` renames/re-briefs; `archive_project`/`unarchive_project` hide/restore (deletion is in-app only).
- `list_sessions` — enumerate the user's sessions across ALL types (`project_id`/`type` filters). Use for "what's in this project?" and to find session ids for `move_session`.
- `add_project_context` — feed a `url` OR `text` (script/brief/research) into the project's knowledge base; analysis is background (no polling). `list_project_context` / `delete_project_context` manage sources; `get_project_profile` reads the synthesized living brief (ground your work in it before generating); `regenerate_project_profile` forces a refresh.
- `create_moodboard` — 1–15 image URLs (+ optional `style_guide`) → AI style analysis → reusable `moodboard_id` for generation tools. `update_moodboard` (images REPLACE the set + re-analyze), `delete_moodboard` (confirm first).
- `clone_voice` — clone a TTS voice from an audio sample (URL or local path; provider default elevenlabs; CHARGES CREDITS — confirm with the user first). `import_elevenlabs_voice` imports by ElevenLabs voice ID. `delete_voice` removes a custom voice (confirm first). New voices appear in `list_voices`.
- `trim_video` — cut `start_time`→`end_time` seconds from a Kolbo-hosted video; the tool waits for the server job and returns the trimmed URL. Also: `edit_video` now supports `operation: "remove_background"`.
- `check_credits` — remaining balance
- `get_generation_status` — poll a generation by id (fallback if a tool times out)

**Artifacts (HTML / SVG / Mermaid publishing)**
- `publish_html_artifact` — publish a self-contained page to `sites.kolbo.ai` and return a public URL. Accepts `title`, `content`, optional `type` (`html` / `svg` / `mermaid`), optional `allow_js`. Pass `share_token` from a prior publish response to **update the same artifact in place** — the URL stays stable and the old content moves into version history. Without `share_token`, identical content returns the same URL (server dedup) but any change forks a brand-new URL.

## Routing — user intent → first tool

| User says | Call |
|---|---|
| "make/create/generate an image of…" | `generate_image` |
| "edit/change/remove [x] in this image" | `generate_image_edit` with `source_images` |
| "make a video of…" | `generate_video` |
| "animate this image" / "make this photo move" | `generate_video_from_image` |
| "restyle this video" / "video in anime style" / "video-to-video" | `generate_video_from_video` with `source_video` |
| "add subtitles" / "caption this video" / "burn in subtitles" / "translate the subtitles" | `generate_video_from_video` with `model: "veed/subtitles"` + `preset` (no prompt) |
| "animate these product shots" / "put these assets into a video" | `generate_elements` |
| "morph from this image to this image" / "transition between these two frames" | `generate_first_last_frame` |
| "sync this audio to this video" / "lipsync" / "make it talk" | `generate_lipsync` |
| "make a 3D model of…" / "generate a GLB / USDZ" | `generate_3d` |
| "transcribe this audio" / "subtitle this video" / "get SRT from this" | `transcribe_audio` |
| "what's in this image" / "analyze this video" / "read the text in this video" | `upload_media` (if local) → `chat_send_message` with `media_urls` |
| "storyboard / multi-scene ad / product showcase" | `generate_creative_director` |
| "write me a song" / "make a beat" | `generate_music` |
| "find a track / background music / stock music for my video" / "royalty-free music" | `analyze_script_for_music` (if from a script) → `search_stock_media` with `mediaType: "music"` (vibe query) → `get_stock_asset` |
| "find b-roll / stock footage / stock photos / a 3D model" / "royalty-free video/image" | `get_stock_sources` (if unsure) → `search_stock_media` (`source=all`) → `get_stock_asset` → `import_stock_asset` |
| "find b-roll for this script" / "auto-source footage for my video" | `analyze_script_for_stock` → run each query through `search_stock_media` → `import_stock_asset` |
| "turn this long video into shorts" / "make clips from my podcast/webinar" / "find the viral moments" | `upload_media` (if not in the library) → `shorts_analyze` → show moments → `shorts_list_presets` → `shorts_estimate` → confirm → `shorts_render` |
| "how much would these shorts cost" | `shorts_estimate` (free) |
| "cut the dead air / edit the subtitles of my short" | `shorts_get_transcript` → build `delete_ranges` + edited `srt_content` → `shorts_estimate` → `shorts_render` |
| "is my shorts job done" / "check the shorts render" | `shorts_status` |
| "stop the shorts job" / "cancel the render" | `shorts_cancel` |
| "read this out loud" / "text to speech" | `list_voices` → `generate_speech` |
| "sound effect of…" | `generate_sound` |
| "ask GPT/Claude/Gemini…" / "chat about…" | `chat_send_message` |
| "continue our conversation" | `chat_list_conversations` → `chat_send_message` with `session_id` |
| "use the same character across N images" | `list_visual_dnas` → `create_visual_dna` if missing → `generate_image` with `visual_dna_ids: [id]` for each output |
| "use the same character across an 8-scene campaign" | `list_visual_dnas` / `create_visual_dna` → `generate_creative_director` with `visual_dna_ids: [id]` and `scene_count: 8` |
| "apply this moodboard style to [thing]" | `list_moodboards` → `generate_image` / `generate_image_edit` / `generate_creative_director` with `moodboard_id` |
| "use this preset" / "what presets are available for video" | `list_presets` → pass `preset_id` to the generation tool |
| "put this in my [project name] project" / "save this to [project]" / "what projects do I have" | `list_projects` → find the matching `id` → pass `project_id` on the generation tool |
| "move this session/chat to project X" / "this landed in the wrong project" | `list_projects` → `move_session` (whole session + its media) or `move_media` / `bulk_move_media` (individual items) |
| "write a plan/brief/script and save it in Kolbo" / "create a doc" / "make me a document" | (`list_projects` if a project is named) → `create_doc` with full authored HTML |
| "update the doc" / "add a section to the document" | `get_doc` → apply edits → `update_doc` with the complete HTML |
| "share the doc" / "give me a link to the document" | `share_doc` with `shared: true` → return `share_url` |
| "organize my characters" / "make a folder for the cast" / "group my Visual DNAs" | `create_visual_dna_folder` → `move_visual_dna_to_folder` per DNA |
| "what character folders do I have" / "show my cast folders" | `list_visual_dna_folders` |
| "make a consistent character" / creating a character DNA | offer `generate_character_sheet` → `create_visual_dna` with `character_sheet_url` |
| "create a new project for X" / "start a workspace" | `create_project` → pass the id as `project_id` on everything after |
| "rename/archive the project" | `update_project` / `archive_project` |
| "what's in this project?" / "list my sessions" | `list_sessions` (with `project_id`) |
| "here's the script — the project should know it" / "add this to the project brain" | `add_project_context` |
| "what does the project know?" / "show the project brief" | `get_project_profile` (list_project_context for raw sources) |
| "make a moodboard from these" / "build a style board" | `create_moodboard` with the image URLs |
| "clone my voice" / "make a voice from this recording" | confirm cost → `clone_voice` → `generate_speech` with it |
| "cut the video from 0:10 to 0:35" / "trim this clip" | `trim_video` |
| "remove the video background" | `edit_video` with `operation: "remove_background"` |
| "upload this local file so I can reuse it" | `upload_media` → use the returned URL in subsequent calls |
| "what files have I uploaded" / "find my previous upload" | `list_media` (optionally with `search`) |
| "show me all videos in project X" / "list everything in this project" | `list_media` with `project_id` (+ `type` if filtering) |
| "list my favorite images" / "show my starred media" | `list_media` with `category=favorites` (and `type` if needed) |
| "what's in my [folder name] folder" / "list my folders" | `list_media_folders` → `list_media` with `folder_id` |
| "list all AI-generated videos" / "show me my training-lab assets" | `list_media` with `category=ai` (or `training-lab`) |
| "favorite this" / "star this" / "save to favorites" | `favorite_media` with the `media_id` |
| "unfavorite this" / "remove from favorites" | `unfavorite_media` |
| "make a new folder called X" / "create folder [name]" | `create_media_folder` |
| "rename folder X to Y" / "change folder color" | `update_media_folder` |
| "delete the [X] folder" | confirm with user → `delete_media_folder` |
| "put these into [folder]" / "save these to [folder]" | `list_media_folders` → `add_media_to_folder` with the media ids |
| "remove these from [folder]" | `remove_media_from_folder` |
| "share [folder] with alice@…" / "give bob access to my [folder]" | `share_media_folder` with `user_emails` |
| "revoke [user]'s access to [folder]" / "unshare from [user]" | `unshare_media_folder` |
| "what was the prompt for this" / "tell me about this generation" | `get_media` |
| "how many videos do I have" / "what's my storage usage" | `get_media_stats` |
| "delete this image / video / song" / "remove this" | `delete_media` (soft, 30-day trash) |
| "undelete" / "restore [item]" / "bring it back from trash" | `restore_media` |
| "permanently delete" / "wipe forever" / "free up space" | **confirm with user** → `permanently_delete_media` |
| "move this to project X" | `move_media` |
| "clean up old [type]" / "delete everything from [time period]" | `list_media` (find ids) → confirm → `bulk_delete_media` |
| "restore all from trash" | `list_media include_deleted=true` → `bulk_restore_media` |
| "empty my trash" / "purge deleted items" | `list_media include_deleted=true` → show count → confirm → `bulk_permanently_delete_media` |
| "move all these to project X" | `bulk_move_media` (atomic — all must be owned by user) |
| "move everything in folder X to project Y" | `move_folder_contents` |
| "match this existing look" (user pastes image URL) | `generate_image` with `reference_images: [url]` (style reference, not edit source) |
| "4 variations of this" | `generate_image` with `num_images: 4` |
| "current events" / "based on today's news" | `generate_image` or `chat_send_message` with `enable_web_search: true` / `web_search: true` |
| "what's my credit balance" | `check_credits` |
| "what models are available for X" | `list_models` type=X |
| "use Flux 2 / Kling / Seedance / specific model" | `list_models` type=Y to find exact `identifier` → pass to generation tool |
| "build me an app" / "create a React app" / "make me a landing page" | `app_builder_list_projects` → `app_builder_create_session` → `app_builder_generate_app` → show `deployment_url` |
| "edit my app" / "add dark mode" / "change the color scheme" | `app_builder_list_generations` → `app_builder_edit_app` |
| "how do I run my app locally" / "give me the GitHub repo" / "Supabase credentials" | `app_builder_get_session` → return `github_repo_url` + `supabase_url` + `supabase_anon_key` |
| "list my apps" / "what sessions do I have" | `app_builder_list_projects` → `app_builder_list_sessions` |
| "publish this as a page" / "share this HTML / diagram with a link" | `publish_html_artifact` — record the returned `share_token` |
| "update the page I just published" / "change the published version" / "edit at the same URL" | `publish_html_artifact` again with the same `share_token` |

## Rules

1. **Never hardcode model identifiers.** They change without notice. Call `list_models` with a `type` filter to get current identifiers only when the user cares about model choice — otherwise omit `model` entirely and let Smart Select pick the best model automatically.
2. **Default to Smart Select.** Omit the `model` parameter unless the user explicitly names a model or their ask implies a specific capability (e.g., "4K", "Kling", "Seedance", "Suno"). When the user does name a model, call `list_models` to find the matching `identifier` — do not guess the identifier format. Model identifiers on generation tools are resolved leniently — shorthand like `"z-image"` or `"nano banana 2"` auto-resolves to the exact identifier — so don't over-engineer exact-id lookups; use `list_models` when you need constraints or pricing, not just to spell a name.
3. **Keep `enhance_prompt: true`** (the default) for image and video generation. Only disable if the user says "don't change my prompt."
4. **Chat sessions are sticky.** On every follow-up `chat_send_message` in the same conversation, pass back the `session_id` from the first response. Starting a new session each turn loses memory.
5. **Polling is automatic.** Generation tools block until the result is ready. If a tool returns a timeout error, the error message includes the `generation_id` — call `get_generation_status` with that id to check the latest state. The generation is almost certainly still running server-side.
6. **Check credits only when it matters.** Before Creative Director with many scenes, long videos, or high-resolution batches (`resolution: "4K"` for images, `"1080p"`+ for video), call `check_credits`. Resolution and audio selections can multiply the per-unit cost (e.g. Kling 1080p = 1.5×, Seedance UHD = 4× at 2160p, Veo 3.1 Lite with audio = 1.6×). Skip it for single images or chat messages — it's noise.
7. **Visual DNA is compositional — use it in the same turn.** When the user wants a character/style/product/scene consistent across multiple outputs:
   1. `list_visual_dnas` to check if one already exists
   2. `create_visual_dna` if needed (accepts URLs or absolute local paths, max 4 images + optional video + audio, 25MB per file)
   3. **Pass the profile's `id` as `visual_dna_ids: [id]` to every subsequent `generate_image` / `generate_image_edit` / `generate_video` / `generate_video_from_image` / `generate_creative_director` call.** This is not a suggestion — without this step the DNA has no effect on the output. You can create the DNA and use it in the same agent turn.
8. **Moodboards are compositional — same pattern.** `list_moodboards` → find the style the user wants → pass `moodboard_id` (or `moodboard_ids` for `generate_creative_director` when blending) to the generation tool. Don't manually fold the moodboard's `master_prompt` into your text prompt — pass the id and let the server handle it.
9. **Multiple variations = `num_images`, not loops.** If the user asks for "4 variations of X", call `generate_image` once with `num_images: 4` (or 5, 6 etc.). Only make multiple tool calls if each output needs a distinct prompt or DNA.
10. **`reference_images` ≠ `source_images`.** `reference_images` on `generate_image` / `generate_video` / `generate_creative_director` are STYLE/COMPOSITION guidance — the model uses them as inspiration. `source_images` on `generate_image_edit` are EDIT SOURCES — the output is a modification of those images. Picking the wrong one causes wrong results.
11. **Absolute paths only for local files.** When `create_visual_dna` takes a local file path, it must be absolute (no `~`, no relative paths). URLs must be publicly reachable (not localhost, not behind auth).
12. **Error codes matter.** SDK errors now carry structured codes in the message like `[INSUFFICIENT_CREDITS]` or `[NOT_FOUND]`. If you see one, act on it: `INSUFFICIENT_CREDITS` → surface balance via `check_credits` and point user to app.kolbo.ai; `NOT_FOUND` → the id is wrong or the resource was deleted; `ACCESS_DENIED` → the user can't act on that resource.
13. **"Delete" defaults to soft delete.** `delete_media` / `bulk_delete_media` moves items to trash (30-day recovery) — that's the right call for normal "delete this" intent. Only use `permanently_delete_media` / `bulk_permanently_delete_media` when the user explicitly asks for unrecoverable deletion ("permanently", "forever", "wipe", "free up space"). **Always confirm with the user before either permanent variant.**
14. **`bulk_move_media` is atomic.** If you get a "not all items owned by you" error, do NOT retry partially. Surface the error to the user and let them pick a smaller batch.
15. **`delete_media_folder` detaches items.** The folder is gone but items stay in the user's library — confirm with the user before calling. There's no undo flow on the folder itself.
16. **Prefer `list_media` filters over post-filtering.** Pass `project_id` / `folder_id` / `category` / `type` / `search` to the backend; don't fetch the whole library and filter client-side.
17. **`is_favorited` is per-user.** On shared projects an item can be favorited by you and not by your teammates — the value reflects the calling user only.
18. **"Empty trash" flow:** `list_media` with `include_deleted=true` → present the count → confirm → `bulk_permanently_delete_media`. Never call the bulk-permanent endpoint without listing first so the user knows the scope.
19. **Bulk caps:** 1000 ids for `bulk_delete_media` / `bulk_restore_media` / `bulk_permanently_delete_media` / `bulk_move_media`; 500 ids for `add_media_to_folder` / `remove_media_from_folder`. Split larger jobs into successive calls.
20. **`project_id` resolves user-mentioned projects.** Whenever the user names a project on a generation request ("save this to my Acme Campaign", "drop these in the Demo project"), call `list_projects` first to resolve the name to an ObjectId, then pass that id as `project_id` on the generation tool. Never invent or guess an id. Omit `project_id` entirely when the user did not mention a project — generations fall back to the auto "API Generations" bucket (the one flagged `is_default: true` in `list_projects`). View-only shares are filtered out of the list, so any returned project is writable.

## Workflows

**Image**
1. If the user did NOT name a specific model, skip straight to step 2 — Smart Select picks the best model automatically.
2. If the user named a model (e.g., "use Flux 2"): `list_models` type=image → find the matching `identifier` from the response → pass it as `model`.
3. `generate_image` with prompt + optional `model` + `aspect_ratio`.
4. Return the URL(s) as markdown links.

**Video**
1. If the user did NOT name a model, omit `model` — Smart Select handles it. If the user named one or needs specific durations: `list_models` type=video → find the `identifier` and check its `supported_durations` and `supported_aspect_ratios`.
2. `generate_video` with prompt, `duration`, `aspect_ratio`, and optional `model`.
3. On timeout: `get_generation_status` with the returned `generation_id`.

**Multi-scene campaign**
1. `generate_creative_director` with `prompt`, `scene_count` (1–8), `workflow_type`, and optional `reference_images` for style guidance.
2. Returns all completed scenes in one response.

**Chat thread**
1. First turn: `chat_send_message` with `message` (+ optional `system_prompt`). Save the returned `session_id`.
2. Follow-ups: `chat_send_message` with new `message` AND saved `session_id`.
3. Flip `web_search: true` for current-events questions, `deep_think: true` for hard reasoning.

**Character consistency across multiple outputs** (the most important composition pattern)
1. `list_visual_dnas` — the user may already have the character/style/product they want.
2. If not, `create_visual_dna` with `name`, `dna_type: "character"` (or `"style"` / `"product"` / `"scene"`), and reference media (URLs from a prior generation work great; absolute local paths also OK). Save the returned `id`.
3. **For each subsequent image/video/edit**, pass `visual_dna_ids: [id]` to the generation tool. Example: "make 4 shots of Alex in different outfits" →
   - `create_visual_dna(name: "Alex", dna_type: "character", images: [...])` → id = "abc123"
   - `generate_image(prompt: "Alex wearing a tuxedo at a formal event", visual_dna_ids: ["abc123"])`
   - `generate_image(prompt: "Alex in hiking gear on a mountain trail", visual_dna_ids: ["abc123"])`
   - ...and so on. Alex stays consistent because the DNA is applied server-side.
4. For a multi-scene campaign with one character, prefer `generate_creative_director` with `visual_dna_ids: [id]` and `scene_count: N` — one call, N consistent scenes.

**Image edit**
1. `generate_image_edit` with `source_images: [url]` and a clear edit instruction as the prompt.
2. You can pipe URLs directly from a prior `generate_image` output.
3. If editing should keep a known character consistent, also pass `visual_dna_ids`.

**TTS with a specific voice**
1. `list_voices` filtered by `language` / `gender` if the user specified them → pick a `voice_id`.
2. `generate_speech` with text + voice (can be voice_id OR display name like "Rachel").

**Moodboard-guided generation**
1. `list_moodboards` → pick the one the user wants (or they name it directly). Grab the `id`.
2. Pass `moodboard_id: "the_id"` directly to `generate_image` / `generate_image_edit` / `generate_creative_director`. The server resolves the moodboard's master_prompt and style_guide into the generation. **Do NOT manually paste the master_prompt into your text prompt** — that duplicates signal and is less effective than passing the id.
3. For `generate_creative_director`, you can pass `moodboard_ids: [id1, id2]` to blend multiple moodboards across scenes.

**Batch variations of the same prompt**
1. `generate_image` with `num_images: N` (where N is how many variations the user wants). Single call, N outputs. Don't loop.

**Current-events / real-world grounding**
1. `generate_image` or `chat_send_message` with `enable_web_search: true` / `web_search: true`. The server grounds the prompt in real web results.

**Recovering from a timeout**
1. Generation tools throw an error message like `"Generation timed out after 120s... call get_generation_status with generation_id='xyz'"`.
2. Extract the `generation_id` from the error message (it's always quoted).
3. Call `get_generation_status` with that id. The generation is almost certainly still running; just wait a bit and check again, or return the current state to the user.

**Animating reference assets (elements)**
1. For "animate this product", "put these characters in a scene", "make this logo move": call `generate_elements` with `reference_images: [url]` (or `files: [path]` for local files) plus a prompt.
2. If the user wants the same character / product kept consistent across multiple outputs, pass `visual_dna_ids: [id]` as well.
3. Returns a video URL.

**First-to-last frame interpolation**
1. If the user wants a smooth transition between two specific images (e.g., "morph from this before-shot to this after-shot"): `generate_first_last_frame`.
2. URL mode: pass `first_frame_url` + `last_frame_url`.
3. File mode: pass `first_frame` + `last_frame` (URL or absolute local path each).
4. Do NOT mix modes — the tool will reject "one URL + one file".

**Lipsync**
1. For "sync this audio to this talking head", "dub this video with this voice track": `generate_lipsync`.
2. `source` is the image or video of the face (URL or local path); `audio` is the voice track (URL or local path).
3. Model auto-selects unless the user names one. Some lipsync models support a text performance prompt — pass `text_prompt` in that case. Call `list_models` with `type: "lipsync"` to see which models are available.
4. For a multi-person video where the user wants a specific person synced, use the Sync-3 model and `active_speaker_detection` — `{ auto_detect: true }` to let it choose, or `{ coordinates: [x, y], frame_number: N }` to pin a face (pixel coordinates in the source-video resolution). Sync-3 also takes `emotion`, `model_mode`, and `temperature`.

**Video restyling (video-to-video)**
1. For "restyle this footage in anime style", "change this scene to night time", "swap the subject for a tiger": `generate_video_from_video`.
2. `source_video` accepts URL or absolute local path. `prompt` describes the transformation.
3. Accepts `visual_dna_ids` if you want the restyle to preserve a known character.

**3D generation**
1. Text mode: `generate_3d` with just a `prompt`.
2. Single-image mode: `generate_3d` with `reference_images: [url]`.
3. Multi-view mode: `generate_3d` with `reference_images: [url1, url2, url3...]` for better reconstruction. Different angles of the same object work best.
4. Returns URLs for GLB, FBX, OBJ, USDZ formats — pick the one the user's target pipeline needs.
5. **Be patient** — 3D is slow. Expect up to 15 min. If you see a timeout, fall back to `get_generation_status`.

**Transcription / subtitling**
1. `transcribe_audio` with `source` as a URL or absolute local path. Works on audio (mp3, wav, m4a, flac) and video (mp4, mov, webm).
2. Returns `text` (full transcript), `srt_url` and `txt_url` (downloadable files), and `duration`.
3. For long content (podcasts > 30min), expect polling to take a while — the timeout is 30 minutes for this tool.

**Image / video / audio analysis (vision)**
1. If the file is local: `upload_media` with the local path → get a stable CDN URL.
2. `chat_send_message` with `media_urls: [<cdn-url>]` and your question. Use `model: "gemini-2.5-pro"` for best results, or omit to auto-route.
3. The API auto-routes to a vision-capable model when `media_urls` is present. Gemini can read video frames, on-screen text, burned-in prompts, and image content natively.
4. **Never use ffmpeg or frame extraction** — pass the file URL directly and let Gemini handle it.

**Upload-once, reference-many (media library)**
1. User says "I have these 5 product shots locally, make a campaign with them": first call `upload_media` for each local file → collect the returned Kolbo CDN URLs.
2. Now pass those stable URLs to `generate_creative_director` with `reference_images: [...]` or `visual_dna_ids` after `create_visual_dna`.
3. Advantage: the local files don't need to be re-resolved on every generation call, and the URLs are permanent so you can reference them across a whole session.
4. For discovering previously uploaded files: `list_media` with a type filter.

**Shorts Creator (long video → viral vertical shorts)**
1. Source must be in the Kolbo media library — if the user gives a local path or external URL, `upload_media` first and use the returned CDN URL.
2. `shorts_analyze` with `video_url` (flat 15 credits). It polls internally (~1-3 min) and returns the AI-picked `moments[]` with titles, hooks, and virality scores.
3. **Show the moments to the user** and let them pick up to 5. Then `shorts_list_presets` and let them pick a style.
4. `shorts_estimate` with the selection — free — and confirm the total credits with the user. `accents` mode (default, restyles only the strongest beats, 1-3 chunks × 200cr) is much cheaper than `full` (one 200cr chunk per ~10s). Subtitles add 40cr/min (60s min).
5. Optional Review & Edit: `shorts_get_transcript` for the word-level transcript (absolute source seconds), then add per-short `delete_ranges` (cut filler/dead air; ≥8s must remain) and/or an edited `srt_content` (SRT with timestamps in the cut timeline). Re-run `shorts_estimate` — cuts make it cheaper.
6. `shorts_render` with the SAME selection. It polls until done (5-20 min) and returns each short's `final_url`. On PARTIALLY_COMPLETED, report successes + failures (failed shorts are auto-refunded).
7. On a polling timeout, the error carries the `job_id` — call `shorts_status` to resume. `shorts_cancel` stops the job and refunds unused credits.
7. Remember: the restyle NEVER adds text — enable `subtitles_enabled` if the user wants captions (VEED style via `subtitles_preset`, default `"glass"`). Each short must be 15-90s (use `start`/`end` overrides to trim).

**App Builder**
1. `app_builder_list_projects` → find the right `project_id` (or ask the user if ambiguous)
2. `app_builder_create_session` with `project_id` → get `session_id`
3. `app_builder_generate_app` with `session_id` + `prompt` → polls until deployed → **always show the `deployment_url` to the user**
4. To iterate: `app_builder_list_generations` → get latest `generation_id` → `app_builder_edit_app`
5. For local dev: `app_builder_get_session` → hand user `github_repo_url` + `supabase_url` + `supabase_anon_key`
6. **Never call `app_builder_delete_session` without explicit user confirmation** — it permanently destroys the GitHub repo, Supabase DB, and all history.

**Preset-driven generation**
1. When the user asks "use the Cyberpunk preset" or "what video presets are available": `list_presets` with an optional `type` filter.
2. Find the preset the user wants, grab its `id`.
3. Pass `preset_id: <id>` to the matching generation tool (`generate_image`, `generate_video`, `generate_music`, `generate_elements`). The server folds in the preset's prompt template and style automatically.

## Gotchas

- **Model identifiers change without notice.** NEVER guess or hardcode an identifier like `"gpt-4o"` or `"claude-sonnet"` — those are not Kolbo identifiers. Kolbo identifiers look like `fal-ai/flux-2`, `seedance-1.5-pro-image-to-video`, `nano-banana-pro`, etc. Always call `list_models` to get the current list, or omit `model` to let Smart Select choose.
- **`create_visual_dna` file rules**: absolute paths only, 25MB per file, 4 images max, URLs must be publicly reachable (not localhost, not behind auth).
- **Visual DNA only works if you pass the id.** `create_visual_dna` alone does nothing for the current generation — you MUST pass `visual_dna_ids: [id]` to the subsequent generation tool. Same for `moodboard_id`.
- **Generation timeouts include the id.** The error message looks like `"Generation timed out after 120s. Call get_generation_status with generation_id='xyz' to check."` — extract the id and call the fallback. Don't claim the generation failed; it's probably still running.
- **Deep-think chat gets 10 minutes.** `chat_send_message` with `deep_think: true` automatically uses a longer polling window. Web search gets 4 minutes. Default chat is 2 minutes. So don't preemptively set timeouts yourself.
- **Error codes carry signal.** `[INSUFFICIENT_CREDITS]` in an error message means call `check_credits` and tell the user to top up. `[NOT_FOUND]` means the id is wrong. `[ACCESS_DENIED]` means the user doesn't own the resource.
- **Don't over-generate.** If the user asked for one image, make one. Use `num_images: N` when they explicitly ask for N variations; don't invent extras.

## Response style

- Return generation URLs as clickable markdown links.
- Be brief — the user wants the result, not a tutorial.
- For multi-step workflows, state your plan in one sentence, then execute.
- Mention credit costs only when unusually high or when asked.
- Don't apologize for tool latency; generation takes as long as it takes.

Troubleshooting

That's expected on the keyless install — the first use logs you in and caches the token, so it shouldn't reappear. If it does every time, the token isn't being saved (e.g. a read-only home dir); set a KOLBO_API_KEY in the env section to use a fixed key instead.

"KOLBO_API_KEY environment variable is required"

You're on an older @kolbo/mcp that predates keyless login. Pin @kolbo/mcp@latest in your config (and clear the npx cache), or add a KOLBO_API_KEY to the env section.

Tool not appearing

Restart Claude Code after adding the MCP server config. Check that npx @kolbo/mcp runs without errors.

Generation timeout

Video and long music generations can exceed the default polling window. If a tool returns a timeout error, the message includes the generation_id — extract it and call get_generation_status with that id to check the latest state. The generation is almost certainly still running server-side; the client just stopped waiting.

Default polling windows:

Image / image-edit: 120s
Video / video-from-image / music: 300s
Creative Director: 600s
Chat (default): 120s
Chat with web_search: true: 240s
Chat with deep_think: true: 600s

`create_visual_dna` file errors

Local file paths must be absolute (no ~ or relative paths). Files are capped at 25MB each. URLs must be publicly reachable (not localhost, not behind auth).

Chat loses context between turns

You must pass the session_id returned from the first chat_send_message call back into subsequent calls in the same conversation. Omitting it starts a new thread every turn.

Skills, Plugin & MCP Setup

On this page