Voice Input (Push-to-Talk)

Dictate your next message by holding Ctrl+Y in the terminal or clicking the mic button in the desktop app

Kolbo Code has built-in voice input. You never have to open a separate transcription tool — hold Ctrl+Y in the terminal prompt, or click the microphone button in the desktop app, and speak.

Desktop app

Click the mic button next to the + (attach) button in the chat input.
Speak — a "Listening…" chip shows the live transcript as you talk.
Committed text is typed into the input automatically. Click the mic again to stop.

The desktop app uses your system microphone directly (the OS will ask for permission the first time). Same realtime engine, same zero-credit billing as the terminal.

How it works (terminal)

Focus the prompt in the TUI.
Press and hold Ctrl+Y. The prompt switches to a listening state and recording begins immediately.
Speak your message. Partial transcripts stream in live so you can see what the model is hearing.
Release Ctrl+Y when you are done. The final transcript is appended to the prompt buffer; press enter to send.

Why Ctrl+Y?

It is a dedicated shortcut, so normal typing (including spaces) is never affected.
It is easy to hold with one hand while you think, and easy to release the instant you are done speaking.
It works consistently across macOS, Linux, and Windows terminals.

What happens under the hood

Audio is captured via the bundled FFmpeg binary (no separate install required). On macOS it uses avfoundation, on Linux pulse, on Windows dshow.
Raw PCM16 mono @ 16 kHz is streamed over a Socket.IO connection to api.kolbo.ai, which proxies to ElevenLabs Scribe v2 Realtime.
Partial and committed transcripts stream back in real time.
Cost: voice transcription from Kolbo Code is billed as source: "chat", which is free — zero credits deducted.

Requirements

A working microphone the OS can see.
You must be signed in (kolbo auth login) — the socket uses your Kolbo API key for authentication.
FFmpeg is bundled; no install needed. If the bundled binary is missing for any reason, Kolbo Code will fall back to system ffmpeg, then system sox on your PATH.

Troubleshooting

"Not logged in" → run kolbo auth login.
"No mic backend" → none of bundled FFmpeg, system FFmpeg, or sox could be resolved. Reinstall Kolbo Code so the bundled FFmpeg is restored.
Backend timeout → the CLI waited >5s for the server to acknowledge the session. Check your network and try again.
Permission denied on macOS → grant your terminal app microphone access in System Settings → Privacy & Security → Microphone.
Windows: no audio device found → Kolbo Code enumerates dshow audio devices automatically. If none are listed, plug in a mic or set a default recording device in Windows sound settings.

Debug logs for push-to-talk sessions are written to:

<data-dir>/log/push-to-talk.log

where <data-dir> is ~/.local/share/kolbo (Linux), ~/Library/Application Support/kolbo (macOS), or %LOCALAPPDATA%\kolbo (Windows).