Kolbo.AIKolbo.AI Docs
Kolbo Code

Voice Input (Push-to-Talk)

Dictate your next message by holding Ctrl+Y in the Kolbo Code prompt

Kolbo Code has built-in push-to-talk voice input. You never have to open a separate transcription tool — just hold Ctrl+Y in the prompt and speak.

How it works

  1. Focus the prompt in the TUI.
  2. Press and hold Ctrl+Y. The prompt switches to a listening state and recording begins immediately.
  3. Speak your message. Partial transcripts stream in live so you can see what the model is hearing.
  4. Release Ctrl+Y when you are done. The final transcript is appended to the prompt buffer; press enter to send.

Why Ctrl+Y?

  • It is a dedicated shortcut, so normal typing (including spaces) is never affected.
  • It is easy to hold with one hand while you think, and easy to release the instant you are done speaking.
  • It works consistently across macOS, Linux, and Windows terminals.

What happens under the hood

  • Audio is captured via the bundled FFmpeg binary (no separate install required). On macOS it uses avfoundation, on Linux pulse, on Windows dshow.
  • Raw PCM16 mono @ 16 kHz is streamed over a Socket.IO connection to api.kolbo.ai, which proxies to ElevenLabs Scribe v2 Realtime.
  • Partial and committed transcripts stream back in real time.
  • Cost: voice transcription from Kolbo Code is billed as source: "chat", which is free — zero credits deducted.

Requirements

  • A working microphone the OS can see.
  • You must be signed in (kolbo auth login) — the socket uses your Kolbo API key for authentication.
  • FFmpeg is bundled; no install needed. If the bundled binary is missing for any reason, Kolbo Code will fall back to system ffmpeg, then system sox on your PATH.

Troubleshooting

  • "Not logged in" → run kolbo auth login.
  • "No mic backend" → none of bundled FFmpeg, system FFmpeg, or sox could be resolved. Reinstall Kolbo Code so the bundled FFmpeg is restored.
  • Backend timeout → the CLI waited >5s for the server to acknowledge the session. Check your network and try again.
  • Permission denied on macOS → grant your terminal app microphone access in System Settings → Privacy & Security → Microphone.
  • Windows: no audio device found → Kolbo Code enumerates dshow audio devices automatically. If none are listed, plug in a mic or set a default recording device in Windows sound settings.

Debug logs for push-to-talk sessions are written to:

<data-dir>/log/push-to-talk.log

where <data-dir> is ~/.local/share/kolbo (Linux), ~/Library/Application Support/kolbo (macOS), or %LOCALAPPDATA%\kolbo (Windows).