New Project! 🎧 Listen to any text as audiobook-quality soundTry it now

VoiceClaw — Give Your Agent a Voice

April 20, 20262 min read

VoiceClaw cover

I kept wanting to talk to my agent while I walked. Not dictate — actually think out loud with it.

How it started

Mid-February I started using OpenClaw and fell in love with it. But the official voice path was Twilio + Vapi, and that never made sense to me. Why hand out a real phone number that anyone on the internet could dial? How do you even secure that? Vapi doesn't have a mobile app either.

So the first version was just: build a mobile app for Vapi and play around.

It worked. It was also slow.

Vapi latency — 5 seconds per turn

And to be clear — the bottleneck wasn't Vapi, or the STT/TTS pipeline. Those were fine. It was the agent itself. Each turn took ~5 seconds because the agent was doing real work: tools, memory, reasoning. I got it down to ~3 and ran out of ideas. Something like Cerebras would shave more off, but you lose the agent's voice — literally, the model that makes it your agent.

The unlock

Then I came across Anthropic's write-up on the advisor strategy — a cheaper executor model calling out to a stronger one when it needs help. Around the same time, Garry Tan (YC) open-sourced gbrain, his personal agent with an ask_brain pattern on top.

That was the idea. Flip the stack: put a fast realtime voice model in front, and let it call out to your agent only when it needs tools or memory. The voice stays snappy, the agent stays yours. No Twilio. No phone number. No inbound attack surface.

I got it working on OpenAI Realtime. VAD was rough, echo cancellation worse, still slower than I wanted.

Then Gemini 3.1 Live Flash dropped in mid-April. I tried it and that was the moment — quality and latency both landed.

What it is now

VoiceClaw — a thin voice layer. Talk to Gemini Live or OpenAI Realtime, and when it needs tools or memory, ask_brain routes to your agent. Any OpenAI-compatible endpoint works.

Ships with an iOS app (Expo), a macOS app (Electron with screen share), and a Node.js relay. MIT licensed.

github.com/yagudaev/voiceclaw — stars and PRs welcome.

What would you build on top of it?

© 2026 Michael Yagudaev