I built a real-time voice chat with two talking dogs. You can apply for early access below.
For the last few months I've been running a side project called The Doodle Cast — an AI-generated video podcast starring two animated dogs, Rusty and Oreo. Rusty is the deadpan operator; Oreo is chaos with a golden coat. It's a sandbox where I get to push pipelines for image-to-video generation, dialogue writing with LLMs, and voice synthesis into a production workflow.
The latest addition is something I've wanted for a while: a live voice chat where you can actually talk to one of the dogs, out loud, and hear them talk back in character — in real time.
What it actually does
You open the link on your phone or laptop, tap the mic, and speak. The page captures your voice through the browser's built-in speech recognition, sends the transcript to a local LLM that's been primed with each character's personality and show bible, streams the response back, and pipes it into a TTS model that's been cloned to match each dog's voice from hours of existing Doodle Cast dialogue. Round-trip latency is good enough that it feels like a real conversation — not a walkie-talkie.
Each character has a distinct profile:
- Rusty — measured, strategic, quietly judgmental. Will answer honestly but treat most questions like a tactical briefing.
- Oreo — enthusiastic to the point of malfunction. Do not mention squirrels, the vacuum, or anything involving water unless you want a meltdown.
The stack, briefly
It's deliberately modest — no Realtime APIs, no WebRTC gymnastics. The whole thing is held together with:
- The browser's built-in
SpeechRecognitionfor input (Chrome, Safari/iOS, Edge). - A locally hosted LLM behind a streamed SSE endpoint on the Doodle Cast API, tuned with the character's system prompt on every turn.
- ElevenLabs Flash v2.5 for the voice — it's fast enough for conversational turn-taking and the voice clones hold up well over short lines.
- Playback through the Web Audio API (
decodeAudioData+BufferSource) instead of<audio>elements, because iOS Safari really does not enjoy the handoff between microphone recording and audio playback, and the buffer-source path survives it. - Session-scoped invites so the whole thing is gated and the compute budget stays sane.
The last point is why this is invite-only for now. It's running on a single VPS in front of a local model, and I want to see how people actually use it before opening it up further. Every conversation costs a little bit of CPU, a little bit of TTS budget, and a nonzero amount of my time watching logs.
Apply for early access
If you want to try it, fill out the form below. I'll review applications manually, and approved users get a single-use invite link emailed to them. The link opens the voice chat page with your own private session — no account, no tracking beyond what's needed for rate limiting.
What happens after you apply
- I get an email with your application and review it.
- If approved, an invite link lands in your inbox — one session, 30-day expiry.
- Open the link on an iPhone, iPad, or a Chrome/Edge desktop, pick Rusty or Oreo, tap the mic, and just talk.
- If you opted in to the newsletter, you'll start getting new-episode notifications. No spam.
If something goes sideways — the mic won't unlock, a response freezes, audio cuts out mid-sentence — I genuinely want to hear about it. This is still early, and iPhones are where most of the interesting edge cases live.
Discussion
Be the first to comment