I built a voice chat with two talking dogs

A real-time voice chat on thedoodlecast.com lets you speak to Rusty and Oreo — and hear them talk back in character. Here's how it works, the stack behind it, and how to apply for early access.

I built a voice chat with two talking dogs

I built a real-time voice chat with two talking dogs. You can apply for early access below.

For the last few months I've been running a side project called The Doodle Cast — an AI-generated video podcast starring two animated dogs, Rusty and Oreo. Rusty is the deadpan operator; Oreo is chaos with a golden coat. It's a sandbox where I get to push pipelines for image-to-video generation, dialogue writing with LLMs, and voice synthesis into a production workflow.

The latest addition is something I've wanted for a while: a live voice chat where you can actually talk to one of the dogs, out loud, and hear them talk back in character — in real time.

The Doodle Cast voice chat interface showing Rusty's avatar inside a pulsing orange ring, a live audio-level waveform, and a conversation bubble reading 'The mailman is a known operative. I've been tracking his movements for years. Trust no one in uniform.'
The voice chat interface, mid-conversation with Rusty. The orb pulses while he's thinking, the waveform animates while he's speaking.

What it actually does

You open the link on your phone or laptop, tap the mic, and speak. The page captures your voice through the browser's built-in speech recognition, sends the transcript to a local LLM that's been primed with each character's personality and show bible, streams the response back, and pipes it into a TTS model that's been cloned to match each dog's voice from hours of existing Doodle Cast dialogue. Round-trip latency is good enough that it feels like a real conversation — not a walkie-talkie.

Each character has a distinct profile:

  • Rusty — measured, strategic, quietly judgmental. Will answer honestly but treat most questions like a tactical briefing.
  • Oreo — enthusiastic to the point of malfunction. Do not mention squirrels, the vacuum, or anything involving water unless you want a meltdown.

The stack, briefly

It's deliberately modest — no Realtime APIs, no WebRTC gymnastics. The whole thing is held together with:

  • The browser's built-in SpeechRecognition for input (Chrome, Safari/iOS, Edge).
  • A locally hosted LLM behind a streamed SSE endpoint on the Doodle Cast API, tuned with the character's system prompt on every turn.
  • ElevenLabs Flash v2.5 for the voice — it's fast enough for conversational turn-taking and the voice clones hold up well over short lines.
  • Playback through the Web Audio API (decodeAudioData + BufferSource) instead of <audio> elements, because iOS Safari really does not enjoy the handoff between microphone recording and audio playback, and the buffer-source path survives it.
  • Session-scoped invites so the whole thing is gated and the compute budget stays sane.

The last point is why this is invite-only for now. It's running on a single VPS in front of a local model, and I want to see how people actually use it before opening it up further. Every conversation costs a little bit of CPU, a little bit of TTS budget, and a nonzero amount of my time watching logs.

Apply for early access

If you want to try it, fill out the form below. I'll review applications manually, and approved users get a single-use invite link emailed to them. The link opens the voice chat page with your own private session — no account, no tracking beyond what's needed for rate limiting.

Request access

Takes 30 seconds. You'll hear back by email.

We'll email you once reviewed.
While you wait: subscribe to The Doodle Cast on YouTube. That's where the real episodes live, and it's where priority access tends to go first.
Subscribe on YouTube

What happens after you apply

  1. I get an email with your application and review it.
  2. If approved, an invite link lands in your inbox — one session, 30-day expiry.
  3. Open the link on an iPhone, iPad, or a Chrome/Edge desktop, pick Rusty or Oreo, tap the mic, and just talk.
  4. If you opted in to the newsletter, you'll start getting new-episode notifications. No spam.

If something goes sideways — the mic won't unlock, a response freezes, audio cuts out mid-sentence — I genuinely want to hear about it. This is still early, and iPhones are where most of the interesting edge cases live.

Discussion

Be the first to comment