Doodle Cast Voice Chat — Talk to Rusty and Oreo

I built a real-time voice chat with two talking dogs. You can apply for early access below.

For the last few months I've been running a side project called The Doodle Cast — an AI-generated video podcast starring two animated dogs, Rusty and Oreo. Rusty is the deadpan operator; Oreo is chaos with a golden coat. It's a sandbox where I get to push pipelines for image-to-video generation, dialogue writing with LLMs, and voice synthesis into a production workflow.

The latest addition is something I've wanted for a while: a live voice chat where you can actually talk to one of the dogs, out loud, and hear them talk back in character — in real time.

The voice chat interface, mid-conversation with Rusty. The orb pulses while he's thinking, the waveform animates while he's speaking.

What it actually does

You open the link on your phone or laptop, tap the mic, and speak. The page captures your voice through the browser's built-in speech recognition, sends the transcript to a local LLM that's been primed with each character's personality and show bible, streams the response back, and pipes it into a TTS model that's been cloned to match each dog's voice from hours of existing Doodle Cast dialogue. Round-trip latency is good enough that it feels like a real conversation — not a walkie-talkie.

Each character has a distinct profile:

Rusty — measured, strategic, quietly judgmental. Will answer honestly but treat most questions like a tactical briefing.
Oreo — enthusiastic to the point of malfunction. Do not mention squirrels, the vacuum, or anything involving water unless you want a meltdown.

The stack, briefly

It's deliberately modest — no Realtime APIs, no WebRTC gymnastics. The whole thing is held together with:

The browser's built-in SpeechRecognition for input (Chrome, Safari/iOS, Edge).
A locally hosted LLM behind a streamed SSE endpoint on the Doodle Cast API, tuned with the character's system prompt on every turn.
ElevenLabs Flash v2.5 for the voice — it's fast enough for conversational turn-taking and the voice clones hold up well over short lines.
Playback through the Web Audio API (decodeAudioData + BufferSource) instead of <audio> elements, because iOS Safari really does not enjoy the handoff between microphone recording and audio playback, and the buffer-source path survives it.
Session-scoped invites so the whole thing is gated and the compute budget stays sane.

The last point is why this is invite-only for now. It's running on a single VPS in front of a local model, and I want to see how people actually use it before opening it up further. Every conversation costs a little bit of CPU, a little bit of TTS budget, and a nonzero amount of my time watching logs.

Apply for early access

If you want to try it, fill out the form below. I'll review applications manually, and approved users get a single-use invite link emailed to them. The link opens the voice chat page with your own private session — no account, no tracking beyond what's needed for rate limiting.

Request access

Takes 30 seconds. You'll hear back by email.

Name

Why do you want access?

Also sign me up for The Doodle Cast newsletter (new episodes, occasional behind-the-scenes notes, easy to unsubscribe).

Request invite We'll email you once reviewed.

While you wait: subscribe to The Doodle Cast on YouTube. That's where the real episodes live, and it's where priority access tends to go first.

Subscribe on YouTube

What happens after you apply

I get an email with your application and review it.
If approved, an invite link lands in your inbox — one session, 30-day expiry.
Open the link on an iPhone, iPad, or a Chrome/Edge desktop, pick Rusty or Oreo, tap the mic, and just talk.
If you opted in to the newsletter, you'll start getting new-episode notifications. No spam.

If something goes sideways — the mic won't unlock, a response freezes, audio cuts out mid-sentence — I genuinely want to hear about it. This is still early, and iPhones are where most of the interesting edge cases live.

#voice ai #elevenlabs #llm #the doodle cast #webaudio #ios safari

I built a voice chat with two talking dogs

What it actually does

The stack, briefly

Apply for early access

Request access

What happens after you apply

Discussion

Be the first to comment

I built a voice chat with two talking dogs

What it actually does

The stack, briefly

Apply for early access

Request access

What happens after you apply

Discussion

Be the first to comment

Related Articles

The AI Production Crew: How Showspring Built The Doodle Cast Without a Studio

Landscape to Vertical, Automatically — How PodSplit Reframes Video With AI

Deep Dive: Building the DaVinci Resolve Integration for Showspring