Create Studio: Hybrid Cloud-GPU Architecture for AI Video Generation

A deep dive into the distributed architecture powering Create Studio — an AI video ad generator that seamlessly bridges a cloud VPS with a local RTX 5090 GPU workstation.

Create Studio: Hybrid Cloud-GPU Architecture for AI Video Generation

1. System Overview

Create Studio is an AI-powered video ad generator that allows users to upload photos of businesses, arrange them on an NLE-style timeline, add AI-generated voiceover and music, and render broadcast-quality video ads — all from a browser.

The system runs on a hybrid architecture spanning two physical locations: a DigitalOcean VPS in the cloud and a local workstation with an NVIDIA RTX 5090 GPU. The architecture is designed so that every feature works when the GPU PC is offline, with the 5090 providing enhanced performance when available.

Fig 1 — High-Level Architecture
CLOUD (VPS — 157.230.202.18) LOCAL (5090 GPU — 192.168.1.85) Express Server Node.js :3013 Apache Proxy SSL + Reverse Proxy FFmpeg VPS Fallback Render Google Drive OAuth2 Storage ElevenLabs API TTS + Music Cloud Google Veo 3 AI Video (Cloud) Gemini API Script Gen Fallback Jobs & Queue data/jobs.json Flask API Python :8189 Cloudflare Tunnel images.thedoodlecast FFmpeg 5090 GPU Render SDXL Image Generation ACE-Step 1.5 Music Generation MusicGen Music Fallback Wan2.1 14B Video Gen (Local) Ollama Script Generation HTTPS HTTP Upload

Key Design Principles

  • Graceful degradation — Every feature works without the 5090. Cloud APIs (ElevenLabs, Veo 3, Gemini) provide fallbacks.
  • Transparent switching — The is5090Online() health check (30s cache) automatically routes to the best available backend.
  • Zero downtime transitions — The 5090 can go offline mid-session (e.g., user starts sim racing) without breaking in-progress renders.

2. Request Flow & Routing

Every request from the browser follows this path:

Fig 2 — Request Flow
Browser User HTTPS Apache :443 SSL proxy Express :3013 Node.js 5090 online? 5090 offline? 5090 Flask GPU Processing Cloud APIs ElevenLabs/Veo/Gemini Result

Session & Authentication

Multi-user auth with session cookies. POST /auth validates credentials, stores req.session.user. All API routes check requireAuth middleware. Gallery items are tagged per-user for filtering.

// server.js — Auth flow
const USERS = { rick: 'rick', jens: 'jens' };

app.post('/auth', (req, res) => {
  const { username, password } = req.body;
  if (USERS[username] === password) {
    req.session.user = username;
    res.json({ success: true, user: username });
  }
});

3. GPU Health Check & Service Discovery

The VPS continuously monitors the 5090's availability through a cached health check. This is the foundation of the hybrid architecture — every subsystem queries this before deciding where to route work.

Fig 3 — Health Check State Machine
ONLINE gpu5090Online=true OFFLINE gpu5090Online=false /health timeout or error /health returns 200 OK 30s cache (GPU_CHECK)
// Cached 5090 health check
let gpu5090Online = false;
let gpu5090LastCheck = 0;
const GPU_CHECK_INTERVAL = 30000; // 30 seconds

async function is5090Online() {
  if (Date.now() - gpu5090LastCheck < GPU_CHECK_INTERVAL)
    return gpu5090Online;
  try {
    const r = await fetch(`${IMAGE_API_URL}/health`, {
      signal: AbortSignal.timeout(5000)
    });
    gpu5090Online = r.ok;
  } catch {
    gpu5090Online = false;
  }
  gpu5090LastCheck = Date.now();
  return gpu5090Online;
}

The 30-second cache prevents hammering the health endpoint during burst operations while keeping the state reasonably fresh. The 5-second timeout prevents the check itself from blocking render pipelines.

4. Video Rendering Pipeline

The render pipeline is the most complex subsystem. It builds an FFmpeg filter_complex graph from the timeline state, then decides where to execute it based on 5090 availability.

Fig 4 — Render Pipeline Decision Tree
Generate Video POST /api/generate Validate Clips 10 clips max TTS Voiceover ElevenLabs API Generate Music Hybrid Dispatcher AI Clips (if any) Veo 3 / Wan2.1 5090 Online? 5090 FFmpeg HTTP offload + upload YES VPS FFmpeg Local render NO fallback if 5090 render fails Upload to Drive Google Drive API

4.1 FFmpeg Offload to 5090

When the 5090 is online, the VPS offloads FFmpeg rendering for dramatically faster encode times (GPU-accelerated vs. VPS CPU). The process works via a fully HTTP-based transfer protocol — no SSH/SCP required:

Fig 5 — FFmpeg Offload Sequence
VPS (Express) Cloudflare Tunnel 5090 (Flask) 1. Build filter_complex 2. Create render token + file download URLs 3. POST /render-video (6KB JSON) 4. GET /api/render-file/{token}/{idx} (x12) 5. Run FFmpeg libx264, drawtext, xfade 6. GET /render-status/{jobId} (poll 3s) { status: "rendering", progress: 45 } 7. PUT /api/render-upload/{token} (video bytes) 8. { status: "done", progress: 100 } 9. Upload to Google Drive

Font Path Mapping

The VPS builds filter_complex strings with Linux font paths. Before sending to the 5090, these are mapped to Windows equivalents with escaped colons (since : is a delimiter in FFmpeg filter syntax):

const WIN_FONT_MAP = {
  '/usr/share/fonts/.../OpenSans-Bold.ttf':
    'C\:/Windows/Fonts/arialbd.ttf',
  '/usr/share/fonts/.../DejaVuSerif.ttf':
    'C\:/Windows/Fonts/times.ttf',
  // ... 12 mappings total
};

5. Hybrid Music Generation

Music generation uses a dispatcher pattern that routes to the best available engine based on user preference and 5090 availability.

Fig 6 — Music Engine Routing
Music Request prompt + duration Engine Select ACE-Step 1.5 5090 via Flask API local ElevenLabs Cloud SFX API elevenlabs Auto Mode Logic 5090 online? → ACE-Step 5090 offline? → ElevenLabs local + offline? → auto-fallback auto WAV MP3
ACE-Step 1.5 (Local)
  • Free, unlimited generations
  • Better quality for longer tracks
  • Lazy-loaded on first request (~60s)
  • Retry loop: 7 attempts, 15s intervals
  • Output: WAV, variable length
ElevenLabs (Cloud)
  • Pay-per-generation
  • Always available, no warmup
  • Max ~22 seconds per generation
  • API: POST /v1/sound-generation
  • Output: MP3, fixed duration

6. AI Video Clip Generation

Create Studio supports three clip modes: Ken Burns (zoom/pan on static images), AI Image-to-Video (I2V), and AI Text-to-Video (T2V). The AI modes use different backends based on availability:

Fig 7 — AI Video Generation Matrix
Model Location I2V T2V Duration Availability
Google Veo 3 Cloud (Google API) 4-8 seconds Always
Wan2.1 14B Local (5090 GPU) ✓ (via SDXL) ~5 seconds 5090 only

The UI dynamically shows/hides the Wan2.1 option based on 5090 status. When offline, clips default to Veo 3 and the "Advanced" toggle is hidden. The VPS can call Veo 3 directly via the Google Generative AI REST API without needing the 5090.

7. LLM Script Generation

Voiceover scripts and ad copy are generated via LLM with a two-tier fallback:

Fig 8 — LLM Fallback Chain
callLLM() prompt + timeout try first Ollama 5090 / llama3.2 fail/timeout Gemini Flash Cloud API Ollama: 8s timeout (AbortController) Gemini: 15s timeout, model: gemini-2.0-flash
async function callLLM(prompt, timeoutMs = 15000) {
  // Try Ollama first (local, free, fast when available)
  try {
    const controller = new AbortController();
    const timer = setTimeout(() => controller.abort(), 8000);
    const res = await fetch(OLLAMA_URL, { ... });
    clearTimeout(timer);
    if (res.ok) return (await res.json()).response?.trim();
  } catch {}
  // Fallback: Gemini API (always available)
  try {
    const res = await fetch(
      `https://generativelanguage.googleapis.com/v1beta/
       models/gemini-2.0-flash:generateContent?key=${KEY}`,
      { method: 'POST', body: JSON.stringify({ contents: [{ parts: [{ text: prompt }] }] }) }
    );
    if (res.ok) return data?.candidates?.[0]?.content?.parts?.[0]?.text?.trim();
  } catch {}
  return null;
}

8. Job Lifecycle & Data Model

Fig 9 — Job State Machine
pending processing done Google Drive upload + cleanup error paused cancel cancelled
// Job data shape (data/jobs.json)
{
  id: "uuid",
  user: "jens",
  brandName: "Joe's Pizza",
  status: "processing",    // pending | processing | done | error | paused
  step: "Rendering on 5090 GPU PC...",
  progress: 67,            // 0-100
  outputUrl: "/videos/uuid.mp4",
  error: null,
  createdAt: "2026-03-13T...",
  completedAt: null,
  comment: "",
  rating: 0,
  driveFileId: "1abc...",
  driveFileSize: 45000000,
  warnings: []             // AI clip fallback warnings
}

9. NLE Timeline Editor

The editor UI follows a traditional non-linear editing layout with a media library panel and a multi-track timeline:

Fig 10 — NLE Editor Layout
Media Library 250px fixed Upload My Library persistent per-user data/user-photos.json Project Photos Google Drive backed AI Scenes T2V placeholders ← drag to timeline → MIME: application/ x-media-photo Timeline fluid width VIDEO VOICE MUSIC Clip 1 (4s) AI (5s) Clip 3 (4s) Clip 4 T2V (6s) Voiceover — draggable + resizable Background Music — full timeline width 0s 10s 20s 30s 40s

10. Deployment & Infrastructure

VPS (DigitalOcean)
OSUbuntu 24.04 LTS
RuntimeNode.js + PM2 (cluster)
ProxyApache 2.4 + SSL
Port3013 (internal)
Domaincreate.overdigital.ai
StorageGoogle Drive (OAuth2)
DeploySCP + PM2 restart
5090 GPU Workstation
OSWindows 11
GPUNVIDIA RTX 5090 (32GB)
RuntimePython Flask + CUDA
Port8189 (internal)
TunnelCloudflare (images.*)
StartupWindows Scheduled Task
FFmpegShared build in PATH

File Transfer Protocol

All file transfers between VPS and 5090 use HTTP — no SCP/SSH required from the 5090 side. This avoids issues with SSH key access in Windows Scheduled Task contexts:

Fig 11 — File Transfer Architecture
VPS Express Token-based file server 5090 Flask urllib downloads GET /api/render-file/{token}/{idx} 5090 pulls input files from VPS PUT /api/render-upload/{token} 5090 pushes rendered video to VPS Render Token (crypto.randomBytes) TTL: 10 minutes | Maps idx → file path No auth required (token IS the auth)

11. Technology Stack

Frontend
  • Vanilla HTML/CSS/JS (single file)
  • ~7000 lines in index.html
  • CSS custom properties for theming
  • HTML5 Drag & Drop API
  • Web Audio API for previews
Backend (VPS)
  • Node.js + Express
  • PM2 (cluster mode)
  • express-session
  • Google APIs (Drive, Veo 3)
  • FFmpeg (fallback render)
GPU Backend (5090)
  • Python Flask
  • PyTorch + CUDA
  • diffusers 0.37.0
  • ACE-Step, SDXL, Wan2.1
  • FFmpeg (primary render)
External APIs
  • ElevenLabs (TTS + SFX)
  • Google Veo 3 (video gen)
  • Google Gemini (LLM fallback)
  • Google Drive (storage)
  • Google Places (photo import)

12. Lessons Learned

  • SCP doesn't work reliably from Windows Scheduled Tasks — SSH key access differs between interactive and service contexts. HTTP-based file transfer solved this permanently.
  • FFmpeg filter_complex and Windows paths don't mix — The : in C: is a filter delimiter. Escaping with \: is required.
  • Base64 in JSON has ~33% overhead — Sending 10 photos as base64 turned a 30MB payload into 40MB+. Token-based download URLs reduced the submission payload to 6KB.
  • Cloudflare blocks Python urllib — The default Python-urllib/3.x user agent triggers Cloudflare's bot detection. A custom User-Agent header fixed 403 errors.
  • Design for offline-first — Making every feature work without the GPU from the start would have saved significant refactoring. The hybrid fallback pattern (try local → fallback to cloud) should be the default architecture, not an afterthought.

This is a private architecture document for Create Studio. Built by Jens Loeffler. Last updated March 2026.

1 Comment

Matthew
The singularity draws closer day by day.

Join the discussion