🛒 E-commerce / Retail

AI Voice Agent + Video Call

Text chat (Llama 3.1), voice mode (Nova-3 STT + Aura-2 TTS), and RealtimeKit WebRTC video call — all from a single Cloudflare Worker. Zero backend changes to your origin.

The Problem

""HÖMSTYLE customers can't get instant furniture advice — support email takes 3–5 days.""

The Outcome

1

Worker to deploy. Text chat → opt-in voice → RealtimeKit video call. All in one script.

Live demo below
🏠

HÖMSTYLE AI Advisor

AI Chat · Llama 3.1

📞

RealtimeKit Video Call

Click "Video Call with Agent" in the chat above, or open directly.

Productionising this

What changes when you ship this for real

API token scope

RTK_API_TOKEN must be a Cloudflare API token with the Realtime permission only. Never reuse a global token. Rotate quarterly.

Per-meeting auth tokens

Participant tokens are JWTs with exp ~24h. Don't cache them — generate one per meeting per participant. The /api/rtk/meeting endpoint already does this.

Custom presets

Default group_call_participant / group_call_host presets are the demo's starting point. For production, create custom presets in the dashboard with the exact permissions your role needs (mute, kick, screen-share, etc.).

STT word error rate

Nova-3 has lower WER than Whisper for conversational English; Whisper-Turbo wins for long-form, multilingual. Pick per use case — both are billed at ~$0.0005/audio-minute.

Recording + retention

Stream the call audio to R2 via the RealtimeKit recording API. Encrypt at rest. Set R2 lifecycle to delete recordings after 90 days unless flagged.

Observability

The /api/rtk/meeting handler now surfaces cloudflareStatus + cloudflareError on failures. Pipe these into your incident dashboard so a 401 from a rotated token gets caught immediately.