One-sentence pitch:
"A Cloudflare Worker sits in front of your existing site and adds AI, cuts infrastructure costs, and localises for any market — in hours, without touching your backend or waiting on engineering."
Flow at a Glance
Time
What You're Doing
Business Point
0:00
Set the stage — the one-concept setup
Worker = thin layer in front, nothing changes behind
2:30
Routes — 60-second architecture explainer
Why none of this requires backend changes
4:00
AI Chatbot — inject without touching origin
Support deflection, 30-min deploy, 24/7
8:00
R2 — kill the egress bill
$0 egress, auditable metadata, no URL changes
12:00
Search — the silence moment
Lost revenue from zero results
16:00
Local pricing at the edge
20–40% conversion uplift, no backend sprint
19:00
Auto subtitles
48× cheaper than AWS Transcribe
22:00
Voice AI agent
24/7 voice support, no SDK needed
24:00
Collaborative Room Designer
Real-time multi-user sync — one DO replaces 5 infrastructure pieces
25:30
SEO — Rich Results
Stars + price in Google Search, zero backend changes
26:30
Pricing calculator — put a number on it
Real BOM, downloadable CSV for finance
29:00
Try it Yourself + Close
Pilot in 2 weeks, engineering can start today
Before You Start
0:00–2:30 — Set the Stage
Navigate to: dev.pongpisit.com/entertainment
"I want to show you something before I explain anything. This is a live streaming platform — real film data, real AI behind every button. No slides, no mocks."
"Everything I'm going to demo in the next 30 minutes was added to this site without changing a single line of its backend code. The secret is one concept I'll show you in 60 seconds."
Discovery: "What's the one feature your product team has been asking engineering for — that keeps getting pushed back?"
2:30–4:00 — The Foundation: Workers Intercept Everything
Navigate to: dev.pongpisit.com/demo/routes
"Here's the one thing you need to understand. A Cloudflare Worker is a small piece of code deployed at the edge — in front of your existing site. When a request comes in, the Worker intercepts it first."
"Your origin server never knows. It gets a completely normal request and returns its normal response. The Worker reads it, modifies it, adds to it — and returns something better to the user. That's it."
"No Kubernetes. No redeployment. One command — wrangler deploy — and it's live in 330 cities in 10 seconds."
Click the route animation — show request intercepting before origin
Key phrase to land: "Your engineering team keeps their sprint. We add a layer in front. Nothing breaks, everything gets better."
4:00–8:00 — AI Chatbot: Injected Without Touching the Server
Navigate to: dev.pongpisit.com/entertainment → scroll to Developer Guide → open AI Chat guide card → click Activate
"Watch the bottom right corner of the screen."
Click Activate — chat button appears instantly
"That button was not in the page before. The server returned exactly the same HTML it always does. A Worker intercepted the response and injected the chat widget using HTMLRewriter — a streaming HTML parser built into the Workers runtime."
"No backend deploy. No frontend PR. No waiting on engineering. 30 minutes of work."
Type into the chat: What should I watch if I loved Parasite?
"Instant AI recommendations, grounded in the catalog via AI Search with AutoRAG. The chatbot can answer questions about any film in the database — no manual FAQ writing, no stale content. It retrieves current data every time."
"No support ticket. No wait time. No human agent at 3am on a Sunday."
Discovery: "Do you have a chatbot today? What did it cost to build and what are you paying monthly to maintain it?"
Business framing: Average support ticket in media/e-commerce: $8–$15. If this handles 500 queries a month, that's $4,000–$7,500 in avoided support cost — from a feature that took 30 minutes to deploy.
8:00–12:00 — R2: Zero Egress, Auditable Cache
Navigate to: dev.pongpisit.com/demo/r2 → Run Demo
"Now let me show you something that hits your finance team immediately."
"Every image, every thumbnail your users load — AWS charges you to send that data out. They call it egress. It's typically 5–10% of cloud bills for any company with user-facing content."
"Cloudflare R2 stores the same content, serves it from the nearest Cloudflare PoP — and charges zero egress fees. Permanently. Not zero with a cap. Zero."
Click "Run Demo" — watch the cost meters flip from red to green
"But here's what makes it interesting for your engineering team — not just your finance team. When the Worker stores an image in R2, it writes custom metadata: exactly when that asset was first cached, its original file size, and where it came from."
"You can open the R2 dashboard right now and see every object with those three fields. You don't just know it's cached — you know when it was cached and from where. That's auditable infrastructure."
Discovery: "Do you have a sense of your current CDN or data transfer line item? Even a ballpark per month?"
Numbers to use: AWS S3 egress = $0.085/GB. At 50 TB/month (typical media platform) that's $4,250/month — permanently gone the moment R2 is in front. R2 storage = $0.015/GB. No URL changes for users, no migration of existing content.
12:00–16:00 — Semantic Search: The Silence Moment
Navigate to: dev.pongpisit.com/entertainment → search bar at the top
Type: mind-bending sci-fi with emotional depth — pause 3 full seconds before saying anything
"Inception is in this catalog. Interstellar is in this catalog. The user typed exactly what they want to watch tonight — and got nothing. What do they do? They leave."
Discovery: "Do you track search abandonment or zero-result rate today?"
Navigate to: dev.pongpisit.com/demo/search?ctx=entertainment → Run Semantic Search
"Now the platform understands what they mean, not just what they typed. Inception. Interstellar. WALL·E. Same query, completely different results. The user stays, finds something, watches it."
"Your search engine wasn't rewritten. A Worker sits in front, embeds the query with AI, and returns semantically matched results. Your existing database is untouched."
Business framing: If 20% of daily searches return zero results and recovering 30% of those converts to a session — at 50,000 daily searches that's 3,000 recovered sessions per day. That's not a technical metric, that's subscriber retention.
16:00–19:00 — Local Pricing: Convert Every Market
Navigate to: dev.pongpisit.com/entertainment/subscribe
"When someone in Thailand opens your subscription page and sees $9.99/month — they do the math. Is this available here? Why dollars? That friction kills conversions."
"Watch — this page shows local pricing automatically. Thai users see ฿350. Indonesian users see Rp162,000. Singaporean users see S$13. Same price point, right local context."
"No backend change. The Worker reads the visitor's country from request.cf.country — it's already there on every request, for free, from Cloudflare's network."
Discovery: "Which markets are you most focused on this year? Are you seeing different conversion rates by country?"
Business framing: Platforms that localise pricing typically see 20–40% improvement in free-to-paid conversion from non-USD markets. That's not a design change — it's a Worker and a rate table.
19:00–22:00 — Auto Subtitles: 48× Cheaper
Navigate to: dev.pongpisit.com/demo/subtitles
"If you produce video — lectures, films, training content — subtitles are a requirement in most markets. The standard approach is Amazon Transcribe at $0.024 per minute, plus an S3 bucket, IAM policies, and a batch pipeline."
Upload the local-language audio clip (MP3, 1–2 min)
"Workers AI Whisper does this at $0.0005 per minute — 48 times cheaper. One API call. No batch pipeline, no IAM policies, no S3 bucket."
"The browser splits the audio into 45-second chunks with 5-second overlap, sends 3 in parallel. Two AI passes clean the output — a Southeast Asian language model per chunk, then GPT-OSS 120B reads the full transcript and corrects domain vocabulary globally. You download a subtitle file ready for any video player."
"For 100 hours of new content per month — the difference between a $144 subtitle bill and a $3 subtitle bill."
Click a timestamp to seek the player → Download .vtt file
22:00–25:00 — Voice AI: Support That Never Sleeps
Navigate to: dev.pongpisit.com/demo/voice → click Voice
Speak: "What's a good film for family movie night?"
"Speech-to-text, AI reasoning, text-to-speech — three models, one endpoint, under 2 seconds. No third-party SDK, no per-minute billing to a call centre platform."
Click "Video Call with Agent" — show Visitor and Presenter links
"And if you want a live agent option — this opens a WebRTC video session. Visitor link for the customer, presenter link for your agent. No Zoom subscription. All routed through Cloudflare."
Discovery: "What does your current support infrastructure cost monthly? Is it 24/7 or business hours only?"
24:00–25:30 — Collaborative Room Designer: Real-Time Without the Infrastructure
Navigate to: dev.pongpisit.com/demo/room-designer
"Here's one that usually silences the room with product teams. A collaborative room planner — customers place HÖMSTYLE furniture in a floor plan, see it in 3D, design together in real time. Your product team has been asking for this for two years. Engineering scoped it at six months: WebSocket server, Redis pub/sub, Socket.io cluster, load balancer, real-time database. One Durable Object replaces all five."
Point at the 2D floor plan — KIVIK sofa, HEMNES bed, BEKANT desk already placed
"Drag the sofa. Watch the green border — you hold the lock. Nobody else can move it simultaneously. That's the DO's SQLite lock system. No Redis, no coordination service."
Open a second browser tab (incognito) to the same URL
"Two users, two tabs. Drag in one — it moves in the other instantly. Live cursors show exactly where each person is in the room. Add a piece in Tab 2 — appears in Tab 1 immediately."
"The entire WebSocket server, the SQLite database, the pub/sub broadcast, the lock manager — it's one TypeScript class. Deploy time: under two minutes."
Discovery: "What features has your product team been requesting that engineering keeps pushing back? Is real-time collaboration one of them?"
Business framing: Room planners increase furniture conversion by 20–30% — customers who visualise the fit buy with confidence and return less. The traditional build cost is 3–6 months of engineering time. With Durable Objects: one Worker class, one wrangler deploy. The DO hibernates when idle — 1,000 open connections cost $0/month. First user wakes it in under 5ms.
Land and expand: Room designer is the Durable Objects beachhead. Once DOs are in the account, the conversation opens to KV (edge caching), R2 (image storage), Workers AI (product recommendations). The room planner is the use case that gets the deal started.
25:30–26:30 — SEO: Rich Results Without Touching the Backend
Navigate to: dev.pongpisit.com/demo/seo
"One more that's directly tied to revenue — search visibility. Your product pages are a React SPA. Googlebot crawls every URL and sees the same generic title: your site name. No meta description. No structured data. Products don't appear in Google. When they do, there are no star ratings, no price — nothing to make someone click."
"A Worker intercepts the HTML, reads the product data from your existing database, and injects the right title, meta description, and JSON-LD structured data for every URL. Googlebot now sees a unique, keyword-rich title for each product. Your pages become eligible for Rich Results — stars, price, availability — directly in Google Search."
Click "Inject SEO Tags" → switch to Google Preview tab
"That's what shows up in Google when someone searches for your product. Unique title, meta description, star rating, price. Zero backend changes."
Discovery: "Do your product pages show up in Google Search today? Are you seeing organic traffic from product-specific queries?"
Business framing: Rich Results (stars + price in Google) increase CTR by 20–30% on average. For an e-commerce site with 10,000 product pages that were previously invisible — the SEO uplift compounds every day.
26:30–29:00 — Put a Real Number on It
Navigate to: dev.pongpisit.com/developer-pricing → click the closest preset to their use case
"Let me build an estimate based on what you've told me about your volumes."
Adjust Workers, R2, Workers AI, and AI Gateway sliders to approximate customer numbers
"This is your approximate monthly Cloudflare cost across everything we just showed. The R2 line is especially worth noting — this is what you stop paying AWS for egress. Download this as a spreadsheet for your finance team."
Click "Download CSV"
Discovery: "Is infrastructure cost a decision your team makes, or does it go through FinOps or procurement?"
29:00–30:00 — Try It Yourself + Close
Navigate to: dev.pongpisit.com/try
"Everything you've seen is deployable on your own domain today. Two scenarios here — AI chatbot on any existing site, and R2 image cache with zero egress. Both have copy-paste Worker code. The R2 one needs zero configuration — it works on whatever hostname it's deployed to."
"Your engineering team can have either of these running in under an hour."
"To summarise what we covered:"
— AI Chatbot injected via HTMLRewriter — 30-min deploy, no backend change
— R2 — $0 egress permanently, every cached asset auditable with timestamp + metadata
— Semantic search — understands intent, reduces abandonment
— Local pricing — auto-converts for any market via request.cf.country
— Subtitles — 48× cheaper than AWS Transcribe, dual-model pipeline (Nova-3 for English, Whisper for local languages)
— Voice AI — STT + LLM + TTS in one endpoint, 24/7
— Collaborative Room Designer — real-time multi-user sync, one DO replaces WebSocket server + Redis + load balancer + real-time DB
— SEO — unique title + meta description + JSON-LD Rich Results for every product URL
— Image resizing — 1.8 MB product photo → 180 KB WebP on mobile, just a URL prefix
— AI review assistant — pill tags + full review drafts, 3× review completion
None of this touched your backend. None required a sprint.
Ask for the meeting: "Which of these would have the highest business impact for your team in the next 90 days? I'd like to get our SE team on a 45-minute call with your engineering lead to scope a 2-week pilot. Would Tuesday or Thursday work?"
Objection Handling
"We're already on AWS / GCP and it works fine."
"Totally — those are great platforms. Cloudflare doesn't replace them. We sit in front. Your AWS infrastructure stays exactly as it is. We add AI, eliminate egress fees, and handle localisation at the edge. Your cloud bill gets smaller. Your product gets better. Nothing breaks."
"Our engineering team is really busy right now."
"That's exactly the point. What you just saw — the chatbot, the R2 cache, the local pricing — none of it required your engineers to touch their existing sprint. Workers are additive. We deploy in front. Your team stays focused on what they're already building."
"We already have a chatbot / search tool."
"What are you paying for it monthly? Most teams we talk to are paying $2,000–$10,000/month for third-party search and chat. Workers AI consolidates that into your Cloudflare bill at a fraction of the cost — and you own the model, the data, and the deployment."
"How is this different from OpenAI or Azure AI?"
"OpenAI gives you a model. Azure AI gives you a model. You still have to build the servers, the APIs, the caching, the scaling. Workers AI gives you models already integrated into the edge network — in 330 cities, with no GPU servers to manage and no egress fees when your data moves between services. You're calling one API instead of building infrastructure."
FAQ — Technical & Business Questions
A Cloudflare Worker is deployed as a DNS route in front of your existing domain. When a request arrives, Cloudflare intercepts it before it ever reaches your origin server. The Worker can:
Read and modify the request before passing it on
Call your origin and receive the response
Transform, enrich, or replace the response before the browser sees it
Respond entirely from the edge — without calling origin at all
Your origin never sees the Worker code. It receives a normal HTTP request and returns a normal HTTP response. The Worker is the layer in between — transparent to the origin, invisible to the user, but powerful enough to add AI, caching, localisation, and entirely new API endpoints.
Browser → Cloudflare Worker → Your origin (unchanged)
Browser ← Cloudflare Worker ← Your origin response
Analogy: Think of it as a translator sitting between two people who speak different languages — both sides communicate normally and neither has to change how they speak.
No. Workers are designed with three safety layers:
Instant rollback — wrangler rollback redeploys the previous version in under 10 seconds, globally.
Passthrough pattern — the standard Worker pattern calls return fetch(request) for anything it doesn't handle. If the Worker throws, Cloudflare can be configured to fall back to origin automatically.
Subrequest isolation — each Worker invocation runs in a V8 isolate. A crash in one request doesn't affect any other.
In practice, Workers have the same deployment risk profile as a CDN configuration change — which teams do routinely.
Workers run in V8 isolates — not containers, not VMs. Cold start is under 1ms. After the first request, the isolate stays warm. Compare that to a Lambda cold start of 100–300ms.
For most use cases — pass-through, header injection, simple redirects — the Worker adds 0–2ms. For AI inference (Llama, Whisper), the model runs at the Cloudflare PoP nearest to the user, which is often faster than a round-trip to a centralised AI API in a single US region.
If the Worker serves content from R2 or KV instead of proxying to origin, the end-to-end latency is often lower than before the Worker was added.
Workers process requests at the PoP that receives the user's connection — data doesn't leave that region unnecessarily. For AI inference, Workers AI runs on Cloudflare's own GPU infrastructure, not a third-party provider. Cloudflare does not use your inputs to train models.
For regulated industries: Cloudflare is SOC 2 Type II, ISO 27001, PCI DSS Level 1, and HIPAA-eligible. R2 supports regional data residency. Workers can be restricted to specific regions. This is a detailed scoping conversation we can have with your security team — but the short answer is: data stays where you configure it to stay.
Yes — and that's actually the ideal starting point. If your domain is already proxied through Cloudflare (orange cloud in DNS), deploying a Worker is a 10-second operation. You don't change DNS, you don't change your origin. You just add a route and deploy code. Everything you already have — DDoS protection, WAF, CDN — keeps working alongside it.
Cloudflare provides built-in observability for Workers:
Workers Logs — real-time structured logs with filtering, tail logs via CLI or dashboard
Analytics Engine — write custom metrics from Workers, query via SQL API
AI Gateway — logs every AI request with token usage, cost, latency, and model — queryable dashboard
Deployments & rollbacks — every deploy is versioned. Rollback is one command.
Alerts — CPU time, error rate, request count — all alertable via Cloudflare or your existing alerting stack (PagerDuty, OpsGenie)
Workers also integrate with your existing CI/CD — GitHub Actions, GitLab CI, CircleCI — via Wrangler CLI. Deploy on merge, run integration tests, rollback automatically on failure.
Workers Paid plan is $5/month and includes 10M requests/month. Overage is $0.30/million — far below AWS Lambda ($0.20/million + $0.0000166667 per GB-second). R2 has zero egress fees — whatever you're currently paying for CDN egress disappears. Workers AI is billed per neuron — most models are free up to daily limits on the Paid plan.
The pricing calculator at dev.pongpisit.com/developer-pricing lets you build a full BOM with your actual numbers. Most teams find the Cloudflare bill is offset entirely by what they stop paying for egress and third-party tools.
It depends on what you're deploying. Real timelines from customers:
AI Chatbot injection 30 min — 2 hours
R2 image cache 1–2 hours including bucket setup
Local pricing 2–4 hours with currency table
Semantic search 1–3 days (index seeding + tuning)
Auto subtitles 4–8 hours end-to-end pipeline
Full AI platform 2–4 weeks production-hardened
The 2-week pilot we recommend covers the 2–3 highest-impact features for your stack, with Cloudflare SE support throughout. That's enough time to measure real uplift before any larger commitment.
Discovery Questions Bank
Use throughout — do not front-load
Infrastructure Cost
"What's the biggest line item on your cloud bill right now?"
"How much are you paying for CDN or data transfer monthly?"
AI & Support
"Do you have a chatbot today? How long did it take to build?"
"Is your support 24/7? What's your average ticket cost?"
Search & Discovery
"Do you know your zero-result search rate?"
"What do users complain about most with search on your platform?"
Market Expansion
"Which markets are you most focused on this year?"
"Are you seeing lower conversion from non-USD markets?"
Product Velocity
"How long does a feature request take from approval to production?"
"What's on your roadmap that engineering says will take the longest?"
Decision Process
"Who needs to be in the room for a pilot decision?"
GoalEstablish Workers architecture clearly, then hit the two best wow moments (chatbot injection + R2 metadata) early while attention is high — then breadth
Timing Cheat Sheet
Time
Section
URL
0:00
Opening question + architecture setup
/entertainment
2:30
Routes — deep architecture walkthrough
/demo/routes
5:00
AI Chatbot via HTMLRewriter — injection pattern
/entertainment → guide card
8:30
R2 + X-R2-Bypass + customMetadata + cache checker
/demo/r2 → /try
13:30
AI Sentiment — 5-star trap demo
/demo/sentiment
16:00
Semantic search — the silence moment
/entertainment → /demo/search
19:00
Local pricing at the edge
/entertainment/subscribe
21:00
KV edge caching
/demo/caching
22:30
Auto subtitles — live upload
/demo/subtitles
25:30
Voice AI + WebRTC video call
/demo/voice
27:00
Collaborative Room Designer — Durable Objects
/demo/room-designer
28:30
SEO — Rich Results via HTMLRewriter + JSON-LD
/demo/seo?ctx=entertainment
29:00
APIs at the edge
/demo/api-edge?ctx=entertainment
29:30
Pricing BOM live build + Close
/developer-pricing → /try
Before You Start
0:00–2:30 — Opening: The One Concept That Unlocks Everything
Navigate to: dev.pongpisit.com/entertainment
Opening question: "When your engineering team gets a new feature request — AI search, a chatbot, local pricing — what's the typical journey from approval to production? Three months? Six?"
"What you're looking at is a live demo environment. 15 use cases across AI, storage, search, caching, SEO, real-time collaboration, and more — deployed across two real websites: a furniture store and a streaming platform. Real Cloudflare APIs. Real AI models. Real data. No mocks."
"Every single feature was deployed without touching either origin server. Not a line of backend code changed. That's possible because of one concept I want to establish before anything else."
2:30–5:00 — Routes: How Workers Intercept the Request Lifecycle
Navigate to: dev.pongpisit.com/demo/routes
"A Worker is deployed against a route pattern — *yourdomain.com/* or more specific like *yourdomain.com/images/*. Every matching request hits the Worker before it reaches your origin."
"The Worker has full access to the request: headers, body, URL, method, and Cloudflare's free metadata — country, city, PoP, ASN. It can fetch the origin response, read it, transform it, and return something different. The origin has no idea any of this happened."
Click the animation — walk through the request → Worker → origin → Worker → browser flow
Deploy model:wrangler deploy — seconds to go live across 330 cities in 125+ countries. Roll back in one command. No Kubernetes, no EC2, no containers, no cold-start overhead — V8 isolate, under 1ms startup. ~50ms from 95% of the world's internet-connected population.
Execution model: Workers are stateless by default. State lives in KV (eventually consistent), D1 (SQLite at edge), R2 (object storage), or Durable Objects (strongly consistent, single-instance coordination). Each tool has a precise use case — not everything needs a database.
5:00–8:30 — AI Chatbot via HTMLRewriter: The Injection Pattern
Navigate to: dev.pongpisit.com/entertainment → Developer Guide section → AI Chat guide card → Activate
"Now I want to show you something immediately practical. Watch the bottom right corner."
Click Activate — chat widget appears on the page
"That widget was not in the HTML the server sent. The origin returned the same response it always does. The Worker piped that response through HTMLRewriter — a streaming HTML transformer built into the Workers runtime — and appended the chat widget before the closing </body> tag."
"The Worker also created a new API endpoint — POST /api/chat — that didn't exist on the origin. Worker B wraps Worker A. You can layer capabilities without merge conflicts, without touching source code."
Type: What should I watch on a rainy Sunday afternoon?
Technical depth: HTMLRewriter uses a CSS-selector API. .on('body', { element(el) { el.append(script, {html:true}) } }) — the transform is streaming, so it doesn't buffer the full HTML. Response time to first byte is unchanged.
Model:@cf/meta/llama-3.1-8b-instruct-fast — included in Workers Paid plan free tier. Context window set to last 6 messages. System prompt is site-specific — swappable per route.
Pattern applies to: any chatbot, any A/B test, any personalisation injection, any analytics tag — without a frontend deploy.
8:30–13:30 — R2: Zero Egress + X-R2-Bypass + customMetadata
Navigate to: dev.pongpisit.com/demo/r2 → Run Demo
"Now let's talk about infrastructure cost. Every image request your users make — your cloud provider charges egress. AWS S3: $0.085/GB. GCP: $0.12. Azure: $0.087. It's not visible day-to-day but it compounds."
"R2 stores the same objects, serves them from the nearest Cloudflare PoP, charges zero egress. Permanently."
Click "Run Demo" — watch cost meters
Worker architecture — every detail:
1. Any hostname, zero config: No ORIGIN constant. Worker intercepts /images/* on whatever domain it's deployed to — works on lazada.com, shopee.com, any custom hostname.
2. Self-loop prevention via X-R2-Bypass: On cache miss, the Worker needs to fetch from origin — but origin is behind the same route. So it re-fetches the request with X-R2-Bypass: 1 header. The next Worker invocation sees that header and immediately return fetch(request) — Cloudflare routes to the real origin. No infinite loop, no external config.
3. Non-blocking cache write:ctx.waitUntil() stores to R2 in the background — user gets their response immediately, doesn't wait for the write to complete.
4. customMetadata on every put: Three fields written on first cache: cached-at (ISO timestamp), size-bytes (original file size), origin-url (source URL). Visible in R2 dashboard. Returned as X-R2-Cached-At, X-R2-Size, X-R2-Origin response headers.
5. HEAD vs GET: The gallery cache checker uses env.R2.head(key) — reads all metadata without downloading the image body. 30 parallel HEAD requests verify the full gallery in under 2 seconds, zero bandwidth cost.
Navigate to: dev.pongpisit.com/try → R2 Image Cache scenario
"The demo ZIP here contains a built-in cache checker. Upload it to Pages as your origin, deploy the Worker, load the gallery once — then paste your Worker URL into the checker. 30 HEAD requests fire in parallel. Every card shows a green badge with the exact ISO timestamp from customMetadata, the file size, and the origin URL."
"Open the R2 dashboard right now — every object has those three metadata fields. That's not just object storage. That's a queryable audit log of when every asset entered your edge infrastructure."
Discovery: "Do you have visibility today into which specific assets are cached and when they were first stored? Or is it just a hit/miss rate?"
13:30–16:00 — AI Sentiment: The 5-Star Trap
Navigate to: dev.pongpisit.com/demo/sentiment → Run AI Sentiment Analysis
"Before I show this — quick question. Do you look at star ratings on reviews? Most e-commerce teams do. The aggregate number looks fine, maybe 4.2 stars."
"Watch what the AI finds."
Wait for results — point to Wanchai B.'s review with the red mismatch banner
"Five stars. The star average includes this. But read what it says — shelf collapsed, nearly hit a child, filing a complaint. The reviewer explicitly wrote 'I am giving 5 stars so this review gets seen.'"
"distilbert-sst-2-int8 catches it immediately — NEGATIVE, high confidence. The star score said nothing. The AI saw through it."
"The Worker wraps your existing reviews API. Fetches what you already have. Runs each review through the model in parallel via Promise.all(). Returns enriched JSON with sentiment scores appended. Your reviews endpoint is unchanged."
Pattern: Wrap existing API → add AI enrichment in-flight → return richer JSON. Zero origin changes. Works for any JSON endpoint — reviews, support tickets, form submissions, social comments.
Parallel execution:Promise.all(reviews.map(r => env.AI.run(...))) — all reviews scored simultaneously. Not sequential. On Workers, concurrent AI calls to the same PoP cost the same as one.
16:00–19:00 — Semantic Search: The Silence Moment
Navigate to: dev.pongpisit.com/entertainment → search bar
Type: mind-bending sci-fi with emotional depth — pause 3 seconds, say nothing
"Inception is in this catalog. Interstellar is in this catalog. Zero results because SQL LIKE looks for the literal string."
Navigate to: dev.pongpisit.com/demo/search?ctx=entertainment → Run Semantic Search
"The Worker embeds the query using @cf/baai/bge-m3 — a multilingual embedding model that handles Thai, Indonesian, Vietnamese, and 100+ languages out of the box. Queries Vectorize for cosine similarity. Enriches from D1. Under 50ms from any PoP."
Stack: Workers AI (bge-m3 embeddings) + Vectorize (ANN index, cosine similarity) + D1 (SQLite metadata enrichment). All in one Worker deployment.
Index setup: One-time seeding — embed your catalog, insert into Vectorize. Incremental updates on new content. Query latency is O(1) — Vectorize uses HNSW approximate nearest neighbour, not brute-force scan.
Multilingual: bge-m3 is trained on 100+ languages. Same index serves Thai, English, Indonesian queries without separate models.
19:00–21:00 — Local Pricing: request.cf.country, $0 Cost
Navigate to: dev.pongpisit.com/entertainment/subscribe
"request.cf.country, request.cf.city, request.cf.colo — free metadata on every single request. No geo-IP API subscription. No database lookup. No latency."
"This page auto-detects your location and shows local pricing. Worker reads the country, looks up the exchange rate, rewrites the price elements via HTMLRewriter. Origin sends the same HTML. User sees their currency."
Use the currency switcher to flip between THB / IDR / SGD / MYR / JPY / GBP — 11 currencies total — show prices updating instantly
Comparison: MaxMind GeoIP2 City = $24/month + SDK + database download. AWS Location Service = $0.50/1,000 queries. request.cf.country = $0, 0ms, already in the request object on every invocation.
Pattern extends to: content language defaulting, regulatory compliance (GDPR banners only for EU), market-specific feature flags — all without a separate geo service.
Navigate to: dev.pongpisit.com/demo/caching → Run Demo
"At 100,000 daily active users, you're potentially making 100,000 database calls for the same catalog data that hasn't changed in hours. KV checks the edge first. Hit: under 10ms from the PoP handling the request. Miss: query D1 once, write to KV with a 60-second TTL. The next 9,999 requests in that window never touch the database."
KV characteristics: Eventually consistent globally — writes propagate to all PoPs in ~60 seconds. Reads are always local to the PoP — no cross-region roundtrip. 10M reads/month included on Workers Paid plan.
When to use KV vs D1 vs R2: KV = high-read, low-write, eventually consistent (catalog, config, feature flags). D1 = relational, ACID, SQLite API (user data, orders, sessions). R2 = objects, blobs, binary (images, files, backups).
22:30–25:30 — Auto Subtitles: Three-Model Pipeline
Navigate to: dev.pongpisit.com/demo/subtitles
"AWS Transcribe: $0.024/min + S3 + IAM + batch pipeline. Workers AI Whisper: $0.0005/min — 48× cheaper. One API call."
"The pipeline: the browser decodes the audio to 16kHz mono PCM, splits into 45-second chunks with 5-second overlap, sends 3 chunks in parallel. Model routing is automatic — Nova-3 ($0.0052/min) for English, Whisper ($0.0005/min) for Thai and other ASEAN languages. Then two AI correction passes: @cf/aisingapore/gemma-sea-lion-v4-27b-it per chunk for regional language nuance, then @cf/openai/gpt-oss-120b reads the full assembled transcript, detects the video domain (medical, tech, legal), and corrects domain vocabulary globally. Two AI passes — one for language, one for domain accuracy."
Upload the local-language audio clip
While processing: "Audio decoded to 16kHz mono PCM, split into 45-second chunks with 5s overlap, sent in batches of 3 simultaneously — two AI passes clean the output"
Result arrives → click a timestamp to seek → toggle VTT view → Download .vtt
Models used:
— @cf/openai/whisper-large-v3-turbo — STT, all languages, $0.0005/min
— @cf/aisingapore/gemma-sea-lion-v4-27b-it — per-chunk ASEAN language cleanup
— @cf/openai/gpt-oss-120b — global pass: domain vocabulary, speaker consistency
Cost at 200 hrs/month:
— English via Nova-3: $0.0052 × 12,000 min = $62 (vs AWS $288). Still 4.6× cheaper, no infrastructure.
— Thai/ASEAN via Whisper: $0.0005 × 12,000 min = $6 (vs AWS $288). 48× cheaper.
Why two models? Nova-3 has lower word error rate for conversational English — better for customer support recordings. Whisper handles 100+ languages including Thai, Indonesian, Vietnamese — Nova-3 hallucinates on ASEAN phoneme clusters.
25:30–27:00 — Voice AI: Three Models, One Endpoint
Navigate to: dev.pongpisit.com/demo/voice → click Voice
Speak: "What's a good film for family movie night?"
"@cf/deepgram/nova-3 for STT — lower word error rate than Whisper for conversational speech, critical for a support use case where users speak naturally. @cf/meta/llama-3.1-8b-instruct-fast for LLM reasoning. @cf/deepgram/aura-2-en — arcas voice — for TTS. Three models, one Worker endpoint, under 2 seconds end-to-end."
Click "Video Call with Agent" — show Visitor and Presenter links
"RealtimeKit spins up a WebRTC session via Cloudflare Calls. Visitor link for the customer, presenter link for the agent. No Zoom, no Twilio, no third-party SDK billing. The media is routed through Cloudflare — which means it works without the WebRTC latency penalty of a US-only TURN server."
"This is the Durable Objects demo. One class — WebSocket server, SQLite database, pub/sub broadcast, furniture lock manager, live cursors — all in a single TypeScript file. No Redis. No Socket.io. No load balancer."
Point at the 2D floor plan — 3 pre-seeded rooms visible in the dropdown (Vardagsrum, Sovrum, Hemmakontor)
"Three rooms, three separate DO instances. Each room's state is completely independent. env.ROOM_DESIGNER.idFromName('vardagsrum') — that's the whole routing layer."
Click the KIVIK sofa to select it — drag it across the floor
"Green border — you hold the SQLite lock. UPDATE furniture SET x=?, y=? WHERE id=? AND locked_by=? — atomic write, broadcast delta to every connected WebSocket. Release on pointer-up."
Press R to rotate — then Del to remove — then open the bottom Add Furniture panel and click BILLY Bookcase to add it back
"Every message type — grab, move, rotate, release, add, remove, cursor, reset — is handled in webSocketMessage(). One handler, no external broker."
Click 3D button — orbit the scene — click the sofa in 3D — drag the ↕ Height slider
"Three.js renders the room with recognisable multi-part furniture shapes. Click any piece to select it — raycasting hits the bounding volume, finds the group, reads the furniture ID from the mesh key. Height slider adjusts the group's Y position client-side — useful for placing a lamp on a table."
Open a second tab (incognito) — drag furniture in Tab 1
"WebSocket Hibernation: the DO sleeps when all connections are idle. Memory cost drops to zero. When the first message arrives, it wakes in under 5ms — ctx.acceptWebSocket(server) with server.serializeAttachment(session) survives the sleep cycle. 1,000 idle connections cost nothing per month."
Point at the Live Events bar — show DO version counter incrementing
"DO v{n} — every state change increments this counter via SQLite. It's not in memory. It survives a Worker restart. The presence strip shows every connected user's name and colour — this.ctx.getWebSockets() iterates the live sessions."
Press ↺ Reset — watch both tabs snap back to the seed layout simultaneously
Technical depth:
— DO class registered with new_sqlite_classes, not new_classes — required to enable ctx.storage.sql
— WebSocket Hibernation: ctx.acceptWebSocket() (not native ws.accept()) — sessions serialised via serializeAttachment() survive hibernation
— SQLite schema: furniture table (id, catalog_id, label, width, depth, x, y, rotation, color, locked_by) + room_meta table (key/value pairs for room dimensions and version)
— Lock system: locked_by column — grab sets it, release clears it, WebSocket close releases all locks held by that session
— Seed layout: 3 rooms seeded on first access — real HÖMSTYLE products (ek01–ek20) with real dimensions from the product catalog
— Protocol messages: join / grab / move / rotate / release / add / remove / cursor / reset / ping
— Broadcast patterns: broadcastAll() for state changes, broadcast(msg, skip) for cursors (skip sender)
28:30–29:00 — SEO: Rich Results via HTMLRewriter + JSON-LD
Navigate to: dev.pongpisit.com/demo/seo?ctx=entertainment
"React SPA. Every film URL sends <title>STREAMVAULT</title> to Googlebot. No meta description. No structured data. The film pages don't rank individually."
"The Worker reads the film ID from the URL, fetches metadata from D1 or KV, and uses HTMLRewriter to rewrite the <title>, inject <meta name="description">, and append a <script type="application/ld+json"> Movie schema block — all in the response stream before the browser sees it. Origin sends the same HTML it always did."
Click "Inject SEO Tags" → switch to Google Preview tab → then JSON-LD tab → then HTML Head tab
Technical depth:
— HTMLRewriter is streaming — no buffering, TTFB unchanged
— JSON-LD Movie schema: aggregateRating.ratingValue, director, dateCreated, genre — all from D1
— KV caches metadata per film for 1h — origin D1 called once, edge serves thereafter
— Same pattern works for Product (e-commerce), Article (blogs), Course (ed-tech)
— Google Rich Results eligibility: star ratings, runtime, and release year visible directly in SERPs
29:00–29:30 — APIs at the Edge: New Endpoints, Zero Origin
Navigate to: dev.pongpisit.com/demo/api-edge?ctx=entertainment
Click each endpoint: Mood Discovery → Content Check → Watch Order → Plan My List
"Four API endpoints. None existed on the origin. The Worker handles them entirely at the edge — origin never receives these requests. Each is independently deployable. If one throws an error, the others keep running. No microservices orchestration, no container cluster."
Pattern: Worker matches pathname → handles at edge → returns JSON. Origin is never called for these routes. Because Workers are route-matched, you can add endpoints to any existing domain without touching the origin's routing table.
29:30–30:00 — Pricing: Build the BOM Live + Close
Navigate to: dev.pongpisit.com/developer-pricing
Ask first: "Roughly how many API requests per month? What's your storage footprint — images, videos, files? How many AI calls per day across search, chatbot, voice?"
Click closest preset → adjust sliders to their actual numbers
Key numbers to walk through:
Workers: 10M requests/month included at $5/month base — overage $0.30/million
AI Gateway Enterprise: 200K logs included, $8/100K overage — each AI query = 2 logs (request + response)
Vectorize: 10M stored + 50M queried dimensions included — scales to hundreds of millions
KV: 10M reads/month included — most catalog caching use cases are free tier
Workers AI: per-neuron billing — most models free up to daily limits on Workers Paid
Click "Copy Summary" → paste to Salesforce/CPQ · Click "Download CSV" → spreadsheet for their finance team
TCO comparison: When comparing vs AWS, remove the egress line items entirely. At enterprise content scale that alone is $5,000–$20,000/month that disappears from the bill.
29:30–30:00 — Close: Try it Yourself
Navigate to: dev.pongpisit.com/try
"Two scenarios here — AI chatbot on any existing site, and R2 image cache with zero egress. Both have copy-paste Worker code. The R2 Worker needs zero configuration — no ORIGIN to set. It adds X-R2-Bypass: 1 when fetching origin, breaking the self-loop. Deploy it to any hostname, bind an R2 bucket, done. Self-configuring."
"Two more if their engineering team wants to explore further: Image resizing — 1.8 MB product photo → 180 KB WebP on mobile, just a /cdn-cgi/image/ URL prefix, zero new infrastructure. And AI Content Generation at /demo/feedback — Workers AI generates contextual review pill tags and full draft reviews, 3× review completion rate."
Final question: "Which of these has the highest business impact for your team in the next 90 days — and what would a 2-week proof of concept look like for you?"
Stop. Let them answer.
FAQ — Technical Deep Dives
The Worker is additive, not a replacement. For SSR apps, the standard pattern is:
const res = await fetch(request) // your SSR origin responds normally
return new HTMLRewriter() // Worker transforms the response stream
.on('body', { element(el) { el.append(widget, {html:true}) } })
.transform(res) // streaming — no buffering, no latency
HTMLRewriter is a streaming HTML parser — it does not buffer the full response. TTFB is unchanged. The origin generates the full SSR response as normal; the Worker transforms it in flight as bytes pass through. Your React/Next/Nuxt/Rails app has no idea this is happening.
For routes where you don't want the Worker to intervene: return fetch(request) — it passes through unmodified. You can route at path-level granularity.
This is a real concern with route-intercepting Workers and there's a clean pattern to solve it — you saw it in the R2 demo. The Worker adds a header on its internal fetch:
// First line of the Worker — bypass check
if (request.headers.get('X-R2-Bypass') === '1') return fetch(request)
// On cache miss — re-fetch with bypass header
const originReq = new Request(request, {
headers: { ...Object.fromEntries(request.headers), 'X-R2-Bypass': '1' }
})
const res = await fetch(originReq) // Worker sees bypass → passes through
On the second invocation, the Worker sees the bypass header and immediately calls return fetch(request), which Cloudflare routes to the real origin without triggering the Worker again. No loop. Works on any hostname without hardcoding a URL.
Three meaningful differences:
Dimension
OpenAI from backend
Workers AI at edge
Latency
Your server → US API → back = 200–800ms round-trip for ASEAN users
Model runs at nearest PoP — Bangkok user calls Bangkok GPU
Data egress
Your prompt/context travels from server → OpenAI, billed as egress
Stays on Cloudflare network, $0 egress between Workers and Workers AI
Infrastructure
You manage rate limiting, retries, caching, API key rotation
AI Gateway handles all of this — built-in rate limits, caching, logs
Workers AI doesn't prevent you from calling OpenAI — you can still use GPT-4o for tasks that require it. Workers AI covers the high-volume, latency-sensitive use cases (search, tagging, classification) at a cost that makes those use cases viable at scale.
R2 customMetadata is a Record<string, string> stored alongside the object — similar to S3 user-defined metadata but without the 2KB size limit. It's written on put and readable on get or head:
// Read (head — no body download)
const obj = await env.R2.head(key)
const ts = obj.customMetadata?.['cached-at'] // "2025-04-09T08:23:01Z"
The metadata is visible in the R2 dashboard, returned in the API, and can be surfaced as response headers (X-R2-Cached-At) for client-side verification. Unlike S3 object tags, metadata doesn't require a separate API call — it's part of the object itself.
KV is eventually consistent with a propagation time of roughly 60 seconds globally. For product catalogs, pricing tables, configuration, and feature flags — 60-second staleness is completely acceptable and usually invisible to users.
For data that needs stronger consistency — user sessions, order state, real-time inventory — use Durable Objects (strongly consistent, single-instance) or D1 (SQLite, ACID transactions). The choice of storage primitive depends on the consistency requirement of the specific data, not the whole application.
In practice: KV for read-heavy, infrequently-changing data. D1 for relational data with writes. R2 for objects. Durable Objects for coordination and real-time state. These aren't competing — they're complementary.
Workers are designed for request-response workloads, not long-running processes. Key limits on the Paid plan:
CPU time: 30s per request (wall clock time can be longer for I/O-bound work)
Memory: 128 MB per isolate
Subrequests: 1,000 fetch() calls per request
For tasks that exceed these limits — batch AI processing, large file transcoding, report generation — use Cloudflare Queues + Workers: the Worker enqueues the job and returns immediately. A consumer Worker processes it asynchronously with up to 15 minutes of wall time. For even longer: Workers Workflows provides durable execution with automatic retries and state persistence.
Workers run on the V8 engine with a Web Standards API surface — not Node.js. Most npm packages that don't depend on Node.js built-ins (fs, net, child_process) work fine. Packages that use only Web APIs, pure JS, or have Workers-compatible builds are fully supported.
For Node.js compatibility: Workers supports the nodejs_compat flag which polyfills a growing set of Node.js APIs. Major packages like Stripe, Prisma (edge client), Zod, date-fns, and most utility libraries work without changes. The Cloudflare Workers npm compatibility table lists what's supported.
The Workers in this demo use zero npm dependencies — everything is Web Standard APIs. That's intentional for simplicity, but real production Workers commonly use Hono (router), Zod (validation), Drizzle (D1 ORM), and similar packages.
Discovery Questions to Weave In
Scatter these at natural transitions — never front-load:
"What's your current approach to caching — are you running something in front of your origin today?" (before Routes)
"Do you have a chatbot today? What's the deploy model — separate service, or injected somehow?" (before chatbot)
"Do you have visibility into which assets are cached and when they were first stored?" (during R2 customMetadata)
"Do you look at star ratings as a signal in your reviews system? How are you surfacing problems?" (before sentiment)
"What does your search look like today — SQL LIKE, Elasticsearch, something else?" (before search)
"Which markets are you most focused on — are you seeing different conversion by country?" (before local pricing)
"How many hours of video or audio content do you produce per month? What languages?" (before subtitles — note: Nova-3 for English, Whisper for local languages)
"What features has your product team been asking for that engineering keeps scoping as 3–6 month projects? Is real-time collaboration one of them?" (before Room Designer)
"Do your product pages rank individually in Google, or does every URL look the same to Googlebot?" (before SEO)
"What's the largest image file size your users download on mobile?" (before image resizing — if relevant)
"Who makes infrastructure cost decisions at your company — is FinOps involved?" (before pricing)