jpskill.com
🛠️ 開発・MCP コミュニティ 🔴 エンジニア向け 👤 エンジニア・AI開発者

🛠️ Cloudflare Workers Architect

cloudflare-workers-architect

Cloudflare Workersを活用し、ウェブサイトやアプリケーションの最適な

⏱ MCPサーバー実装 1日 → 2時間

📺 まず動画で見る(YouTube)

▶ 【衝撃】最強のAIエージェント「Claude Code」の最新機能・使い方・プログラミングをAIで効率化する超実践術を解説! ↗

※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。

📜 元の英語説明(参考)

Design Cloudflare Workers solutions end-to-end — pick the right runtime tier (Workers vs Pages vs Durable Objects vs Workers AI), the right storage (KV vs D1 vs R2 vs Durable Object Storage vs Hyperdrive), the right state pattern (singleton DOs, sharded DOs, hibernating WebSockets, RPC-bound services), and the right limits (CPU time, wall time, subrequest count, request size). Covers R2 multipart uploads, Queues-backed pipelines, Cron Triggers, Tail Workers, Smart Placement, Workers AI model selection, Vectorize embeddings, Hyperdrive for legacy Postgres/MySQL, and migration playbooks from Lambda@Edge, Vercel Edge, Deno Deploy, and AWS API Gateway. Triggers on "cloudflare workers", "cloudflare pages", "durable objects", "workers kv", "d1 database", "r2 storage", "cloudflare queues", "vectorize", "workers ai", "hyperdrive", "smart placement", "tail worker", "cron triggers", "rpc bindings", "wrangler", "service bindings", "edge function", "lambda@edge migration", "vercel edge migration", "deno deploy migration".

🇯🇵 日本人クリエイター向け解説

一言でいうと

Cloudflare Workersを活用し、ウェブサイトやアプリケーションの最適な

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o cloudflare-workers-architect.zip https://jpskill.com/download/4590.zip && unzip -o cloudflare-workers-architect.zip && rm cloudflare-workers-architect.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/4590.zip -OutFile "$d\cloudflare-workers-architect.zip"; Expand-Archive "$d\cloudflare-workers-architect.zip" -DestinationPath $d -Force; ri "$d\cloudflare-workers-architect.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して cloudflare-workers-architect.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → cloudflare-workers-architect フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-17
取得日時
2026-05-18
同梱ファイル
1

💬 こう話しかけるだけ — サンプルプロンプト

  • Cloudflare Workers Architect を使って、最小構成のサンプルコードを示して
  • Cloudflare Workers Architect の主な使い方と注意点を教えて
  • Cloudflare Workers Architect を既存プロジェクトに組み込む方法を教えて

これをClaude Code に貼るだけで、このSkillが自動発動します。

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Cloudflare Workers Architect

Cloudflare の開発者プラットフォーム上で本番システムを設計し、出荷します。各問題に最適なプリミティブを選択し、問題となる制限を特定し、wrangler.toml、バインディング、デプロイスクリプト、およびコストモデルを出力します。マルチテナント SaaS、リアルタイムコラボレーション、AI 推論、高ファンアウトの Webhook を Workers 上で出荷し、Lambda@Edge、Vercel Edge、Deno Deploy からスタックを移行した経験を持つシニアプラットフォームエンジニアとして機能します。

使用方法

新しい Workers プロジェクトを開始するとき、プリミティブ間で選択するとき、リアルタイム機能のサイズを決定するとき、ストレージを選択するとき、移行を計画するとき、または制限に達したときに呼び出してください。「これはどうあるべきか」というアーキテクチャコールや、「この Worker がタイムアウトし続ける」というデバッグにも同様に役立ちます。

基本的な呼び出し:

Cloudflare 上でリアルタイムコラボレーションエディタを設計してください これは Worker、Pages 関数、それとも Durable Object であるべきですか? 80 個の Lambda@Edge ハンドラを Workers に移行してください

コンテキスト付き:

こちらが API サーフェスです — ストレージを選択し、wrangler.toml を記述してください p99 が 30 秒の壁時間制限に達しています。Queues を使用して再設計してください 認証付きで 50k の WebSocket 接続が必要です — DO シャーディングを計画してください

エージェントは、プリミティブの選択、wrangler.toml、バインディング宣言、コードスケルトン、デプロイコマンド、およびコスト予測を出力します。

必要な入力

  • ワークロードの形状 — HTTP API / 静的サイト / 長時間実行ストリーム / WebSocket / バックグラウンドジョブ / スケジュール済み / AI 推論
  • 状態要件 — ステートレス?ユーザーごと?ルームごと?グローバル?結果整合性または強い整合性?
  • スループット — ピーク時のリクエスト/秒、同時接続数、ペイロードサイズ
  • レイテンシターゲット — p50 / p95 / p99 の予算
  • 地理的分布 — グローバル、地域、単一国 (データレジデンシー)
  • 既存の制約 — 移行中の現在のプラットフォーム、固定された外部 API、規制範囲 (GDPR、HIPAA)
  • コスト上限 — Workers の無料ティアは多くのものをカバーします。月額 200 ドルを超える場合は、実際の設計選択が必要です

ワークフロー

  1. ワークロードを決定木 (下記) に照らして分類します
  2. 選択マトリックスからストレージを選択し、wrangler.toml でバインディングを宣言します
  3. すべてのリクエストパスをプリミティブ (Worker / Pages Function / DO / Queue consumer / Cron Trigger) にマッピングします
  4. 最初にかみつく制限を特定し、コードの前にそれを回避するように設計します
  5. すべてのバインディング、ルート、互換性フラグを含む wrangler.toml を作成します
  6. データフローをスケッチします: どのサブリクエストが、どの順序で、どのパスで発生するか
  7. 可観測性を配線します: Workers Analytics Engine + デバッグログ用の Tail Worker + R2/外部への Logpush
  8. wrangler dev --remote (実際のバインディング) を使用してローカルで実装およびテストします
  9. wrangler deploy を介してデプロイします。段階的なデプロイを介してカナリアデプロイを行います
  10. ロールバックを文書化します (wrangler rollback を既知のバージョン ID に)

決定木: Pages vs Workers vs Durable Objects vs Workers AI

START
 ├── リクエストパスは静的アセット (HTML/JS/CSS/画像) ですか?
 │     └── YES → Pages (または完全な制御が必要な場合は Workers Sites)
 │
 ├── 動的だがステートレスですか (ルックアップ、変換、プロキシ、認証)?
 │     └── YES → Worker (HTTP fetch ハンドラ)
 │
 ├── グローバルに一貫性があり、シリアライズされる必要があるエンティティごとの状態 (ユーザーごと、ルームごと、ドキュメントごと) が必要ですか?
 │     └── YES → Durable Object
 │             ├── ユーザーと 1 対 1 の場合 → ユーザーごとの DO、ID = userId
 │             ├── 共有されている場合 (コラボレーションドキュメント、チャットルーム) → ルームごとの DO
 │             └── グローバルカウンター / グローバルキューの場合 → シングルトン DO
 │
 ├── 長時間実行ストリーム / WebSocket ですか?
 │     └── YES → Hibernating WebSockets を備えた Durable Object
 │             (無料のハイバネーション; 実際のメッセージに対してのみ課金)
 │
 ├── AI 推論 (LLM、埋め込み、Whisper、画像) ですか?
 │     └── YES → Workers AI バインディング (CF の推論フリートを呼び出す)
 │
 ├── スケジュールされたジョブですか?
 │     └── YES → Cron Trigger を備えた Worker
 │
 ├── キュー駆動のパイプライン (Webhook、ファンアウト、リトライ) ですか?
 │     └── YES → Worker プロデューサー + Queue + Worker コンシューマー
 │
 └── 低レイテンシでレガシーな Postgres/MySQL と通信する必要がありますか?
       └── YES → Hyperdrive バインディング (接続プール + リージョンピンニング)

Pages vs Workers のニュアンス: Pages = 静的 + オプトインの functions/。サイトがほとんど静的で、いくつかの API ルートがある場合は Pages を使用します。API が製品である場合、または高度なバインディング (DO、Queues、RPC) が必要な場合は Workers を使用します。

Pages Functions は Workers の内部で動作します — 同じランタイム、同じ制限、少ない設定ノブ。Cron トリガー、キューコンシューマー、スマートプレースメント、カスタムルート、またはサービスバインディングが必要な場合は、Pages Functions → Worker に移行してください。

ストレージ選択マトリックス

ストレージ 読み取りレイテンシ 書き込みレイテンシ サイズ上限 一貫性 コスト いつ
Workers KV <50ms (キャッシュ済み) 数秒 (結果整合性) 25 MiB/値 結果整合性 (60秒) $0.50/M 読み取り, $5/M 書き込み 読み取りが多いグローバル設定、機能フラグ、キャッシュされた HTML
D1 5-50ms 5-50ms 10 GB/db 強い (リージョン内) $0.001/1k 読み取り, $1/1M 書き込み リレーショナルアプリデータ、書き込みが少ない
R2 50-200ms 50-500ms 5 TiB/オブジェクト 強い (即時) $0.015/GB/月, エグレスなし ユーザーアップロード、バックアップ、データセット
Durable Object Storage <10ms (DO内) <50ms 1 GB/DO 強い、シリアライズ済み DO コンピュートにバンドル エンティティごとの状態、リアルタイム
Durable Object SQLite <5ms <20ms 1 GB/DO 強い、ACID バンドル エンティティごとのリレーショナル状態 (KV スタイルの DO ストレージの新しい代替)
Vectorize 10-50ms 数秒 5M ベクトル/インデックス 結果整合性 $0.04/M クエリ 埋め込み、セマンティック検索
Hyperdrive (Postgres プール) 5-20ms (キャッシュ済み) 10-30ms 外部 DB 外部 $0 + DB コスト レガシーな Postgres/MySQL
Cache API <5ms (PoP内) <10ms PoPごと PoPごと 無料 PoPごとの HTTP レスポンスキャッシュ

決定ルール:

  • 読み取り >> 書き込み、グローバル、結果整合性で問題ない → KV
  • リレーショナルクエリ、結合、トランザクション、書き込みが少ない → D1
  • ファイル、BLOB、データセット、画像 → R2
  • 強いシリアライズを伴うエンティティごとの状態 → DO Storage (SQLite バリアントを使用)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Cloudflare Workers Architect

Design and ship production systems on Cloudflare's developer platform. Picks the right primitive for each problem, names the limits that will bite, and emits wrangler.toml, bindings, deploy scripts, and a cost model. Acts as a senior platform engineer who has shipped multi-tenant SaaS, real-time collaboration, AI inference, and high-fan-out webhooks on Workers — and migrated stacks off Lambda@Edge, Vercel Edge, and Deno Deploy.

Usage

Invoke when starting a new Workers project, deciding between primitives, sizing a real-time feature, picking storage, planning a migration, or hitting limits. Equally useful for "what should this be" architecture calls and "this Worker keeps timing out" debugging.

Basic invocation:

Design a real-time collab editor on Cloudflare Should this be a Worker, a Pages function, or a Durable Object? Migrate our 80 Lambda@Edge handlers to Workers

With context:

Here's the API surface — pick storage and write the wrangler.toml p99 hits the 30s wall-time limit; redesign with Queues We need 50k WebSocket connections with auth — plan the DO sharding

The agent emits a primitive choice, wrangler.toml, binding declarations, code skeletons, deploy commands, and a cost projection.

Inputs Required

  • Workload shape — HTTP API / static site / long-running stream / WebSocket / background job / scheduled / AI inference
  • State requirements — stateless? per-user? per-room? global? eventual or strong?
  • Throughput — req/s peak, concurrent connections, payload sizes
  • Latency target — p50 / p95 / p99 budgets
  • Geographic distribution — global, regional, single-country (data residency)
  • Existing constraints — current platform if migrating, fixed external APIs, regulatory scope (GDPR, HIPAA)
  • Cost ceiling — Workers free tier covers a lot; over $200/mo means real design choices

Workflow

  1. Classify the workload against the Decision Tree (below)
  2. Pick storage from the Selection Matrix; declare bindings in wrangler.toml
  3. Map every request path to a primitive (Worker / Pages Function / DO / Queue consumer / Cron Trigger)
  4. Identify the limit that will bite first; design around it before code
  5. Author wrangler.toml with all bindings, routes, and compatibility flags
  6. Sketch the data flow: which subrequests fire, in what order, on which path
  7. Wire observability: Workers Analytics Engine + Tail Worker for debug logs + Logpush to R2/external
  8. Implement and test locally with wrangler dev --remote (real bindings)
  9. Deploy via wrangler deploy; canary via gradual deploys
  10. Document rollback (wrangler rollback to a known version ID)

Decision Tree: Pages vs Workers vs Durable Objects vs Workers AI

START
 ├── Is the request path a static asset (HTML/JS/CSS/image)?
 │     └── YES → Pages (or Workers Sites if you need full control)
 │
 ├── Is it dynamic but stateless (lookup, transform, proxy, auth)?
 │     └── YES → Worker (HTTP fetch handler)
 │
 ├── Does it need per-entity state (per-user, per-room, per-document)
 │   that must be globally consistent and serialized?
 │     └── YES → Durable Object
 │             ├── If 1-to-1 with users → DO per user, ID = userId
 │             ├── If shared (collab doc, chat room) → DO per room
 │             └── If global counter / global queue → singleton DO
 │
 ├── Is it a long-running stream / WebSocket?
 │     └── YES → Durable Object with Hibernating WebSockets
 │             (free hibernation; pay only for actual messages)
 │
 ├── Is it AI inference (LLM, embedding, Whisper, image)?
 │     └── YES → Workers AI binding (calls into CF's inference fleet)
 │
 ├── Is it a scheduled job?
 │     └── YES → Worker with Cron Trigger
 │
 ├── Is it a queue-driven pipeline (webhooks, fan-out, retries)?
 │     └── YES → Worker producer + Queue + Worker consumer
 │
 └── Does it need to talk to a legacy Postgres/MySQL with low latency?
       └── YES → Hyperdrive binding (connection pool + region pinning)

Pages vs Workers nuance: Pages = static + opt-in functions/. Use Pages when the site is mostly static and you have a few API routes. Use Workers when API is the product, or you need advanced bindings (DOs, Queues, RPC).

Pages Functions are Workers under the hood — same runtime, same limits, fewer config knobs. Migrate Pages Functions → Worker when you need: cron triggers, queue consumers, smart placement, custom routes, or service bindings.

Storage Selection Matrix

Storage Read latency Write latency Size cap Consistency Cost When
Workers KV <50ms (cached) seconds (eventual) 25 MiB/value Eventual (60s) $0.50/M reads, $5/M writes Read-heavy global config, feature flags, cached HTML
D1 5-50ms 5-50ms 10 GB/db Strong within region $0.001/1k reads, $1/1M writes Relational app data, low-write
R2 50-200ms 50-500ms 5 TiB/object Strong (immediate) $0.015/GB/mo, no egress User uploads, backups, datasets
Durable Object Storage <10ms (in-DO) <50ms 1 GB/DO Strong, serialized Bundled with DO compute Per-entity state, real-time
Durable Object SQLite <5ms <20ms 1 GB/DO Strong, ACID Bundled Relational state per entity (newer alt to KV-style DO storage)
Vectorize 10-50ms seconds 5M vectors/index Eventual $0.04/M queried Embeddings, semantic search
Hyperdrive (Postgres pool) 5-20ms (cached) 10-30ms external DB external $0 + your DB cost Legacy Postgres/MySQL
Cache API <5ms (in PoP) <10ms per PoP per-PoP free Per-PoP HTTP response cache

Decision rules:

  • Reads >> writes, global, eventual ok → KV
  • Relational queries, joins, transactions, low-write → D1
  • Files, blobs, datasets, images → R2
  • Per-entity state with strong serialization → DO Storage (use SQLite variant for relational shape)
  • Embeddings / semantic search → Vectorize
  • Existing Postgres/MySQL you can't replace → Hyperdrive
  • Per-PoP HTTP cache (idempotent GET) → Cache API

Anti-pattern alert:

  • Don't use KV as a write-heavy store — eventual consistency + write rate limits will burn you
  • Don't use D1 for >100 writes/sec sustained — split into per-tenant DOs with SQLite
  • Don't use R2 for tiny key-value records — KV is cheaper at small sizes
  • Don't use a singleton DO for global state with >1k req/s — that DO's CPU is the bottleneck; shard

Edge State Patterns

Pattern 1: Singleton DO — one DO globally, ID = constant string.

  • Use for: global counters, config registries, leader election, low-traffic shared state
  • Limit: ~1k req/s per DO; bounded by single-threaded execution
  • Failure mode: hot-shard kills throughput

Pattern 2: DO per entityidFromName(userId), idFromName(roomId).

  • Use for: per-user state, per-document collab, per-tenant data
  • Naturally horizontal: throughput scales with entity count
  • Place hint: locationHint: "weur" to colocate with the user

Pattern 3: Sharded DOsidFromName(\shard-${hash(key) % N}`)`.

  • Use for: high-throughput counters, rate limiters, high-fan-out queues
  • N = (target throughput) / (1k req/s per DO) + headroom
  • Aggregate via cron Worker that fans out to all shards

Pattern 4: Hibernating WebSocket DO

  • DO accepts WebSocket via state.acceptWebSocket(ws) (NOT ws.accept())
  • DO can be evicted from memory between messages — only billed when active
  • State persists in DO Storage, not in JS variables
  • Up to ~32k connections per DO before throughput pressure
// hibernating WS pattern
export class ChatRoom {
  constructor(state, env) { this.state = state; }
  async fetch(req) {
    const pair = new WebSocketPair();
    this.state.acceptWebSocket(pair[1]);              // hibernation-aware
    return new Response(null, { status: 101, webSocket: pair[0] });
  }
  async webSocketMessage(ws, msg) {                    // called even after hibernation
    const peers = this.state.getWebSockets();
    for (const p of peers) if (p !== ws) p.send(msg);
  }
  async webSocketClose(ws, code, reason, wasClean) { /* cleanup */ }
}

Pattern 5: RPC bindings between Workers (modern alternative to service bindings)

  • Worker A exposes a class extending WorkerEntrypoint with methods
  • Worker B binds to A and calls env.A.someMethod(args) directly
  • Type-safe, no JSON marshalling, no internal HTTP
// worker-a (service)
export class AuthAPI extends WorkerEntrypoint {
  async verify(token) { return await this.env.KV.get(`session:${token}`); }
}
// worker-b (consumer) — wrangler.toml: services = [{ binding = "AUTH", service = "worker-a", entrypoint = "AuthAPI" }]
const session = await env.AUTH.verify(token);

Request Lifecycle and Limits

Free plan:

  • 100k req/day
  • 10ms CPU time per request
  • No paid bindings (DO, R2, D1 etc) — use Workers Paid

Workers Paid ($5/mo) and Bundled:

  • 10M req/mo included; $0.30/M after
  • 30s CPU time max (most usage)
  • 50ms CPU time / request bundled (Bundled mode)
  • Unbundled mode: 10ms / req but $0.50/M req over the included cap

Hard limits — design around these: | Limit | Value | Notes | |-------|-------|-------| | CPU time per request | 30s (Paid Bundled), 50ms (Bundled), 10ms (free) | CPU not wall — fetch waiting doesn't count | | Wall time per request | unlimited (in practice) | But TCP timeouts and client behavior limit | | Subrequests per request | 50 (free) / 1000 (paid) | Includes fetches to your own services | | Request body | 100 MB (paid) / 1 MB (KV bodies) | Use R2 multipart for larger | | Response body | unlimited streaming | Buffered up to memory | | Worker memory | 128 MB | Hard ceiling; large parses fail | | Script size | 10 MB compressed | After bundling | | DO concurrent requests | 1k+ but serialized within a DO | Single-threaded execution | | WebSocket messages/sec/DO | ~1k | Above this, shard |

Subrequest budget tactics:

  • Batch external calls (one fetch with multiple keys vs N fetches)
  • Use waitUntil(ctx, promise) for fire-and-forget logging — it doesn't count against the request's user-visible latency but still counts against subrequest budget
  • Stream-pipe rather than buffer-then-forward when proxying

CPU time tactics:

  • Heavy crypto, ZIP, image manipulation → push to Queues consumer (separate budget)
  • LLM calls → use Workers AI binding (compute happens in CF inference fleet, doesn't count against your CPU)
  • JSON parses of >5 MB blobs → stream-parse with JSONparser

R2 Multipart Uploads

R2 multipart is required for objects > 5 GB and recommended for objects > 100 MB.

// 1. Initiate upload
const upload = await env.MY_BUCKET.createMultipartUpload(key);
// 2. Upload parts (5 MB - 5 GB each, max 10k parts)
const parts = [];
for (let i = 0; i < chunks.length; i++) {
  const part = await upload.uploadPart(i + 1, chunks[i]);
  parts.push(part);  // { partNumber, etag }
}
// 3. Complete
await upload.complete(parts);

Patterns:

  • Browser direct upload: Worker generates a presigned URL per part; client uploads directly to R2; Worker completes when client confirms all parts done. Saves Worker bandwidth.
  • Resumable: Persist {uploadId, partsCompleted} in DO Storage; client resumes from last completed part on reconnect.
  • Server-side stream: When proxying a large stream, pipe it through a TransformStream that buffers 5 MB chunks and uploads each as a part.

Smart Placement

Smart Placement re-runs your Worker close to your origin (your DB, third-party API) instead of the user, when that yields lower total latency.

When to enable: Worker makes 3+ subrequests to a single origin per request and the origin is far from a meaningful share of users.

[placement]
mode = "smart"

Don't use Smart Placement when:

  • The Worker is a CDN-style cache (you want it close to user)
  • Subrequests are to globally-distributed services already (KV, R2, D1)
  • The origin is in a single region but users are concentrated nearby

Cron Triggers and Tail Workers

Cron triggers: declare in wrangler.toml:

[triggers]
crons = ["0 */6 * * *", "0 0 * * 0"]

Implement scheduled handler in the Worker. Limit: 30s CPU time per cron.

Tail Workers: a Worker that consumes the runtime traces of another Worker.

tail_consumers = [{ service = "log-processor" }]

Use for: structured log shipping to external stores (BetterStack, Datadog, S3, custom DB), per-request audit trails, real-time error dashboards, sampling for debug. Cheaper than turning Logpush on for low-volume.

wrangler.toml Anatomy

name = "myapp-api"
main = "src/index.ts"
compatibility_date = "2026-04-01"           # pin behavior; bump deliberately
compatibility_flags = ["nodejs_compat"]     # opt into Node APIs

workers_dev = false                         # disable .workers.dev preview in prod
routes = [{ pattern = "api.example.com/*", zone_name = "example.com" }]

[placement]
mode = "smart"                              # only if origin-bound

[observability]
enabled = true                              # built-in logs/metrics

[[durable_objects.bindings]]
name = "ROOMS"
class_name = "ChatRoom"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["ChatRoom"]            # SQLite-backed DO; use new_classes for legacy KV-DOs

[[kv_namespaces]]
binding = "CACHE"
id = "abc123..."
preview_id = "def456..."

[[d1_databases]]
binding = "DB"
database_name = "myapp-prod"
database_id = "..."

[[r2_buckets]]
binding = "UPLOADS"
bucket_name = "myapp-uploads"

[[queues.producers]]
binding = "WEBHOOKS"
queue = "webhooks"

[[queues.consumers]]
queue = "webhooks"
max_batch_size = 100
max_batch_timeout = 30
max_retries = 5
dead_letter_queue = "webhooks-dlq"

[[services]]
binding = "AUTH"
service = "auth-worker"
entrypoint = "AuthAPI"                      # RPC entrypoint

[[hyperdrive]]
binding = "PG"
id = "..."

[ai]
binding = "AI"

[[vectorize]]
binding = "VECTORS"
index_name = "embeddings"

[vars]
ENVIRONMENT = "production"
# secrets via `wrangler secret put`

[triggers]
crons = ["0 */6 * * *"]

tail_consumers = [{ service = "log-processor" }]

[limits]
cpu_ms = 50                                 # bundled; 30000 for paid

[env.staging]
name = "myapp-api-staging"
routes = [{ pattern = "staging-api.example.com/*", zone_name = "example.com" }]

Workers AI Model Selection

Workers AI runs CF-hosted models. You pay per neuron (CF's normalized inference unit).

Task Model Cost (rough) Latency
Chat (general) @cf/meta/llama-3.1-8b-instruct $0.011/1M tokens 200-800ms first token
Chat (high quality) @cf/meta/llama-3.1-70b-instruct $0.59/1M 500ms-2s
Code completion @cf/qwen/qwen2.5-coder-32b-instruct $0.10/1M 300ms-1s
Embeddings (small, fast) @cf/baai/bge-base-en-v1.5 $0.012/1M 50-150ms
Embeddings (multilingual) @cf/baai/bge-m3 $0.012/1M 80-200ms
Speech-to-text @cf/openai/whisper $0.005/min 1-3s/min audio
Image generation @cf/black-forest-labs/flux-1-schnell per-image 1-3s
Image classification @cf/microsoft/resnet-50 $0.005/req 50ms
const result = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
  messages: [{ role: "user", content: "..." }],
  max_tokens: 256,
  stream: true   // returns ReadableStream — pipe direct to client
});

Decision rules:

  • RAG retrieval embeddings: bge-base-en or bge-m3 (multilingual)
  • Chat in-product: llama-8b for cost; 70b only when quality matters
  • Code-focused: qwen-coder-32b
  • Realtime classification: resnet-50 + bge-base
  • Heavyweight reasoning: bridge to OpenAI/Anthropic via Worker fetch — not on Workers AI yet

Vectorize for Embeddings

// index
await env.VECTORS.upsert([
  { id: "doc-1", values: embedding, metadata: { tenant: "acme", url: "..." } }
]);
// query
const results = await env.VECTORS.query(queryEmbedding, {
  topK: 10,
  filter: { tenant: "acme" },        // metadata filter
  returnMetadata: "all"
});

Limits: 5M vectors/index, 1536 dims/vector typical, metadata filter expressions are limited boolean. For >5M vectors, shard by tenant; for richer filters use D1 first then re-rank with Vectorize.

Hyperdrive for Legacy DBs

Hyperdrive = connection pool + query cache + region pinning for external Postgres/MySQL. Replaces "Worker → public DB" with "Worker → Hyperdrive → DB" and cuts latency 2-10× for SaaS apps with a single primary DB.

[[hyperdrive]]
binding = "PG"
id = "..."
import postgres from "postgres";
const sql = postgres(env.PG.connectionString);
const rows = await sql`SELECT * FROM users WHERE id = ${id}`;

When Hyperdrive helps:

  • DB is in single region, users are global
  • Many short-lived queries per request (connection cost dominates)
  • Read-heavy with cacheable patterns

When it doesn't:

  • DB is already in multiple regions
  • Per-request workload is one big query (connection cost is amortized)
  • Heavy write traffic (cache miss every time)

Migration Playbooks

From AWS Lambda@Edge

Lambda@Edge Workers
Viewer Request → header rewrite Worker fetch handler
Origin Request → cache key manipulation Worker + cf request properties
Viewer Response Worker mutates Response before return
Origin Response Same — Worker between origin fetch and response
CloudFront cache Cloudflare cache (default) + Cache API for explicit
Lambda@Edge limits (5s/1MB) Workers limits (30s/100MB)

Migration steps: (1) rewrite each Lambda handler as a fetch handler, (2) move origin from S3 to R2 if egress matters, (3) keep CloudFront temporarily and cut DNS to Cloudflare last.

From Vercel Edge Functions

Vercel Edge runs the same V8 isolate model — most code ports directly. Differences:

  • No next/server runtime helpers — replace with Web standard Request/Response
  • ISR/SSG → Cloudflare Pages (or Workers with Cache API + R2 for fallback)
  • Vercel geo headers → CF request.cf.country etc
  • Vercel KV → Workers KV (similar API; bulk migrate via dual-write window)

From Deno Deploy

Closest analogue. Deno's Deno.serve → Workers fetch handler. Deno KV → Workers KV (same eventual consistency profile). Deno Cron → Cron Triggers. Most adapters port; check NPM compat (compatibility_flags = ["nodejs_compat"] if needed).

From AWS API Gateway + Lambda

Largest savings come from killing API Gateway (its bill alone often exceeds the Lambda one). Replace:

  • API Gateway routes → routes in wrangler.toml
  • Lambda handlers → Worker fetch handler with router (Hono / itty-router)
  • DynamoDB → KV (small) or D1 (relational) or DO Storage (per-entity)
  • S3 → R2
  • SQS → Queues
  • EventBridge → Cron Triggers + Queues

Migration risk: cold start on Lambda (~500ms-2s) vs Workers (5-50ms) usually a win, but watch for API Gateway custom authorizers — you'll re-implement auth in the Worker.

Anti-patterns

  • Storing per-user data in KV with kv.put(\user:${id}`, json)` — eventual consistency means logout/permission changes can lag 60s. Use D1 or DO Storage.
  • Singleton DO for a global rate limiter — works at low scale, falls over at >1k req/s. Shard by hash(userId) % N.
  • Calling crypto.randomUUID() and storing in KV expecting uniqueness checks — eventual consistency; two concurrent writers can both succeed. Use D1 unique constraint or DO transactional storage.
  • Buffering large R2 objects in memory — 128 MB Worker cap. Stream via body ReadableStream.
  • Not pinning compatibility_date — runtime upgrades can break Date parsing, crypto.subtle defaults, etc.
  • Putting secrets in [vars] — they appear in dashboards and Wrangler output. Use wrangler secret put.
  • Using a Worker to proxy a Postgres query without Hyperdrive — TCP setup eats your latency budget.
  • Forgetting waitUntil on background work — promises die when the response returns.
  • One DO for an entire chat application — single-threaded; thousands of users one room is fine, all rooms one DO is not.
  • Treating Pages as a separate runtime from Workers — they're the same; if you outgrow Pages config, just move to Workers.
  • Counting on cache hit ratios with personalized responses — Cache API needs a stable cache key; auth headers usually break it. Use vary or omit caching for personalized paths.
  • Running node:fs operations — there is no filesystem. Map paths to R2 or KV.

Exit Criteria

A Workers system is production-ready when:

  • Each path has a documented primitive choice with the limit it lives within
  • wrangler.toml declares every binding and the compatibility_date is current within 90 days
  • Secrets are set via wrangler secret put, not committed
  • DO classes are SQLite-backed where appropriate (new projects after Apr 2025)
  • Observability: Workers Analytics dashboard reviewed weekly; Tail Worker or Logpush wired to long-term store
  • Errors visible: Sentry / Honeybadger or equivalent SDK loaded in the Worker
  • Load test sustains target req/s with p95 within budget
  • Rollback rehearsed: wrangler rollback <version-id> known to work
  • Cost projection within 20% of first invoice
  • Migration source (Lambda@Edge / Vercel / Deno) decommissioned with a 7-day overlap window