Merge pull request #9 from zonghaoyuan/feat/cloudflare-migration

feat: add Cloudflare Workers deployment alongside Vercel
2026-06-02 22:14:52 +08:00
parent dd8b60c06b 203e63edc2
commit d263437756
18 changed files with 3660 additions and 182 deletions
@@ -14,6 +14,9 @@ out
 .turbo
 .claude

+.open-next
+.wrangler
+
 .DS_Store
 *.log
 npm-debug.log*
@@ -22,3 +25,4 @@ pnpm-debug.log*
 repomix-output.xml

 users.md
+.dev.vars
@@ -39,9 +39,11 @@ Free to play, no setup required: [infiplot.com](https://infiplot.com)

 ## One-click deploy

-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.en.md%23configuration-guide)
+InfiPlot deploys to both Vercel and Cloudflare Workers — pick whichever you prefer.

-After deploy, set your environment variables in the Vercel project — see the [Configuration guide](#configuration-guide) below. The Vercel project's **Root Directory** must be `apps/web` (the deploy button passes this automatically; if you configure manually, set it in Project Settings).
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.en.md%23configuration-guide) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot/tree/main/apps/web)
+
+After deploy, fill in the environment variables — see the [Configuration guide](#configuration-guide) below. Both platforms need `apps/web` as the project root (Vercel's button passes this automatically; on Cloudflare, set the build root to `apps/web` and the build command to `pnpm --filter @infiplot/web build:cf`).

 ---

@@ -135,13 +137,17 @@ InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenA

 **2. Set the environment variables**

-Set these in your Vercel project (**Settings → Environment Variables**), or in `apps/web/.env.local` for local runs. Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:
+Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:

 | Variable | Effect |
 |---|---|
 | `MOCK_IMAGE=true` | Skip image generation; the renderer returns a static placeholder. Story, voice, and choices still run normally. Great for iterating on TTS without burning Runware credits. |

-See `apps/web/.env.example` for the exact shape.
+Where to set them (see `apps/web/.env.example` for the exact shape):
+
+- **Local dev** — `apps/web/.env.local`
+- **Vercel** — Project Settings → Environment Variables
+- **Cloudflare Workers** — from `apps/web/`, run `wrangler secret put <NAME>` for each variable, or set them in the dashboard (Workers → infiplot → Settings → Variables and Secrets). For a private staging instance, gate the Worker behind [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) — zero-code email-whitelist auth in front of the Worker.

 **3. Mind the cost**

@@ -39,9 +39,11 @@ InfiPlot は、AI がコンテンツをリアルタイムに生成するイン

 ## ワンクリックデプロイ

-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.ja.md%23%E8%A8%AD%E5%AE%9A%E3%82%AC%E3%82%A4%E3%83%89)
+InfiPlot は Vercel と Cloudflare Workers の両方にそのままデプロイできます —— お好みの方をお選びください。

-デプロイ後、Vercel プロジェクトで環境変数を設定してください —— 下記の[設定ガイド](#設定ガイド)を参照。Vercel プロジェクトの **Root Directory** は `apps/web` に設定する必要があります（デプロイボタンが自動で渡します。手動設定の場合は Project Settings で指定してください）。
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.ja.md%23%E8%A8%AD%E5%AE%9A%E3%82%AC%E3%82%A4%E3%83%89) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot/tree/main/apps/web)
+
+デプロイ後、環境変数を設定してください —— 下記の[設定ガイド](#設定ガイド)を参照。両方のプラットフォームで、プロジェクトのルートを `apps/web` に設定する必要があります（Vercel のデプロイボタンが自動で渡します。Cloudflare では build root を `apps/web`、ビルドコマンドを `pnpm --filter @infiplot/web build:cf` に設定してください）。

 ---

@@ -112,13 +114,17 @@ InfiPlot は 4 種類のモデルプロバイダと通信します。**テキス

 **2. 環境変数を設定する**

-Vercel プロジェクト（**Settings → Environment Variables**）、またはローカル実行時は `apps/web/.env.local` に設定します。9 つの変数が必須で、TTS は任意です（空欄なら無音で動作）。低コストなテスト用のフラグもあります。
+9 つの変数が必須で、TTS は任意です（空欄なら無音で動作）。低コストなテスト用のフラグもあります。

 | 変数 | 効果 |
 |---|---|
 | `MOCK_IMAGE=true` | 画像生成をスキップし、レンダラが静的なプレースホルダを返します。ストーリー・音声・選択肢は通常どおり動作します。Runware のクレジットを消費せずに TTS を調整するのに最適です。 |

-正確なフォーマットは `apps/web/.env.example` を参照してください。
+設定場所（正確なフォーマットは `apps/web/.env.example` を参照）：
+
+- **ローカル開発** —— `apps/web/.env.local`
+- **Vercel** —— Project Settings → Environment Variables
+- **Cloudflare Workers** —— `apps/web/` から各変数について `wrangler secret put <NAME>` を実行するか、ダッシュボード（Workers → infiplot → Settings → Variables and Secrets）で設定します。ステージング環境にアクセス制限を掛けたい場合は、Worker の前に [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) を挟むと、ゼロコードでメール許可リスト方式の認証が利用できます。

 **3. コストに注意**

@@ -39,9 +39,11 @@ InfiPlot是一款AI实时生成内容的互动剧情游戏，这里没有预设

 ## 一键部署

-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot%23%E9%85%8D%E7%BD%AE%E6%95%99%E7%A8%8B)
+InfiPlot 同时支持部署到 Vercel 与 Cloudflare Workers —— 任选其一即可。

-部署完成后，在 Vercel 项目里填好环境变量 —— 详见下方的[配置教程](#配置教程)。Vercel 项目的 **Root Directory** 必须设为 `apps/web`（一键部署按钮会自动带上；若手动配置，请在 Project Settings 里设置）。
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot%23%E9%85%8D%E7%BD%AE%E6%95%99%E7%A8%8B) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot/tree/main/apps/web)
+
+部署完成后，填好环境变量 —— 详见下方的[配置教程](#配置教程)。两个平台都需要把项目根目录设为 `apps/web`（Vercel 一键部署按钮会自动带上；在 Cloudflare 上请把 build root 设为 `apps/web`，构建命令设为 `pnpm --filter @infiplot/web build:cf`）。

 ---

@@ -134,13 +136,17 @@ InfiPlot 会与四类模型供应商通信。**文本（Text）和视觉（Visio

 **2. 填写环境变量**

-在 Vercel 项目里设置（**Settings → Environment Variables**），或在本地运行时写进 `apps/web/.env.local`。九个变量为必填；TTS 可选（留空则静音运行）。此外还有一个用于低成本测试的开关：
+九个变量为必填；TTS 可选（留空则静音运行）。此外还有一个用于低成本测试的开关：

 | 变量 | 作用 |
 |---|---|
 | `MOCK_IMAGE=true` | 跳过图像生成，渲染器返回一张静态占位图。剧情、语音、选项照常运行。非常适合在不消耗 Runware 额度的情况下调试 TTS。 |

-确切的字段格式见 `apps/web/.env.example`。
+在哪里设置（确切字段见 `apps/web/.env.example`）：
+
+- **本地开发** —— `apps/web/.env.local`
+- **Vercel** —— Project Settings → Environment Variables
+- **Cloudflare Workers** —— 在 `apps/web/` 目录下逐个执行 `wrangler secret put <NAME>`，或在 dashboard 里设置（Workers → infiplot → Settings → Variables and Secrets）。如果要给 staging 加访问限制，可以在 Worker 前面挂一个 [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/)（零代码，邮箱白名单）。

 **3. 注意成本**

@@ -6,6 +6,11 @@ import { loadEngineConfig } from "@/lib/config";
 export const runtime = "nodejs";
 export const maxDuration = 60;

+// Browser annotator resizes to 768 wide → typically 200-800 KB base64.
+// 3 MB caps abusive direct-API payloads (which would inflate upstream
+// vision LLM costs) while leaving ~4x headroom for legitimate inputs.
+const MAX_ANNOTATED_BYTES = 3 * 1024 * 1024;
+
 export async function POST(req: Request) {
  let body: VisionRequest;
  try {
@@ -14,12 +19,27 @@ export async function POST(req: Request) {
    return NextResponse.json({ error: "Invalid JSON" }, { status: 400 });
  }

-  if (!body.session || !body.prevImageUrl || !body.click) {
+  if (!body.session) {
    return NextResponse.json(
-      { error: "session, prevImageUrl, click are required" },
+      { error: "session is required" },
      { status: 400 },
    );
  }
+  if (
+    typeof body.annotatedImageBase64 !== "string" ||
+    body.annotatedImageBase64.length === 0
+  ) {
+    return NextResponse.json(
+      { error: "annotatedImageBase64 must be a non-empty string" },
+      { status: 400 },
+    );
+  }
+  if (body.annotatedImageBase64.length > MAX_ANNOTATED_BYTES) {
+    return NextResponse.json(
+      { error: `annotatedImageBase64 exceeds ${MAX_ANNOTATED_BYTES} bytes` },
+      { status: 413 },
+    );
+  }

  try {
    const config = loadEngineConfig();
@@ -11,6 +11,7 @@ import {
  useState,
 } from "react";
 import { PlayCanvas, type Phase } from "@/components/PlayCanvas";
+import { annotateClick } from "@/lib/annotateClient";
 import { PRESETS } from "@/lib/presets";
 import type {
  Beat,
@@ -746,10 +747,11 @@ function PlayInner() {
    setPendingClick(click);

    try {
+      const annotatedImageBase64 = await annotateClick(imageUrl, click);
      const visionRes = await fetch("/api/vision", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
-        body: JSON.stringify({ session, prevImageUrl: imageUrl, click }),
+        body: JSON.stringify({ session, annotatedImageBase64 }),
      });
      if (!visionRes.ok) {
        const j = (await visionRes.json().catch(() => ({}))) as {
@@ -0,0 +1,80 @@
+const TARGET_WIDTH = 768;
+
+// Browser-side equivalent of the former engine/src/annotate.ts. Redraws the
+// scene image with the player's click marker on a Canvas 2D and returns the
+// raw PNG base64 (no `data:` prefix) — interpretClick wraps it back into a
+// data URL before posting to the vision LLM.
+//
+// crossOrigin="anonymous" + the CDN's Access-Control-Allow-Origin header are
+// both required to keep the canvas un-tainted; without them toDataURL throws
+// SecurityError. Runware's image CDN supports anonymous CORS; data: URIs
+// (MOCK_IMAGE mode) load without CORS.
+export async function annotateClick(
+  imageUrl: string,
+  click: { x: number; y: number },
+): Promise<string> {
+  const img = await loadImage(imageUrl);
+
+  const scale = Math.min(1, TARGET_WIDTH / img.naturalWidth);
+  const w = Math.max(1, Math.round(img.naturalWidth * scale));
+  const h = Math.max(1, Math.round(img.naturalHeight * scale));
+
+  const canvas = document.createElement("canvas");
+  canvas.width = w;
+  canvas.height = h;
+  const ctx = canvas.getContext("2d");
+  if (!ctx) throw new Error("Canvas 2D context unavailable");
+
+  ctx.drawImage(img, 0, 0, w, h);
+
+  const cx = Math.round(click.x * w);
+  const cy = Math.round(click.y * h);
+  const r = Math.max(8, Math.round(Math.min(w, h) * 0.025));
+  const stroke = Math.max(2, Math.round(r * 0.25));
+
+  ctx.beginPath();
+  ctx.arc(cx, cy, r, 0, Math.PI * 2);
+  ctx.fillStyle = "rgba(255,40,40,0.55)";
+  ctx.fill();
+  ctx.lineWidth = stroke;
+  ctx.strokeStyle = "rgba(255,255,255,0.95)";
+  ctx.stroke();
+
+  ctx.beginPath();
+  ctx.arc(cx, cy, Math.max(2, Math.round(r * 0.25)), 0, Math.PI * 2);
+  ctx.fillStyle = "rgba(255,255,255,1)";
+  ctx.fill();
+
+  const dataUrl = canvas.toDataURL("image/png");
+  return dataUrl.replace(/^data:image\/png;base64,/, "");
+}
+
+// 10s timeout mirrors the old server-side annotator's 5s fetch budget +
+// headroom for browser decode. Without it a hung CDN response would strand
+// the player in `vision-thinking` forever.
+function loadImage(
+  url: string,
+  timeoutMs = 10_000,
+): Promise<HTMLImageElement> {
+  return new Promise((resolve, reject) => {
+    const img = new Image();
+    const timer = setTimeout(() => {
+      // removeAttribute, not `src = ""` — setting empty string can trigger
+      // a navigation to the current document URL in some browsers.
+      img.removeAttribute("src");
+      reject(new Error(`Image load timed out after ${timeoutMs}ms`));
+    }, timeoutMs);
+    img.crossOrigin = "anonymous";
+    img.onload = () => {
+      clearTimeout(timer);
+      resolve(img);
+    };
+    img.onerror = () => {
+      clearTimeout(timer);
+      reject(
+        new Error(`Failed to load image for annotation: ${url.slice(0, 80)}`),
+      );
+    };
+    img.src = url;
+  });
+}
@@ -10,7 +10,6 @@ const config: NextConfig = {
    "@infiplot/types",
    "@infiplot/tts-client",
  ],
-  serverExternalPackages: ["sharp"],
  turbopack: {
    root: path.join(__dirname, "..", ".."),
  },
@@ -0,0 +1,5 @@
+import { defineCloudflareConfig } from "@opennextjs/cloudflare";
+
+// Minimal config — the project is fully stateless (sessions live on the
+// client), so no R2/KV/D1 incremental cache is needed.
+export default defineCloudflareConfig();
@@ -8,7 +8,10 @@
    "build": "next build",
    "start": "next start",
    "lint": "next lint",
-    "typecheck": "tsc --noEmit"
+    "typecheck": "tsc --noEmit",
+    "build:cf": "opennextjs-cloudflare build",
+    "preview:cf": "opennextjs-cloudflare preview",
+    "deploy:cf": "opennextjs-cloudflare deploy"
  },
  "dependencies": {
    "@infiplot/ai-client": "workspace:*",
@@ -16,16 +19,18 @@
    "@infiplot/types": "workspace:*",
    "next": "^16.0.0",
    "react": "^19.0.0",
-    "react-dom": "^19.0.0",
-    "sharp": "^0.33.5"
+    "react-dom": "^19.0.0"
  },
  "devDependencies": {
+    "@opennextjs/cloudflare": "^1.19.11",
+    "sharp": "^0.33.5",
    "@types/node": "^22.9.0",
    "@types/react": "^19.0.0",
    "@types/react-dom": "^19.0.0",
    "autoprefixer": "^10.4.20",
    "postcss": "^8.4.49",
    "tailwindcss": "^3.4.15",
-    "typescript": "^5.6.3"
+    "typescript": "^5.6.3",
+    "wrangler": "^4.96.0"
  }
 }
@@ -0,0 +1,20 @@
+{
+  "$schema": "node_modules/wrangler/config-schema.json",
+  "name": "infiplot",
+  "main": ".open-next/worker.js",
+  "compatibility_date": "2025-03-25",
+  "compatibility_flags": ["nodejs_compat"],
+  "assets": {
+    "binding": "ASSETS",
+    "directory": ".open-next/assets"
+  },
+  "observability": {
+    "enabled": true
+  },
+  // 60s mirrors apps/web/vercel.json maxDuration for the scene pipeline tail
+  // (multi-agent LLM, ~30-45s p95). Requires Workers Paid — Free is capped
+  // at 10ms CPU. I/O wait does not count against this budget.
+  "limits": {
+    "cpu_ms": 60000
+  }
+}
@@ -15,7 +15,6 @@
    "@infiplot/ai-client": "workspace:*",
    "@infiplot/tts-client": "workspace:*",
    "@infiplot/types": "workspace:*",
-    "jsonrepair": "^3.14.0",
-    "sharp": "^0.33.5"
+    "jsonrepair": "^3.14.0"
  }
 }
@@ -1,111 +0,0 @@
-import sharp from "sharp";
-
-const FETCH_TIMEOUT_MS = 5000;
-const MAX_IMAGE_BYTES = 10 * 1024 * 1024; // 10 MB
-
-// Validate that an imageUrl is safe to fetch server-side.
-// Only https: and data: URIs are allowed; http: is rejected to
-// prevent SSRF via private IPs / cloud metadata endpoints.
-function assertSafeUrl(url: string): void {
-  if (url.startsWith("data:")) return;
-  const parsed = new URL(url);
-  if (parsed.protocol !== "https:") {
-    throw new Error(
-      `prevImageUrl must use https: or data: protocol, got ${parsed.protocol}`,
-    );
-  }
-  const host = parsed.hostname;
-  if (
-    host === "localhost" ||
-    host === "127.0.0.1" ||
-    host === "0.0.0.0" ||
-    host.startsWith("192.168.") ||
-    host.startsWith("10.") ||
-    /^172\.(1[6-9]|2\d|3[0-1])\./.test(host) ||
-    host === "169.254.169.254"
-  ) {
-    throw new Error(
-      `prevImageUrl resolves to a private/reserved IP: ${host}`,
-    );
-  }
-}
-
-// Pull the bytes from an image URL or data URI into a Buffer suitable for
-// sharp. Data URIs are decoded inline (no network); https: URLs are fetched
-// with a short timeout — if Runware's CDN is slow we'd rather fail the vision
-// step quickly than tie up a 60s Vercel function on a single image read.
-async function loadImageBuffer(imageUrl: string): Promise<Buffer> {
-  assertSafeUrl(imageUrl);
-
-  if (imageUrl.startsWith("data:")) {
-    const comma = imageUrl.indexOf(",");
-    if (comma === -1) throw new Error("Malformed data URI in prevImageUrl");
-    const b64 = imageUrl.slice(comma + 1);
-    return Buffer.from(b64, "base64");
-  }
-
-  const ctrl = new AbortController();
-  const timer = setTimeout(() => ctrl.abort(), FETCH_TIMEOUT_MS);
-  try {
-    const res = await fetch(imageUrl, { signal: ctrl.signal });
-    if (!res.ok) {
-      throw new Error(
-        `Failed to fetch prevImageUrl (${res.status}): ${imageUrl.slice(0, 120)}`,
-      );
-    }
-    const contentLength = res.headers.get("content-length");
-    if (contentLength && Number(contentLength) > MAX_IMAGE_BYTES) {
-      throw new Error(
-        `prevImageUrl response too large (${contentLength} bytes, max ${MAX_IMAGE_BYTES})`,
-      );
-    }
-    const arr = await res.arrayBuffer();
-    if (arr.byteLength > MAX_IMAGE_BYTES) {
-      throw new Error(
-        `prevImageUrl response too large (${arr.byteLength} bytes, max ${MAX_IMAGE_BYTES})`,
-      );
-    }
-    return Buffer.from(arr);
-  } finally {
-    clearTimeout(timer);
-  }
-}
-
-// Marks the player's click point on the scene image so the vision LLM can see
-// WHERE they tapped. Output is base64 because the vision LLM is called over
-// the OpenAI-compatible chat endpoint, which only accepts image_url data URIs
-// — we can't hand it a Runware CDN URL directly.
-export async function annotateClick(
-  imageUrl: string,
-  click: { x: number; y: number },
-): Promise<string> {
-  const buf = await loadImageBuffer(imageUrl);
-
-  const resized = await sharp(buf)
-    .resize({ width: 768, withoutEnlargement: true, fit: "inside" })
-    .png()
-    .toBuffer();
-
-  const meta = await sharp(resized).metadata();
-  const w = meta.width ?? 768;
-  const h = meta.height ?? 1152;
-
-  const cx = Math.round(click.x * w);
-  const cy = Math.round(click.y * h);
-  const r = Math.max(8, Math.round(Math.min(w, h) * 0.025));
-  const stroke = Math.max(2, Math.round(r * 0.25));
-
-  const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="${w}" height="${h}" viewBox="0 0 ${w} ${h}">
-    <circle cx="${cx}" cy="${cy}" r="${r}" fill="rgba(255,40,40,0.55)"
-            stroke="rgba(255,255,255,0.95)" stroke-width="${stroke}" />
-    <circle cx="${cx}" cy="${cy}" r="${Math.round(r * 0.25)}"
-            fill="rgba(255,255,255,1)" />
-  </svg>`;
-
-  const out = await sharp(resized)
-    .composite([{ input: Buffer.from(svg), top: 0, left: 0 }])
-    .png({ compressionLevel: 9 })
-    .toBuffer();
-
-  return out.toString("base64");
-}
@@ -5,7 +5,6 @@ export {
  requestInsertBeat,
  requestBeatAudio,
 } from "./orchestrator";
-export { annotateClick } from "./annotate";
 export { synthesizeBeat } from "./voice";
 export { mergeCharacters } from "./director";
 export type { SceneResult } from "./director";
@@ -1,29 +1,25 @@
-import sharp from "sharp";
+// Static SVG placeholder used when MOCK_IMAGE=true, so we can exercise the
+// TTS path without paying for image generation. Returned as a data URI so the
+// rest of the pipeline can treat it as an `imageUrl` interchangeably with
+// real Runware URLs (the client's <img src> accepts both, and we never feed
+// a mock image to Runware's referenceImages because mockImage mode
+// short-circuits the Painter entirely).
+//
+// Previously rendered to PNG via sharp; switched to a self-describing SVG
+// data URI so the engine has zero Node-native dependencies and runs on
+// Cloudflare Workers. SVG also stays crisp at any display size.

-let cachedDataUri: string | undefined;
+const W = 1792;
+const H = 1024;
+const SVG = `<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}">
+  <rect width="${W}" height="${H}" fill="#161109"/>
+  <rect x="2" y="2" width="${W - 4}" height="${H - 4}" fill="none" stroke="#5a4628" stroke-width="3" stroke-dasharray="14 10"/>
+  <text x="50%" y="45%" fill="#b88f4a" font-family="Georgia, serif" font-size="72" letter-spacing="6" text-anchor="middle">MOCK IMAGE</text>
+  <text x="50%" y="53%" fill="#6e5430" font-family="Georgia, serif" font-size="30" letter-spacing="3" text-anchor="middle">TTS TEST — image generation skipped</text>
+</svg>`;
+
+const DATA_URI = `data:image/svg+xml;charset=utf-8,${encodeURIComponent(SVG)}`;

-// A static 16:9 placeholder used when MOCK_IMAGE=true, so we can exercise the
-// TTS path without paying for image generation. Generated once, then memoized.
-// Returned as a data URI so the rest of the pipeline can treat it as an
-// `imageUrl` interchangeably with real Runware URLs (the client's <img src>
-// accepts both, and we never feed a mock image to Runware's referenceImages
-// because mockImage mode short-circuits the Painter entirely).
 export async function mockImageDataUri(): Promise<string> {
-  if (cachedDataUri) return cachedDataUri;
-
-  const W = 1792;
-  const H = 1024;
-  const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}">
-    <rect width="${W}" height="${H}" fill="#161109"/>
-    <rect x="2" y="2" width="${W - 4}" height="${H - 4}" fill="none"
-          stroke="#5a4628" stroke-width="3" stroke-dasharray="14 10"/>
-    <text x="50%" y="45%" fill="#b88f4a" font-family="Georgia, serif"
-          font-size="72" letter-spacing="6" text-anchor="middle">MOCK IMAGE</text>
-    <text x="50%" y="53%" fill="#6e5430" font-family="Georgia, serif"
-          font-size="30" letter-spacing="3" text-anchor="middle">TTS TEST — image generation skipped</text>
-  </svg>`;
-
-  const png = await sharp(Buffer.from(svg)).png().toBuffer();
-  cachedDataUri = `data:image/png;base64,${png.toString("base64")}`;
-  return cachedDataUri;
+  return DATA_URI;
 }
@@ -13,7 +13,6 @@ import type {
  VisionResponse,
 } from "@infiplot/types";
 import { runArchitect } from "./agents/architect";
-import { annotateClick } from "./annotate";
 import { directInsertBeat, directScene } from "./director";
 import { synthesizeBeat } from "./voice";
 import { interpret } from "./vision";
@@ -109,9 +108,8 @@ export async function visionDecide(
  config: EngineConfig,
  req: VisionRequest,
 ): Promise<VisionResponse> {
-  const annotated = await annotateClick(req.prevImageUrl, req.click);
  const current = req.session.history.at(-1)?.scene ?? null;
-  return interpret(config.vision, annotated, current);
+  return interpret(config.vision, req.annotatedImageBase64, current);
 }

 // ──────────────────────────────────────────────────────────────────────
@@ -67,10 +67,11 @@ export type Scene = {
  imageUuid?: string;
  /**
   * Public CDN URL of this Scene's generated image. Returned to the client for
-   * `<img src>` rendering, and is what the client passes back to `/api/vision`
-   * as `prevImageUrl` so the server can re-fetch the bytes for click annotation.
+   * `<img src>` rendering; the client also feeds it through a Canvas 2D click
+   * annotator before posting to `/api/vision` (see
+   * `VisionRequest.annotatedImageBase64`).
   *
-   * For MOCK_IMAGE=true this is a `data:image/png;base64,...` data URI, not a
+   * For MOCK_IMAGE=true this is a `data:image/svg+xml;...` data URI, not a
   * Runware URL — the client renders both forms transparently.
   */
  imageUrl?: string;
@@ -306,12 +307,16 @@ export type BeatAudioResponse = {
 export type VisionRequest = {
  session: Session;
  /**
-   * Public CDN URL (or data URI in MOCK_IMAGE mode) of the scene the player
-   * just clicked. The server re-fetches the bytes to annotate the click and
-   * pass an OpenAI-compatible image_url to the vision LLM.
+   * Raw PNG base64 (no `data:` prefix) of the scene image WITH the player's
+   * click marker already drawn on it by the browser's Canvas 2D. The server
+   * forwards this straight to the vision LLM as an OpenAI-compatible
+   * image_url.
+   *
+   * Annotation lives client-side so the engine has no Node-native image
+   * dependency (sharp doesn't run on Cloudflare Workers) and we save a
+   * server-side image re-fetch per click.
   */
-  prevImageUrl: string;
-  click: { x: number; y: number };
+  annotatedImageBase64: string;
 };

 export type VisionResponse = {