Merge pull request #29 from zonghaoyuan/staging

Merge staging → main
2026-06-04 11:39:54 +08:00
parent 1298e99339 af155ac107
commit 1b1d5ce1c5
147 changed files with 1540 additions and 342 deletions
@@ -56,6 +56,29 @@ TTS_SPEECH_MODEL=mimo-v2.5-tts
 # Text/story/voice still run normally. Great for iterating on TTS.
 MOCK_IMAGE=false

+# ---- 5b. Image proxy (Cloudflare Worker, OPTIONAL) -----------------
+# Leave NEXT_PUBLIC_IMAGE_PROXY_URL blank (the default) and the browser
+# fetches images directly from the provider — exactly as the app worked
+# before this proxy existed. The ALLOWED_HOSTS value below is inert until
+# a proxy URL is set, so you're completely unaffected; skip this section.
+#
+# Why you might want it: Chrome's direct fetch of im.runware.ai is unreliable
+# on some networks (ERR_QUIC_PROTOCOL_ERROR mid-stream → partial bytes →
+# <img> paints progressively top-to-bottom). Routing the fetch through a tiny
+# Cloudflare Worker re-fetches server-to-server (no QUIC fragility) and serves
+# over HTTP/2 — atomic paint, plus edge caching + CORS.
+#
+# Deploy your own in ~1 min (one-click "Deploy to Cloudflare" button):
+#   https://github.com/zonghaoyuan/infiplot-image-proxy
+# Then paste the workers.dev URL it prints below. NEXT_PUBLIC_ vars are
+# inlined at BUILD time — set them in Vercel/Cloudflare project settings.
+NEXT_PUBLIC_IMAGE_PROXY_URL=
+# Hostnames the proxy is allowed to fetch (comma-separated). Default covers
+# Runware's CDN. If your IMAGE_BASE_URL points at another provider, add that
+# provider's image host here so its URLs take the proxy path too. Anything
+# not listed stays on the direct fetch. Only matters when the URL above is set.
+NEXT_PUBLIC_IMAGE_PROXY_ALLOWED_HOSTS=im.runware.ai
+
 # ---- 6. Analytics · Umami (optional — leave blank to disable) ------
 # Privacy-friendly, cookieless page-view stats — no Cookie consent banner.
 # Cloud: sign up at https://cloud.umami.is, add your site, copy its ID into
@@ -63,7 +86,18 @@ MOCK_IMAGE=false
 #   NEXT_PUBLIC_UMAMI_SRC=https://cloud.umami.is/script.js
 # Self-host later: point SRC at your own instance — the integration is identical
 # (no code change), e.g. NEXT_PUBLIC_UMAMI_SRC=https://stats.example.com/script.js
-# Both blank → no script is injected (zero tracking). NEXT_PUBLIC_ vars are
-# inlined at BUILD time, so set them in the build env (Vercel project settings).
+# Both blank → no script is injected (zero tracking; every track() call no-ops).
+# Beyond page views the app emits content-free custom events (game start, scene
+# reached, choice picked, ...) — only enums/counts/booleans, never your prompts,
+# uploaded images or any per-user ID. The visitor's Do-Not-Track is honoured.
+# NEXT_PUBLIC_ vars are inlined at BUILD time, so set them in the build env
+# (Vercel project settings).
 NEXT_PUBLIC_UMAMI_SRC=
 NEXT_PUBLIC_UMAMI_WEBSITE_ID=
+
+# Optional hostname allowlist — defense-in-depth on top of the blank-to-disable
+# gate above. The tracker fires only when window.location.hostname EXACTLY
+# matches an entry, so a fork that copied these vars stays silent on its own
+# domain. Comma-separated, exact match: apex ≠ www (list both), no wildcards.
+# Blank → track on all hosts. e.g. infiplot.com,www.infiplot.com
+NEXT_PUBLIC_UMAMI_DOMAINS=
@@ -155,6 +155,10 @@ Where to set them (see `.env.example` for the exact shape):

 With the recommended trio, each scene's cost comes mainly from the image generation model. The FLUX.2 [klein] 9B KV image is roughly **\$0.00078** per scene (1792×1024, 4 steps, sub-second); the text model uses `deepseek-v4-flash`, so text costs are negligible by comparison. Tapping through a scene's beats is free. To keep transitions instant, the engine also pre-generates scenes you might pick but ultimately don't — so real spend runs somewhat higher than the scenes you actually see.

+**4. Image proxy (optional)**
+
+By default the browser fetches images directly from the provider — no setup needed; leave `NEXT_PUBLIC_IMAGE_PROXY_URL` blank and you're completely unaffected. You only want this if you hit progressive "top-to-bottom" image loading (Chrome's `ERR_QUIC_PROTOCOL_ERROR` on some networks paints partial PNGs row by row): deploy a tiny Cloudflare Worker that re-fetches images server-side and serves them atomically over HTTP/2. One-click deploy at **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**, then paste the `workers.dev` URL it prints into `NEXT_PUBLIC_IMAGE_PROXY_URL`.
+
 ---

 ## Roadmap
@@ -154,6 +154,10 @@ InfiPlot は 4 種類のモデルプロバイダと通信します。**テキス

 推奨の 3 点セットでは、各シーンのコストは主に画像生成モデルによるものです。FLUX.2 [klein] 9B KV の画像は 1 シーンあたり概ね **$0.00078**（1792×1024、4 ステップ、サブ秒）。テキストモデルは `deepseek-v4-flash` を使用するため、テキストコストは比較になりません。シーン内のビートをタップしていくのは無料です。切り替えを一瞬に保つため、エンジンは選ぶ可能性はあるが最終的に選ばないシーンも先行生成します —— そのため実際の支出は、あなたが実際に見るシーン数よりやや高くなります。

+**4. 画像プロキシ（オプション）**
+
+デフォルトではブラウザが画像プロバイダーに直接アクセスするため、設定は不要です —— `NEXT_PUBLIC_IMAGE_PROXY_URL` を空欄のままにすれば、まったく影響ありません。画像が「上から順に」表示される現象（一部のネットワークで Chrome の `ERR_QUIC_PROTOCOL_ERROR` により PNG が行ごとに描画される）に遭遇した場合のみ必要です。小さな Cloudflare Worker をデプロイすると、画像をサーバー側で再取得し HTTP/2 で一括返却します。ワンクリックデプロイは **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)** を参照し、出力された `workers.dev` の URL を `NEXT_PUBLIC_IMAGE_PROXY_URL` に設定してください。
+
 ---

 ## Roadmap
@@ -154,6 +154,10 @@ InfiPlot 会与四类模型供应商通信。**文本（Text）和视觉（Visio

 使用推荐的三件套时，每一幕场景的开销主要来自图像生成模型。FLUX.2 [klein] 9B KV 的图像大约 **$0.00078** 一张（1792×1024，4 步，亚秒级）；文本模型使用 `deepseek-v4-flash` 时，成本极低。逐拍点过一个场景是免费的。为了让切换瞬间完成，引擎还会预测式地生成那些你可能选、但最终可能没选的场景 —— 所以真实花费会比你实际看到的场景数略高一些。

+**4. 图片代理（可选）**
+
+默认浏览器直连图片供应商，无需任何配置 —— 留空 `NEXT_PUBLIC_IMAGE_PROXY_URL` 即可，完全不受影响。只有当你遇到图片「层层加载」（Chrome 在某些网络下 `ERR_QUIC_PROTOCOL_ERROR` 导致 PNG 逐行渲染）时才需要它：部署一个极小的 Cloudflare Worker，把图片改为服务端转发 + HTTP/2 原子返回。一键部署见 **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**，然后把它给出的 `workers.dev` 地址填进 `NEXT_PUBLIC_IMAGE_PROXY_URL`。
+
 ---

 ## Roadmap
@@ -0,0 +1,83 @@
+import { analyzeImageDataUrl } from "@infiplot/ai-client";
+import type {
+  ParseStyleImageRequest,
+  ParseStyleImageResponse,
+} from "@infiplot/types";
+import { NextResponse } from "next/server";
+import { loadEngineConfig } from "@/lib/config";
+
+export const runtime = "nodejs";
+export const maxDuration = 60;
+
+// Same rationale as /api/vision: the client resizes to 512px max-dim webp
+// (~30-80KB base64 typical) before upload, so 3 MB is generous headroom
+// against malformed / abusive direct-API payloads.
+const MAX_IMAGE_BYTES = 3 * 1024 * 1024;
+
+const STYLE_EXTRACTION_PROMPT = `You are a senior concept artist helping describe an image's visual style so that a text-to-image diffusion model (FLUX) can reproduce the same aesthetic on different subjects.
+
+Look at the attached image and produce a single English style-prompt string that captures ONLY its visual style — NOT its subject matter. Focus on:
+- Medium / technique (e.g., watercolor, oil painting, cel-shaded anime, 3D render, pixel art)
+- Line work and rendering (sharp ink outlines, soft shading, painterly brushstrokes, flat colors)
+- Color palette and lighting (pastel, saturated, monochrome, warm golden-hour, cool neon, high contrast)
+- Mood and atmosphere (dreamy, melancholic, cinematic, nostalgic, gritty)
+- Any recognizable artistic influence (Ghibli, Makoto Shinkai, ukiyo-e, vaporwave, cyberpunk anime, etc.)
+
+Do NOT describe the characters, objects, or scene contents. Output exactly one JSON object:
+{"stylePrompt": "<comma-separated English visual-style attributes, ~30-60 words>"}`;
+
+export async function POST(req: Request) {
+  let body: ParseStyleImageRequest;
+  try {
+    body = (await req.json()) as ParseStyleImageRequest;
+  } catch {
+    return NextResponse.json({ error: "Invalid JSON" }, { status: 400 });
+  }
+
+  if (
+    typeof body.imageDataUrl !== "string" ||
+    !body.imageDataUrl.startsWith("data:image/")
+  ) {
+    return NextResponse.json(
+      { error: "imageDataUrl must be a data:image/... base64 URL" },
+      { status: 400 },
+    );
+  }
+  if (body.imageDataUrl.length > MAX_IMAGE_BYTES) {
+    return NextResponse.json(
+      { error: `imageDataUrl exceeds ${MAX_IMAGE_BYTES} bytes` },
+      { status: 413 },
+    );
+  }
+
+  try {
+    const config = loadEngineConfig();
+    const raw = await analyzeImageDataUrl(
+      config.vision,
+      body.imageDataUrl,
+      STYLE_EXTRACTION_PROMPT,
+      { responseFormat: "json_object" },
+    );
+
+    let parsed: { stylePrompt?: string };
+    try {
+      parsed = JSON.parse(raw);
+    } catch {
+      // Fall back: treat the raw response as the style prompt directly.
+      parsed = { stylePrompt: raw };
+    }
+    const stylePrompt = (parsed.stylePrompt ?? "").trim();
+    if (!stylePrompt) {
+      return NextResponse.json(
+        { error: "Vision model returned an empty stylePrompt" },
+        { status: 502 },
+      );
+    }
+
+    const payload: ParseStyleImageResponse = { stylePrompt };
+    return NextResponse.json(payload);
+  } catch (err) {
+    const message = err instanceof Error ? err.message : "Unknown error";
+    return NextResponse.json({ error: message }, { status: 500 });
+  }
+}
@@ -6,6 +6,11 @@ import { loadEngineConfig } from "@/lib/config";
 export const runtime = "nodejs";
 export const maxDuration = 60;

+// Matches /api/vision and /api/parse-style-image — the user's resized 512px
+// webp is ~30-80 KB; this caps pathological direct-API payloads (which would
+// then ride along in every subsequent /api/scene request body via session).
+const MAX_STYLE_REF_BYTES = 3 * 1024 * 1024;
+
 export async function POST(req: Request) {
  let body: StartRequest;
  try {
@@ -20,6 +25,20 @@ export async function POST(req: Request) {
      { status: 400 },
    );
  }
+  if (typeof body.styleReferenceImage === "string") {
+    if (!body.styleReferenceImage.startsWith("data:image/")) {
+      return NextResponse.json(
+        { error: "styleReferenceImage must be a data:image/... base64 URL" },
+        { status: 400 },
+      );
+    }
+    if (body.styleReferenceImage.length > MAX_STYLE_REF_BYTES) {
+      return NextResponse.json(
+        { error: `styleReferenceImage exceeds ${MAX_STYLE_REF_BYTES} bytes` },
+        { status: 413 },
+      );
+    }
+  }

  try {
    const config = loadEngineConfig();
@@ -2,6 +2,14 @@

 import { useRouter } from "next/navigation";
 import { useEffect, useRef, useState } from "react";
+import { track } from "@/lib/analytics";
+import {
+  ART_STYLES,
+  GENDERS,
+  PACINGS,
+  PLOT_STYLES,
+  type Gender,
+} from "@/lib/options";

 /* ============================================================================
   InfiPlot · 首页（编辑式视觉风格 · 居中构图，呼应低保真原型）
@@ -20,8 +28,6 @@ import { useEffect, useRef, useState } from "react";
   ========================================================================== */


-type Gender = "男性向" | "女性向";
-
 const EXAMPLE_PHRASES: Record<Gender, string[]> = {
  男性向: [
    "从小一起长大的青梅竹马，突然红着脸向我告白",
@@ -44,82 +50,30 @@ type Opt = {
 };

 const OPTS: Opt[] = [
-  { label: "性向", items: ["男性向", "女性向"] },
-  {
-    label: "绘画风格",
-    modal: true,
-    items: [
-      "自动",
-      "古典厚涂油画 (学术奇幻)",
-      "极简中国水墨 (Image 0参考升级版)",
-      "浮世绘木刻 (美人画升级)",
-      "莫高窟壁画风 (敦煌学)",
-      "细密画 (波斯/伊斯兰风)",
-      "镶嵌画 (拜占庭/马赛克)",
-      "彩绘玻璃 (哥特风)",
-      "吉卜力治愈手绘 (Image 4参考)",
-      "京阿尼细腻日常 (Image 5参考)",
-      "新海诚唯美光影 (Image 2参考)",
-      "赛博朋克 / 赛璐珞二次元",
-      "Galgame CG 梦幻光影",
-      "3D 动漫电影质感",
-      "蒸汽波 (Vaporwave) 赛璐珞",
-      "极简矢量插画 (Minimalist Vector)",
-      "低多边形 (Low Poly)",
-      "双重曝光 (Double Exposure)",
-      "波普艺术 (Pop Art)",
-      "故障艺术 (Glitch Art)",
-      "瑞士平面设计 (Typography-Centric)",
-      "剪纸艺术 (Papercut)",
-      "科幻：太阳朋克 (Solar Punk)",
-      "奇幻：爱手艺 (Lovecraftian Horror)",
-      "现代惊悚：霓虹剪影 (Urban Noir)",
-      "温馨推理：英式村庄 (Cozy Mystery)",
-      "哥特言情：庄园废墟 (Gothic Romance)",
-      "格林童话：暗黑森林 (Fairytale Noir)",
-      "废土科幻 (Post-Apocalyptic)",
-      "都市幻想：隐形世界 (Urban Fantasy)",
-      "文字与图形：抽象主义 (BookPosterLayout)"
-    ],
-  },
-  { label: "剧情风格", items: ["平铺直叙", "多线转折", "悬疑烧脑", "治愈日常"], defaultIndex: 1 },
+  { label: "性向", items: [...GENDERS] },
+  { label: "绘画风格", modal: true, items: [...ART_STYLES] },
+  { label: "剧情风格", items: [...PLOT_STYLES], defaultIndex: 1 },
  { label: "语音配音", items: ["关闭", "开启"], defaultIndex: 1 },
-  { label: "内容节奏", items: ["慢热细腻", "紧凑爽快"], defaultIndex: 1 },
+  { label: "内容节奏", items: [...PACINGS], defaultIndex: 1 },
 ];

 type StoryContent = { title: string; outline: string; style: string; tags: string[] };

 const STYLE_MAP: Record<string, string> = {
-  "古典厚涂油画 (学术奇幻)": "Dark fantasy oil painting style, grand clockwork steampunk city built into a mountain range at twilight, immense gothic spires with glowing green lamps, complex gears and platforms. Richly detailed, impasto texture, dramatic academic lighting. Horizontal cinematic composition.",
-  "极简中国水墨 (Image 0参考升级版)": "Minimalist Chinese ink wash style, vertical sea of clouds and distant jagged peaks. Ethereal, sparse composition with poetic brushstrokes, monochrome palette with subtle blue hints. Large blank mist area for copy space.",
-  "浮世绘木刻 (美人画升级)": "Ukiyo-e woodblock print style, majestic waves and Mount Fuji visible through cherry branches. Bold outlines, flat colors with paper texture, ancient and mystical atmosphere.",
-  "莫高窟壁画风 (敦煌学)": "Dunhuang fresco style, celestial patterns, stylized lotus flowers and floating geometric patterns on an aged stucco wall. Muted, oxidized mineral colors, delicate line art, historical and divine ambiance.",
-  "细密画 (波斯/伊斯兰风)": "Persian miniature style, ornate vertical tiled garden pavilion surrounded by tall cypress trees and complex geometric mosaics. High detail, jewel-like colors, flattened perspective, decorative borders.",
-  "镶嵌画 (拜占庭/马赛克)": "Byzantine mosaic style, highly detailed mosaic pattern of glittering gold and deep blue tiles, spiritual and ancient feel, flat decorative background.",
-  "彩绘玻璃 (哥特风)": "Stained glass style, tall gothic archways and trefoils. Vibrant, translucent jewel colors, bold black leading lines. The image looks like an ancient cathedral stained glass window panel.",
-  "吉卜力治愈手绘 (Image 4参考)": "Ghibli hand-painted watercolor style, a vast wildflower meadow hill under a bright blue sky with fluffy clouds, a fantastical airship flying in the distance. Natural daylight, soft washes, nostalgic feel.",
-  "京阿尼细腻日常 (Image 5参考)": "KyoAni anime style, fine line art, warm indoor lighting contrasting the cool moonlight outside, rain streaks on a tall window. Deep emotional atmosphere, delicate light and shadow reflections.",
-  "新海诚唯美光影 (Image 2参考)": "Makoto Shinkai anime style, hyper-detailed, towering dramatic night starry sky with a descending comet trail, glowing cherry tree branches in the foreground. Brilliant lighting effects, vivid colors.",
-  "赛博朋克 / 赛璐珞二次元": "Cyberpunk anime style, cel-shaded animation, rainy night streets of a dense neon-drenched futuristic megacity with towering skyscrapers. Hard edges, high saturation, sharp contrast.",
-  "Galgame CG 梦幻光影": "High-quality Galgame CG illustration, dreamlike beach scene at sunset with sparkling waves rolling in. Pastel colors, bloom lighting, clean composition, soft focus.",
-  "3D 动漫电影质感": "Cinematic 3D animated film style, a rustic wooden hangar at sunrise with volumetric lighting, warm golden hour colors, deep textures, cinematic composition.",
-  "蒸汽波 (Vaporwave) 赛璐珞": "Vaporwave aesthetic, anime style, a geometric pink grid floor leading to a palm tree silhouette, neon pink sunset over a purple ocean in the background. Glitch effects, retro pastel colors.",
-  "极简矢量插画 (Minimalist Vector)": "Minimalist vector illustration style, geometric dunes, flat warm colors, clean lines under a massive rising sun, elegant minimalist composition.",
-  "低多边形 (Low Poly)": "Low poly art style, crystalline formations on a high mountain ridge under a towering, faceted starry night sky. Sharp polygon edges, ambient cool colors.",
-  "双重曝光 (Double Exposure)": "Digital double exposure portrait style, forest trees and a cascading waterfall double exposed, high contrast black and white composition, elegant and moody atmosphere.",
-  "波普艺术 (Pop Art)": "Pop Art style illustration, bold comic book dot patterns, halftone screens, loud speech bubbles, bold black outlines, high-saturation contrasting colors.",
-  "故障艺术 (Glitch Art)": "Glitch art style, colorful data corruption, pixel sorting, and digital artifacts in cyan, magenta, and yellow. Cybernetic, high-tech and moody atmosphere.",
-  "瑞士平面设计 (Typography-Centric)": "Modern Swiss graphic design style, vertical minimalist composition, bold geometric grids, red, black, and white flat color blocks.",
-  "剪纸艺术 (Papercut)": "Multilayered papercut art style, 3D landscape of a deep forest and a fairytale castle, made of staggered paper layers with intricate cutouts. Backlighting, soft shadows.",
-  "科幻：太阳朋克 (Solar Punk)": "Solar Punk art style, a sustainable futuristic city integrated with vertical gardens and green balconies, clean solar panels and wind turbines, bright optimistic sunlight.",
-  "奇幻：爱手艺 (Lovecraftian Horror)": "Dark cosmic horror illustration, desolate rocky shore, towering ancient eldritch clouds descending from a stormy sky. Moody, muted cool colors, visible brushstrokes.",
-  "现代惊悚：霓虹剪影 (Urban Noir)": "Modern urban noir, wet narrow alleyway under a vertical buzzing neon sign, dark puddles, high contrast, cinematic noir lighting, deep shadows.",
-  "温馨推理：英式村庄 (Cozy Mystery)": "Cozy mystery book cover illustration, a charming, warm English village scene at night with thatched roofs, snow falling, and warm bookstore lights.",
-  "哥特言情：庄园废墟 (Gothic Romance)": "Gothic romance illustration, desolate moonlit ruins of a grand gothic manor on a foggy cliff, misty atmosphere, melancholic blue and grey tones.",
-  "格林童话：暗黑森林 (Fairytale Noir)": "Dark fairytale illustration, massive ancient forest with towering twisted claw-like trees. Grimm's style, classical woodcut illustration, mood of awe and dread.",
-  "废土科幻 (Post-Apocalyptic)": "Post-apocalyptic landscape, vast desert wasteland with the rusted remains of an overgrown highway and ruined skyscrapers under a dusty orange sunset sky.",
-  "都市幻想：隐形世界 (Urban Fantasy)": "Urban fantasy concept art, a hidden glowing magical pathway underneath a busy modern pedestrian bridge in a rain-streaked metropolitan city, magical blue sparks.",
-  "文字与图形：抽象主义 (BookPosterLayout)": "Abstract geometric poster layout, minimalist line-art integrated into a vertical arrangement of intersecting lines, circles, and curves in a gradient of emerald green and deep blue."
+  "古典厚涂油画": "Dark fantasy oil painting style, grand clockwork steampunk city built into a mountain range at twilight, immense gothic spires with glowing green lamps, complex gears and platforms. Richly detailed, impasto texture, dramatic academic lighting. Horizontal cinematic composition.",
+  "极简中国水墨": "Minimalist Chinese ink wash style, vertical sea of clouds and distant jagged peaks. Ethereal, sparse composition with poetic brushstrokes, monochrome palette with subtle blue hints. Large blank mist area for copy space.",
+  "浮世绘木刻": "Ukiyo-e woodblock print style, majestic waves and Mount Fuji visible through cherry branches. Bold outlines, flat colors with paper texture, ancient and mystical atmosphere.",
+  "莫高窟壁画": "Dunhuang fresco style, celestial patterns, stylized lotus flowers and floating geometric patterns on an aged stucco wall. Muted, oxidized mineral colors, delicate line art, historical and divine ambiance.",
+  "波斯细密画": "Persian miniature style, ornate vertical tiled garden pavilion surrounded by tall cypress trees and complex geometric mosaics. High detail, jewel-like colors, flattened perspective, decorative borders.",
+  "吉卜力治愈手绘": "Ghibli hand-painted watercolor style, a vast wildflower meadow hill under a bright blue sky with fluffy clouds, a fantastical airship flying in the distance. Natural daylight, soft washes, nostalgic feel.",
+  "京阿尼细腻日常": "KyoAni anime style, fine line art, warm indoor lighting contrasting the cool moonlight outside, rain streaks on a tall window. Deep emotional atmosphere, delicate light and shadow reflections.",
+  "新海诚唯美光影": "Makoto Shinkai anime style, hyper-detailed, towering dramatic night starry sky with a descending comet trail, glowing cherry tree branches in the foreground. Brilliant lighting effects, vivid colors.",
+  "Galgame CG": "High-quality Galgame CG illustration, dreamlike beach scene at sunset with sparkling waves rolling in. Pastel colors, bloom lighting, clean composition, soft focus.",
+  "3D 动漫电影": "Cinematic 3D animated film style, a rustic wooden hangar at sunrise with volumetric lighting, warm golden hour colors, deep textures, cinematic composition.",
+  "赛博朋克": "Cyberpunk anime style, cel-shaded animation, rainy night streets of a dense neon-drenched futuristic megacity with towering skyscrapers. Hard edges, high saturation, sharp contrast.",
+  "蒸汽波": "Vaporwave aesthetic, anime style, a geometric pink grid floor leading to a palm tree silhouette, neon pink sunset over a purple ocean in the background. Glitch effects, retro pastel colors.",
+  "哥特庄园": "Gothic romance illustration, desolate moonlit ruins of a grand gothic manor on a foggy cliff, misty atmosphere, melancholic blue and grey tones.",
+  "废土科幻": "Post-apocalyptic landscape, vast desert wasteland with the rusted remains of an overgrown highway and ruined skyscrapers under a dusty orange sunset sky.",
 };

 /* 每个性向 24 篇预设剧情（与封面 /home/{m|f}{i}.webp 按索引一一对应）。
@@ -915,14 +869,34 @@ function StyleModal({
  value,
  onPick,
  onClose,
+  customStyleGuide,
+  setCustomStyleGuide,
+  styleOverrides,
+  setStyleOverrides,
+  customStyleRefImage,
+  setCustomStyleRefImage,
 }: {
  items: string[];
  value: number;
  onPick: (i: number) => void;
  onClose: () => void;
+  customStyleGuide: string;
+  setCustomStyleGuide: (s: string) => void;
+  styleOverrides: Record<string, string>;
+  setStyleOverrides: (o: Record<string, string>) => void;
+  customStyleRefImage: string;
+  setCustomStyleRefImage: (s: string) => void;
 }) {
  const [q, setQ] = useState("");
  const [shown, setShown] = useState(false);
+  // Inline editing：editingIdx === i 时该卡片的 prompt 框变成可编辑 textarea。
+  // 列表保持原位（不跳新页面），其他卡片继续可见——用户随时可以取消并切到别处。
+  const [editingIdx, setEditingIdx] = useState<number | null>(null);
+  const [draft, setDraft] = useState("");
+  // 上传 / 解析参考图的瞬时状态——失败/进行中提示只在此次弹窗内可见。
+  const [parsing, setParsing] = useState(false);
+  const [parseError, setParseError] = useState<string | null>(null);
+  const fileInputRef = useRef<HTMLInputElement>(null);
  useEffect(() => {
    const id = requestAnimationFrame(() => setShown(true));
    return () => cancelAnimationFrame(id);
@@ -931,7 +905,130 @@ function StyleModal({
    setShown(false);
    setTimeout(onClose, 280);
  };
-  const list = items.map((name, i) => ({ name, i })).filter((x) => x.name.includes(q.trim()));
+  const startEditing = (i: number, currentPrompt: string) => {
+    setEditingIdx(i);
+    setDraft(currentPrompt);
+  };
+  const cancelEditing = () => {
+    setEditingIdx(null);
+    setDraft("");
+  };
+  const saveEditing = () => {
+    if (editingIdx === null) return;
+    const targetName = items[editingIdx];
+    const t = draft.trim();
+    if (!targetName || !t) return;
+    if (targetName === "自定义") {
+      setCustomStyleGuide(t);
+    } else {
+      // STYLE_MAP 这个 source-of-truth 不动；只往 in-memory overrides 写一条。
+      setStyleOverrides({ ...styleOverrides, [targetName]: t });
+    }
+    onPick(editingIdx);
+    setEditingIdx(null);
+    close();
+  };
+  const resetOverride = (name: string) => {
+    const next = { ...styleOverrides };
+    delete next[name];
+    setStyleOverrides(next);
+    setDraft(STYLE_MAP[name] ?? "");
+  };
+
+  // 客户端把上传的图片缩到 512px 长边 + webp(0.85)，base64 通常落在 30-80KB。
+  // 必须客户端做：(1) 上传 / 后续 /api/scene 都会带这串，包不能太大；
+  //              (2) Runware referenceImages 支持 base64，无需另外加 upload 端点。
+  const resizeImageToDataUrl = async (file: File): Promise<string> => {
+    const dataUrl = await new Promise<string>((resolve, reject) => {
+      const r = new FileReader();
+      r.onload = () => resolve(String(r.result));
+      r.onerror = () => reject(new Error("读取文件失败"));
+      r.readAsDataURL(file);
+    });
+    const img = await new Promise<HTMLImageElement>((resolve, reject) => {
+      const i = new Image();
+      i.onload = () => resolve(i);
+      i.onerror = () => reject(new Error("无法解码图片"));
+      i.src = dataUrl;
+    });
+    const MAX_DIM = 512;
+    const scale = Math.min(1, MAX_DIM / Math.max(img.width, img.height));
+    const w = Math.round(img.width * scale);
+    const h = Math.round(img.height * scale);
+    const canvas = document.createElement("canvas");
+    canvas.width = w;
+    canvas.height = h;
+    const ctx = canvas.getContext("2d");
+    if (!ctx) throw new Error("Canvas 2D context unavailable");
+    ctx.drawImage(img, 0, 0, w, h);
+    // webp 比 jpeg 体积更小一些；浏览器全支持。降级到 jpeg 作为兜底。
+    let out = canvas.toDataURL("image/webp", 0.85);
+    if (!out.startsWith("data:image/webp")) {
+      out = canvas.toDataURL("image/jpeg", 0.85);
+    }
+    return out;
+  };
+
+  const handleUploadStyleImage = async (file: File) => {
+    setParseError(null);
+    if (!file.type.startsWith("image/")) {
+      setParseError("只支持图片文件");
+      return;
+    }
+    setParsing(true);
+    try {
+      const resized = await resizeImageToDataUrl(file);
+      const res = await fetch("/api/parse-style-image", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ imageDataUrl: resized }),
+      });
+      if (!res.ok) {
+        const j = (await res.json().catch(() => ({}))) as { error?: string };
+        throw new Error(j.error ?? `${res.status}`);
+      }
+      const data = (await res.json()) as { stylePrompt: string };
+      // 收到 AI 解析后的 prompt → 覆盖正在编辑的 draft + 持久化参考图。
+      // 用户事后还可以手动改 draft（仍是 textarea）。
+      setDraft(data.stylePrompt);
+      setCustomStyleRefImage(resized);
+      track("style_image_upload", { ok: true });
+    } catch (err) {
+      const msg = err instanceof Error ? err.message : "解析失败";
+      setParseError(msg);
+      track("style_image_upload", { ok: false });
+    } finally {
+      setParsing(false);
+    }
+  };
+
+  const removeStyleRefImage = () => {
+    setCustomStyleRefImage("");
+    setParseError(null);
+  };
+  // 标题取去掉括号后缀的"主名"——括号里的英文 / 「Image N参考」之类的脚注
+  // 在标题位上显示噪声太大，挪到下方 prompt 行也已经覆盖到了。两种括号都
+  // 兼容（中文「（）」和英文「()」）。
+  const stripSuffix = (s: string) => s.replace(/\s*[（(].*?[)）]\s*$/, "");
+  const q2 = q.trim();
+  const list = items
+    .map((name, i) => {
+      const base = STYLE_MAP[name] ?? "";
+      const override = styleOverrides[name];
+      return {
+        name,
+        title: stripSuffix(name),
+        // 列表里展示的是「有效 prompt」——优先 override，让用户看到自己改过的版本
+        prompt: override ?? base,
+        hasOverride: typeof override === "string" && override.length > 0,
+        i,
+      };
+    })
+    .filter((x) => {
+      if (!q2) return true;
+      const hay = (x.title + " " + x.name + " " + x.prompt).toLowerCase();
+      return hay.includes(q2.toLowerCase());
+    });
  return (
    <div
      onMouseDown={close}
@@ -943,7 +1040,7 @@ function StyleModal({
      <div
        onMouseDown={(e) => e.stopPropagation()}
        className={
-          "flex w-[1000px] max-w-[94vw] max-h-[86vh] flex-col overflow-hidden rounded-sm border border-clay-900/15 bg-cream-50 shadow-2xl shadow-clay-900/25 transition-all duration-300 " +
+          "flex w-[860px] max-w-[94vw] max-h-[86vh] flex-col overflow-hidden rounded-sm border border-clay-900/15 bg-cream-50 shadow-2xl shadow-clay-900/25 transition-all duration-300 " +
          (shown ? "opacity-100 scale-100" : "opacity-0 scale-95")
        }
      >
@@ -951,14 +1048,14 @@ function StyleModal({
          <div className="flex flex-col">
            <span className="font-serif text-xl md:text-2xl text-clay-900">选择绘画风格</span>
            <span className="text-[11px] text-clay-500 mt-1 tracking-wide">
-              默认「自动」· 由模型根据 prompt 判断风格
+              默认「自动」· 点 prompt 框旁的 ✎ 可在该风格基础上修改（默认 prompt 不会被覆盖）
            </span>
          </div>
          <div className="relative ml-auto w-[280px] max-w-[46vw]">
            <input
              value={q}
              onChange={(e) => setQ(e.target.value)}
-              placeholder="搜索风格…"
+              placeholder="搜索风格 / prompt…"
              autoFocus
              className="h-10 w-full rounded-sm border border-clay-900/15 bg-cream-100 pl-4 pr-10 font-sans text-sm text-clay-900 outline-none transition-colors focus:border-ember-500 placeholder:text-clay-400"
            />
@@ -973,27 +1070,296 @@ function StyleModal({
            <i className="fa-solid fa-xmark" />
          </button>
        </div>
-        <div className="grid grid-cols-2 gap-3 overflow-y-auto px-6 py-6 md:grid-cols-4 md:gap-4 md:px-8">
-          {list.map(({ name, i }) => (
-            <button
-              key={i}
-              type="button"
-              onClick={() => {
-                onPick(i);
-                close();
-              }}
-              className={
-                "flex h-20 items-center justify-center rounded-sm border px-3 text-center transition-all " +
-                (i === value
-                  ? "border-ember-500 bg-ember-500/5 text-ember-500"
-                  : "border-clay-900/12 text-clay-700 hover:border-clay-900/35 hover:bg-cream-100")
-              }
-            >
-              <span className="font-serif text-base md:text-lg">{name}</span>
-            </button>
-          ))}
+        <div className="flex flex-col gap-2 overflow-y-auto px-4 py-5 md:px-6 md:py-6">
+          {list.map(({ name, title, prompt, hasOverride, i }) => {
+            const isCustom = name === "自定义";
+            const selected = i === value;
+            const editable = isCustom || Boolean(STYLE_MAP[name]);
+            const isEditing = editingIdx === i;
+            return (
+              <div
+                key={i}
+                onClick={(e) => {
+                  // 编辑态下：让点击事件落在 textarea/按钮上即可，不要冒泡触发"选中关闭"。
+                  // 非编辑态下：点卡片选中此风格（自定义项点卡片直接进编辑）。
+                  if (isEditing) return;
+                  const tag = (e.target as HTMLElement).tagName;
+                  if (tag === "BUTTON" || tag === "TEXTAREA" || tag === "I") return;
+                  if (isCustom) {
+                    startEditing(i, customStyleGuide);
+                  } else {
+                    onPick(i);
+                    close();
+                  }
+                }}
+                className={
+                  "flex items-start gap-4 rounded-sm border px-3 py-3 md:px-4 md:py-3.5 text-left transition-all " +
+                  (isEditing
+                    ? "border-ember-500 bg-cream-50 cursor-default"
+                    : selected
+                      ? "border-ember-500 bg-ember-500/5 cursor-pointer"
+                      : "border-clay-900/12 hover:border-clay-900/35 hover:bg-cream-100 cursor-pointer")
+                }
+              >
+                <span
+                  aria-hidden
+                  className={
+                    "flex h-12 w-12 shrink-0 items-center justify-center rounded-sm border text-base " +
+                    (isCustom
+                      ? "border-ember-500/40 bg-ember-500/10 text-ember-500"
+                      : "border-clay-900/10 bg-cream-100 text-clay-400")
+                  }
+                >
+                  <i
+                    className={
+                      isCustom ? "fa-solid fa-pen-to-square" : "fa-regular fa-image"
+                    }
+                  />
+                </span>
+                <div className="flex min-w-0 flex-1 flex-col">
+                  {/* 标题（标题永远不可编辑） */}
+                  <span
+                    className={
+                      "font-serif text-base md:text-lg leading-snug flex items-center gap-2 " +
+                      (selected || isEditing ? "text-ember-500" : "text-clay-900")
+                    }
+                  >
+                    {isCustom ? "自定义 prompt" : title}
+                    {hasOverride && !isEditing && (
+                      <span
+                        className="rounded-sm border border-ember-500/40 bg-ember-500/10 px-1.5 py-0.5 font-sans text-[10px] tracking-wide text-ember-500"
+                        title="你修改过这条 prompt（仅本次会话生效，默认 prompt 不变）"
+                      >
+                        已改
+                      </span>
+                    )}
+                    {isCustom && customStyleRefImage && !isEditing && (
+                      <span
+                        className="inline-flex items-center gap-1 rounded-sm border border-ember-500/40 bg-ember-500/10 px-1.5 py-0.5 font-sans text-[10px] tracking-wide text-ember-500"
+                        title="参考图已附带——每一幕画师都会参考这张图"
+                      >
+                        <i className="fa-regular fa-image text-[9px]" />
+                        附参考图
+                      </span>
+                    )}
+                  </span>
+
+                  {/* 「自动」语义就是「让 AI 自己判断画风」，没有 prompt 可显示也无从编辑；
+                      标题下方直接放一句解释，不渲染空文本框 / 铅笔。 */}
+                  {name === "自动" ? (
+                    <span className="font-sans text-[12px] md:text-[13px] leading-relaxed text-clay-500 mt-1">
+                      由 AI 依据世界观自动选择合适画风（无需手动指定 prompt）
+                    </span>
+                  ) : /* prompt 区域：非编辑态是看起来像文本框的只读容器；编辑态是真的 textarea */
+                  isEditing ? (
+                    <div className="mt-1.5 flex flex-col gap-2">
+                      {/* 自定义卡专属：上传画风参考图。上传后会：(1) 用 vision LLM
+                          解析成 prompt 覆盖到下方 textarea；(2) 图片本身随会话送到
+                          画师，每幕都作为 reference 锚定画风。 */}
+                      {isCustom && (
+                        <div
+                          onClick={(e) => e.stopPropagation()}
+                          className="flex flex-col gap-2"
+                        >
+                          <input
+                            ref={fileInputRef}
+                            type="file"
+                            accept="image/*"
+                            className="hidden"
+                            onChange={(e) => {
+                              const f = e.target.files?.[0];
+                              if (f) handleUploadStyleImage(f);
+                              // reset 让同一文件重选能再次触发 onChange
+                              if (fileInputRef.current) fileInputRef.current.value = "";
+                            }}
+                          />
+                          {customStyleRefImage ? (
+                            <div className="flex items-center gap-3 rounded-sm border border-clay-900/12 bg-cream-100 px-3 py-2.5">
+                              {/* eslint-disable-next-line @next/next/no-img-element */}
+                              <img
+                                src={customStyleRefImage}
+                                alt="画风参考图"
+                                className="h-14 w-14 shrink-0 rounded-sm border border-clay-900/10 object-cover"
+                              />
+                              <div className="flex min-w-0 flex-1 flex-col">
+                                <span className="font-sans text-[12px] text-clay-900">
+                                  <i className="fa-solid fa-check mr-1.5 text-ember-500" />
+                                  参考图已上传
+                                </span>
+                                <span className="font-sans text-[11px] leading-snug text-clay-500">
+                                  AI 已解析为下方 prompt；每一幕画师都会参考这张图
+                                </span>
+                              </div>
+                              <div className="flex flex-col items-end gap-1">
+                                <button
+                                  type="button"
+                                  onClick={(e) => {
+                                    e.stopPropagation();
+                                    fileInputRef.current?.click();
+                                  }}
+                                  disabled={parsing}
+                                  className="font-sans text-[11px] text-clay-500 hover:text-ember-500 transition-colors disabled:opacity-50"
+                                >
+                                  换一张
+                                </button>
+                                <button
+                                  type="button"
+                                  onClick={(e) => {
+                                    e.stopPropagation();
+                                    removeStyleRefImage();
+                                  }}
+                                  className="font-sans text-[11px] text-clay-400 hover:text-clay-900 transition-colors"
+                                >
+                                  移除
+                                </button>
+                              </div>
+                            </div>
+                          ) : (
+                            <button
+                              type="button"
+                              onClick={(e) => {
+                                e.stopPropagation();
+                                fileInputRef.current?.click();
+                              }}
+                              disabled={parsing}
+                              className={
+                                "flex items-center justify-center gap-2 rounded-sm border border-dashed px-3 py-2.5 font-sans text-[12px] transition-colors " +
+                                (parsing
+                                  ? "border-clay-900/15 bg-cream-100 text-clay-400 cursor-wait"
+                                  : "border-clay-900/25 text-clay-700 hover:border-ember-500 hover:bg-ember-500/5 hover:text-ember-500")
+                              }
+                            >
+                              {parsing ? (
+                                <>
+                                  <i className="fa-solid fa-circle-notch fa-spin text-[11px]" />
+                                  AI 正在解析参考图…
+                                </>
+                              ) : (
+                                <>
+                                  <i className="fa-regular fa-image text-[13px]" />
+                                  上传画风参考图（可选）· AI 自动解析为 prompt
+                                </>
+                              )}
+                            </button>
+                          )}
+                          {parseError && (
+                            <span className="font-sans text-[11px] text-rose-500">
+                              <i className="fa-solid fa-circle-exclamation mr-1" />
+                              {parseError}
+                            </span>
+                          )}
+                        </div>
+                      )}
+                      <textarea
+                        value={draft}
+                        onChange={(e) => setDraft(e.target.value)}
+                        onClick={(e) => e.stopPropagation()}
+                        autoFocus
+                        rows={6}
+                        placeholder={
+                          isCustom
+                            ? "示例：\nA dreamy watercolor illustration, soft pastel washes, gentle line art, nostalgic atmosphere."
+                            : ""
+                        }
+                        className="w-full resize-y rounded-sm border border-ember-500 bg-cream-50 px-3 py-2.5 font-sans text-[13px] leading-relaxed text-clay-900 outline-none placeholder:text-clay-400"
+                      />
+                      <div className="flex items-center justify-between gap-3">
+                        {!isCustom && styleOverrides[name] ? (
+                          <button
+                            type="button"
+                            onClick={(e) => {
+                              e.stopPropagation();
+                              resetOverride(name);
+                            }}
+                            className="font-sans text-xs text-clay-500 hover:text-ember-500 transition-colors"
+                          >
+                            <i className="fa-solid fa-rotate-left mr-1.5" />
+                            还原默认 prompt
+                          </button>
+                        ) : (
+                          <span />
+                        )}
+                        <div className="flex items-center gap-2">
+                          <button
+                            type="button"
+                            onClick={(e) => {
+                              e.stopPropagation();
+                              cancelEditing();
+                            }}
+                            className="px-3 py-1.5 font-sans text-xs text-clay-500 hover:text-clay-900 transition-colors"
+                          >
+                            取消
+                          </button>
+                          <button
+                            type="button"
+                            disabled={!draft.trim()}
+                            onClick={(e) => {
+                              e.stopPropagation();
+                              saveEditing();
+                            }}
+                            className={
+                              "rounded-sm px-4 py-1.5 font-sans text-xs text-cream-50 transition-colors " +
+                              (draft.trim()
+                                ? "bg-clay-900 hover:bg-ember-500"
+                                : "bg-clay-300 cursor-not-allowed")
+                            }
+                          >
+                            保存并选用
+                          </button>
+                        </div>
+                      </div>
+                    </div>
+                  ) : (
+                    /* 只读 prompt 行——无边框、纯文字，铅笔靠 padding-right 留位 */
+                    <div className="mt-1 relative">
+                      <div
+                        className={
+                          "pr-8 font-sans text-[12px] md:text-[13px] leading-relaxed line-clamp-2 " +
+                          (isCustom && !customStyleGuide
+                            ? "italic text-clay-400"
+                            : "text-clay-500")
+                        }
+                      >
+                        {isCustom
+                          ? customStyleGuide || "点击此卡片或铅笔编辑你自己的画风 prompt"
+                          : prompt || "（这个风格没有默认 prompt——点 ✎ 添加）"}
+                      </div>
+                      {editable && (
+                        <button
+                          type="button"
+                          onClick={(e) => {
+                            e.stopPropagation();
+                            startEditing(
+                              i,
+                              isCustom
+                                ? customStyleGuide
+                                : styleOverrides[name] ?? STYLE_MAP[name] ?? "",
+                            );
+                          }}
+                          title={
+                            hasOverride
+                              ? "再次编辑此 prompt"
+                              : "在此 prompt 基础上修改（默认 prompt 不会被覆盖）"
+                          }
+                          aria-label="编辑此风格 prompt"
+                          className={
+                            "absolute right-0 top-0 flex h-5 w-5 items-center justify-center rounded-sm text-[11px] transition-colors " +
+                            (hasOverride
+                              ? "text-ember-500 hover:bg-ember-500/10"
+                              : "text-clay-400 hover:bg-cream-100 hover:text-clay-700")
+                          }
+                        >
+                          <i className="fa-solid fa-pencil" />
+                        </button>
+                      )}
+                    </div>
+                  )}
+                </div>
+              </div>
+            );
+          })}
          {list.length === 0 && (
-            <div className="col-span-full py-12 text-center font-serif text-sm text-clay-400">
+            <div className="py-12 text-center font-serif text-sm text-clay-400">
              没有匹配的风格
            </div>
          )}
@@ -1012,6 +1378,17 @@ export default function HomePage() {
  const [open, setOpen] = useState<number>(-1);
  const [styleOpen, setStyleOpen] = useState(false);
  const [prompt, setPrompt] = useState("");
+  // 用户在「自定义」入口里填的 styleGuide 文本（中/英文都行，原样喂给 LLM）。
+  // 仅在内存里持有——刷新即丢，符合「这就是一次性试玩」的语义。
+  const [customStyleGuide, setCustomStyleGuide] = useState("");
+  // 用户对某个预设的 prompt 改写——只覆盖该用户本次会话，绝不污染 STYLE_MAP
+  // 这个 source-of-truth。键是预设名（如 "京阿尼细腻日常"），值是 override prompt。
+  // 选中该预设 + 有 override → 把 override 当 styleGuide 喂给画师。
+  const [styleOverrides, setStyleOverrides] = useState<Record<string, string>>({});
+  // 用户在「自定义」里上传的参考图（已客户端缩到 512px、webp base64）。
+  // 同时随 sessionStorage 透传到 /play → /api/start → session → painter，
+  // 每一幕的 painter 都会把它作为 reference slot 0，锚定整局画风。
+  const [customStyleRefImage, setCustomStyleRefImage] = useState<string>("");
  const inputRef = useRef<HTMLTextAreaElement>(null);

  // 顶部使用提示：默认展示，用户可点 × 永久关闭（localStorage:infiplot:hintClosed）。
@@ -1081,10 +1458,10 @@ export default function HomePage() {
    // 不会再出现「点开始 → 剧情和占位文字毫无关系」的体验断层。
    const userPrompt =
      prompt.trim() || (phrases[phraseIdx] ?? "").trim();
-    const artStyle = OPTS[1]!.items[sel[1] ?? 0]!;
-    const plotStyle = OPTS[2]!.items[sel[2] ?? 1]!;
+    const artStyle = ART_STYLES[sel[1] ?? 0] ?? "自动";
+    const plotStyle = PLOT_STYLES[sel[2] ?? 1] ?? "多线转折";
    const voice = OPTS[3]!.items[sel[3] ?? 1]!;
-    const pace = OPTS[4]!.items[sel[4] ?? 1]!;
+    const pace = PACINGS[sel[4] ?? 1] ?? "紧凑爽快";

    // worldSetting 顺序很重要：玩家输入若存在，必须放在最前面、单独成段、
    // 用强指令包住，否则模型会把它当成夹在风格说明里的背景参考、扩写出
@@ -1108,23 +1485,54 @@ export default function HomePage() {
    // 「自动」→ fall back to Galgame CG (project default). Plain prompts like
    // "由模型自动判断画风" are not understood by FLUX — it just paints them
    // literally, so we'd rather lock in a sensible default.
+    // 「自定义」→ 用用户在弹窗里填的原始 styleGuide，原样喂给 LLM；空内容时
+    // 退化到默认（避免传入空字符串导致 /api/start 报缺字段）。
    // TODO(自动路由): 后续实现真正的「自动」——由模型依据世界观 / 玩家 prompt
    // 选出最合适的画风，再映射到对应风格提示词，而非固定回退到 Galgame。届时
    // 同步更新风格弹窗副标题（「由模型根据 prompt 判断风格」）使文案与行为一致。
-    const DEFAULT_STYLE = "Galgame CG 梦幻光影";
-    const effectiveStyle = artStyle === "自动" ? DEFAULT_STYLE : artStyle;
-    const styleGuide = STYLE_MAP[effectiveStyle] ?? STYLE_MAP[DEFAULT_STYLE]!;
+    const DEFAULT_STYLE = "Galgame CG";
+    let styleGuide: string;
+    if (artStyle === "自定义" && customStyleGuide.trim()) {
+      styleGuide = customStyleGuide.trim();
+    } else if (styleOverrides[artStyle]?.trim()) {
+      // 用户对该预设做过 prompt 修改——优先用 override，不污染 STYLE_MAP。
+      styleGuide = styleOverrides[artStyle]!.trim();
+    } else {
+      const effectiveStyle =
+        artStyle === "自动" || artStyle === "自定义" ? DEFAULT_STYLE : artStyle;
+      styleGuide = STYLE_MAP[effectiveStyle] ?? STYLE_MAP[DEFAULT_STYLE]!;
+    }
    const audioEnabled = voice === "开启";

+    // 只有「自定义」风格选中、且确实上传了参考图时才透传——其他预设没必要
+    // 占用 reference slot（也避免 styleGuide 已经是文本预设、画师收到不相关
+    // 参考图反而产生干扰）。
+    const styleReferenceImage =
+      artStyle === "自定义" && customStyleRefImage ? customStyleRefImage : undefined;
+
+    track("game_start", {
+      source: "prompt",
+      gender,
+      art_style: artStyle,
+      plot_style: plotStyle,
+      pacing: pace,
+      tts: audioEnabled,
+      has_prompt: prompt.trim().length > 0,
+      has_style_ref: Boolean(styleReferenceImage),
+    });
+
    sessionStorage.setItem(
      "infiplot:custom",
-      JSON.stringify({ worldSetting, styleGuide, audioEnabled }),
+      JSON.stringify({ worldSetting, styleGuide, audioEnabled, styleReferenceImage }),
    );
    router.push("/play?custom=1");
  };

  const stories = STORIES[galleryGender];
  const imgPrefix = galleryGender === "女性向" ? "f" : "m";
+  const analyticsOn = Boolean(
+    process.env.NEXT_PUBLIC_UMAMI_SRC && process.env.NEXT_PUBLIC_UMAMI_WEBSITE_ID,
+  );

  // 点卡片 = 直接开始这张卡的故事，零等待：跳 /play?card=m0/f0... 由 /play
  // 页面从 /home/firstact/{name}.json 静态文件加载预烘焙好的首幕（含 scene /
@@ -1139,6 +1547,12 @@ export default function HomePage() {
      "infiplot:custom",
      JSON.stringify({ worldSetting: "", styleGuide: "", audioEnabled }),
    );
+    track("game_start", {
+      source: "curated",
+      gender: galleryGender,
+      tts: audioEnabled,
+      card: `${imgPrefix}${idx}`,
+    });
    router.push(`/play?card=${imgPrefix}${idx}`);
  };

@@ -1370,8 +1784,21 @@ export default function HomePage() {
          目前，内测期间生成的内容不会被保存，如有需要，请通过录屏或截图等方式保存游玩体验，并记录下生成故事时的提示词与风格选项等。
          <br />
          AI 生成的内容不代表本团队立场。
-          <br />
-          本站使用开源的 Umami 进行隐私友好的匿名访问统计：不使用 Cookie、不收集个人信息、不做跨站追踪。
+          {analyticsOn && (
+            <>
+              <br />
+              本站使用开源的{" "}
+              <a
+                href="https://umami.is/"
+                target="_blank"
+                rel="noopener noreferrer"
+                className="underline decoration-clay-900/20 underline-offset-2 transition-colors hover:text-clay-700"
+              >
+                Umami
+              </a>{" "}
+              进行隐私友好的匿名访问与交互统计：不使用 Cookie、不收集个人信息、不发送任何您输入的内容、不做跨站追踪。
+            </>
+          )}
        </p>
      </section>

@@ -1386,8 +1813,17 @@ export default function HomePage() {
        <StyleModal
          items={OPTS[styleRow]!.items}
          value={sel[styleRow] ?? 0}
-          onPick={(i) => setSel((s) => s.map((v, j) => (j === styleRow ? i : v)))}
+          onPick={(i) => {
+            track("art_style_select", { style: ART_STYLES[i] ?? "自动" });
+            setSel((s) => s.map((v, j) => (j === styleRow ? i : v)));
+          }}
          onClose={() => setStyleOpen(false)}
+          customStyleGuide={customStyleGuide}
+          setCustomStyleGuide={setCustomStyleGuide}
+          styleOverrides={styleOverrides}
+          setStyleOverrides={setStyleOverrides}
+          customStyleRefImage={customStyleRefImage}
+          setCustomStyleRefImage={setCustomStyleRefImage}
        />
      )}
    </div>
@@ -26,45 +26,162 @@ import type {
  StartResponse,
  VisionResponse,
 } from "@infiplot/types";
+import { track } from "@/lib/analytics";

 const MUTED_STORAGE_KEY = "infiplot:muted";

 // Cap how long we wait for the browser to download + decode a scene image
-// before giving up and rendering anyway. Runware's CDN is normally <2s for a
-// 1792×1024 PNG; tolerate up to 8s before the typewriter starts so a slow
-// download can't strand the player on a blank screen forever.
-const IMAGE_PRELOAD_TIMEOUT_MS = 8000;
+// before giving up and rendering anyway. Runware's CDN is usually <2s for a
+// 1792×1024 PNG, but over slow links / VPN / strict corp networks the same
+// download can stretch to 10-20s. The previous 8s ceiling fired in that
+// window, and because the rendered <img> has no aspect-ratio occupation, the
+// layout collapsed to a one-pixel-tall sliver until the bytes actually
+// finished arriving — "等了很久 → 一根线 → 突然出图" of the original report.
+// 20s + the <img> aspect-video fallback together remove that failure mode.
+const IMAGE_PRELOAD_TIMEOUT_MS = 20000;

 // ──────────────────────────────────────────────────────────────────────
-//  Image preload — decode the Runware URL in memory before committing to
-//  React state, so when the <img> mounts, the browser cache is warm and
-//  rendering is instant. Without this the user sees a blank canvas during
-//  the Runware-CDN download (~1-3s) after /api/scene returns.
+//  Two ways an <img> gets its pixels, picked per-URL by shouldProxy():
 //
-//  Data URIs (MOCK_IMAGE mode) and prefetched-then-cached real URLs both
-//  resolve fast / instantly. Errors and timeouts resolve quietly — better
-//  to render a broken-image than to hang the play loop indefinitely.
+//  1. DIRECT (default — no proxy configured): preload the URL with an
+//     Image() + decode() so the HTTP cache is warm and the bitmap decoded
+//     before React commits, then hand the ORIGINAL URL to <img>. This is the
+//     long-standing behavior; deployers who set no env var get exactly this
+//     and are completely unaffected by the proxy machinery below.
+//
+//  2. PROXY (opt-in — NEXT_PUBLIC_IMAGE_PROXY_URL set, host allow-listed):
+//     fetch the bytes through the Cloudflare Worker (which adds CORS and
+//     serves over stable HTTP/2), await the FULL body via .blob(), materialize
+//     a blob: URL over that local copy, and hand THAT to <img>. The <img>
+//     never sees a network-backed src, so there's no "字节还在路上" middle
+//     state and no progressive paint.
+//     Why it matters: Chrome's direct fetch of im.runware.ai sometimes hits
+//     ERR_QUIC_PROTOCOL_ERROR mid-stream, leaving partial PNG bytes that
+//     paint row-by-row. The Worker re-fetches server-to-server (no QUIC
+//     fragility) and serves over HTTP/2 — atomic and reliable. Trade-off:
+//     callers MUST revoke the blob URL when swapping it out (revokeBlobUrlFor)
+//     or the bytes leak in the JS heap.
+//
+//  Data URIs (MOCK_IMAGE mode) are already local; passed through unchanged
+//  on both paths. blobUrlCache is keyed by the ORIGINAL URL either way.
 // ──────────────────────────────────────────────────────────────────────

+// Direct-path preload: decode the URL in memory before committing to React
+// state, so when the <img> mounts the cache is warm and first paint is
+// instant. Errors / timeouts resolve quietly — better a broken <img> than a
+// hung play loop. (im.runware.ai sends no CORS header, so we can't fetch()
+// its bytes here; warming + decoding is the most the direct path can do.)
 function preloadImage(url: string): Promise<void> {
  return new Promise<void>((resolve) => {
    const img = new Image();
-    const done = () => resolve();
-    const timer = setTimeout(done, IMAGE_PRELOAD_TIMEOUT_MS);
-    img.onload = () => {
+    let timer: ReturnType<typeof setTimeout>;
+    // Single exit: clear the timeout and resolve. resolve() is idempotent, so
+    // whichever path fires first (load+decode, error, timeout) wins.
+    const done = () => {
      clearTimeout(timer);
+      resolve();
+    };
+    // Armed across BOTH network load and decode, so a hung decode still
+    // resolves quietly — better a broken <img> than a stuck play loop.
+    timer = setTimeout(done, IMAGE_PRELOAD_TIMEOUT_MS);
+    img.onload = () => {
      // .decode() forces the bitmap to be fully decoded before we proceed —
      // without it, a slow decode could still cause a flash on first paint.
      img.decode().then(done, done);
    };
-    img.onerror = () => {
-      clearTimeout(timer);
-      done();
-    };
+    img.onerror = done;
    img.src = url;
  });
 }

+// Opt-in Cloudflare Workers proxy (deploy your own — see the link in README).
+// Inlined by Next.js at build time. Empty / unset → no proxy → every URL takes
+// the direct path above, exactly as if this feature didn't exist.
+const IMAGE_PROXY_BASE = (
+  process.env.NEXT_PUBLIC_IMAGE_PROXY_URL ?? ""
+).replace(/\/$/, "");
+
+// Hostnames eligible for the proxy. Default: Runware's CDN only. Deployers who
+// point IMAGE_BASE_URL at another provider can opt that provider's image host
+// in via NEXT_PUBLIC_IMAGE_PROXY_ALLOWED_HOSTS (comma-separated). Inlined at
+// build time. Anything not on this list stays on the direct path.
+const IMAGE_PROXY_ALLOWED_HOSTS = (
+  process.env.NEXT_PUBLIC_IMAGE_PROXY_ALLOWED_HOSTS ?? "im.runware.ai"
+)
+  .split(",")
+  .map((h) => h.trim().toLowerCase())
+  .filter(Boolean);
+
+// Route a URL through the proxy only when a proxy is configured AND it's a
+// remote http(s) image on an allow-listed host. data: URIs (MOCK_IMAGE) are
+// already local; malformed URLs and any other origin fall through to direct.
+function shouldProxy(originalUrl: string): boolean {
+  if (!IMAGE_PROXY_BASE) return false;
+  if (originalUrl.startsWith("data:")) return false;
+  try {
+    const { protocol, hostname } = new URL(originalUrl);
+    if (protocol !== "https:" && protocol !== "http:") return false;
+    return IMAGE_PROXY_ALLOWED_HOSTS.includes(hostname.toLowerCase());
+  } catch {
+    return false;
+  }
+}
+
+function proxiedImageUrl(originalUrl: string): string {
+  return `${IMAGE_PROXY_BASE}/?url=${encodeURIComponent(originalUrl)}`;
+}
+
+async function fetchImageAsBlobUrl(url: string): Promise<string> {
+  if (url.startsWith("data:")) return url;
+
+  // Direct path (default): warm the cache + decode, hand back the original
+  // URL. No fetch() — im.runware.ai has no CORS, so fetch().blob() would throw.
+  if (!shouldProxy(url)) {
+    await preloadImage(url);
+    return url;
+  }
+
+  // Proxy path (opt-in): fetch through the Worker and materialize a blob: URL.
+  // On error / timeout fall back to the original URL so <img> still tries
+  // (possible progressive paint — same as the direct path, never worse).
+  const ctrl = new AbortController();
+  const timer = setTimeout(() => ctrl.abort(), IMAGE_PRELOAD_TIMEOUT_MS);
+  try {
+    const r = await fetch(proxiedImageUrl(url), { signal: ctrl.signal });
+    if (!r.ok) return url;
+    const blob = await r.blob();
+    return URL.createObjectURL(blob);
+  } catch {
+    return url;
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+// Module-level cache so speculative prefetches and the eventual commit share
+// the same in-flight fetch — no double-download per scene. Keyed by the
+// ORIGINAL CDN URL (the blob: URL it resolves to is the value). Persists for
+// the page's lifetime; entries are explicitly revoked when the scene swaps.
+const blobUrlCache = new Map<string, Promise<string>>();
+
+function getOrCreateBlobUrl(originalUrl: string): Promise<string> {
+  let p = blobUrlCache.get(originalUrl);
+  if (!p) {
+    p = fetchImageAsBlobUrl(originalUrl);
+    blobUrlCache.set(originalUrl, p);
+  }
+  return p;
+}
+
+function revokeBlobUrlFor(originalUrl: string): void {
+  const p = blobUrlCache.get(originalUrl);
+  if (!p) return;
+  blobUrlCache.delete(originalUrl);
+  p.then((u) => {
+    if (u.startsWith("blob:")) URL.revokeObjectURL(u);
+  }).catch(() => {});
+}
+
 // ──────────────────────────────────────────────────────────────────────
 //  Prefetch pool — speculative SceneResponses keyed by choice path.
 //
@@ -160,11 +277,11 @@ function prefetchScenePath(
    }
    const data = (await res.json()) as SceneResponse;

-    // Warm the browser's HTTP + image-decode cache for this URL so when the
-    // player eventually picks this choice and we render the <img>, it's
-    // instant. Don't await — let the bytes stream in the background; the
-    // transition path will await its own preloadImage() before committing.
-    void preloadImage(data.imageUrl);
+    // Kick off the blob fetch for this URL so when the player eventually
+    // picks this choice, transitioning is a no-op cache lookup instead of a
+    // fresh CDN download. Don't await — let it run in the background; the
+    // transition path awaits the same cached promise via getOrCreateBlobUrl.
+    void getOrCreateBlobUrl(data.imageUrl);

    // Recursive: if the resulting scene has exactly one change-scene exit,
    // it is a must-pass node — prefetch its child too.
@@ -284,6 +401,10 @@ function PlayInner() {
  const currentSceneRef = useRef<Scene | null>(null);
  const currentBeatRef = useRef<Beat | null>(null);
  const visitedBeatsRef = useRef<string[]>([]);
+  // Original (CDN) URL of the currently-rendered scene image. Used as the key
+  // to revoke its blob: URL when the scene swaps. We track the ORIGINAL URL,
+  // not the blob URL, because blobUrlCache is keyed by original URL.
+  const lastImageOriginalUrlRef = useRef<string | null>(null);

  const currentBeat = useMemo<Beat | null>(() => {
    if (!currentScene || !currentBeatId) return null;
@@ -307,6 +428,21 @@ function PlayInner() {
    mutedRef.current = muted;
  }, [muted]);

+  // Coarse liveness ping for active-time analytics. /play is a single SPA
+  // route, so page views alone read as ~0 duration; a 30s heartbeat (only
+  // while the tab is visible) gives Umami the timestamps to derive real
+  // engaged time. Content-free — no payload. The interval is never even
+  // scheduled unless the tracker is configured, so it's zero work when off.
+  useEffect(() => {
+    if (!process.env.NEXT_PUBLIC_UMAMI_SRC || !process.env.NEXT_PUBLIC_UMAMI_WEBSITE_ID) {
+      return;
+    }
+    const id = window.setInterval(() => {
+      if (document.visibilityState === "visible") track("play_heartbeat");
+    }, 30_000);
+    return () => window.clearInterval(id);
+  }, []);
+
  // Whenever currentBeatId changes, append it to visited (skip consecutive dups)
  useEffect(() => {
    if (!currentBeatId) return;
@@ -406,6 +542,7 @@ function PlayInner() {

  // ── Mute persistence (read is via the useState lazy initializer above) ─
  const toggleMuted = useCallback(() => {
+    track("tts_toggle", { muted: !mutedRef.current });
    setMuted((prev) => {
      const next = !prev;
      try {
@@ -441,6 +578,7 @@ function PlayInner() {
  // ── Presentation mode toggle ─────────────────────────────────────────
  const togglePresentation = useCallback(async () => {
    const entering = !presentation;
+    track("fullscreen_toggle", { on: entering });
    if (entering) {
      try {
        if (!document.fullscreenElement) {
@@ -496,7 +634,11 @@ function PlayInner() {
    const presetId = params.get("preset");
    const isCustom = params.get("custom") === "1";

-    let livePayload: { worldSetting: string; styleGuide: string } | null = null;
+    let livePayload: {
+      worldSetting: string;
+      styleGuide: string;
+      styleReferenceImage?: string;
+    } | null = null;
    if (!cardName) {
      if (presetId) {
        const p = PRESETS.find((x) => x.id === presetId);
@@ -509,8 +651,13 @@ function PlayInner() {
              worldSetting: string;
              styleGuide: string;
              audioEnabled?: boolean;
+              styleReferenceImage?: string;
+            };
+            livePayload = {
+              worldSetting: parsed.worldSetting,
+              styleGuide: parsed.styleGuide,
+              styleReferenceImage: parsed.styleReferenceImage || undefined,
            };
-            livePayload = { worldSetting: parsed.worldSetting, styleGuide: parsed.styleGuide };
            // audioEnabled 已在 useState 初始化时反向投射到 muted；这里无需再额外存。
          } catch {
            livePayload = null;
@@ -527,6 +674,11 @@ function PlayInner() {
    type PrebakedFirstAct = StartResponse & {
      worldSetting: string;
      styleGuide: string;
+      // Live /api/start path tags this on after the response (prebaked card
+      // JSONs never have one — they were rendered at build time without any
+      // user-uploaded reference). Carried into Session so /api/scene's painter
+      // anchors the same style image on every subsequent scene.
+      styleReferenceImage?: string;
      cardName?: string;
      cardTitle?: string;
      cardGender?: string;
@@ -550,15 +702,23 @@ function PlayInner() {
          }
          const data = (await r.json()) as StartResponse;
          // Live /api/start doesn't echo ws/sg back — splice in what we sent.
-          return { ...data, worldSetting: livePayload!.worldSetting, styleGuide: livePayload!.styleGuide };
+          // styleReferenceImage is similarly not in StartResponse; tag it on so
+          // the session we build below carries it for every /api/scene call.
+          return {
+            ...data,
+            worldSetting: livePayload!.worldSetting,
+            styleGuide: livePayload!.styleGuide,
+            styleReferenceImage: livePayload!.styleReferenceImage,
+          };
        });

    fetchStart
      .then(async (data) => {
-        // Decode the Runware image in memory before committing to state, so
-        // the <img> renders instantly when it mounts (same rationale as the
-        // performSceneTransition path).
-        await preloadImage(data.imageUrl);
+        // Resolve to a paintable src before committing to state. Proxy path:
+        // a fully-local blob: URL the browser paints atomically (no row-by-row
+        // "层层加载"). Direct path (default): the preloaded original URL.
+        const blobUrl = await getOrCreateBlobUrl(data.imageUrl);
+        lastImageOriginalUrlRef.current = data.imageUrl;

        const initial: Session = {
          id: data.sessionId,
@@ -573,15 +733,17 @@ function PlayInner() {
          ],
          characters: data.characters,
          storyState: data.storyState,
+          styleReferenceImage: data.styleReferenceImage,
        };
        visitedBeatsRef.current = [data.scene.entryBeatId];
        setSession(initial);
        setCurrentScene(data.scene);
        setCurrentBeatId(data.scene.entryBeatId);
-        setImageUrl(data.imageUrl);
+        setImageUrl(blobUrl);
        // beatAudioMap is populated lazily by the per-beat fetch effect once
        // currentScene becomes non-null (see fetchBeatAudio).
        setPhase("ready");
+        track("scene_reached", { scene_index: initial.history.length });
      })
      .catch((e) => setError(String(e)));
  }, [params, router]);
@@ -613,6 +775,9 @@ function PlayInner() {
  // stop paying for background scene/image generation. Empty deps → fires only
  // on unmount; it must NOT run on scene transitions, which rely on
  // consumeChoice keeping the re-rooted survivor prefetches alive.
+  // Also revoke any surviving blob: URLs so their bytes can be GC'd — the
+  // module-level blobUrlCache outlives the component but its entries should
+  // not survive the page navigation that unmounts us.
  useEffect(() => {
    const pool = poolRef.current;
    const beatAborts = beatAudioAbortRef.current;
@@ -620,6 +785,9 @@ function PlayInner() {
      clearPool(pool);
      for (const c of beatAborts.values()) c.abort();
      beatAborts.clear();
+      for (const [originalUrl] of blobUrlCache) {
+        revokeBlobUrlFor(originalUrl);
+      }
    };
  }, []);

@@ -646,13 +814,21 @@ function PlayInner() {
      const base = sessionRef.current;
      if (!base) throw new Error("Session lost mid-transition");

-      // Wait for the browser to download + decode the Runware-hosted image
-      // BEFORE committing it to state, so the <img> renders instantly when it
-      // mounts. For prefetched scenes the preloadImage call inside
-      // prefetchScenePath has already warmed the cache, so this resolves
-      // almost immediately. For cold transitions we trade an extra ~1-3s of
-      // "transitioning" overlay for an image-pop-in-from-blank flash.
-      await preloadImage(result.imageUrl);
+      // Pull full image bytes into a local blob: URL before committing. For
+      // prefetched scenes the speculative getOrCreateBlobUrl in
+      // prefetchScenePath already has this in flight (often resolved), so
+      // this is a near-instant cache lookup. For cold transitions we eat the
+      // CDN download / preload time under the "transitioning" overlay. Proxy
+      // path: the <img> then gets a fully-local blob (no progressive paint);
+      // direct path (default): the preloaded original URL.
+      const blobUrl = await getOrCreateBlobUrl(result.imageUrl);
+      // Revoke the previous scene's blob (no longer rendered) to release JS
+      // heap. New scene's original URL takes its place as "current".
+      const priorOriginal = lastImageOriginalUrlRef.current;
+      if (priorOriginal && priorOriginal !== result.imageUrl) {
+        revokeBlobUrlFor(priorOriginal);
+      }
+      lastImageOriginalUrlRef.current = result.imageUrl;

      const closedHistory = base.history.map((h, i, arr) =>
        i === arr.length - 1
@@ -675,10 +851,11 @@ function PlayInner() {
      setSession(newSession);
      setCurrentScene(result.scene);
      setCurrentBeatId(result.scene.entryBeatId);
-      setImageUrl(result.imageUrl);
+      setImageUrl(blobUrl);
      // beatAudioMap reset + per-beat fetches kicked off by the scene effect.
      setLastExitLabel(exitLabel);
      setPhase("ready");
+      track("scene_reached", { scene_index: newSession.history.length });
    } catch (e) {
      if ((e as { name?: string }).name === "AbortError") {
        setPhase("ready");
@@ -692,6 +869,19 @@ function PlayInner() {
  function onSelectChoice(choice: BeatChoice) {
    if (phase !== "ready" || !session || !currentScene) return;

+    const beatNext = currentBeatRef.current?.next;
+    const choiceIndex =
+      beatNext?.type === "choice"
+        ? beatNext.choices.findIndex((c) => c.id === choice.id)
+        : -1;
+    if (choiceIndex >= 0) {
+      track("choice_select", {
+        scene_index: session.history.length,
+        choice_index: choiceIndex,
+        kind: choice.effect.kind,
+      });
+    }
+
    if (choice.effect.kind === "advance-beat") {
      // Pure local jump. No network. No pool changes.
      setCurrentBeatId(choice.effect.targetBeatId);
@@ -760,6 +950,7 @@ function PlayInner() {
        throw new Error(j.error ?? visionRes.statusText);
      }
      const decision = (await visionRes.json()) as VisionResponse;
+      track("vision_click", { result: decision.classify });

      if (decision.classify === "insert-beat") {
        setPhase("inserting-beat");
@@ -934,10 +1125,6 @@ function PlayInner() {
          <span>第 · {String(sceneCount).padStart(3, "0")} · 幕</span>
          <span className="text-clay-300">·</span>
          <span>{String(beatCount).padStart(3, "0")} · 拍</span>
-          <span className="text-clay-300">·</span>
-          <span className="hidden sm:inline truncate max-w-[180px]">
-            {session?.id.slice(2, 14) ?? "—"}
-          </span>
        </div>
      </header>

@@ -995,11 +1182,6 @@ function PlayInner() {
          )}
        </div>
      </main>
-
-      <footer className="px-5 md:px-12 pb-6 flex items-center justify-center">
-        {/* 演示 / 静音入口已搬到画面正上方左右两侧；footer 仅留中间的「Ⅰ · Ⅰ」标记 */}
-        <div className="text-[9px] smallcaps text-clay-400 num">Ⅰ · Ⅰ</div>
-      </footer>
    </div>
  );
 }
@@ -1,16 +1,23 @@
 import Script from "next/script";

-// Privacy-friendly, cookieless page-view analytics (Umami). Both env vars
-// unset → render nothing, so local dev and forks never report to our instance.
+// Privacy-friendly, cookieless analytics (Umami). Both env vars unset →
+// render nothing, so local dev and forks never report to our instance.
+// - data-do-not-track: honour the visitor's browser Do Not Track setting.
+// - data-domains (NEXT_PUBLIC_UMAMI_DOMAINS): extra guard — the tracker only
+//   fires when the live hostname matches, so even a fork that copied our env
+//   vars stays silent on a different domain. Unset → run on all hosts.
 export function Analytics() {
  const src = process.env.NEXT_PUBLIC_UMAMI_SRC;
  const websiteId = process.env.NEXT_PUBLIC_UMAMI_WEBSITE_ID;
+  const domains = process.env.NEXT_PUBLIC_UMAMI_DOMAINS;
  if (!src || !websiteId) return null;

  return (
    <Script
      src={src}
      data-website-id={websiteId}
+      data-do-not-track="true"
+      {...(domains ? { "data-domains": domains } : {})}
      strategy="afterInteractive"
      defer
    />
@@ -2,6 +2,7 @@

 import { useRouter } from "next/navigation";
 import { useState } from "react";
+import { track } from "@/lib/analytics";

 export function CustomForm() {
  const router = useRouter();
@@ -22,6 +23,7 @@ export function CustomForm() {
      "infiplot:custom",
      JSON.stringify({ worldSetting, styleGuide }),
    );
+    track("game_start", { source: "custom" });
    router.push("/play?custom=1");
  }

@@ -276,6 +276,18 @@ export function PlayCanvas({
    });
  }

+  // Card swallows its own clicks so they never fall through to the image's
+  // vision (识图) trigger: while typing a click completes the text, a continue
+  // beat advances, and a choice beat stays inert (player must pick an option).
+  function handleCardClick() {
+    if (phase !== "ready" || !beat) return;
+    if (!typingDone) {
+      skipTypewriter();
+      return;
+    }
+    if (beat.next.type === "continue") onAdvance();
+  }
+
  const interactive = phase === "ready" && !!imageUrl;
  const dimmed = phase === "transitioning";

@@ -310,11 +322,19 @@ export function PlayCanvas({
          className="relative inline-block"
          style={{ boxShadow: fullViewport ? "none" : SHADOW }}
        >
-          {/* Background image — Runware CDN URL or data URI (mock mode) */}
+          {/* Background image — Runware CDN URL or data URI (mock mode).
+              The width/height attributes are NOT rendered dimensions (w-auto
+              h-auto + the maxWidth/maxHeight in sizeStyle still drive the
+              final layout); they give the browser an intrinsic aspect ratio
+              so that, while the bytes are still arriving from the CDN, the
+              <img> reserves a 1792:1024 box instead of collapsing to a
+              one-pixel sliver — fixes the "等很久 → 一根线 → 突然出图" jank. */}
          <img
            key={imageUrl.slice(-48)}
            ref={imgRef}
            src={imageUrl}
+            width={1792}
+            height={1024}
            alt="Generated scene"
            onClick={handleImageClick}
            draggable={false}
@@ -358,7 +378,8 @@ export function PlayCanvas({

              {(beat.narration || beat.line) && (
                <div
-                  className="pointer-events-none mx-[2%] mb-[2%] px-[3%] py-[2.2%] relative"
+                  className="pointer-events-auto mx-[2%] mb-[2%] px-[3%] py-[2.2%] relative"
+                  onClick={handleCardClick}
                  style={{
                    background: "rgba(14, 10, 6, 0.72)",
                    border: "1.5px solid rgba(175, 138, 72, 0.60)",
@@ -6,10 +6,65 @@ export type ChatMessage = {
  content: string;
 };

+// Different providers expose prompt-cache stats under different keys. We probe
+// for the three forms we've seen in the wild and fall back to total tokens
+// when no cache field exists.
+//
+//   DeepSeek (v3+)    usage.prompt_cache_hit_tokens / prompt_cache_miss_tokens
+//   OpenAI / o-series usage.prompt_tokens_details.cached_tokens
+//   Anthropic / others  usage.cache_read_input_tokens / cache_creation_input_tokens
+//   No-cache (MiMo,
+//     local Ollama, …) only prompt_tokens / completion_tokens — print those
+//                       so we still get a rough cost baseline.
+type Usage = {
+  prompt_tokens?: number;
+  completion_tokens?: number;
+  prompt_cache_hit_tokens?: number;
+  prompt_cache_miss_tokens?: number;
+  prompt_tokens_details?: { cached_tokens?: number };
+  cache_read_input_tokens?: number;
+  cache_creation_input_tokens?: number;
+};
+
+function summarizeUsage(tag: string, usage: Usage | undefined): string {
+  if (!usage) return `[cache] ${tag} no-usage`;
+  const prompt = usage.prompt_tokens ?? 0;
+  const completion = usage.completion_tokens ?? 0;
+  // DeepSeek-style
+  if (typeof usage.prompt_cache_hit_tokens === "number") {
+    const hit = usage.prompt_cache_hit_tokens;
+    const miss = usage.prompt_cache_miss_tokens ?? Math.max(0, prompt - hit);
+    const denom = hit + miss;
+    const rate = denom > 0 ? ((hit / denom) * 100).toFixed(1) : "n/a";
+    return `[cache] ${tag} hit=${hit} miss=${miss} rate=${rate}% completion=${completion}`;
+  }
+  // OpenAI-style
+  const oaiCached = usage.prompt_tokens_details?.cached_tokens;
+  if (typeof oaiCached === "number") {
+    const miss = Math.max(0, prompt - oaiCached);
+    const rate = prompt > 0 ? ((oaiCached / prompt) * 100).toFixed(1) : "n/a";
+    return `[cache] ${tag} hit=${oaiCached} miss=${miss} rate=${rate}% completion=${completion}`;
+  }
+  // Anthropic-style
+  if (typeof usage.cache_read_input_tokens === "number") {
+    const hit = usage.cache_read_input_tokens;
+    const create = usage.cache_creation_input_tokens ?? 0;
+    const denom = hit + create + prompt;
+    const rate = denom > 0 ? ((hit / denom) * 100).toFixed(1) : "n/a";
+    return `[cache] ${tag} hit=${hit} create=${create} miss=${prompt} rate=${rate}% completion=${completion}`;
+  }
+  // No cache field at all
+  return `[cache] ${tag} prompt=${prompt} completion=${completion} (provider didn't report cache stats)`;
+}
+
 export async function chat(
  config: ProviderConfig,
  messages: ChatMessage[],
-  opts?: { temperature?: number; responseFormat?: "json_object" | "text" },
+  opts?: {
+    temperature?: number;
+    responseFormat?: "json_object" | "text";
+    tag?: string;
+  },
 ): Promise<string> {
  const url = `${config.baseUrl.replace(/\/$/, "")}/chat/completions`;
  const body: Record<string, unknown> = {
@@ -35,7 +90,10 @@ export async function chat(
    throw new Error(`Chat API error ${res.status}: ${text}`);
  }

-  let json: { choices: { message: { content: string } }[] };
+  let json: {
+    choices: { message: { content: string } }[];
+    usage?: Usage;
+  };
  try {
    json = JSON.parse(text);
  } catch {
@@ -50,5 +108,7 @@ export async function chat(
    );
  }

+  console.log(summarizeUsage(opts?.tag ?? "chat", json.usage));
+
  return content;
 }
@@ -1,5 +1,5 @@
 export { chat } from "./chat";
 export { generateImage } from "./image";
 export type { GenerateImageOptions, GenerateImageResult } from "./image";
-export { interpretClick } from "./vision";
+export { interpretClick, analyzeImageDataUrl } from "./vision";
 export type { ChatMessage } from "./chat";
@@ -5,26 +5,46 @@ export async function interpretClick(
  config: ProviderConfig,
  imageBase64: string,
  prompt: string,
+): Promise<string> {
+  // Wrap the raw base64 in a PNG data URL — the Canvas annotator on the
+  // client encodes as PNG. analyzeImageDataUrl handles the actual request.
+  return analyzeImageDataUrl(
+    config,
+    `data:image/png;base64,${imageBase64}`,
+    prompt,
+    { responseFormat: "json_object" },
+  );
+}
+
+/**
+ * General single-image vision call. Accepts a complete data URL (preserves
+ * the source mime type, e.g. webp/jpeg) and lets the caller opt out of
+ * `response_format: json_object` for free-form text responses.
+ */
+export async function analyzeImageDataUrl(
+  config: ProviderConfig,
+  imageDataUrl: string,
+  prompt: string,
+  opts: { responseFormat?: "json_object" | "text" } = {},
 ): Promise<string> {
  const url = `${config.baseUrl.replace(/\/$/, "")}/chat/completions`;

-  const body = {
+  const body: Record<string, unknown> = {
    model: config.model,
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: prompt },
-          {
-            type: "image_url",
-            image_url: { url: `data:image/png;base64,${imageBase64}` },
-          },
+          { type: "image_url", image_url: { url: imageDataUrl } },
        ],
      },
    ],
    temperature: 0.2,
-    response_format: { type: "json_object" },
  };
+  if (opts.responseFormat === "json_object") {
+    body.response_format = { type: "json_object" };
+  }

  const timeoutCtrl = new AbortController();
  const timeoutId = setTimeout(() => timeoutCtrl.abort(), 60_000);
@@ -0,0 +1,72 @@
+// Privacy-first analytics. Sends only content-free, categorical events to
+// Umami, and only when the tracker script is actually present (gated by the
+// NEXT_PUBLIC_UMAMI_* env vars in components/Analytics.tsx). With no script
+// loaded — local dev, forks, a non-matching data-domains host, or a visitor
+// with Do Not Track — `window.umami` is undefined and every call here is a
+// silent no-op: zero runtime impact, no errors.
+//
+// RULE: never pass free text (player prompts, custom world/style guides,
+// uploaded images, vision output) or any per-user identifier. Only enums,
+// indices, counts and booleans — that is what keeps these events as
+// privacy-friendly as the cookieless page-view baseline.
+
+import type { ArtStyle, Gender, Pacing, PlotStyle } from "./options";
+
+declare global {
+  interface Window {
+    umami?: {
+      track: (event: string, data?: Record<string, unknown>) => void;
+    };
+  }
+}
+
+// Per-event payload schema. Fixing each event's allowed fields turns the RULE
+// above into a compile-time guarantee: an event simply has no slot for a prompt,
+// world/style guide or vision string, so free text can't be attached by mistake
+// (a bare `Record<string, string>` would happily accept it). Every field is a
+// literal union (shared with the selector UI via ./options), index, count or
+// boolean — never a bare `string`. `never` marks events that carry no payload.
+type AnalyticsEventData = {
+  game_start:
+    | {
+        source: "prompt";
+        gender: Gender;
+        art_style: ArtStyle;
+        plot_style: PlotStyle;
+        pacing: Pacing;
+        tts: boolean;
+        has_prompt: boolean;
+        has_style_ref: boolean;
+      }
+    | { source: "curated"; gender: Gender; tts: boolean; card: `${"m" | "f"}${number}` }
+    | { source: "custom" };
+  art_style_select: { style: ArtStyle };
+  style_image_upload: { ok: boolean };
+  scene_reached: { scene_index: number };
+  choice_select: {
+    scene_index: number;
+    choice_index: number;
+    kind: "advance-beat" | "change-scene";
+  };
+  vision_click: { result: "insert-beat" | "change-scene" };
+  tts_toggle: { muted: boolean };
+  fullscreen_toggle: { on: boolean };
+  play_heartbeat: never;
+};
+
+export type AnalyticsEvent = keyof AnalyticsEventData;
+
+// Payload is required for events that define one and forbidden for those typed
+// `never` (the conditional rest tuple collapses to `[]`), so `track("game_start")`
+// without data and `track("play_heartbeat", {...})` with data are both errors.
+export function track<E extends AnalyticsEvent>(
+  event: E,
+  ...[data]: AnalyticsEventData[E] extends never ? [] : [AnalyticsEventData[E]]
+): void {
+  if (typeof window === "undefined") return;
+  try {
+    window.umami?.track(event, data as Record<string, unknown> | undefined);
+  } catch {
+    // Analytics must never throw into the app.
+  }
+}
@@ -53,7 +53,7 @@ export async function runArchitect(
        { role: "system", content: ARCHITECT_SYSTEM },
        { role: "user", content: buildArchitectUserMessage(session) },
      ],
-      { temperature: 0.85, responseFormat: "json_object" },
+      { temperature: 0.85, responseFormat: "json_object", tag: "architect" },
    );

    const parsed = parseJsonLoose<RawStoryState>(raw);
@@ -56,7 +56,7 @@ async function runDesignLLM(
        content: buildCharacterDesignerUserMessage(charName, session),
      },
    ],
-    { temperature: 0.7, responseFormat: "json_object" },
+    { temperature: 0.7, responseFormat: "json_object", tag: "character-designer" },
  );
  return parseJsonLoose<CharacterDesignOutput>(raw);
 }
@@ -67,7 +67,7 @@ export async function runCinematographer(
        ),
      },
    ],
-    { temperature: 0.6, responseFormat: "json_object" },
+    { temperature: 0.6, responseFormat: "json_object", tag: "cinematographer" },
  );

  const parsed = parseJsonLoose<RawCinematographerOutput>(raw);
@@ -47,6 +47,13 @@ export type PainterInput = {
   * with character refs, capped at 4 total per Runware spec.
   */
  priorSceneImage?: string;
+  /**
+   * User-uploaded style reference (data URL base64). When set, it takes the
+   * highest-priority slot in referenceImages so the painting STYLE (brush /
+   * color / mood) of the user's image is anchored across every scene this
+   * session paints — even before any priorScene exists.
+   */
+  styleReferenceImage?: string;
 };

 // Pick the references we send to Runware as `referenceImages`. Priority:
@@ -59,14 +66,22 @@ export function collectReferenceImages(
  characters: Character[],
  entryBeat: Beat | undefined,
  priorSceneImage: string | undefined,
+  styleReferenceImage?: string,
 ): string[] {
  const refs: string[] = [];
  const seen = new Set<string>();

-  // Slot 0 — prior scene image for spatial continuity. Goes first because
-  // backdrop drift is the most jarring discontinuity across same-sceneKey
-  // scenes; character drift is partially masked by character archetype text
-  // in the prompt anyway.
+  // Slot 0 — user-uploaded style reference image, if any. Goes first because
+  // it anchors the whole-session painting STYLE (brush / color / mood) that
+  // the user explicitly chose. priorScene continuity comes second; character
+  // archetypes are partially covered by the prompt text anyway.
+  if (styleReferenceImage) {
+    refs.push(styleReferenceImage);
+  }
+
+  // Slot N — prior scene image for spatial continuity. Backdrop drift is the
+  // next-most jarring discontinuity across same-sceneKey scenes; character
+  // drift is partially masked by character archetype text in the prompt.
  if (priorSceneImage) {
    refs.push(priorSceneImage);
  }
@@ -140,6 +155,7 @@ export async function runPainter(
    input.onStageCharacters,
    entryBeat,
    input.priorSceneImage,
+    input.styleReferenceImage,
  );

  // Tier A — with referenceImages (priorSceneImage + character portraits).
@@ -369,7 +369,7 @@ export async function runWriter(
      { role: "system", content: WRITER_SYSTEM },
      { role: "user", content: buildWriterUserMessage(session) },
    ],
-    { temperature: 0.9, responseFormat: "json_object" },
+    { temperature: 0.9, responseFormat: "json_object", tag: "writer" },
  );

  const parsed = parseJsonLoose<RawScene>(raw);
@@ -327,6 +327,7 @@ export async function directScene(
      styleGuide: session.styleGuide,
      onStageCharacters,
      priorSceneImage: priorSceneReference,
+      styleReferenceImage: session.styleReferenceImage,
    },
    entryBeat,
  );
@@ -405,7 +406,7 @@ export async function directInsertBeat(
        content: buildInsertBeatUserMessage(session, freeformAction),
      },
    ],
-    { temperature: 0.9, responseFormat: "json_object" },
+    { temperature: 0.9, responseFormat: "json_object", tag: "insert-beat" },
  );

  const parsed = parseJsonLoose<InsertBeatPartial>(raw);
@@ -47,6 +47,7 @@ export async function startSession(
    styleGuide: req.styleGuide.trim(),
    history: [],
    characters: [],
+    styleReferenceImage: req.styleReferenceImage?.trim() || undefined,
  };

  // Stage 0 — Architect: expand the terse world/style prompt into a story
@@ -28,22 +28,55 @@ import type {
 //  the bible looks identical to every agent that consumes it.
 // ──────────────────────────────────────────────────────────────────────

+// ── Story bible — split spine / dynamic for prefix-cache friendliness ──
+//
+// SPINE = Architect-set, never updated by Writer's storyStatePatch:
+//   logline / genreTags / protagonist / castNotes
+//   → goes in the STABLE PREFIX of every Writer user message
+//
+// DYNAMIC = patched every scene by the Writer:
+//   synopsis / relationships / openThreads / nextHook
+//   → goes in the DYNAMIC SUFFIX
+//
+// Keep both sections present even when empty (固定 section) so position is
+// stable across calls — a missing section here would shift every byte after
+// it and torch the cache.
+
+export function renderStoryStateSpine(s: StoryState | undefined): string {
+  const lines: string[] = ["【故事档案 · 主轴（不变）】"];
+  lines.push(`主线（中心钩子）：${s?.logline ?? "（未设定）"}`);
+  lines.push(`题材基调：${s?.genreTags ?? "（未设定）"}`);
+  lines.push(`主角「你」：${s?.protagonist ?? "（未设定）"}`);
+  lines.push(`核心配角：${s?.castNotes ?? "（未设定）"}`);
+  return lines.join("\n");
+}
+
+export function renderStoryStateDynamic(s: StoryState | undefined): string {
+  const lines: string[] = ["【故事档案 · 当前状态（每幕更新）】"];
+  lines.push(`已发生（梗概）：${s?.synopsis ?? "（暂无）"}`);
+  lines.push(
+    `当前关系/情绪：${
+      s?.relationships?.length
+        ? "\n" + s.relationships.map((r) => `- ${r}`).join("\n")
+        : "（暂无）"
+    }`,
+  );
+  lines.push(
+    `未收的悬念/伏笔：${
+      s?.openThreads?.length
+        ? "\n" + s.openThreads.map((t) => `- ${t}`).join("\n")
+        : "（暂无）"
+    }`,
+  );
+  lines.push(`接下来要往哪走（下一个钩子方向）：${s?.nextHook ?? "（暂无）"}`);
+  return lines.join("\n");
+}
+
+// Back-compat for the Architect's own user message (it sees the full bible
+// at session start, no caching concern there yet).
 export function renderStoryState(s: StoryState | undefined): string {
  if (!s) return "";
-  const lines: string[] = ["【故事档案 / 主线记忆】"];
-  if (s.logline) lines.push(`主线（中心钩子）：${s.logline}`);
-  if (s.genreTags) lines.push(`题材基调：${s.genreTags}`);
-  if (s.protagonist) lines.push(`主角「你」：${s.protagonist}`);
-  if (s.castNotes) lines.push(`核心配角：\n${s.castNotes}`);
-  if (s.synopsis) lines.push(`已发生（梗概）：${s.synopsis}`);
-  if (s.relationships?.length) {
-    lines.push(`当前关系/情绪：\n${s.relationships.map((r) => `- ${r}`).join("\n")}`);
-  }
-  if (s.openThreads?.length) {
-    lines.push(`未收的悬念/伏笔：\n${s.openThreads.map((t) => `- ${t}`).join("\n")}`);
-  }
-  if (s.nextHook) lines.push(`接下来要往哪走（下一个钩子方向）：${s.nextHook}`);
-  return lines.join("\n");
+  return renderStoryStateSpine(s) + "\n\n" + renderStoryStateDynamic(s);
 }

 // ──────────────────────────────────────────────────────────────────────
@@ -272,74 +305,127 @@ sceneKey 设计原则（重要 — 用于跨场景视觉一致性）：

 不要输出 JSON 以外的任何文本。`;

+// Render one history entry as a stable, position-independent block. Used by
+// the Writer to dump both "completed past" (stable prefix) and "the entry the
+// player just finished" (dynamic suffix) — same format, so the model sees a
+// uniform history surface.
+function renderHistoryEntry(
+  entry: Session["history"][number],
+  index: number,
+): string {
+  const lines: string[] = [`【场景 ${index}】`];
+  if (entry.scene.sceneKey) lines.push(`  sceneKey: ${entry.scene.sceneKey}`);
+
+  const visited = entry.visitedBeatIds.length
+    ? entry.visitedBeatIds
+    : [entry.scene.entryBeatId];
+  const beatById = new Map(entry.scene.beats.map((b) => [b.id, b]));
+  const visitedBeats = visited
+    .map((id) => beatById.get(id))
+    .filter((b): b is NonNullable<typeof b> => Boolean(b));
+
+  for (const b of visitedBeats) {
+    const fragments: string[] = [];
+    if (b.narration) fragments.push(`旁白：${b.narration}`);
+    if (b.line) fragments.push(`${b.speaker ?? "?"}：${b.line}`);
+    if (fragments.length) lines.push("  " + fragments.join(" / "));
+  }
+
+  if (entry.exit) {
+    if (entry.exit.kind === "choice") {
+      lines.push(
+        `  玩家最终选择：${entry.exit.label}（去往：${entry.exit.nextSceneSeed}）`,
+      );
+    } else {
+      lines.push(`  玩家自由动作：${entry.exit.action}`);
+    }
+  }
+  return lines.join("\n");
+}
+
 export function buildWriterUserMessage(session: Session): string {
+  // ─── STABLE PREFIX ────────────────────────────────────────────────────
+  // Everything in this section is invariant across consecutive Writer calls
+  // within the session (or monotonically grows in a way that keeps the
+  // earlier bytes byte-identical). Always emit every section header — even
+  // when empty — so positions don't shift between calls.
+  //
+  // Order optimized for DeepSeek/MiMo prefix caching (64-token chunks):
+  //   1. session-immutable scalars (world / style)
+  //   2. story bible spine (Architect-set, never patched)
+  //   3. monotonically-growing lists (characters, sceneKeys)
+  //   4. history entries 0..N-2 (the last entry is what THIS call must
+  //      react to, so it lives in the dynamic suffix instead)
+  //
+  // ─── DYNAMIC SUFFIX ───────────────────────────────────────────────────
+  // Everything below changes on (almost) every call:
+  //   5. story bible dynamic patch (synopsis/threads/relationships/nextHook)
+  //   6. the just-completed entry (history[-1]) — same render format as the
+  //      stable history blocks, just preceded by a "just completed" header
+  //   7. last-beat snippet (the exact emotional cliffhanger)
+  //   8. lastExit hint
+  //   9. format reminder tail
+
  const parts: string[] = [];

-  const bible = renderStoryState(session.storyState);
-  if (bible) {
-    parts.push(bible);
-    parts.push("");
-  }
-
+  // ── 1. session scalars ────────────────────────────────────────────────
  parts.push(`世界观：${session.worldSetting}`);
  parts.push(`画风：${session.styleGuide}`);
+  parts.push("");

-  if (session.characters.length > 0) {
-    parts.push("\n已登记角色（speaker 必须用这些名字之一，或本场景新引入）：");
-    for (const c of session.characters) {
-      parts.push(`- ${c.name}`);
-    }
-  }
+  // ── 2. story bible — spine only (stable) ──────────────────────────────
+  parts.push(renderStoryStateSpine(session.storyState));
+  parts.push("");

-  const priorKeys = collectPriorSceneKeys(session);
-  if (priorKeys.length > 0) {
-    parts.push("\n已使用的 sceneKey（同一物理空间请沿用，不要新造）：");
-    for (const k of priorKeys) parts.push(`- ${k}`);
-  }
+  // ── 3a. registered characters ─────────────────────────────────────────
+  // SENTINEL pattern: header + a constant "after this line, entries follow"
+  // marker, then the entries themselves. The marker is byte-identical even
+  // when the list is empty, so adding a character only ever APPENDS bytes
+  // — earlier bytes never shift. Crucial for prefix caching: a placeholder
+  // like "（暂无）" that gets replaced by entries breaks the prefix the
+  // moment the first character is registered.
+  parts.push("已登记角色（speaker 必须用这些名字之一，或本场景新引入）：");
+  parts.push("（以下每行一个已登记角色，开场前为空。）");
+  for (const c of session.characters) parts.push(`- ${c.name}`);
+  parts.push("");

-  if (session.history.length === 0) {
-    parts.push(
-      "\n这是故事的开场。请按【故事档案】里的 nextHook 把第一幕的冷开场写出来——开场即抓人，别花笔墨铺垫世界观。写完后更新 storyStatePatch。严格以 JSON 格式返回。",
-    );
-    return parts.join("\n");
-  }
+  // ── 3b. prior sceneKeys (sentinel pattern, same rationale) ────────────
+  parts.push("已使用的 sceneKey（同一物理空间请沿用，不要新造）：");
+  parts.push("（以下每行一个已用过的 sceneKey，开场前为空。）");
+  for (const k of collectPriorSceneKeys(session)) parts.push(`- ${k}`);
+  parts.push("");

-  parts.push("\n场景历史（按时间顺序）：");
-  session.history.forEach((entry, idx) => {
-    const lines: string[] = [`【场景 ${idx + 1}】`];
-    if (entry.scene.sceneKey) lines.push(`  sceneKey: ${entry.scene.sceneKey}`);
-
-    const visited = entry.visitedBeatIds.length
-      ? entry.visitedBeatIds
-      : [entry.scene.entryBeatId];
-    const beatById = new Map(entry.scene.beats.map((b) => [b.id, b]));
-    const visitedBeats = visited
-      .map((id) => beatById.get(id))
-      .filter((b): b is NonNullable<typeof b> => Boolean(b));
-
-    for (const b of visitedBeats) {
-      const fragments: string[] = [];
-      if (b.narration) fragments.push(`旁白：${b.narration}`);
-      if (b.line) fragments.push(`${b.speaker ?? "?"}：${b.line}`);
-      if (fragments.length) lines.push("  " + fragments.join(" / "));
-    }
-
-    if (entry.exit) {
-      if (entry.exit.kind === "choice") {
-        lines.push(
-          `  玩家最终选择：${entry.exit.label}（去往：${entry.exit.nextSceneSeed}）`,
-        );
-      } else {
-        lines.push(`  玩家自由动作：${entry.exit.action}`);
-      }
-    }
-    parts.push(lines.join("\n"));
+  // ── 4. history[0..N-2] — ARCHIVED entries (sentinel, append-only) ─────
+  // CRITICAL: only the ALREADY-ARCHIVED entries (i.e. everything except
+  // history[-1]) go in the stable prefix. The last entry is still "live":
+  // its visitedBeatIds keeps growing as the player walks more beats in the
+  // current scene, and speculative prefetch triggers Writer calls that
+  // observe different snapshots of history[-1] mid-scene. Putting the live
+  // entry in the stable prefix would corrupt every Writer call's cache.
+  //
+  // Archived entries (history[0..N-2]) are immutable — once a scene is
+  // exited, its visitedBeatIds + exit are frozen. Safe to cache.
+  const archivedHistory = session.history.slice(0, -1);
+  parts.push("场景历史（按时间顺序，已完结）：");
+  parts.push("（以下每段一幕已完结的场景，开场前为空。）");
+  archivedHistory.forEach((entry, idx) => {
+    parts.push(renderHistoryEntry(entry, idx + 1));
  });
+  parts.push("");

+  // ════════════════ DYNAMIC SUFFIX 从这里开始 ═══════════════════════════
+  // 上面 ~95% 的 prompt 长度应该已经稳定可缓存。下面每次调用都会变化。
+
+  // ── 5. story bible — dynamic patch ────────────────────────────────────
+  parts.push(renderStoryStateDynamic(session.storyState));
+  parts.push("");
+
+  // ── 6. last-beat snippet (the exact emotional cliffhanger) ──
+  // The full last entry is already in the stable history block above; here
+  // we only re-emit the very last beat to sharply focus the Writer on the
+  // emotional moment to continue from. Skip the duplicate full-entry render
+  // that was here previously — it wasted ~200-500 tokens of dynamic suffix.
  const last = session.history.at(-1);
-
-  // The exact last moment the player stopped on — the new scene must continue
-  // seamlessly from this emotional beat, not reset to a neutral state.
  if (last) {
    const lastBeatId = last.visitedBeatIds.at(-1) ?? last.scene.entryBeatId;
    const lastBeat = last.scene.beats.find((b) => b.id === lastBeatId);
@@ -349,12 +435,20 @@ export function buildWriterUserMessage(session: Session): string {
      if (lastBeat.line) frag.push(`${lastBeat.speaker ?? "?"}：${lastBeat.line}`);
      if (frag.length) {
        parts.push(
-          `\n上一刻（玩家停留的最后一个画面，新场景要从这里的情绪无缝承接）：\n  ${frag.join(" / ")}`,
+          `上一刻（玩家停留的最后一个画面，新场景从这里的情绪无缝承接）：\n  ${frag.join(" / ")}`,
        );
      }
    }
  }

+  if (session.history.length === 0) {
+    parts.push(
+      "\n这是故事的开场。请按【故事档案】里的 nextHook 把第一幕的冷开场写出来——开场即抓人，别花笔墨铺垫世界观。写完后更新 storyStatePatch。严格以 JSON 格式返回。",
+    );
+    return parts.join("\n");
+  }
+
+  // ── 8. lastExit hint ──────────────────────────────────────────────────
  const lastExit = last?.exit;
  if (lastExit) {
    if (lastExit.kind === "choice") {
@@ -370,6 +464,7 @@ export function buildWriterUserMessage(session: Session): string {
    parts.push("\n无缝续写下一个场景，延续上一刻的情绪。");
  }

+  // ── 9. format reminder tail ───────────────────────────────────────────
  parts.push("写完后别忘了更新 storyStatePatch。严格以 JSON 格式返回。");
  return parts.join("\n");
 }
@@ -518,6 +613,22 @@ export const CINEMATOGRAPHER_SYSTEM = `你是视觉小说的「分镜导演」

 不要输出 JSON 以外的任何文本。`;

+// Stable hint block — invariant across every Cinematographer call in a
+// session. Front-loading this (with the session-scoped styleGuide) gives the
+// prefix cache something substantial to anchor on; without it, the per-scene
+// `sceneSummary` would land in the first content chunk and force the whole
+// user message to miss. Long enough to land beyond the 64-token chunk
+// boundary that follows the system prompt.
+const CINE_STABLE_HINT = [
+  "",
+  "以下为本次场景的输入。请基于这些信息：",
+  "1. 选择最合适的 shotType（依据 system prompt 的动态镜头策略 + entryBeatSpeaker）。",
+  "2. 写一段**只用英文**的 integratedPrompt——纯环境 + 构图 + 角色姿态/位置；服饰由画师另外通过 referenceImages 锁定，你只描述能看到的样貌与镜头。",
+  "3. 若上一场与本场 sceneKey 相同，**强调连续性**（时段/情绪/构图微调），而不是重新设定空间。",
+  "4. 严格按 system prompt 要求的 JSON schema 输出。",
+  "",
+].join("\n");
+
 export function buildCinematographerUserMessage(
  sceneSummary: string,
  styleGuide: string,
@@ -527,38 +638,53 @@ export function buildCinematographerUserMessage(
  currentSceneKey: string | undefined,
 ): string {
  const parts: string[] = [];
-  parts.push(`全局美术画风：${styleGuide}`);
-  parts.push(`\n当前场景（来自编剧）：${sceneSummary}`);

+  // ─── STABLE PREFIX ──────────────────────────────────────────────────
+  // styleGuide is session-immutable; CINE_STABLE_HINT is a true constant.
+  // Together they're long enough to cross at least one 64-token chunk
+  // boundary, so every subsequent Cinematographer call in this session can
+  // cache-hit through this block.
+  parts.push(`全局美术画风：${styleGuide}`);
+  parts.push(CINE_STABLE_HINT);
+
+  // ─── DYNAMIC SUFFIX ─────────────────────────────────────────────────
+  // Always emit every section header — even when empty — so positions don't
+  // shift between calls. (Caching of the dynamic section itself isn't
+  // expected, but stable positioning helps when adjacent calls happen to
+  // share a sceneSummary prefix.)
+  parts.push(`当前场景（来自编剧）：${sceneSummary}`);
+  parts.push("");
+
+  parts.push("开场画面里的角色及其姿态：");
  if (entryBeatActive.length > 0) {
-    parts.push("\n开场画面里的角色及其姿态：");
    for (const c of entryBeatActive) {
      parts.push(`- ${c.name}：${c.pose ?? "（无具体姿态描述）"}`);
    }
  } else {
-    parts.push("\n开场画面里没有角色（纯环境）。");
+    parts.push("（无角色，纯环境）");
  }
+  parts.push("");

  // entryBeatSpeaker drives the dynamic camera policy (see CINEMATOGRAPHER_SYSTEM).
  // "你" means the player is speaking; an NPC name means an NPC is speaking;
  // empty means no dialog (pure environment / narration beat).
  if (entryBeatSpeaker === "你") {
    parts.push(
-      '\n开场 beat 是**玩家说话**（speaker = "你"）——按动态镜头策略：medium shot，NPC 居中、做听玩家说话的姿态、看向画面外。**绝不要画出玩家**。',
+      '开场 beat 是**玩家说话**（speaker = "你"）——按动态镜头策略：medium shot，NPC 居中、做听玩家说话的姿态、看向画面外。**绝不要画出玩家**。',
    );
  } else if (entryBeatSpeaker) {
    parts.push(
-      `\n开场 beat 是 **${entryBeatSpeaker} 在对玩家说话**（speaker = "${entryBeatSpeaker}"）——按动态镜头策略：close-up 或 medium close-up，${entryBeatSpeaker} 看向画面外（看玩家），眼神交流。`,
+      `开场 beat 是 **${entryBeatSpeaker} 在对玩家说话**（speaker = "${entryBeatSpeaker}"）——按动态镜头策略：close-up 或 medium close-up，${entryBeatSpeaker} 看向画面外（看玩家），眼神交流。`,
    );
  } else {
    parts.push(
-      "\n开场 beat 没有 speaker（纯旁白/环境）——按动态镜头策略：wide establishing shot 展现环境氛围。",
+      "开场 beat 没有 speaker（纯旁白/环境）——按动态镜头策略：wide establishing shot 展现环境氛围。",
    );
  }

  if (priorSceneKey && currentSceneKey && priorSceneKey === currentSceneKey) {
    parts.push(
-      `\n注意：上一场和本场 sceneKey 都是 "${currentSceneKey}"——画师会把上一张场景图作为 referenceImages 之一锚定同一空间。你的 integratedPrompt 应该**强调连续性**，描述时段/情绪/构图的细微变化，而不是完全重新设定空间。`,
+      `\n注意：上一场和本场 sceneKey 都是 "${currentSceneKey}"——画师会把上一张场景图作为 referenceImages 之一锚定同一空间。integratedPrompt 应强调连续性。`,
    );
  }

@@ -0,0 +1,37 @@
+// Single source of truth for the home-page selector option sets. Kept as
+// `as const` so each list also yields a literal-union type: the play-start
+// UI (app/page.tsx) renders from the arrays, and the analytics schema
+// (lib/analytics.ts) types its payload fields from the unions. That shared
+// origin is what keeps the "content-free" events honest — an event field can
+// only ever be one of these fixed labels, never free-form player text.
+
+export const GENDERS = ["男性向", "女性向"] as const;
+
+export const ART_STYLES = [
+  "自动",
+  "自定义",
+  "京阿尼细腻日常",
+  "新海诚唯美光影",
+  "Galgame CG",
+  "3D 动漫电影",
+  "赛博朋克",
+  "蒸汽波",
+  "吉卜力治愈手绘",
+  "哥特庄园",
+  "废土科幻",
+  // 以下为小众/区域性画风，留作长尾选项
+  "古典厚涂油画",
+  "极简中国水墨",
+  "浮世绘木刻",
+  "莫高窟壁画",
+  "波斯细密画",
+] as const;
+
+export const PLOT_STYLES = ["平铺直叙", "多线转折", "悬疑烧脑", "治愈日常"] as const;
+
+export const PACINGS = ["慢热细腻", "紧凑爽快"] as const;
+
+export type Gender = (typeof GENDERS)[number];
+export type ArtStyle = (typeof ART_STYLES)[number];
+export type PlotStyle = (typeof PLOT_STYLES)[number];
+export type Pacing = (typeof PACINGS)[number];
@@ -206,6 +206,14 @@ export type Session = {
   * session payload created before this field existed.
   */
  storyState?: StoryState;
+  /**
+   * Optional user-uploaded style reference image (data URL — `data:image/...;base64,...`).
+   * When set, the Painter prepends it to `referenceImages` on every scene so the
+   * uploaded image anchors painting style (brush, color, mood) across the whole
+   * session. Resized client-side before upload (~512px max dim) to keep session
+   * payload small for /api/scene round-trips.
+   */
+  styleReferenceImage?: string;
 };

 // ──────────────────────────────────────────────────────────────────────
@@ -253,6 +261,21 @@ export type EngineConfig = {
 export type StartRequest = {
  worldSetting: string;
  styleGuide: string;
+  /** Optional user-uploaded style reference image — see Session.styleReferenceImage. */
+  styleReferenceImage?: string;
+};
+
+// /api/parse-style-image — vision LLM extracts a textual painting-style
+// prompt from a user-uploaded reference image. The same base64 is echoed
+// back so the client can later pass it through to /api/start.
+export type ParseStyleImageRequest = {
+  /** Data URL: `data:image/...;base64,...`. */
+  imageDataUrl: string;
+};
+
+export type ParseStyleImageResponse = {
+  /** English style prompt suitable as a styleGuide (FLUX-friendly attributes). */
+  stylePrompt: string;
 };

 export type StartResponse = {
@@ -6,6 +6,17 @@ const config: NextConfig = {
  turbopack: {
    root: __dirname,
  },
+  // /public defaults to `max-age=0, must-revalidate`; pin the stable /home/* covers + first-act JSON for 1y so browsers/CDN stop re-downloading them.
+  async headers() {
+    return [
+      {
+        source: "/home/:path*",
+        headers: [
+          { key: "Cache-Control", value: "public, max-age=31536000, immutable" },
+        ],
+      },
+    ];
+  },
 };

 export default config;
--- a/Show More
+++ b/Show More