perf(engine): split Writer into Phase A (plan) + Phase B (beats)

The Writer was the serial long pole: a single LLM call wrote the scene skeleton AND the full beats[] graph before anything downstream could start, so variable-length beat generation blew up tail latency. Split it into two calls: - Phase A (runWriterPlan): minimal skeleton the image pipeline needs (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker). Serial, on the critical path, kept lightweight. - Phase B (runWriterBeats): full beats[] + storyStatePatch, written to honor the plan. Launched immediately, overlaps the ENTIRE image pipeline (cards / cinematographer / portraits / painter), awaited last. Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long beat-writing is hidden behind image gen. A Phase B failure degrades to a single playable beat synthesized from the plan. Paired distinct-payload A/B (6 content-matched stories, baseline vs split): - median end-to-end 42.6s -> 32.2s (-24%) - mean 46.4s -> 33.1s (-29%) - worst case 74.7s -> 37.6s (halved) - no content regression: total Writer output tokens 12858 -> 13699 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 11:17:34 +08:00
parent 9f4dcc097b
commit 3bf5c92841
5 changed files with 443 additions and 174 deletions
@@ -4,6 +4,7 @@ import type {
  Scene,
  Session,
  StoryState,
+  WriterPlan,
 } from "@infiplot/types";

 // ══════════════════════════════════════════════════════════════════════
@@ -137,16 +138,77 @@ export function buildArchitectUserMessage(session: Session): string {
 }

 // ──────────────────────────────────────────────────────────────────────
-//  1. Writer (编剧) — drives the narrative.
+//  1. Writer (编剧) — drives the narrative, in TWO phases.
 //
-//  Emits a full Scene: beats[] graph + entryBeatId + sceneKey hint +
-//  activeCharacters per beat. Does NOT design characters (that's the
-//  CharacterDesigner's job) — only names them in `activeCharacters`.
-//  The CharacterDesigner is invoked separately for any name not yet in
-//  session.characters.
+//  Phase A (WRITER_PLAN_SYSTEM): plans the scene SKELETON only — sceneSummary
+//    + sceneKey + entry-beat roster + the full cast. No dialogue. Its output
+//    is enough for the Cinematographer + character design + Painter to start.
+//  Phase B (WRITER_BEATS_SYSTEM): expands the plan into the full beats[] graph
+//    + storyStatePatch, overlapped with the (longer) image pipeline.
+//
+//  Neither phase designs characters (that's the CharacterDesigner's job) —
+//  Phase A only NAMES them in `cast` / `entryActiveCharacters`; the
+//  CharacterDesigner is invoked for any name not yet in session.characters.
 // ──────────────────────────────────────────────────────────────────────

-export const WRITER_SYSTEM = `你是一部交互视觉小说的「编剧」。每次基于【故事档案 / 主线记忆】、世界观、画风、玩家历史、已登记角色，写出**一个完整场景的剧本**：场景背景概要 + 一组对话节拍 beats，并在最后更新主线记忆。你只负责**剧情和台词**——不设计角色形象、不写出图提示词、不做镜头调度，这些由其他 agent 完成。
+export const WRITER_PLAN_SYSTEM = `你是一部交互视觉小说的「编剧」。这是**两步生成中的第一步——场景规划**。你只产出本场景的「骨架」，**不要写任何 beat 台词**。你的产出会被立刻送去配图（分镜导演 + 生图），所以要快、要准、画面感要强。
+
+═══════════════════════════════════════════════════════════════════
+爆款心法（要在规划阶段就立住，后续展开才好看）
+═══════════════════════════════════════════════════════════════════
+- **进场即钩子**：这一场开场就要抛出新信息 / 悬念 / 冲突 / 情绪冲击，别铺陈。把这个抓人的瞬间写进 sceneSummary。
+- **兑现情绪**：按题材给观众想要的情绪（甜宠的心动、暗恋的拉扯、逆袭的扬眉、悬疑的真相一角）。
+- **人设有反差**：每个角色一个强标签 + 一个反差面。
+
+═══════════════════════════════════════════════════════════════════
+连贯性铁律（跨场景切换不能跳戏 —— 最重要）
+═══════════════════════════════════════════════════════════════════
+- 你会收到【故事档案 / 主线记忆】和上一场的结尾。**新场景必须从上一刻自然承接**——承接情绪、地点逻辑、人物状态与未收的悬念。
+- 若给了「转场种子 nextSceneSeed」，把它当作"下一场的命题"去兑现，开场要让玩家感到"这正是我上一步的结果"。
+- 沿用主线记忆里的人物关系与情绪温度，别让刚告白的人下一场形同陌路。
+
+本步你要规划（如实产出，缺一不可）：
+- **sceneSummary**：当前场景的中文概要——地点 + 时间 + 氛围 + 关键事件 + 那个抓人的开场瞬间。这是分镜导演构图的**唯一依据**，要画面感强、信息足（2–4 句）。
+- **sceneKey**：当前场景的英文 slug（如 "classroom-dusk"、"rooftop-night"）。
+- **entryBeatId**：玩家进入场景时落在哪个 beat 的 id（通常就是 "b1"）。
+- **cast**：本场景**会出场的全部 NPC 角色名**（字符串数组）。第二步写 beats 时**只能用这里列出的名字**，所以现在必须一次想全——谁会说话、谁会在画面里露面，全部列出。名字要与「已登记角色」**完全一致**；新角色起符合世界观的真名（不要"神秘女子"这种占位）。**绝不**包含玩家（你 / 我 / 主角 / protagonist / player / MC...）。
+- **entrySpeaker**：入口 beat 由谁开口 —— 取值只有三种：① 某个 NPC 真名（必须在 cast 里）② "你"（玩家本人开口）③ 留空（纯旁白 / 环境开场）。这决定镜头语言，要选准。
+- **entryActiveCharacters**：入口画面里**此刻出现的 NPC** 及其当下姿态 / 神情（中文 pose）。即使没人说话，画面里有谁也要列。**绝不**包含玩家。
+
+sceneKey 设计原则（用于跨场景视觉一致性）：
+- 同一物理空间 + 同一时段 → 必须沿用**完全相同**的英文 slug
+- 时段 / 空间变化时换 slug（"classroom-dusk" → "classroom-night" / "corridor-dusk"）
+- slug 规范：lowercase-with-dashes，2–4 个英文单词
+- 用户消息会列出已用过的 sceneKey，请优先**复用**这些已有 slug
+
+玩家视角硬规则（违反会破坏整个 galgame）：
+- 玩家是第二人称 POV，**永远不出现在任何画面里**——entryActiveCharacters 的 name **绝不允许**是「玩家 / 你 / 我 / 主角 / protagonist / player / Player / MC / I / me」任何变体。
+- entrySpeaker 只能是 NPC 真名 / "你" / 留空；其它 POV 变体一律视为错误。
+
+必须输出严格 JSON：
+{
+  "sceneSummary": "黄昏的天台，风很大。夏海背对你站在栏杆边，手里攥着一张揉皱的成绩单——她把你单独叫上来，却迟迟不开口。",
+  "sceneKey": "rooftop-dusk",
+  "entryBeatId": "b1",
+  "cast": ["夏海"],
+  "entrySpeaker": "夏海",
+  "entryActiveCharacters": [
+    { "name": "夏海", "pose": "背对你倚着栏杆，侧脸绷着，手里攥着揉皱的纸" }
+  ]
+}
+
+不要输出 JSON 以外的任何文本。`;
+
+// ──────────────────────────────────────────────────────────────────────
+//  Phase B — expands the plan into the full beats[] + storyStatePatch.
+// ──────────────────────────────────────────────────────────────────────
+
+export const WRITER_BEATS_SYSTEM = `你是一部交互视觉小说的「编剧」。这是**两步生成中的第二步——把已规划好的场景展开成完整剧本**。你会收到本场景的「规划」（场景概要 sceneSummary、sceneKey、入口 beat 的 id / speaker / 登场角色、以及本场景允许出场的角色名单 cast）。你的任务：基于规划写出玩家依次经历的对话节拍 beats，并在最后更新主线记忆。你只负责**剧情和台词**——不设计角色形象、不写出图提示词、不做镜头调度，这些由其他 agent 完成。
+
+你必须严格遵守收到的规划：
+- 必须存在一个 id 等于规划 entryBeatId 的 beat，作为玩家入口。
+- 该入口 beat 的 speaker 与登场角色（activeCharacters）要与规划一致（姿态措辞可微调，但**人物身份必须一致**）。
+- speaker 与 activeCharacters 里的 NPC 名字**只能来自规划的 cast**（或玩家 "你"）——**不要引入规划之外的新角色**。

 ═══════════════════════════════════════════════════════════════════
 爆款心法（番茄网文 / 红果短剧 / galgame 的叙事手感）—— 必须贯彻
@@ -167,11 +229,7 @@ export const WRITER_SYSTEM = `你是一部交互视觉小说的「编剧」。
 - 沿用主线记忆里的人物关系与情绪温度——别让刚告白的人下一场形同陌路，也别凭空遗忘已埋的伏笔。
 - 推进、但别重置：每一场都让主线问题往前走一点（关系变化 / 真相揭露一角 / 新悬念浮现）。

-一个场景包含：
- sceneSummary：当前场景的中文概要（地点、时间、氛围、关键事件——给后续的分镜导演看）
- sceneKey：当前场景的英文 slug（如 "classroom-dusk"、"rooftop-night"、"rainy-street"）——同一物理空间应沿用相同 slug
- beats[]：玩家依次经历的对话节拍
- entryBeatId：玩家进入场景时落在哪个 beat
+本步你只产出两样：**beats[]**（玩家依次经历的对话节拍）和 **storyStatePatch**（主线记忆更新）。sceneSummary / sceneKey / entryBeatId 已由规划给定，**不要再输出**它们。

 每个 beat 是玩家会看到的一段叙述 / 对话 / 选择。beat 之间通过 next 字段连接：
 - "continue"：玩家点击图片背景 / 按继续，自然推进到下一个 beat
@@ -183,6 +241,7 @@ choice 的 effect 有两种：

 设计原则：
 - 同场景内 beat 数自由发挥，按剧情节奏自然给出（通常 2–6 个，可以更多）
+- 入口 beat 的 id 必须等于规划给定的 entryBeatId；其余 beat id 依次自取且互不重复
 - 多用 continue，少用 choice — 选择只应出现在「真正的岔路口」
 - advance-beat 适合处理对话分支（同一场景里换个话题、追问、撒娇）
 - change-scene 适合空间/时间跳跃（出门、转身看窗外、第二天清晨）
@@ -192,12 +251,6 @@ choice 的 effect 有两种：
 - next.nextBeatId 引用的 beat 必须存在
 - choice 至少 2 个，至多 4 个，互不重复

-sceneKey 设计原则（重要 — 用于跨场景视觉一致性）：
- 同一物理空间 + 同一时段 → 必须沿用**完全相同**的英文 slug
- 时段或空间变化时换 slug（如 "classroom-dusk" → "classroom-night"，"classroom-dusk" → "corridor-dusk"）
- slug 规范：lowercase-with-dashes，2–4 个英文单词
- 已登记的历史场景 sceneKey 会在用户消息里列出，请优先**复用**这些已有 slug
-
 文本风格约束：
 - narration / line 用中文（**纯净可显示文本**，绝不要写 (叹气)(语速快) 这类标注 —— 那是给配音的，会被玩家看见）
 - sceneSummary / lineDelivery / activeCharacters[].pose 内的文字也用中文
@@ -243,11 +296,8 @@ sceneKey 设计原则（重要 — 用于跨场景视觉一致性）：
 - nextHook：基于这一场的结尾，下一场应往哪走（给"下一次的你"一个明确命题，接住本场留下的扣子）
 这些字段是写给"未来的你"的连贯性记忆，请认真写。

-必须输出严格 JSON，结构如下：
+必须输出严格 JSON，结构如下（**只含 beats 与 storyStatePatch**；sceneSummary / sceneKey / entryBeatId 由规划给定，不要输出。下例入口 beat 的 id "b1" 即规划的 entryBeatId）：
 {
-  "sceneSummary": "中文场景概要：地点+时间+氛围+关键事件",
-  "sceneKey": "classroom-dusk",
-  "entryBeatId": "b1",
  "beats": [
    {
      "id": "b1",
@@ -343,29 +393,28 @@ function renderHistoryEntry(
  return lines.join("\n");
 }

-export function buildWriterUserMessage(session: Session): string {
-  // ─── STABLE PREFIX ────────────────────────────────────────────────────
-  // Everything in this section is invariant across consecutive Writer calls
-  // within the session (or monotonically grows in a way that keeps the
-  // earlier bytes byte-identical). Always emit every section header — even
-  // when empty — so positions don't shift between calls.
-  //
-  // Order optimized for DeepSeek/MiMo prefix caching (64-token chunks):
-  //   1. session-immutable scalars (world / style)
-  //   2. story bible spine (Architect-set, never patched)
-  //   3. monotonically-growing lists (characters, sceneKeys)
-  //   4. history entries 0..N-2 (the last entry is what THIS call must
-  //      react to, so it lives in the dynamic suffix instead)
-  //
-  // ─── DYNAMIC SUFFIX ───────────────────────────────────────────────────
-  // Everything below changes on (almost) every call:
-  //   5. story bible dynamic patch (synopsis/threads/relationships/nextHook)
-  //   6. the just-completed entry (history[-1]) — same render format as the
-  //      stable history blocks, just preceded by a "just completed" header
-  //   7. last-beat snippet (the exact emotional cliffhanger)
-  //   8. lastExit hint
-  //   9. format reminder tail
-
+// Shared narrative context for BOTH Writer phases. Returns the message parts
+// from the cacheable STABLE PREFIX (sections 1-4) through the dynamic
+// transition hint (section 7), but WITHOUT the trailing phase-specific
+// instruction — each phase appends its own. Building this once and reusing it
+// keeps EACH phase's prompt prefix byte-stable across scenes for DeepSeek
+// prompt caching (Phase A and Phase B cache independently since their system
+// prompts differ, but each shares its own prefix across consecutive calls).
+//
+// ─── STABLE PREFIX ──────────────────────────────────────────────────────
+// Invariant across consecutive Writer calls within the session (or grows in a
+// way that keeps earlier bytes byte-identical). Always emit every section
+// header — even when empty — so positions don't shift between calls.
+//   1. session-immutable scalars (world / style)
+//   2. story bible spine (Architect-set, never patched)
+//   3. monotonically-growing lists (characters, sceneKeys)
+//   4. history entries 0..N-2 (the last entry is what THIS call must react
+//      to, so it lives in the dynamic suffix instead)
+// ─── DYNAMIC SUFFIX ─────────────────────────────────────────────────────
+//   5. story bible dynamic patch (synopsis/threads/relationships/nextHook)
+//   6. last-beat snippet (the exact emotional cliffhanger)
+//   7. transition hint (opening cold-open directive OR lastExit承接)
+function buildWriterContextParts(session: Session): string[] {
  const parts: string[] = [];

  // ── 1. session scalars ────────────────────────────────────────────────
@@ -423,8 +472,7 @@ export function buildWriterUserMessage(session: Session): string {
  // ── 6. last-beat snippet (the exact emotional cliffhanger) ──
  // The full last entry is already in the stable history block above; here
  // we only re-emit the very last beat to sharply focus the Writer on the
-  // emotional moment to continue from. Skip the duplicate full-entry render
-  // that was here previously — it wasted ~200-500 tokens of dynamic suffix.
+  // emotional moment to continue from.
  const last = session.history.at(-1);
  if (last) {
    const lastBeatId = last.visitedBeatIds.at(-1) ?? last.scene.entryBeatId;
@@ -441,14 +489,14 @@ export function buildWriterUserMessage(session: Session): string {
    }
  }

+  // ── 7. transition hint ────────────────────────────────────────────────
  if (session.history.length === 0) {
    parts.push(
-      "\n这是故事的开场。请按【故事档案】里的 nextHook 把第一幕的冷开场写出来——开场即抓人，别花笔墨铺垫世界观。写完后更新 storyStatePatch。严格以 JSON 格式返回。",
+      "\n这是故事的开场。请按【故事档案】里的 nextHook 把第一幕的冷开场设计出来——开场即抓人，别花笔墨铺垫世界观。",
    );
-    return parts.join("\n");
+    return parts;
  }

-  // ── 8. lastExit hint ──────────────────────────────────────────────────
  const lastExit = last?.exit;
  if (lastExit) {
    if (lastExit.kind === "choice") {
@@ -464,8 +512,59 @@ export function buildWriterUserMessage(session: Session): string {
    parts.push("\n无缝续写下一个场景，延续上一刻的情绪。");
  }

-  // ── 9. format reminder tail ───────────────────────────────────────────
-  parts.push("写完后别忘了更新 storyStatePatch。严格以 JSON 格式返回。");
+  return parts;
+}
+
+// Phase A — plan the scene skeleton (no beats). Shares the cacheable context;
+// appends a plan-only instruction tail.
+export function buildWriterPlanUserMessage(session: Session): string {
+  const parts = buildWriterContextParts(session);
+  parts.push(
+    '\n现在**只规划本场景的骨架**（不要写 beats 台词）：给出 sceneSummary（画面感强、含开场钩子）、sceneKey、entryBeatId、本场景会出场的全部角色 cast、以及入口 beat 的 entrySpeaker 与 entryActiveCharacters。严格以 JSON 格式返回。',
+  );
+  return parts.join("\n");
+}
+
+// Phase B — expand the plan into full beats[] + storyStatePatch. The plan is
+// dynamic per scene, so it goes AFTER the cacheable context (keeping Phase B's
+// prefix stable across scenes).
+export function buildWriterBeatsUserMessage(
+  session: Session,
+  plan: WriterPlan,
+): string {
+  const parts = buildWriterContextParts(session);
+
+  parts.push("");
+  parts.push("━━━ 本场景规划（上一步已定，必须严格遵守）━━━");
+  parts.push(`场景概要 sceneSummary：${plan.sceneSummary}`);
+  if (plan.sceneKey) parts.push(`sceneKey：${plan.sceneKey}`);
+  parts.push(
+    `入口 beat 的 id（entryBeatId，必须有一个此 id 的 beat 作为入口）：${plan.entryBeatId}`,
+  );
+  parts.push(
+    `入口 beat 的 speaker：${plan.entrySpeaker ? plan.entrySpeaker : "（空 —— 纯旁白 / 环境开场）"}`,
+  );
+  parts.push("入口 beat 的登场角色 activeCharacters（人物身份须一致，姿态可微调）：");
+  if (plan.entryActiveCharacters.length === 0) {
+    parts.push("（无 —— 入口画面没有 NPC）");
+  } else {
+    for (const c of plan.entryActiveCharacters) {
+      parts.push(`- ${c.name}${c.pose ? `：${c.pose}` : ""}`);
+    }
+  }
+  parts.push(
+    '本场景允许出现的角色名 cast（speaker / activeCharacters 只能用这些名字或 "你"，不要新增角色）：',
+  );
+  if (plan.cast.length === 0) {
+    parts.push("（无 NPC —— 仅旁白与玩家）");
+  } else {
+    for (const n of plan.cast) parts.push(`- ${n}`);
+  }
+  parts.push("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━");
+
+  parts.push(
+    "\n把上面的规划展开成完整的 beats[]（入口 beat 用规划的 entryBeatId / speaker / 登场角色），写完后更新 storyStatePatch。严格以 JSON 格式返回。",
+  );
  return parts.join("\n");
 }