infiplot-web

Author	SHA1	Message	Date
yuanzonghao	3bf5c92841	perf(engine): split Writer into Phase A (plan) + Phase B (beats) The Writer was the serial long pole: a single LLM call wrote the scene skeleton AND the full beats[] graph before anything downstream could start, so variable-length beat generation blew up tail latency. Split it into two calls: - Phase A (runWriterPlan): minimal skeleton the image pipeline needs (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker). Serial, on the critical path, kept lightweight. - Phase B (runWriterBeats): full beats[] + storyStatePatch, written to honor the plan. Launched immediately, overlaps the ENTIRE image pipeline (cards / cinematographer / portraits / painter), awaited last. Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long beat-writing is hidden behind image gen. A Phase B failure degrades to a single playable beat synthesized from the plan. Paired distinct-payload A/B (6 content-matched stories, baseline vs split): - median end-to-end 42.6s -> 32.2s (-24%) - mean 46.4s -> 33.1s (-29%) - worst case 74.7s -> 37.6s (halved) - no content regression: total Writer output tokens 12858 -> 13699 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 11:17:34 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	347ab297d5	feat(web,engine): custom style — image upload, AI-extract prompt, painter ref 自定义画风入口里加上传按钮：客户端把图缩到 512px webp(base64)，传到新路由 /api/parse-style-image，vision LLM 解析成英文 style prompt 回填 textarea；图本身随 sessionStorage → /api/start → Session.styleReferenceImage 透传， painter.collectReferenceImages 把它置于 slot 0，整局每一幕都作为 reference 图锚定画风（brush / color / mood），比 priorScene 优先级更高。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 19:15:19 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	298ecd4ec0	perf(engine): reorder Writer/Cinematographer prompts for prefix caching Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+ on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers match a stable byte-identical prefix from message[0]; once a single byte changes everything after it misses, so the trick is to push every session-stable bit to the front and concentrate per-call churn in a short suffix. Three coordinated changes: 1. Split storyState rendering into spine + dynamic. renderStoryStateSpine: logline / genreTags / protagonist / castNotes — Architect-set fields that StoryStatePatch literally cannot touch (the type only declares the 4 volatile ones; coerce and apply both cherry-pick), so spine bytes are guaranteed stable for the entire session. Goes in the STABLE PREFIX. renderStoryStateDynamic: synopsis / openThreads / relationships / nextHook — the Writer rewrites these every scene via storyStatePatch. Goes in the DYNAMIC SUFFIX. renderStoryState kept as a convenience wrapper that joins both, for anything that still wants the merged bible. 2. Rewrite buildWriterUserMessage with a stable/dynamic split. STABLE PREFIX (byte-identical or pure append across consecutive calls): - 世界观 / 画风 (session-immutable scalars) - story bible spine - 已登记角色 [sentinel: "（以下每行一个已登记角色，开场前为空。）"] + entries - 已使用的 sceneKey [sentinel] + entries - 场景历史，已完结 [sentinel] + archivedHistory entries ↑ archivedHistory = history.slice(0, -1), NOT the full history — the live entry (history[-1]) keeps mutating mid-scene as the player walks new beats and speculative prefetches snapshot it at different moments, so it MUST stay out of the stable prefix or the byte-monotonic invariant breaks. DYNAMIC SUFFIX: - storyState dynamic patch - last-beat snippet (the exact emotional cliffhanger to continue from) - lastExit hint - format reminder tail The previous structure put the full storyState (including patched fields) at the very top of the user message, so the very first byte of the user message changed every scene — user-side cache hit was effectively 0% across the board. 3. Sentinel pattern for variable-length sections. Every list (characters / sceneKeys / archivedHistory) now emits a constant placeholder line after its header REGARDLESS of whether it has entries. With the old "if empty print '（暂无）' else print entries" pattern, adding the first item silently rewrites those placeholder bytes — the byte at offset N moves from a Chinese parenthesis to a dash, prefix cache torched. The sentinel line is the same bytes whether the list has 0 or N items; new items are pure appends after it. 4. Rewrite buildCinematographerUserMessage. New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued right after the session-stable styleGuide line, so the stable prefix is long enough to cross at least one full 64-token chunk boundary beyond the system prompt. The per-scene inputs (sceneSummary, entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity hint) all moved into the dynamic suffix below. Verified (see [cache] / [debug-writer] logs from staging): hash of 500-byte slices of the user message is byte-identical across two same-historyLen Writer calls through the entire stable prefix; only the dynamic suffix slice differs. The remaining cache-hit gap under MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally jumps to 4096); on DeepSeek the same prefix should hit fully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 10:42:33 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	37c911f510	chore(engine): log prompt-cache hit/miss per chat call Add a `tag` option to chat() and have it print one `[cache] <tag> hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are probed in order so the same logger works across providers: - DeepSeek (v3+): usage.prompt_cache_hit_tokens / _miss_tokens - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens - Anthropic: usage.cache_read_input_tokens / _creation_* When none of them are present (MiMo / local Ollama / others) we still print prompt + completion totals so the cost baseline is visible. Tag every callsite so the log is greppable: architect / writer / character-designer / cinematographer / insert-beat This is the prerequisite for the prefix-cache reordering work that follows — without per-agent visibility there's no way to tell if a prompt rearrangement actually moved the needle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 10:42:33 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	cbabc54273	chore(engine): log worldSetting and storyBible at session start Two lines in startSession: the full worldSetting being fed to the Architect, and the resulting logline/genreTags/synopsis it produced. Cheap to keep — fires once per session — and makes it possible to tell at a glance whether a "story unrelated to my input" report is a frontend transport bug, a worldSetting layout problem, or the LLM ignoring the seed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 03:51:58 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	bed4dc5a8f	feat(web): gender-differentiated 4:5 covers + per-card styleGuide prebake - Regenerate 60 covers (30 male + 30 female) via FLUX with story-specific prompts, replacing the prior gender-shared set - Crop covers to 4:5 (960×1200) via sharp attention cover; matches new homepage card aspectRatio - Persist all 60 prompts to public/home/prompts.json so the prebake step can reuse the cover's exact visual anchor (per-card styleGuide) and the first-act scene visually carries over from the poster the player clicked - Restore /play?card= prebaked instant-play path on homepage card click - Add OpenAI-compatible image route in ai-client for non-Runware endpoints - Hide Next.js dev indicators globally; tweak F-key fullscreen label Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 02:26:35 +08:00
Zonghao Yuan	dc5ecd60f6	refactor: flatten monorepo to single web package (#12 ) Flatten the pnpm monorepo (apps/web + packages/) into a single web package at the repo root. - Move app/lib/components/scripts/public to root; drop apps/web and packages/ wrappers - Rewrite tsconfig paths (@infiplot/) to ./lib/; turbopack.root = __dirname - Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths - Regenerate pnpm-lock.yaml to drop stale workspace importers - Bump engines.node to >=22 to match wrangler Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 00:55:45 +08:00

7 Commits