- inferImageProtocol: match runware.ai by parsed hostname (exact match or
subdomain) instead of a bare substring, so notrunware.ai /
runware.ai.evil.com no longer misroute to the Runware protocol
- README: document the image-2-vip → OpenAI-compatible exception; correct the
Imagen wording (deprecated, EOL 2026-06-24 — not yet discontinued)
Addresses Copilot review on #30.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- TEXT/VISION: add native Anthropic & Google Gemini paths via Vercel AI SDK,
selectable through TEXT_PROVIDER / VISION_PROVIDER (default openai_compatible)
- IMAGE: expand to openai (gpt-image) / google (Nano Banana) via AI SDK
alongside the existing Runware task-array and OpenAI-compatible REST paths
- normalizeBaseUrl: tolerate URLs with/without /v1 (or /chat/completions);
append the per-protocol version segment only for bare hosts
- config: readProvider() reads *_PROVIDER; types: ProviderProtocol + provider?
- deps: @ai-sdk/anthropic, @ai-sdk/google; docs in .env.example + README
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Harden the BYO-mode signal at the API boundary (start/scene/insert-beat):
only clientTts === true drops server TTS, so a stray truthy non-boolean can't
silently disable it. Add a non-blocking prefix hint in TtsKeyModal that warns
when the pasted key prefix (tp-/sk-) mismatches the selected key type — a
mismatch hits the wrong endpoint and plays silently, the symptom BYO fixes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits
cause silent playback under concurrency. This adds an OPTIONAL path: a user
can store their own Xiaomi MiMo key in the browser and synthesize voice
client-side against Xiaomi's CORS-open endpoints. The key lives only in
localStorage and is never sent to or logged by our server; the shared server
key still serves everyone who does not opt in.
- components/TtsKeyModal.tsx: shared key modal (key-family + region picker),
reused by both the home and play pages
- app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens
in place instead of redirecting to the home page
- app/page.tsx: home page consumes the shared modal + readStoredTtsConfig
- lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets
- app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update
- docs/xiaomi-tts-key.md + README note
Verified with tsc --noEmit (exit 0).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Painter composites exactly plan.entryActiveCharacters into the entry
frame (the same roster the Cinematographer framed). Phase B is told to
reuse that roster, but only the entry beat's id was code-enforced — so an
LLM slip could leave a character in the painted frame that the runtime
entry beat says isn't there. Pin activeCharacters onto the plan's entry
beat as a last line of defense, mirroring the existing id pin.
Speaker is intentionally left to the prompt: it's coupled to line/TTS, so
overwriting it could mis-attribute or orphan Phase B's dialogue.
Addresses Copilot review feedback on PR #27.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Writer was the serial long pole: a single LLM call wrote the scene
skeleton AND the full beats[] graph before anything downstream could
start, so variable-length beat generation blew up tail latency.
Split it into two calls:
- Phase A (runWriterPlan): minimal skeleton the image pipeline needs
(sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker).
Serial, on the critical path, kept lightweight.
- Phase B (runWriterBeats): full beats[] + storyStatePatch, written to
honor the plan. Launched immediately, overlaps the ENTIRE image pipeline
(cards / cinematographer / portraits / painter), awaited last.
Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long
beat-writing is hidden behind image gen. A Phase B failure degrades to a
single playable beat synthesized from the plan.
Paired distinct-payload A/B (6 content-matched stories, baseline vs split):
- median end-to-end 42.6s -> 32.2s (-24%)
- mean 46.4s -> 33.1s (-29%)
- worst case 74.7s -> 37.6s (halved)
- no content regression: total Writer output tokens 12858 -> 13699
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses two GitHub Copilot review comments on PR #24:
- preloadImage cleared the 20s timeout in onload, before awaiting
img.decode(), leaving the decode phase unguarded — a hung decode could
keep the promise pending forever and stall the play loop. Move
clearTimeout into a single idempotent done() so the timeout stays armed
through decode() too, matching the stated "timeouts resolve quietly"
intent.
- .env.example said to leave BOTH proxy vars blank, but shipped
NEXT_PUBLIC_IMAGE_PROXY_ALLOWED_HOSTS=im.runware.ai. Only
NEXT_PUBLIC_IMAGE_PROXY_URL gates the feature; the allowlist is inert
until the URL is set. Corrected the wording, kept the self-documenting
default value.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address the Copilot review on #26.
#1 The game_start / art_style_select payload fields were typed as bare
`string`, so free text could still slip through despite the "content-free
by construction" claim. Add lib/options.ts as the single source of truth
for the selector option sets (`as const` → literal-union types), have the
home OPTS render from those arrays, and type the analytics fields from the
derived unions (gender/art_style/plot_style/pacing/style) plus a template
type for `card`. Free text now fails to compile; no casts at call sites.
#2 The /play heartbeat scheduled its 30s interval unconditionally. Gate the
effect on the same NEXT_PUBLIC_UMAMI_* env used for script injection, so
nothing is scheduled when the tracker is off (visibility check kept — a
hidden tab still never emits).
#3 choice_select no longer emits a -1 choice_index: skip the event when the
index can't be resolved instead of polluting the index distribution.
Verified with tsc (exit 0) and a throwaway negative test: free text in any
of the six fields raises TS2322, valid enum/template values compile.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instrument the play flow with 9 content-free custom events (game_start,
art_style_select, style_image_upload, scene_reached, choice_select,
vision_click, tts_toggle, fullscreen_toggle, play_heartbeat) to measure
retention, engagement depth and session duration.
Privacy is enforced by construction, not convention:
- lib/analytics.ts types each event with a discriminated union, so a
payload has no slot for free text — prompts, world guides, uploaded
images and vision output can never reach analytics (compile-time
guarantee, not a comment).
- track() no-ops without window.umami and never throws into the app.
- coarse 30s heartbeat fires only while the tab is visible.
- script stays gated on NEXT_PUBLIC_UMAMI_* env (blank → no script),
honours Do-Not-Track, and locks to an exact data-domains allowlist.
- one-line on-site disclosure with a link, shown only when tracking is on.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Next.js serves /public files with `Cache-Control: public, max-age=0,
must-revalidate`, so the home covers + first-act JSON were re-fetched on
every visit. Verified against 30 days of Vercel metrics: /home/* alone was
~62% of Fast Data Transfer egress (5.42 GB) while the files total only
~31 MB — the same bytes re-downloaded hundreds of times.
Add a headers() rule scoping `public, max-age=31536000, immutable` to
/home/:path* only; other paths keep their defaults (verified /icon.svg
still returns no-cache). Filenames under /home are stable (covers fN/mN.webp,
first-act JSON by card name), so immutable is safe; if a first-act JSON is
ever re-baked under the same name, bump a query string or purge the cache.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
b805b1d routed every scene <img> through fetch → Blob → createObjectURL to
kill QUIC progressive-paint, but in doing so added an *unconditional*
dependency on a CORS-adding proxy. That breaks the default deployment:
im.runware.ai sends no Access-Control-Allow-Origin, so a direct
fetch().blob() throws and the scene image silently fails to load for anyone
who hasn't stood up the Cloudflare Worker.
Restore the pre-b805b1d behavior as the *default* and make the proxy
strictly opt-in:
- Direct path (no env set): preloadImage() warms the HTTP cache + decodes,
then <img> uses the original https://im.runware.ai URL — as before
b805b1d. No fetch().blob(), no CORS dependency: a fresh clone just works.
- Proxy path (NEXT_PUBLIC_IMAGE_PROXY_URL set): fetch the proxied URL →
Blob → createObjectURL, exactly as b805b1d, gaining the QUIC-immune
HTTP/2 edge + atomic paint.
shouldProxy(url) gates the two paths: proxy only when a base is configured
AND the host is in NEXT_PUBLIC_IMAGE_PROXY_ALLOWED_HOSTS (default
im.runware.ai). data: / non-http / unknown-host URLs always take the direct
path. blobUrlCache + revoke logic is unchanged and safe for both paths
(revoke is a no-op on non-blob: URLs).
The Cloudflare Worker moves out of this repo into a standalone, one-click-
deployable project (infiplot-image-proxy) so the optional infra isn't
carried by every clone; .env.example and the READMEs link to it.
restore: preloadImage() helper deleted by b805b1d
add: NEXT_PUBLIC_IMAGE_PROXY_ALLOWED_HOSTS (default im.runware.ai)
remove: worker/ (moved to standalone repo)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Card click flow now serves /home/firstscene/{name}.webp from Vercel static
hosting instead of fetching im.runware.ai/... — those URLs have a finite TTL
and would silently rot. Side benefit: backfilled the 18 stories that never had
a local webp (f14-f29, m14, m29), and refreshed the 44 stale webps left over
from a pre-prebake story batch so they actually match their cover art again.
Scope is scene.imageUrl only; characters[].basePortraitUrl still points at
Runware (painter consumes it server-side as referenceImages, where a local
public path won't resolve).
localize-firstact-images.mjs:
- skip the network when the local webp is already on disk (don't re-encode
what's already correct)
- read imageUrlRemote as a fallback URL when imageUrl is already localized,
so --force can refresh from the original Runware source
- also localize scene.imageUrl alongside the top-level imageUrl
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: on a choice beat, clicking the dialogue/narration card fired
the vision ("识图") flow instead of doing nothing. Picking an option with
fast clicks that landed on the card repeatedly kicked off the expensive
/api/vision → insert-beat/scene chain — janky and confusing.
Root cause: the story-card <div> had `pointer-events-none`, so clicks
passed through to the background <img> onClick (handleImageClick), which
on choice beats calls onBackgroundClick → vision.
Fix: the card now owns its clicks (`pointer-events-auto` + handleCardClick):
- mid-typing → completes the text (VN skip affordance, unchanged)
- continue beat → advances, as before
- choice beat → no-op (no vision)
Clicking the actual scene art still triggers vision; choice buttons
already had pointer-events-auto and are unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Symptom: in Chrome on certain networks the scene <img> renders row-by-row
from top to bottom — "层层加载" — instead of appearing atomically.
Root cause (confirmed via DevTools):
- Chrome opportunistically opens HTTP/3 (QUIC) to im.runware.ai.
- QUIC streams to Runware sometimes error mid-transfer:
net::ERR_QUIC_PROTOCOL_ERROR
HTTP-level status stays 200 (response headers received), but bytes are
truncated. The browser paints whatever PNG bytes it has so far → visible
row-by-row decode.
- The earlier preloadImage()+decode() trick can't fix this — neither
HTTP-cache reuse nor sync decode helps when the bytes themselves were
never fully delivered.
Two-tier fix:
1. Client: fetch → Blob → URL.createObjectURL() (app/play/page.tsx)
- <img src> only ever points to a blob: URL whose bytes are 100%
resident in the JS heap. No network-backed src = no possibility of
progressive paint.
- Module-level blobUrlCache keys by original URL so speculative
prefetch + the eventual commit share one fetch.
- Old blobs are URL.revokeObjectURL()'d on scene swap + unmount to
release memory.
2. Network: optional Cloudflare Worker proxy (worker/)
- Browser ↔ Worker is HTTP/2 over CF edge (extremely stable).
- Worker ↔ Runware is a server-to-server fetch (no QUIC fragility,
Cloudflare's backbone handles transit).
- Worker buffers the full upstream response → client never sees a
half-stream.
- Bonus: CF edge cache (cacheEverything, 1y TTL) on Runware UUIDs;
Access-Control-Allow-Origin: * so client fetch() can't hit CORS.
- Hardened: only proxies im.runware.ai, only GET/HEAD/OPTIONS, all
other hosts/methods → 403/405.
Wired via NEXT_PUBLIC_IMAGE_PROXY_URL (inlined at build). Empty → no proxy
→ direct fetch (which still uses the blob path, just exposed to QUIC).
──────────────────────────────────────────────────────────────────────
Deploy steps (one-time, do this AFTER pulling this commit):
1. Install wrangler globally:
npm i -g wrangler
2. Log in to Cloudflare (opens browser for OAuth):
wrangler login
3. From the worker/ directory, deploy:
cd worker
wrangler deploy
wrangler will print the deployed URL, e.g.
https://infiplot-image-proxy.<your-cf-username>.workers.dev
4. Paste that URL into .env.local for local dev:
NEXT_PUBLIC_IMAGE_PROXY_URL=https://infiplot-image-proxy.<...>.workers.dev
…and into Vercel project settings (Environment Variables) for prod.
NEXT_PUBLIC_ vars are inlined at build time, so the URL bakes into
the bundle on the next deploy/dev-server restart.
5. Restart dev server (pnpm dev) so the new env baked in. Generate a
scene; Network tab should show requests going to *.workers.dev
instead of im.runware.ai, no ERR_QUIC_PROTOCOL_ERROR, image renders
atomically.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+
on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers
match a stable byte-identical prefix from message[0]; once a single
byte changes everything after it misses, so the trick is to push every
session-stable bit to the front and concentrate per-call churn in a
short suffix.
Three coordinated changes:
1. Split storyState rendering into spine + dynamic.
renderStoryStateSpine: logline / genreTags / protagonist / castNotes
— Architect-set fields that StoryStatePatch literally cannot touch
(the type only declares the 4 volatile ones; coerce and apply both
cherry-pick), so spine bytes are guaranteed stable for the entire
session. Goes in the STABLE PREFIX.
renderStoryStateDynamic: synopsis / openThreads / relationships /
nextHook — the Writer rewrites these every scene via storyStatePatch.
Goes in the DYNAMIC SUFFIX.
renderStoryState kept as a convenience wrapper that joins both, for
anything that still wants the merged bible.
2. Rewrite buildWriterUserMessage with a stable/dynamic split.
STABLE PREFIX (byte-identical or pure append across consecutive calls):
- 世界观 / 画风 (session-immutable scalars)
- story bible spine
- 已登记角色 [sentinel: "(以下每行一个已登记角色,开场前为空。)"] + entries
- 已使用的 sceneKey [sentinel] + entries
- 场景历史,已完结 [sentinel] + archivedHistory entries
↑ archivedHistory = history.slice(0, -1), NOT the full history
— the live entry (history[-1]) keeps mutating mid-scene as the
player walks new beats and speculative prefetches snapshot it
at different moments, so it MUST stay out of the stable prefix
or the byte-monotonic invariant breaks.
DYNAMIC SUFFIX:
- storyState dynamic patch
- last-beat snippet (the exact emotional cliffhanger to continue from)
- lastExit hint
- format reminder tail
The previous structure put the full storyState (including patched
fields) at the very top of the user message, so the very first byte
of the user message changed every scene — user-side cache hit was
effectively 0% across the board.
3. Sentinel pattern for variable-length sections.
Every list (characters / sceneKeys / archivedHistory) now emits a
constant placeholder line after its header REGARDLESS of whether
it has entries. With the old "if empty print '(暂无)' else print
entries" pattern, adding the first item silently rewrites those
placeholder bytes — the byte at offset N moves from a Chinese
parenthesis to a dash, prefix cache torched. The sentinel line is
the same bytes whether the list has 0 or N items; new items are
pure appends after it.
4. Rewrite buildCinematographerUserMessage.
New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued
right after the session-stable styleGuide line, so the stable prefix
is long enough to cross at least one full 64-token chunk boundary
beyond the system prompt. The per-scene inputs (sceneSummary,
entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity
hint) all moved into the dynamic suffix below.
Verified (see [cache] / [debug-writer] logs from staging): hash of
500-byte slices of the user message is byte-identical across two
same-historyLen Writer calls through the entire stable prefix; only
the dynamic suffix slice differs. The remaining cache-hit gap under
MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally
jumps to 4096); on DeepSeek the same prefix should hit fully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a `tag` option to chat() and have it print one `[cache] <tag>
hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are
probed in order so the same logger works across providers:
- DeepSeek (v3+): usage.prompt_cache_hit_tokens / *_miss_tokens
- OpenAI / o-series: usage.prompt_tokens_details.cached_tokens
- Anthropic: usage.cache_read_input_tokens / *_creation_*
When none of them are present (MiMo / local Ollama / others) we still
print prompt + completion totals so the cost baseline is visible.
Tag every callsite so the log is greppable:
architect / writer / character-designer / cinematographer / insert-beat
This is the prerequisite for the prefix-cache reordering work that
follows — without per-agent visibility there's no way to tell if a
prompt rearrangement actually moved the needle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The session-id slice shown in the play header was an opaque timestamp
that reads as noise to players. The footer's "Ⅰ · Ⅰ" was a leftover
decorative mark after its sibling controls were moved above the canvas.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the Runware CDN download was slow (~10-20s over VPN / strict
networks, vs. the optimistic <2s the existing comment assumed), the
preload's 8s timeout fired and setImageUrl committed before the bytes
were actually decoded. The rendered <img> has w-auto h-auto and no
intrinsic aspect-ratio source — until the image loads the layout
collapses to roughly 1px tall, giving the "等了很久 → 一根线 → 突然
出图" jank.
Two compounding fixes:
app/play/page.tsx IMAGE_PRELOAD_TIMEOUT_MS 8000 → 20000.
Real CDN+decode usually finishes well before
this; pushing the ceiling out just stops the
window where we commit a half-loaded URL.
components/PlayCanvas.tsx Add width={1792} height={1024} HTML attrs
to the scene <img>. Doesn't affect rendered
size (still driven by w-auto h-auto and the
maxWidth/maxHeight in sizeStyle); the
browser uses them purely as an intrinsic
aspect-ratio source, so the placeholder box
reserves a 16:9-ish frame even mid-download.
Together: slow networks now mostly wait through preload; on the rare
genuine timeout the layout still holds shape instead of collapsing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coordinated additions to the 绘画风格 modal so the user can shape
the styleGuide that ultimately feeds every painter/director agent,
without ever mutating the source-of-truth STYLE_MAP:
1. New "自定义" entry sits right under "自动" — opens an inline
textarea where the user can write a free-form styleGuide (mix of
Chinese / English, sent verbatim to the image model). Stored as
in-memory state on HomePage (customStyleGuide), so refresh clears
it — fits the "one-shot session" semantics of this UI.
2. Every preset card now exposes a small pencil on the right of its
prompt area. Clicking it inlines a textarea pre-filled with the
current effective prompt (override if any, else STYLE_MAP value).
Saving writes to styleOverrides[name] — a separate in-memory
record keyed by preset name. STYLE_MAP is never written to.
start() selects the styleGuide with this priority:
customStyleGuide (when 自动→自定义)
> styleOverrides[artStyle]
> STYLE_MAP[artStyle]
> STYLE_MAP[DEFAULT_STYLE]
UX polish in the same change:
- 标题永远只读 (only the prompt is editable)
- 只读 prompt 行去掉边框/底色,回归纯文字 + 右上铅笔
- 「自动」项无 prompt 可编辑,标题下直接放一行说明
- 编辑态 textarea 用 ember 边框作为"正在编辑"视觉反馈
- 「保存并选用」一并 onPick + close;「还原默认」清除该预设的 override
- 搜索框同时匹配标题/原名/prompt 内容
- 移除「自由输入」标签 (now visually redundant with the pencil affordance)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the old 9-image 3x3 grid (4/5/a4/c3/c5/c7/d2/f2/f5.webp) and bring
in 14 new stills as 1.webp..14.webp, laid out as 7 rows of 2 columns at
width=420. Source PNGs (1920x1080 for 1-8, 1200x680 for 9-14) are
resized to fit inside 1200x680 and saved as q=85 WebP — 70-150KB each.
All three README locales (zh/en/ja) share the same paths so a single
asset swap refreshes every edition.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
m14 (极简杀机) is currently a 14.7KB placeholder while m18 (数据幽灵)
got a real curated cover this round — promote 数据幽灵 into the front
row and demote 极简杀机 back to its original neighborhood so the visible
首屏 only shows finished art.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes so the user's preferred male cards (复古未来梦,
社团存亡日, 黄昏归途, 极简杀机, 辐射新娘, 霓虹义体, 月光下的约定,
花魁的刀) actually appear in the visual front row:
1. Add a DISPLAY_ORDER indirection. STORIES, covers (m{i}.webp),
prebaked first-acts (firstact/m{i}.json) and prompts.json are all
keyed on the original array index — renaming them would touch
dozens of static assets. DISPLAY_ORDER instead lets the homepage
iterate cards in a curated order while still resolving each card's
assets via its original index. Editing one line re-shuffles the
gallery.
2. Switch the gallery wrapper from CSS multi-column (columns-N) to
grid (grid-cols-N). columns fills column-first (top-of-col-1, then
bottom-of-col-1, then top-of-col-2...) so the first eight entries
of DISPLAY_ORDER ended up stacked down the leftmost column instead
of across the top row. Grid fills row-first, which is what "visual
front row" actually means. Cards are already fixed at aspect-ratio
4/5 so row heights stay uniform — no masonry effect lost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>