Insert-beat is a pure in-scene micro-interaction — adding choices that
lead to change-scene contradicted its purpose. Now insert-beat generates
1-3 richer beats then loops back to the original options, which is the
natural UX for "you glanced at something decorative."
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
User feedback: custom interactions rarely produce new story content because
the classifier heavily biased toward insert-beat (single reaction, no scene
change). Three changes to fix this:
1. Freeform text input now always triggers a full scene generation (skips
the classify step entirely) — users who type expect the story to advance.
2. Vision (background click) classifier de-biased: prompt now favors
change-scene when uncertain, and the code fallback flipped from
insert-beat to change-scene. insert-beat narrowed to pure observation.
3. Insert-beat enhanced: generates 1-3 beats (was 1) with follow-up
choices (was: loop back to original beat). Even when vision classifies
as insert-beat, the player gets richer content and new options.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrites the i18n system introduced in PR #94 to use Next.js App Router
[locale] dynamic segments with SSR-rendered translations and proper
middleware locale routing.
- Add middleware locale detection: / rewrites to /zh-CN/ internally,
/en and /ja pass through, /zh-CN/... redirects to bare path
- Move all 7 pages under app/[locale]/ with SSR translation injection
- Fix server→client serialization: pre-evaluate function-valued
translations (makeSerializable) to eliminate hydration flash
- Fix language switch key flash: use hard navigation with localStorage-
only persistence, avoiding React state update before page reload
- Add <link rel="alternate" hreflang> tags for multilingual SEO
- Fix Supabase setAll overwriting locale rewrite response
- Trim locales from 22 to 3 (zh-CN/en/ja), delete 19 incomplete files
- LLM-translate 240 firstact game preset JSONs (en + ja, landscape +
portrait) and story titles via gemini-3.5-flash
- Delete 11 one-off migration scripts and outdated i18n docs
- Add useLocalePath hook and navigation utilities
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Squash-merge the cloudflare-migration branch (7 commits by Kai ki) into
staging with conflict resolution, feature integration, and bug fixes.
Engine:
- Paradigm D: single-stream Writer replacing dual-phase Plan/Beats
- Delete Architect agent; story bible generated via Writer <plan> tag
- Modular prompt architecture (segments/registry/builder)
- StreamRouter for tagged stream splitting (<plan>/<story>/<choices>)
Infrastructure:
- Cloudflare Workers deployment (wrangler.jsonc, OpenNext adapter)
- D1 database schema + Drizzle ORM (scaffolded, not yet active)
- R2 storage helpers (scaffolded, not yet active)
- Story persistence API routes + client-side persistence
BYOK (Bring Your Own Key):
- /api/llm/user-proxy with SSRF-protected LLM proxy (+ requireUser auth)
- CORS-aware fetch in ai-client: auto-detect CORS failure, fallback to
server proxy transparently via OpenAI SDK custom fetch
- BYO config support added to classify-freeform and vision routes
- SettingsModal CORS privacy notice (keys never logged/stored)
SSE streaming:
- engineClient.ts: fetchSSE helper for progressive scene events
- startSession/requestScene accept optional emit callback
- Fix SSE error event field name (error → message) in scene/start routes
i18n integration:
- Wire buildLanguageDirective into paradigm D's prompt builder
- Update corsNotice i18n keys (zh-CN/en/ja) with CORS proxy privacy text
- Preserve Session.language + LanguageSwitcher from i18n commit
Co-authored-by: Kai ki <155355644+zbf1009@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- New client-side i18n via React Context (useI18n, tArray, I18nProvider)
- Catalog ships 21 locale stubs; only zh-CN/en/ja have reviewed translations
- Header language switcher (globe icon + short label) before settings gear
- All hardcoded Chinese UI text migrated to keys: typewriter, options,
hints (with embedded gear icon via dangerouslySetInnerHTML), settings
panel, footer/about, play page hints
- AI output language follows user-selected locale via trailing one-liner
directive appended to Architect/Writer/CharacterDesigner/InsertBeat
user messages (preserves system-prompt cacheability)
- Per-locale separator rule: zh uses middot between every glyph; en/ja
use plain spaces
- Option value → i18n key suffix maps preserve Chinese as the underlying
identifier so analytics unions and STYLE_MAP keys stay byte-stable
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Make homepage cards and live sessions produce sound when the server is
configured for StepFun TTS, instead of silently failing (the prebaked
Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in
Fast Origin Transfer).
Three coordinated changes:
1. CharacterDesigner now picks a StepFun preset voice id directly from the
32-entry catalog in the SAME LLM call that designs the character — zero
extra latency, LLM-grade match quality. The Xiaomi prompt path is
byte-identical to history (verified programmatically) so cache hit rate
and voice quality are preserved. pickStepfunVoiceId (keyword scorer)
remains the fallback for orphan speakers / invalid LLM picks.
2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the
single source of truth, shared by the scorer, the CharacterDesigner
prompt, /api/tts-provider, and the offline enrich script.
3. A new GET /api/tts-provider endpoint lets the client probe the server's
TTS provider at /play mount. fetchBeatAudio then shapes its request body:
on a StepFun server it sends the lightweight stepfunVoiceId /
voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving
~13MB per protagonist per session on prebaked cards). requestBeatAudio
re-provisions on a provider mismatch before synth, so audio never goes
silent on a cross-provider replay or mid-session provider flip.
New type fields are all optional and backward-compatible: Character.stepfunVoiceId,
BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made
optional. AGENTS.md updated for the new route, type fields, dependency map,
and StepFun voice-selection flow.
* feat(engine): tighten CharacterDesigner prompt to prevent look-alike characters
Expand the visualDescription rules into a 6-element mandatory checklist (hair
quad / eyes triad / face & build / outfit quad / personality-driven vibe /
silhouette tag) and add an explicit anti-collision rule comparing against the
existing cast across cross-color-family and cross-silhouette dimensions.
Also upgrade the user-message "已设定角色" block from soft hint to hard
constraint with an explicit pre-write scan step, nudging the LLM into chain-
of-thought differentiation before emitting tags.
All additions land in the session-stable system prefix, so prompt cache
absorbs the extra tokens — per-call billed token delta is ~0.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(engine): replace pose examples with aura descriptors in personality vibe
The PERSONALITY-DRIVEN VIBE element listed concrete poses (arms crossed,
chin tilted up, slight slouch) which contradicted the earlier rule
banning transient poses from visualDescription. Switch to pure
atmosphere/aura keywords so the character card stays pose-neutral.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: yuanzonghao <yuanzonghao123@gmail.com>
- Player name: stored in localStorage, injected into Architect/Writer/InsertBeat
prompts so NPCs address the player by name, displayed in dialogue UI
- Freeform input: compact button at choice nodes expands to text input, LLM
classifier routes to insert-beat (interactive NPC response) or change-scene
- SettingsModal: unified panel merging player name, voice toggle (with
collapsible TTS key section), replacing the old TtsKeyModal
- Insert-beat upgrade: prompt now requires NPC reaction when characters are
present, shared by both freeform and Vision paths
- IME guard: isComposing check on freeform input to prevent CJK mid-composition
submission
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thread orientation (portrait|landscape) from client through API, engine,
and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes;
desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image
coordinate mapping, session-locked orientation, a shared coerceOrientation
helper, and a choices overflow cap in portrait.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Writer was the serial long pole: a single LLM call wrote the scene
skeleton AND the full beats[] graph before anything downstream could
start, so variable-length beat generation blew up tail latency.
Split it into two calls:
- Phase A (runWriterPlan): minimal skeleton the image pipeline needs
(sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker).
Serial, on the critical path, kept lightweight.
- Phase B (runWriterBeats): full beats[] + storyStatePatch, written to
honor the plan. Launched immediately, overlaps the ENTIRE image pipeline
(cards / cinematographer / portraits / painter), awaited last.
Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long
beat-writing is hidden behind image gen. A Phase B failure degrades to a
single playable beat synthesized from the plan.
Paired distinct-payload A/B (6 content-matched stories, baseline vs split):
- median end-to-end 42.6s -> 32.2s (-24%)
- mean 46.4s -> 33.1s (-29%)
- worst case 74.7s -> 37.6s (halved)
- no content regression: total Writer output tokens 12858 -> 13699
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+
on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers
match a stable byte-identical prefix from message[0]; once a single
byte changes everything after it misses, so the trick is to push every
session-stable bit to the front and concentrate per-call churn in a
short suffix.
Three coordinated changes:
1. Split storyState rendering into spine + dynamic.
renderStoryStateSpine: logline / genreTags / protagonist / castNotes
— Architect-set fields that StoryStatePatch literally cannot touch
(the type only declares the 4 volatile ones; coerce and apply both
cherry-pick), so spine bytes are guaranteed stable for the entire
session. Goes in the STABLE PREFIX.
renderStoryStateDynamic: synopsis / openThreads / relationships /
nextHook — the Writer rewrites these every scene via storyStatePatch.
Goes in the DYNAMIC SUFFIX.
renderStoryState kept as a convenience wrapper that joins both, for
anything that still wants the merged bible.
2. Rewrite buildWriterUserMessage with a stable/dynamic split.
STABLE PREFIX (byte-identical or pure append across consecutive calls):
- 世界观 / 画风 (session-immutable scalars)
- story bible spine
- 已登记角色 [sentinel: "(以下每行一个已登记角色,开场前为空。)"] + entries
- 已使用的 sceneKey [sentinel] + entries
- 场景历史,已完结 [sentinel] + archivedHistory entries
↑ archivedHistory = history.slice(0, -1), NOT the full history
— the live entry (history[-1]) keeps mutating mid-scene as the
player walks new beats and speculative prefetches snapshot it
at different moments, so it MUST stay out of the stable prefix
or the byte-monotonic invariant breaks.
DYNAMIC SUFFIX:
- storyState dynamic patch
- last-beat snippet (the exact emotional cliffhanger to continue from)
- lastExit hint
- format reminder tail
The previous structure put the full storyState (including patched
fields) at the very top of the user message, so the very first byte
of the user message changed every scene — user-side cache hit was
effectively 0% across the board.
3. Sentinel pattern for variable-length sections.
Every list (characters / sceneKeys / archivedHistory) now emits a
constant placeholder line after its header REGARDLESS of whether
it has entries. With the old "if empty print '(暂无)' else print
entries" pattern, adding the first item silently rewrites those
placeholder bytes — the byte at offset N moves from a Chinese
parenthesis to a dash, prefix cache torched. The sentinel line is
the same bytes whether the list has 0 or N items; new items are
pure appends after it.
4. Rewrite buildCinematographerUserMessage.
New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued
right after the session-stable styleGuide line, so the stable prefix
is long enough to cross at least one full 64-token chunk boundary
beyond the system prompt. The per-scene inputs (sceneSummary,
entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity
hint) all moved into the dynamic suffix below.
Verified (see [cache] / [debug-writer] logs from staging): hash of
500-byte slices of the user message is byte-identical across two
same-historyLen Writer calls through the entire stable prefix; only
the dynamic suffix slice differs. The remaining cache-hit gap under
MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally
jumps to 4096); on DeepSeek the same prefix should hit fully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flatten the pnpm monorepo (apps/web + packages/*) into a single web package at the repo root.
- Move app/lib/components/scripts/public to root; drop apps/web and packages/* wrappers
- Rewrite tsconfig paths (@infiplot/*) to ./lib/*; turbopack.root = __dirname
- Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths
- Regenerate pnpm-lock.yaml to drop stale workspace importers
- Bump engines.node to >=22 to match wrangler
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>