Commit Graph

48 Commits

Author SHA1 Message Date
yuanzonghao 87a2f93edb feat(auth): add Supabase auth with Google, GitHub, and email OTP login
Introduce user registration/login gated behind optional NEXT_PUBLIC_SUPABASE_*
env vars (leave blank to disable — app behaves exactly as before). Adds
proxy.ts for automatic cookie session refresh, requireUser() API route
guards on all 7 compute-consuming routes, AuthModal (Google/GitHub OAuth +
6-digit email OTP), UserChip header component, and login_success analytics
event. Identity is fully decoupled from Session/engine — no type changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-13 17:33:55 +08:00
yuanzonghao e68e7e1690 feat(engine): add opt-in image timeout and scene-paint hedging
IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout);
IMAGE_HEDGE_MS races a second identical scene-paint request when the
first is still pending past the threshold. Both default to OFF when
unset, preserving historical behavior for self-hosted deploys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 11:21:47 +08:00
baizhi958216 c4ffc16498 Merge pull request #64 from zonghaoyuan/refactor/settings-modal
feat: add client-side model configuration and server fallback
2026-06-12 22:09:43 +08:00
baizhi958216 5608b0fdd0 fix(engine): tolerate duplicated JSON outputs 2026-06-11 16:11:52 +08:00
baizhi958216 ef3b57953b refactor(ai-client): replace AI SDK adapters with OpenAI SDK 2026-06-11 16:11:44 +08:00
baizhi958216 6cd7d88326 feat(web): fallback to server API routes when no client-side model config is set
When a user has not configured their own model keys in localStorage,
engine calls now automatically route through /api/* server routes
instead of throwing "模型配置未设置". This lets Vercel deploys with
server-side environment variables work out of the box.

- Add lib/engineClient.ts as a unified client-side routing layer:
  checks localStorage for BYO config, falls back to POST /api/start,
  /api/scene, /api/vision, /api/classify-freeform, /api/insert-beat
- Update app/play/page.tsx to use engineClient instead of direct
  engine imports; remove buildEngineConfig()
- Update app/page.tsx style-image parsing to also fall back to
  /api/parse-style-image when no local model config exists

Signed-off-by: zhi <zhi@peropero.net>
2026-06-11 12:15:14 +08:00
baizhi958216 94973bc6c6 fix(tts): add non-null assertion in stepfun array access
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 759319bf28 feat(config): extract STYLE_EXTRACTION_PROMPT to shared lib for client reuse
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
baizhi958216 a2dd5ad630 feat(config): add client-side model config storage and EngineConfig resolver
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
baizhi958216 2088bae311 fix(tts): replace Buffer.from with browser-compatible arrayBufferToBase64 in stepfun
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
DESKTOP-I1T6TF3\Q 621f83c47b feat(web): embed beat audio into gallery and infiplot exports
Walk every speaking beat at export time, reuse current scene's beatAudioMap,
and synth the rest via BYO TTS or /api/beat-audio with concurrency 4. Show a
progress toast on the play page while collecting.

Gallery export keeps audio in a sidecar localStorage key so the first paint
is not blocked by JSON.parse-ing several MB of base64; the gallery lazy-loads
it after the first scene image, then plays per-beat audio with a mute toggle
persisted to localStorage. .infiplot share files embed audioByBeatId in the
doc itself (v2); on import the data URIs survive scene swaps and feed back
into the per-beat audio map so replayers hear the original voices for free.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-11 09:29:16 +08:00
Zonghao Yuan d15d53ba65 Merge pull request #57 from zonghaoyuan/feat/tts-stepfun-provider
feat(tts): add StepFun preset-voice provider, route by URL + voice tag
2026-06-09 14:28:36 +08:00
yuanzonghao 1a6238f8b8 fix(tts): harden StepFun provider integration
- Validate voice.provider against known whitelist (xiaomi|stepfun) in
  beat-audio route to return a clear 400 instead of falling through
- Move single-char pronouns (他/她) to weak-signal fallback in
  detectGender to avoid false positives on compounds like 其他
- Update .env.example with StepFun configuration examples

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-09 14:24:27 +08:00
DESKTOP-I1T6TF3\Q 04f22249c9 fix(tts): make stepfun preset pick case-stable and per-character
- Hash the lowercased description (matching the case-insensitive scoring)
  so the same archetype text picks the same preset regardless of case.
- Thread the character name through provisionVoice -> stepfunProvision as
  the hash salt, so two characters that share archetype keywords spread
  across the top-N candidate presets instead of collapsing on one voice.

Xiaomi path is unaffected (voicedesign mints a unique clip per call).
2026-06-09 09:14:44 +08:00
DESKTOP-I1T6TF3\Q 19bbee16fe feat(tts): add StepFun preset-voice provider, route by URL + voice tag
Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate
TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host
(contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the
image client infers Runware from `*.runware.ai`.

CharacterVoice becomes a discriminated union on `provider`:
- xiaomi: { referenceAudioBase64, mimeType } — unchanged
- stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model

Provision dispatches on the current cfg's base URL; synthesis dispatches
on the voice's own `provider` tag so a session with mixed voices (e.g. a
provider switch mid-development) routes each beat through the correct
protocol. xiaomiSynthesize now guards against being called with a non-
xiaomi voice, surfacing the bug as a clear runtime error instead of a
TypeScript narrow violation at the access site.

StepFun has no voicedesign equivalent — only preset voices + voice
cloning from a reference audio upload. Cloning would require an extra
asset per character, so v1 maps the LLM's Chinese voiceDescription to one
of the 32 published preset IDs via gender + age + tone keyword scoring,
with a deterministic hash spread across the top-3 candidates so multiple
characters with similar descriptions don't collapse onto the identical
preset. lineDelivery is accepted but not yet propagated to StepFun's
voice_label.emotion / .style fields — left as a follow-up.

beat-audio route validation relaxed from `voice.referenceAudioBase64`
(xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices
pass the gate; provider-specific shape errors still surface from the
synth function.

Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median
~2.3s per beat with 0% timeouts across the test session, vs MiMo's
median ~8s with the long tail tripping the existing 15s synth budget
on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14
per typical 50-beat session) vs MiMo TTS currently free under the
Token Plan creator incentive.

AGENTS.md provider matrix updated to describe both providers and the
discriminated-union dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 17:15:02 +08:00
Qi Chen fc62c9edf5 feat(engine): tighten CharacterDesigner prompt to prevent look-alike … (#56)
* feat(engine): tighten CharacterDesigner prompt to prevent look-alike characters

Expand the visualDescription rules into a 6-element mandatory checklist (hair
quad / eyes triad / face & build / outfit quad / personality-driven vibe /
silhouette tag) and add an explicit anti-collision rule comparing against the
existing cast across cross-color-family and cross-silhouette dimensions.

Also upgrade the user-message "已设定角色" block from soft hint to hard
constraint with an explicit pre-write scan step, nudging the LLM into chain-
of-thought differentiation before emitting tags.

All additions land in the session-stable system prefix, so prompt cache
absorbs the extra tokens — per-call billed token delta is ~0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(engine): replace pose examples with aura descriptors in personality vibe

The PERSONALITY-DRIVEN VIBE element listed concrete poses (arms crossed,
chin tilted up, slight slouch) which contradicted the earlier rule
banning transient poses from visualDescription. Switch to pure
atmosphere/aura keywords so the character card stays pose-neutral.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: yuanzonghao <yuanzonghao123@gmail.com>
2026-06-08 16:27:15 +08:00
yuanzonghao 75548ce005 Merge pull request #52 from zonghaoyuan/feat/story-share
feat(play): add encrypted story sharing with replay
2026-06-08 09:57:16 +08:00
yuanzonghao 39a7269494 fix(share): harden story share and relocate import button
- Add Content-Length pre-check to story-pack and story-unpack routes
  to reject oversized payloads before buffering the body
- Suppress internal error details in story-unpack catch (was leaking
  e.message to the client)
- Strengthen sceneIndex validation: require non-negative integer
- Guard against undefined storyState when replaying shared stories
- Fix prefetch regression: remove currentBeat?.id from useEffect deps
  that was re-triggering all change-scene prefetches on every beat
- Fix double detach: use else-if so the second replay detach guard
  doesn't fire redundantly after the first already detached
- Align client file-size limit by format (.json 12MB, .infiplot 13MB)
- Move "载入剧情" import button next to "开始" with hover tooltip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-08 08:46:05 +08:00
yuanzonghao 867c52c24f fix(gallery): address review findings in zip download module
- Handle downloadImagesAsZip return value and surface errors to user
- Fix inferImageExtension garbage output for data URIs without semicolons
- Scale blob URL revocation delay for large zip files (>5MB → 60s)
- Cap uniqueZipPath dedup loop at 10k iterations with timestamp fallback
- Support relative URLs in inferImageExtension via base URL
- Handle svg+xml MIME subtype correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 22:32:23 +08:00
baizhi958216 0abd5f1525 feat(play): add encrypted story sharing 2026-06-07 17:13:27 +08:00
baizhi958216 7925e9c459 feat(gallery): download scene gallery as zip
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-07 15:45:46 +08:00
yuanzonghao 4972243a93 fix: address PR Agent review findings across 6 files
Restrict PR Agent workflow to trusted collaborators on PR comments only,
fix UTF-8 byte counting in gallery-pack, correct portrait-to-landscape
fallback orientation, track inserted freeform beats in visitedBeatIds,
allow clearing stored TTS key, and guard empty-string fuzzy match in
style selector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 14:40:37 +08:00
yuanzonghao 53868471c6 feat(web): add 14 new art styles with thumbnails and reorder style grid
Add 14 new painting styles sourced from preset story card generation
scripts: Dunhuang fresco, Persian miniature, Byzantine mosaic, stained
glass, vaporwave, vector illustration, low poly, pop art, glitch art,
papercut, steampunk, xianxia fantasy, dark fairytale, and urban fantasy.

Reorder all 36 styles into logical visual categories (anime → cinematic
→ Eastern traditional → Western traditional → genre → digital → handcraft)
for easier browsing. Update "auto" thumbnail to a 3×3 composite grid and
"custom" thumbnail to a paintbrush-on-canvas concept image.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 12:56:54 +08:00
yuanzonghao ae3dd17e6b feat(web): add player name, freeform input, and unified settings modal
- Player name: stored in localStorage, injected into Architect/Writer/InsertBeat
  prompts so NPCs address the player by name, displayed in dialogue UI
- Freeform input: compact button at choice nodes expands to text input, LLM
  classifier routes to insert-beat (interactive NPC response) or change-scene
- SettingsModal: unified panel merging player name, voice toggle (with
  collapsible TTS key section), replacing the old TtsKeyModal
- Insert-beat upgrade: prompt now requires NPC reaction when characters are
  present, shared by both freeform and Vision paths
- IME guard: isComposing check on freeform input to prevent CJK mid-composition
  submission

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 12:37:50 +08:00
DESKTOP-I1T6TF3\Q b0b5630a25 feat(web): export interactive gallery + encrypted share file
Adds a "导出图集" action at the bottom-right of the play canvas that
snapshots the current session into localStorage and opens
/gallery#id=<id> in a new tab — the original play page keeps running
untouched. In parallel, sends the doc to /api/gallery-pack and
downloads the result as a binary .infiplot file the player can send
to a friend.

The snapshot pulls in:
  - Every visited scene's image + beat graph + recorded visit trail
  - All AI-prefetched alternate scenes (a new resolvedPrefetchesRef in
    PlayInner captures each prefetch as it resolves, so abandoned
    branches the engine already paid to generate are kept)
  - Character names + basePortraitUrl (voice base64 / styleReference
    are stripped — they aren't needed for replay)

/gallery is a no-network interactive replay:
  - Per-beat advance and per-choice navigation. Picked choices are
    highlighted; unpicked choices are clickable when an alternate was
    prefetched, greyed otherwise.
  - Stack-based navigation for stepping into branches with one-tap
    "返回主线" to collapse back to the main path.
  - Top-bar batch download for scene images (including unique
    AI-prefetched branch scenes, deduped against the main path) and
    character portraits. Fetched with a per-file AbortController + 20s
    timeout in a small concurrency pool, then clicked serially.
    Prevents one slow CDN response from stranding the busy button.
  - In-progress hint banner reminding the player to allow the
    browser's "multiple downloads" prompt.
  - F-key fullscreen with a top toolbar that auto-retracts after the
    initial glance and pops back down on cursor approach.
  - Per-scene dialogue panel (fa-clock-rotate-left, matching the
    in-game history affordance).
  - "导入分享文件" entry on the empty/error state — accepts a friend's
    .infiplot, posts to /api/gallery-unpack, renders the decrypted doc.

Share-file format (.infiplot):
  - AES-256-GCM via Web Crypto (portable to Cloudflare Workers).
  - Layout: 4-byte magic "IFPL" + 1-byte version + 12-byte nonce +
    ciphertext (includes 16-byte auth tag).
  - Key derived from GALLERY_SECRET via SHA-256.
  - GCM's auth tag gives tamper-detection for free; any flip in the
    ciphertext/nonce surfaces as "文件校验失败" — same error as wrong-key,
    so the distinction can't leak server config.
  - Stateless: server keeps no record of issued files.
  - GALLERY_SECRET unset → /api/gallery-pack returns 503, the play page
    silently skips the share-file download, local view still works.
    Rotating the secret invalidates every previously-issued file.

Retention: trimGalleryExports keeps only the 2 most recent localStorage
docs; older ones are evicted before each write so quota stays flat
regardless of how many times the player exports. Share files live on
the player's own disk — no retention concern.

Adds 'gallery_export' to the analytics event schema (scene_count only —
no free text).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 12:08:37 +08:00
yuanzonghao f4aca0b59c refactor(ai-client): extract shared createLanguageModel helper
De-duplicate the provider switch logic that was identical in chat.ts
and vision.ts into a shared model.ts module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 11:55:55 +08:00
yuanzonghao 57bc6556ab refactor(ai-client): unify OpenAI-compatible path to AI SDK generateText
Eliminate the dual code path (raw fetch vs AI SDK) for text and vision.
All providers now go through createLanguageModel() + generateText(),
removing chatOpenAiCompatible/analyzeOpenAiCompatible, the manual Usage
type, summarizeUsage, and responseFormat plumbing from 8 call sites.

Key fix: @ai-sdk/openai v3 defaults to the Responses API (/responses);
DeepSeek only supports Chat Completions, so we use .chat() explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 00:31:36 +08:00
yuanzonghao 165dcbc5e6 fix(engine): prevent Architect from seeing literal "auto" styleGuide
Replace session.styleGuide with a descriptive placeholder before the
Architect runs, so its prompt reads a natural sentence instead of the
raw "auto" marker. Also wrap selectStyle in a try-catch so a transient
LLM failure falls back to 吉卜力 instead of crashing session start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-06 22:28:44 +08:00
yuanzonghao 585f302908 feat(engine): auto-select art style via parallel LLM call
When user picks "自动", the client sends styleGuide="auto" to the
server. The orchestrator then runs a lightweight style-selector LLM
call in parallel with the Architect — both only depend on worldSetting,
so there is zero added latency. The selector picks the best-matching
preset from STYLE_MAP based on genre, mood, and setting.

Also moves STYLE_MAP from page.tsx to lib/options.ts so it can be
shared between client and server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-06 22:08:08 +08:00
yuanzonghao 31ce3f1d40 feat(web): revamp style modal UI with grid cards, thumbnails, and dual-view
Redesign the painting-style picker inspired by Pollo AI: widen modal to
1400px, show styles as square thumbnail cards in a 4-column grid with
name labels below, add ember glow hover effect, and split custom-style
editing into its own view. Simplify style names (e.g. "京阿尼细腻日常" →
"京阿尼"), add 22 .webp preview thumbnails, and remove the per-preset
override mechanism in favor of a cleaner grid + custom flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-06 20:45:08 +08:00
yuanzonghao d646ce8db8 refactor(web): remove client-side BYO API key feature
The BYO (Bring Your Own) API key configuration for LLM and image
generation will be re-implemented via Cloudflare Workers. Remove
the client-side implementation to prepare for that migration.

TTS (text-to-speech) BYO key support is intentionally preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-06 17:42:00 +08:00
Zonghao Yuan c30d11d60b fix(security): harden BYO API header against SSRF and input abuse (#33)
* fix(security): harden BYO API header against SSRF and input abuse

- Add lib/validateUrl.ts with HTTPS-only + public-IP enforcement,
  provider allowlist, IPv6 rejection, and userinfo-in-URL blocking.
- Add lib/byoHeaders.ts — single source of truth for client-side BYO
  header construction (deduplicates app/page.tsx & app/play/page.tsx).
- config.ts: validate BYO endpoints via isPublicUrl(), cap header at
  2 KB, truncate apiKey/model strings, sanitize log output.
- fetchWithRetry: default redirect to "manual" to block 302-to-intranet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address Copilot review — trim endpoint, strip control chars, drop unused import

- safeEndpoint: trim whitespace before URL validation
- safeString: strip ASCII control characters to prevent header injection
- play/page.tsx: remove unused BYO_STORAGE_KEY import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-05 00:23:35 +08:00
yuanzonghao 9fc83de276 feat(web,engine): portrait-orientation scene images for mobile full-bleed
Thread orientation (portrait|landscape) from client through API, engine,
and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes;
desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image
coordinate mapping, session-locked orientation, a shared coerceOrientation
helper, and a choices overflow cap in portrait.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:30:54 +08:00
yuanzonghao 865bf322e9 fix(ai-client): parse Runware host by hostname; doc nits
- inferImageProtocol: match runware.ai by parsed hostname (exact match or
  subdomain) instead of a bare substring, so notrunware.ai /
  runware.ai.evil.com no longer misroute to the Runware protocol
- README: document the image-2-vip → OpenAI-compatible exception; correct the
  Imagen wording (deprecated, EOL 2026-06-24 — not yet discontinued)

Addresses Copilot review on #30.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:09:05 +08:00
yuanzonghao 83fd5717e7 feat(ai-client): multi-provider compat — native Anthropic/Google + URL tolerance
- TEXT/VISION: add native Anthropic & Google Gemini paths via Vercel AI SDK,
  selectable through TEXT_PROVIDER / VISION_PROVIDER (default openai_compatible)
- IMAGE: expand to openai (gpt-image) / google (Nano Banana) via AI SDK
  alongside the existing Runware task-array and OpenAI-compatible REST paths
- normalizeBaseUrl: tolerate URLs with/without /v1 (or /chat/completions);
  append the per-protocol version segment only for bare hosts
- config: readProvider() reads *_PROVIDER; types: ProviderProtocol + provider?
- deps: @ai-sdk/anthropic, @ai-sdk/google; docs in .env.example + README

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:09:05 +08:00
yuanzonghao b0b2e922d3 feat(web): optional bring-your-own Xiaomi MiMo TTS key (browser-side synthesis)
Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits
cause silent playback under concurrency. This adds an OPTIONAL path: a user
can store their own Xiaomi MiMo key in the browser and synthesize voice
client-side against Xiaomi's CORS-open endpoints. The key lives only in
localStorage and is never sent to or logged by our server; the shared server
key still serves everyone who does not opt in.

- components/TtsKeyModal.tsx: shared key modal (key-family + region picker),
  reused by both the home and play pages
- app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens
  in place instead of redirecting to the home page
- app/page.tsx: home page consumes the shared modal + readStoredTtsConfig
- lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets
- app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update
- docs/xiaomi-tts-key.md + README note

Verified with tsc --noEmit (exit 0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 16:58:55 +08:00
Zonghao Yuan 24b674d792 Merge pull request #27 from zonghaoyuan/perf/writer-split
perf(engine): split Writer into Phase A (plan) + Phase B (beats)
2026-06-04 16:53:21 +08:00
yuanzonghao efe021d886 fix(engine): pin entry-beat roster to the plan in Phase B
The Painter composites exactly plan.entryActiveCharacters into the entry
frame (the same roster the Cinematographer framed). Phase B is told to
reuse that roster, but only the entry beat's id was code-enforced — so an
LLM slip could leave a character in the painted frame that the runtime
entry beat says isn't there. Pin activeCharacters onto the plan's entry
beat as a last line of defense, mirroring the existing id pin.

Speaker is intentionally left to the prompt: it's coupled to line/TTS, so
overwriting it could mis-attribute or orphan Phase B's dialogue.

Addresses Copilot review feedback on PR #27.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 15:48:14 +08:00
DESKTOP-I1T6TF3\Q e04c51e875 feat(api): support custom BYO API header override on client fetches and backend config 2026-06-04 13:49:46 +08:00
yuanzonghao 3bf5c92841 perf(engine): split Writer into Phase A (plan) + Phase B (beats)
The Writer was the serial long pole: a single LLM call wrote the scene
skeleton AND the full beats[] graph before anything downstream could
start, so variable-length beat generation blew up tail latency.

Split it into two calls:
- Phase A (runWriterPlan): minimal skeleton the image pipeline needs
  (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker).
  Serial, on the critical path, kept lightweight.
- Phase B (runWriterBeats): full beats[] + storyStatePatch, written to
  honor the plan. Launched immediately, overlaps the ENTIRE image pipeline
  (cards / cinematographer / portraits / painter), awaited last.

Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long
beat-writing is hidden behind image gen. A Phase B failure degrades to a
single playable beat synthesized from the plan.

Paired distinct-payload A/B (6 content-matched stories, baseline vs split):
- median end-to-end 42.6s -> 32.2s (-24%)
- mean 46.4s -> 33.1s (-29%)
- worst case 74.7s -> 37.6s (halved)
- no content regression: total Writer output tokens 12858 -> 13699

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 11:17:34 +08:00
yuanzonghao e095650944 refactor(web): enforce content-free Umami fields at compile time
Address the Copilot review on #26.

#1 The game_start / art_style_select payload fields were typed as bare
   `string`, so free text could still slip through despite the "content-free
   by construction" claim. Add lib/options.ts as the single source of truth
   for the selector option sets (`as const` → literal-union types), have the
   home OPTS render from those arrays, and type the analytics fields from the
   derived unions (gender/art_style/plot_style/pacing/style) plus a template
   type for `card`. Free text now fails to compile; no casts at call sites.

#2 The /play heartbeat scheduled its 30s interval unconditionally. Gate the
   effect on the same NEXT_PUBLIC_UMAMI_* env used for script injection, so
   nothing is scheduled when the tracker is off (visibility check kept — a
   hidden tab still never emits).

#3 choice_select no longer emits a -1 choice_index: skip the event when the
   index can't be resolved instead of polluting the index distribution.

Verified with tsc (exit 0) and a throwaway negative test: free text in any
of the six fields raises TS2322, valid enum/template values compile.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 10:59:31 +08:00
yuanzonghao 4bf05f6784 feat(web): add privacy-friendly Umami custom events
Instrument the play flow with 9 content-free custom events (game_start,
art_style_select, style_image_upload, scene_reached, choice_select,
vision_click, tts_toggle, fullscreen_toggle, play_heartbeat) to measure
retention, engagement depth and session duration.

Privacy is enforced by construction, not convention:
- lib/analytics.ts types each event with a discriminated union, so a
  payload has no slot for free text — prompts, world guides, uploaded
  images and vision output can never reach analytics (compile-time
  guarantee, not a comment).
- track() no-ops without window.umami and never throws into the app.
- coarse 30s heartbeat fires only while the tab is visible.
- script stays gated on NEXT_PUBLIC_UMAMI_* env (blank → no script),
  honours Do-Not-Track, and locks to an exact data-domains allowlist.
- one-line on-site disclosure with a link, shown only when tracking is on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 10:14:08 +08:00
DESKTOP-I1T6TF3\Q 347ab297d5 feat(web,engine): custom style — image upload, AI-extract prompt, painter ref
自定义画风入口里加上传按钮:客户端把图缩到 512px webp(base64),传到新
路由 /api/parse-style-image,vision LLM 解析成英文 style prompt 回填 textarea;
图本身随 sessionStorage → /api/start → Session.styleReferenceImage 透传,
painter.collectReferenceImages 把它置于 slot 0,整局每一幕都作为 reference
图锚定画风(brush / color / mood),比 priorScene 优先级更高。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 19:15:19 +08:00
DESKTOP-I1T6TF3\Q 298ecd4ec0 perf(engine): reorder Writer/Cinematographer prompts for prefix caching
Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+
on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers
match a stable byte-identical prefix from message[0]; once a single
byte changes everything after it misses, so the trick is to push every
session-stable bit to the front and concentrate per-call churn in a
short suffix.

Three coordinated changes:

1. Split storyState rendering into spine + dynamic.

   renderStoryStateSpine: logline / genreTags / protagonist / castNotes
   — Architect-set fields that StoryStatePatch literally cannot touch
     (the type only declares the 4 volatile ones; coerce and apply both
     cherry-pick), so spine bytes are guaranteed stable for the entire
     session. Goes in the STABLE PREFIX.

   renderStoryStateDynamic: synopsis / openThreads / relationships /
     nextHook — the Writer rewrites these every scene via storyStatePatch.
     Goes in the DYNAMIC SUFFIX.

   renderStoryState kept as a convenience wrapper that joins both, for
   anything that still wants the merged bible.

2. Rewrite buildWriterUserMessage with a stable/dynamic split.

   STABLE PREFIX (byte-identical or pure append across consecutive calls):
     - 世界观 / 画风 (session-immutable scalars)
     - story bible spine
     - 已登记角色  [sentinel: "(以下每行一个已登记角色,开场前为空。)"] + entries
     - 已使用的 sceneKey  [sentinel] + entries
     - 场景历史,已完结 [sentinel] + archivedHistory entries
        ↑ archivedHistory = history.slice(0, -1), NOT the full history
        — the live entry (history[-1]) keeps mutating mid-scene as the
          player walks new beats and speculative prefetches snapshot it
          at different moments, so it MUST stay out of the stable prefix
          or the byte-monotonic invariant breaks.

   DYNAMIC SUFFIX:
     - storyState dynamic patch
     - last-beat snippet (the exact emotional cliffhanger to continue from)
     - lastExit hint
     - format reminder tail

   The previous structure put the full storyState (including patched
   fields) at the very top of the user message, so the very first byte
   of the user message changed every scene — user-side cache hit was
   effectively 0% across the board.

3. Sentinel pattern for variable-length sections.

   Every list (characters / sceneKeys / archivedHistory) now emits a
   constant placeholder line after its header REGARDLESS of whether
   it has entries. With the old "if empty print '(暂无)' else print
   entries" pattern, adding the first item silently rewrites those
   placeholder bytes — the byte at offset N moves from a Chinese
   parenthesis to a dash, prefix cache torched. The sentinel line is
   the same bytes whether the list has 0 or N items; new items are
   pure appends after it.

4. Rewrite buildCinematographerUserMessage.

   New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued
   right after the session-stable styleGuide line, so the stable prefix
   is long enough to cross at least one full 64-token chunk boundary
   beyond the system prompt. The per-scene inputs (sceneSummary,
   entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity
   hint) all moved into the dynamic suffix below.

Verified (see [cache] / [debug-writer] logs from staging): hash of
500-byte slices of the user message is byte-identical across two
same-historyLen Writer calls through the entire stable prefix; only
the dynamic suffix slice differs. The remaining cache-hit gap under
MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally
jumps to 4096); on DeepSeek the same prefix should hit fully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:42:33 +08:00
DESKTOP-I1T6TF3\Q 37c911f510 chore(engine): log prompt-cache hit/miss per chat call
Add a `tag` option to chat() and have it print one `[cache] <tag>
hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are
probed in order so the same logger works across providers:

  - DeepSeek (v3+):  usage.prompt_cache_hit_tokens / *_miss_tokens
  - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens
  - Anthropic:        usage.cache_read_input_tokens / *_creation_*

When none of them are present (MiMo / local Ollama / others) we still
print prompt + completion totals so the cost baseline is visible.

Tag every callsite so the log is greppable:
  architect / writer / character-designer / cinematographer / insert-beat

This is the prerequisite for the prefix-cache reordering work that
follows — without per-agent visibility there's no way to tell if a
prompt rearrangement actually moved the needle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:42:33 +08:00
DESKTOP-I1T6TF3\Q cbabc54273 chore(engine): log worldSetting and storyBible at session start
Two lines in startSession: the full worldSetting being fed to the
Architect, and the resulting logline/genreTags/synopsis it produced.
Cheap to keep — fires once per session — and makes it possible to tell
at a glance whether a "story unrelated to my input" report is a frontend
transport bug, a worldSetting layout problem, or the LLM ignoring the
seed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 03:51:58 +08:00
DESKTOP-I1T6TF3\Q bed4dc5a8f feat(web): gender-differentiated 4:5 covers + per-card styleGuide prebake
- Regenerate 60 covers (30 male + 30 female) via FLUX with story-specific
  prompts, replacing the prior gender-shared set
- Crop covers to 4:5 (960×1200) via sharp attention cover; matches new
  homepage card aspectRatio
- Persist all 60 prompts to public/home/prompts.json so the prebake step
  can reuse the cover's exact visual anchor (per-card styleGuide) and the
  first-act scene visually carries over from the poster the player clicked
- Restore /play?card= prebaked instant-play path on homepage card click
- Add OpenAI-compatible image route in ai-client for non-Runware endpoints
- Hide Next.js dev indicators globally; tweak F-key fullscreen label

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 02:26:35 +08:00
Zonghao Yuan dc5ecd60f6 refactor: flatten monorepo to single web package (#12)
Flatten the pnpm monorepo (apps/web + packages/*) into a single web package at the repo root.

- Move app/lib/components/scripts/public to root; drop apps/web and packages/* wrappers
- Rewrite tsconfig paths (@infiplot/*) to ./lib/*; turbopack.root = __dirname
- Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths
- Regenerate pnpm-lock.yaml to drop stale workspace importers
- Bump engines.node to >=22 to match wrangler

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 00:55:45 +08:00