Commit Graph

67 Commits

Author SHA1 Message Date
Zonghao Yuan c30d11d60b fix(security): harden BYO API header against SSRF and input abuse (#33)
* fix(security): harden BYO API header against SSRF and input abuse

- Add lib/validateUrl.ts with HTTPS-only + public-IP enforcement,
  provider allowlist, IPv6 rejection, and userinfo-in-URL blocking.
- Add lib/byoHeaders.ts — single source of truth for client-side BYO
  header construction (deduplicates app/page.tsx & app/play/page.tsx).
- config.ts: validate BYO endpoints via isPublicUrl(), cap header at
  2 KB, truncate apiKey/model strings, sanitize log output.
- fetchWithRetry: default redirect to "manual" to block 302-to-intranet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address Copilot review — trim endpoint, strip control chars, drop unused import

- safeEndpoint: trim whitespace before URL validation
- safeString: strip ASCII control characters to prevent header injection
- play/page.tsx: remove unused BYO_STORAGE_KEY import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-05 00:23:35 +08:00
yuanzonghao 9fc83de276 feat(web,engine): portrait-orientation scene images for mobile full-bleed
Thread orientation (portrait|landscape) from client through API, engine,
and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes;
desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image
coordinate mapping, session-locked orientation, a shared coerceOrientation
helper, and a choices overflow cap in portrait.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:30:54 +08:00
yuanzonghao 865bf322e9 fix(ai-client): parse Runware host by hostname; doc nits
- inferImageProtocol: match runware.ai by parsed hostname (exact match or
  subdomain) instead of a bare substring, so notrunware.ai /
  runware.ai.evil.com no longer misroute to the Runware protocol
- README: document the image-2-vip → OpenAI-compatible exception; correct the
  Imagen wording (deprecated, EOL 2026-06-24 — not yet discontinued)

Addresses Copilot review on #30.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:09:05 +08:00
yuanzonghao 83fd5717e7 feat(ai-client): multi-provider compat — native Anthropic/Google + URL tolerance
- TEXT/VISION: add native Anthropic & Google Gemini paths via Vercel AI SDK,
  selectable through TEXT_PROVIDER / VISION_PROVIDER (default openai_compatible)
- IMAGE: expand to openai (gpt-image) / google (Nano Banana) via AI SDK
  alongside the existing Runware task-array and OpenAI-compatible REST paths
- normalizeBaseUrl: tolerate URLs with/without /v1 (or /chat/completions);
  append the per-protocol version segment only for bare hosts
- config: readProvider() reads *_PROVIDER; types: ProviderProtocol + provider?
- deps: @ai-sdk/anthropic, @ai-sdk/google; docs in .env.example + README

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:09:05 +08:00
yuanzonghao b0b2e922d3 feat(web): optional bring-your-own Xiaomi MiMo TTS key (browser-side synthesis)
Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits
cause silent playback under concurrency. This adds an OPTIONAL path: a user
can store their own Xiaomi MiMo key in the browser and synthesize voice
client-side against Xiaomi's CORS-open endpoints. The key lives only in
localStorage and is never sent to or logged by our server; the shared server
key still serves everyone who does not opt in.

- components/TtsKeyModal.tsx: shared key modal (key-family + region picker),
  reused by both the home and play pages
- app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens
  in place instead of redirecting to the home page
- app/page.tsx: home page consumes the shared modal + readStoredTtsConfig
- lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets
- app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update
- docs/xiaomi-tts-key.md + README note

Verified with tsc --noEmit (exit 0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 16:58:55 +08:00
Zonghao Yuan 24b674d792 Merge pull request #27 from zonghaoyuan/perf/writer-split
perf(engine): split Writer into Phase A (plan) + Phase B (beats)
2026-06-04 16:53:21 +08:00
yuanzonghao efe021d886 fix(engine): pin entry-beat roster to the plan in Phase B
The Painter composites exactly plan.entryActiveCharacters into the entry
frame (the same roster the Cinematographer framed). Phase B is told to
reuse that roster, but only the entry beat's id was code-enforced — so an
LLM slip could leave a character in the painted frame that the runtime
entry beat says isn't there. Pin activeCharacters onto the plan's entry
beat as a last line of defense, mirroring the existing id pin.

Speaker is intentionally left to the prompt: it's coupled to line/TTS, so
overwriting it could mis-attribute or orphan Phase B's dialogue.

Addresses Copilot review feedback on PR #27.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 15:48:14 +08:00
DESKTOP-I1T6TF3\Q e04c51e875 feat(api): support custom BYO API header override on client fetches and backend config 2026-06-04 13:49:46 +08:00
yuanzonghao 3bf5c92841 perf(engine): split Writer into Phase A (plan) + Phase B (beats)
The Writer was the serial long pole: a single LLM call wrote the scene
skeleton AND the full beats[] graph before anything downstream could
start, so variable-length beat generation blew up tail latency.

Split it into two calls:
- Phase A (runWriterPlan): minimal skeleton the image pipeline needs
  (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker).
  Serial, on the critical path, kept lightweight.
- Phase B (runWriterBeats): full beats[] + storyStatePatch, written to
  honor the plan. Launched immediately, overlaps the ENTIRE image pipeline
  (cards / cinematographer / portraits / painter), awaited last.

Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long
beat-writing is hidden behind image gen. A Phase B failure degrades to a
single playable beat synthesized from the plan.

Paired distinct-payload A/B (6 content-matched stories, baseline vs split):
- median end-to-end 42.6s -> 32.2s (-24%)
- mean 46.4s -> 33.1s (-29%)
- worst case 74.7s -> 37.6s (halved)
- no content regression: total Writer output tokens 12858 -> 13699

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 11:17:34 +08:00
yuanzonghao e095650944 refactor(web): enforce content-free Umami fields at compile time
Address the Copilot review on #26.

#1 The game_start / art_style_select payload fields were typed as bare
   `string`, so free text could still slip through despite the "content-free
   by construction" claim. Add lib/options.ts as the single source of truth
   for the selector option sets (`as const` → literal-union types), have the
   home OPTS render from those arrays, and type the analytics fields from the
   derived unions (gender/art_style/plot_style/pacing/style) plus a template
   type for `card`. Free text now fails to compile; no casts at call sites.

#2 The /play heartbeat scheduled its 30s interval unconditionally. Gate the
   effect on the same NEXT_PUBLIC_UMAMI_* env used for script injection, so
   nothing is scheduled when the tracker is off (visibility check kept — a
   hidden tab still never emits).

#3 choice_select no longer emits a -1 choice_index: skip the event when the
   index can't be resolved instead of polluting the index distribution.

Verified with tsc (exit 0) and a throwaway negative test: free text in any
of the six fields raises TS2322, valid enum/template values compile.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 10:59:31 +08:00
yuanzonghao 4bf05f6784 feat(web): add privacy-friendly Umami custom events
Instrument the play flow with 9 content-free custom events (game_start,
art_style_select, style_image_upload, scene_reached, choice_select,
vision_click, tts_toggle, fullscreen_toggle, play_heartbeat) to measure
retention, engagement depth and session duration.

Privacy is enforced by construction, not convention:
- lib/analytics.ts types each event with a discriminated union, so a
  payload has no slot for free text — prompts, world guides, uploaded
  images and vision output can never reach analytics (compile-time
  guarantee, not a comment).
- track() no-ops without window.umami and never throws into the app.
- coarse 30s heartbeat fires only while the tab is visible.
- script stays gated on NEXT_PUBLIC_UMAMI_* env (blank → no script),
  honours Do-Not-Track, and locks to an exact data-domains allowlist.
- one-line on-site disclosure with a link, shown only when tracking is on.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 10:14:08 +08:00
DESKTOP-I1T6TF3\Q 347ab297d5 feat(web,engine): custom style — image upload, AI-extract prompt, painter ref
自定义画风入口里加上传按钮:客户端把图缩到 512px webp(base64),传到新
路由 /api/parse-style-image,vision LLM 解析成英文 style prompt 回填 textarea;
图本身随 sessionStorage → /api/start → Session.styleReferenceImage 透传,
painter.collectReferenceImages 把它置于 slot 0,整局每一幕都作为 reference
图锚定画风(brush / color / mood),比 priorScene 优先级更高。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 19:15:19 +08:00
DESKTOP-I1T6TF3\Q 298ecd4ec0 perf(engine): reorder Writer/Cinematographer prompts for prefix caching
Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+
on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers
match a stable byte-identical prefix from message[0]; once a single
byte changes everything after it misses, so the trick is to push every
session-stable bit to the front and concentrate per-call churn in a
short suffix.

Three coordinated changes:

1. Split storyState rendering into spine + dynamic.

   renderStoryStateSpine: logline / genreTags / protagonist / castNotes
   — Architect-set fields that StoryStatePatch literally cannot touch
     (the type only declares the 4 volatile ones; coerce and apply both
     cherry-pick), so spine bytes are guaranteed stable for the entire
     session. Goes in the STABLE PREFIX.

   renderStoryStateDynamic: synopsis / openThreads / relationships /
     nextHook — the Writer rewrites these every scene via storyStatePatch.
     Goes in the DYNAMIC SUFFIX.

   renderStoryState kept as a convenience wrapper that joins both, for
   anything that still wants the merged bible.

2. Rewrite buildWriterUserMessage with a stable/dynamic split.

   STABLE PREFIX (byte-identical or pure append across consecutive calls):
     - 世界观 / 画风 (session-immutable scalars)
     - story bible spine
     - 已登记角色  [sentinel: "(以下每行一个已登记角色,开场前为空。)"] + entries
     - 已使用的 sceneKey  [sentinel] + entries
     - 场景历史,已完结 [sentinel] + archivedHistory entries
        ↑ archivedHistory = history.slice(0, -1), NOT the full history
        — the live entry (history[-1]) keeps mutating mid-scene as the
          player walks new beats and speculative prefetches snapshot it
          at different moments, so it MUST stay out of the stable prefix
          or the byte-monotonic invariant breaks.

   DYNAMIC SUFFIX:
     - storyState dynamic patch
     - last-beat snippet (the exact emotional cliffhanger to continue from)
     - lastExit hint
     - format reminder tail

   The previous structure put the full storyState (including patched
   fields) at the very top of the user message, so the very first byte
   of the user message changed every scene — user-side cache hit was
   effectively 0% across the board.

3. Sentinel pattern for variable-length sections.

   Every list (characters / sceneKeys / archivedHistory) now emits a
   constant placeholder line after its header REGARDLESS of whether
   it has entries. With the old "if empty print '(暂无)' else print
   entries" pattern, adding the first item silently rewrites those
   placeholder bytes — the byte at offset N moves from a Chinese
   parenthesis to a dash, prefix cache torched. The sentinel line is
   the same bytes whether the list has 0 or N items; new items are
   pure appends after it.

4. Rewrite buildCinematographerUserMessage.

   New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued
   right after the session-stable styleGuide line, so the stable prefix
   is long enough to cross at least one full 64-token chunk boundary
   beyond the system prompt. The per-scene inputs (sceneSummary,
   entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity
   hint) all moved into the dynamic suffix below.

Verified (see [cache] / [debug-writer] logs from staging): hash of
500-byte slices of the user message is byte-identical across two
same-historyLen Writer calls through the entire stable prefix; only
the dynamic suffix slice differs. The remaining cache-hit gap under
MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally
jumps to 4096); on DeepSeek the same prefix should hit fully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:42:33 +08:00
DESKTOP-I1T6TF3\Q 37c911f510 chore(engine): log prompt-cache hit/miss per chat call
Add a `tag` option to chat() and have it print one `[cache] <tag>
hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are
probed in order so the same logger works across providers:

  - DeepSeek (v3+):  usage.prompt_cache_hit_tokens / *_miss_tokens
  - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens
  - Anthropic:        usage.cache_read_input_tokens / *_creation_*

When none of them are present (MiMo / local Ollama / others) we still
print prompt + completion totals so the cost baseline is visible.

Tag every callsite so the log is greppable:
  architect / writer / character-designer / cinematographer / insert-beat

This is the prerequisite for the prefix-cache reordering work that
follows — without per-agent visibility there's no way to tell if a
prompt rearrangement actually moved the needle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:42:33 +08:00
DESKTOP-I1T6TF3\Q cbabc54273 chore(engine): log worldSetting and storyBible at session start
Two lines in startSession: the full worldSetting being fed to the
Architect, and the resulting logline/genreTags/synopsis it produced.
Cheap to keep — fires once per session — and makes it possible to tell
at a glance whether a "story unrelated to my input" report is a frontend
transport bug, a worldSetting layout problem, or the LLM ignoring the
seed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 03:51:58 +08:00
DESKTOP-I1T6TF3\Q bed4dc5a8f feat(web): gender-differentiated 4:5 covers + per-card styleGuide prebake
- Regenerate 60 covers (30 male + 30 female) via FLUX with story-specific
  prompts, replacing the prior gender-shared set
- Crop covers to 4:5 (960×1200) via sharp attention cover; matches new
  homepage card aspectRatio
- Persist all 60 prompts to public/home/prompts.json so the prebake step
  can reuse the cover's exact visual anchor (per-card styleGuide) and the
  first-act scene visually carries over from the poster the player clicked
- Restore /play?card= prebaked instant-play path on homepage card click
- Add OpenAI-compatible image route in ai-client for non-Runware endpoints
- Hide Next.js dev indicators globally; tweak F-key fullscreen label

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 02:26:35 +08:00
Zonghao Yuan dc5ecd60f6 refactor: flatten monorepo to single web package (#12)
Flatten the pnpm monorepo (apps/web + packages/*) into a single web package at the repo root.

- Move app/lib/components/scripts/public to root; drop apps/web and packages/* wrappers
- Rewrite tsconfig paths (@infiplot/*) to ./lib/*; turbopack.root = __dirname
- Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths
- Regenerate pnpm-lock.yaml to drop stale workspace importers
- Bump engines.node to >=22 to match wrangler

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 00:55:45 +08:00