Commit Graph

16 Commits

Author SHA1 Message Date
Zonghao Yuan d15d53ba65 Merge pull request #57 from zonghaoyuan/feat/tts-stepfun-provider
feat(tts): add StepFun preset-voice provider, route by URL + voice tag
2026-06-09 14:28:36 +08:00
DESKTOP-I1T6TF3\Q 04f22249c9 fix(tts): make stepfun preset pick case-stable and per-character
- Hash the lowercased description (matching the case-insensitive scoring)
  so the same archetype text picks the same preset regardless of case.
- Thread the character name through provisionVoice -> stepfunProvision as
  the hash salt, so two characters that share archetype keywords spread
  across the top-N candidate presets instead of collapsing on one voice.

Xiaomi path is unaffected (voicedesign mints a unique clip per call).
2026-06-09 09:14:44 +08:00
Qi Chen fc62c9edf5 feat(engine): tighten CharacterDesigner prompt to prevent look-alike … (#56)
* feat(engine): tighten CharacterDesigner prompt to prevent look-alike characters

Expand the visualDescription rules into a 6-element mandatory checklist (hair
quad / eyes triad / face & build / outfit quad / personality-driven vibe /
silhouette tag) and add an explicit anti-collision rule comparing against the
existing cast across cross-color-family and cross-silhouette dimensions.

Also upgrade the user-message "已设定角色" block from soft hint to hard
constraint with an explicit pre-write scan step, nudging the LLM into chain-
of-thought differentiation before emitting tags.

All additions land in the session-stable system prefix, so prompt cache
absorbs the extra tokens — per-call billed token delta is ~0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(engine): replace pose examples with aura descriptors in personality vibe

The PERSONALITY-DRIVEN VIBE element listed concrete poses (arms crossed,
chin tilted up, slight slouch) which contradicted the earlier rule
banning transient poses from visualDescription. Switch to pure
atmosphere/aura keywords so the character card stays pose-neutral.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: yuanzonghao <yuanzonghao123@gmail.com>
2026-06-08 16:27:15 +08:00
yuanzonghao 4972243a93 fix: address PR Agent review findings across 6 files
Restrict PR Agent workflow to trusted collaborators on PR comments only,
fix UTF-8 byte counting in gallery-pack, correct portrait-to-landscape
fallback orientation, track inserted freeform beats in visitedBeatIds,
allow clearing stored TTS key, and guard empty-string fuzzy match in
style selector.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 14:40:37 +08:00
yuanzonghao ae3dd17e6b feat(web): add player name, freeform input, and unified settings modal
- Player name: stored in localStorage, injected into Architect/Writer/InsertBeat
  prompts so NPCs address the player by name, displayed in dialogue UI
- Freeform input: compact button at choice nodes expands to text input, LLM
  classifier routes to insert-beat (interactive NPC response) or change-scene
- SettingsModal: unified panel merging player name, voice toggle (with
  collapsible TTS key section), replacing the old TtsKeyModal
- Insert-beat upgrade: prompt now requires NPC reaction when characters are
  present, shared by both freeform and Vision paths
- IME guard: isComposing check on freeform input to prevent CJK mid-composition
  submission

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 12:37:50 +08:00
yuanzonghao 57bc6556ab refactor(ai-client): unify OpenAI-compatible path to AI SDK generateText
Eliminate the dual code path (raw fetch vs AI SDK) for text and vision.
All providers now go through createLanguageModel() + generateText(),
removing chatOpenAiCompatible/analyzeOpenAiCompatible, the manual Usage
type, summarizeUsage, and responseFormat plumbing from 8 call sites.

Key fix: @ai-sdk/openai v3 defaults to the Responses API (/responses);
DeepSeek only supports Chat Completions, so we use .chat() explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 00:31:36 +08:00
yuanzonghao 165dcbc5e6 fix(engine): prevent Architect from seeing literal "auto" styleGuide
Replace session.styleGuide with a descriptive placeholder before the
Architect runs, so its prompt reads a natural sentence instead of the
raw "auto" marker. Also wrap selectStyle in a try-catch so a transient
LLM failure falls back to 吉卜力 instead of crashing session start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-06 22:28:44 +08:00
yuanzonghao 585f302908 feat(engine): auto-select art style via parallel LLM call
When user picks "自动", the client sends styleGuide="auto" to the
server. The orchestrator then runs a lightweight style-selector LLM
call in parallel with the Architect — both only depend on worldSetting,
so there is zero added latency. The selector picks the best-matching
preset from STYLE_MAP based on genre, mood, and setting.

Also moves STYLE_MAP from page.tsx to lib/options.ts so it can be
shared between client and server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-06 22:08:08 +08:00
yuanzonghao 9fc83de276 feat(web,engine): portrait-orientation scene images for mobile full-bleed
Thread orientation (portrait|landscape) from client through API, engine,
and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes;
desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image
coordinate mapping, session-locked orientation, a shared coerceOrientation
helper, and a choices overflow cap in portrait.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 17:30:54 +08:00
yuanzonghao efe021d886 fix(engine): pin entry-beat roster to the plan in Phase B
The Painter composites exactly plan.entryActiveCharacters into the entry
frame (the same roster the Cinematographer framed). Phase B is told to
reuse that roster, but only the entry beat's id was code-enforced — so an
LLM slip could leave a character in the painted frame that the runtime
entry beat says isn't there. Pin activeCharacters onto the plan's entry
beat as a last line of defense, mirroring the existing id pin.

Speaker is intentionally left to the prompt: it's coupled to line/TTS, so
overwriting it could mis-attribute or orphan Phase B's dialogue.

Addresses Copilot review feedback on PR #27.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 15:48:14 +08:00
yuanzonghao 3bf5c92841 perf(engine): split Writer into Phase A (plan) + Phase B (beats)
The Writer was the serial long pole: a single LLM call wrote the scene
skeleton AND the full beats[] graph before anything downstream could
start, so variable-length beat generation blew up tail latency.

Split it into two calls:
- Phase A (runWriterPlan): minimal skeleton the image pipeline needs
  (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker).
  Serial, on the critical path, kept lightweight.
- Phase B (runWriterBeats): full beats[] + storyStatePatch, written to
  honor the plan. Launched immediately, overlaps the ENTIRE image pipeline
  (cards / cinematographer / portraits / painter), awaited last.

Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long
beat-writing is hidden behind image gen. A Phase B failure degrades to a
single playable beat synthesized from the plan.

Paired distinct-payload A/B (6 content-matched stories, baseline vs split):
- median end-to-end 42.6s -> 32.2s (-24%)
- mean 46.4s -> 33.1s (-29%)
- worst case 74.7s -> 37.6s (halved)
- no content regression: total Writer output tokens 12858 -> 13699

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-04 11:17:34 +08:00
DESKTOP-I1T6TF3\Q 347ab297d5 feat(web,engine): custom style — image upload, AI-extract prompt, painter ref
自定义画风入口里加上传按钮:客户端把图缩到 512px webp(base64),传到新
路由 /api/parse-style-image,vision LLM 解析成英文 style prompt 回填 textarea;
图本身随 sessionStorage → /api/start → Session.styleReferenceImage 透传,
painter.collectReferenceImages 把它置于 slot 0,整局每一幕都作为 reference
图锚定画风(brush / color / mood),比 priorScene 优先级更高。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 19:15:19 +08:00
DESKTOP-I1T6TF3\Q 298ecd4ec0 perf(engine): reorder Writer/Cinematographer prompts for prefix caching
Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+
on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers
match a stable byte-identical prefix from message[0]; once a single
byte changes everything after it misses, so the trick is to push every
session-stable bit to the front and concentrate per-call churn in a
short suffix.

Three coordinated changes:

1. Split storyState rendering into spine + dynamic.

   renderStoryStateSpine: logline / genreTags / protagonist / castNotes
   — Architect-set fields that StoryStatePatch literally cannot touch
     (the type only declares the 4 volatile ones; coerce and apply both
     cherry-pick), so spine bytes are guaranteed stable for the entire
     session. Goes in the STABLE PREFIX.

   renderStoryStateDynamic: synopsis / openThreads / relationships /
     nextHook — the Writer rewrites these every scene via storyStatePatch.
     Goes in the DYNAMIC SUFFIX.

   renderStoryState kept as a convenience wrapper that joins both, for
   anything that still wants the merged bible.

2. Rewrite buildWriterUserMessage with a stable/dynamic split.

   STABLE PREFIX (byte-identical or pure append across consecutive calls):
     - 世界观 / 画风 (session-immutable scalars)
     - story bible spine
     - 已登记角色  [sentinel: "(以下每行一个已登记角色,开场前为空。)"] + entries
     - 已使用的 sceneKey  [sentinel] + entries
     - 场景历史,已完结 [sentinel] + archivedHistory entries
        ↑ archivedHistory = history.slice(0, -1), NOT the full history
        — the live entry (history[-1]) keeps mutating mid-scene as the
          player walks new beats and speculative prefetches snapshot it
          at different moments, so it MUST stay out of the stable prefix
          or the byte-monotonic invariant breaks.

   DYNAMIC SUFFIX:
     - storyState dynamic patch
     - last-beat snippet (the exact emotional cliffhanger to continue from)
     - lastExit hint
     - format reminder tail

   The previous structure put the full storyState (including patched
   fields) at the very top of the user message, so the very first byte
   of the user message changed every scene — user-side cache hit was
   effectively 0% across the board.

3. Sentinel pattern for variable-length sections.

   Every list (characters / sceneKeys / archivedHistory) now emits a
   constant placeholder line after its header REGARDLESS of whether
   it has entries. With the old "if empty print '(暂无)' else print
   entries" pattern, adding the first item silently rewrites those
   placeholder bytes — the byte at offset N moves from a Chinese
   parenthesis to a dash, prefix cache torched. The sentinel line is
   the same bytes whether the list has 0 or N items; new items are
   pure appends after it.

4. Rewrite buildCinematographerUserMessage.

   New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued
   right after the session-stable styleGuide line, so the stable prefix
   is long enough to cross at least one full 64-token chunk boundary
   beyond the system prompt. The per-scene inputs (sceneSummary,
   entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity
   hint) all moved into the dynamic suffix below.

Verified (see [cache] / [debug-writer] logs from staging): hash of
500-byte slices of the user message is byte-identical across two
same-historyLen Writer calls through the entire stable prefix; only
the dynamic suffix slice differs. The remaining cache-hit gap under
MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally
jumps to 4096); on DeepSeek the same prefix should hit fully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:42:33 +08:00
DESKTOP-I1T6TF3\Q 37c911f510 chore(engine): log prompt-cache hit/miss per chat call
Add a `tag` option to chat() and have it print one `[cache] <tag>
hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are
probed in order so the same logger works across providers:

  - DeepSeek (v3+):  usage.prompt_cache_hit_tokens / *_miss_tokens
  - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens
  - Anthropic:        usage.cache_read_input_tokens / *_creation_*

When none of them are present (MiMo / local Ollama / others) we still
print prompt + completion totals so the cost baseline is visible.

Tag every callsite so the log is greppable:
  architect / writer / character-designer / cinematographer / insert-beat

This is the prerequisite for the prefix-cache reordering work that
follows — without per-agent visibility there's no way to tell if a
prompt rearrangement actually moved the needle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:42:33 +08:00
DESKTOP-I1T6TF3\Q cbabc54273 chore(engine): log worldSetting and storyBible at session start
Two lines in startSession: the full worldSetting being fed to the
Architect, and the resulting logline/genreTags/synopsis it produced.
Cheap to keep — fires once per session — and makes it possible to tell
at a glance whether a "story unrelated to my input" report is a frontend
transport bug, a worldSetting layout problem, or the LLM ignoring the
seed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 03:51:58 +08:00
Zonghao Yuan dc5ecd60f6 refactor: flatten monorepo to single web package (#12)
Flatten the pnpm monorepo (apps/web + packages/*) into a single web package at the repo root.

- Move app/lib/components/scripts/public to root; drop apps/web and packages/* wrappers
- Rewrite tsconfig paths (@infiplot/*) to ./lib/*; turbopack.root = __dirname
- Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths
- Regenerate pnpm-lock.yaml to drop stale workspace importers
- Bump engines.node to >=22 to match wrangler

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 00:55:45 +08:00