Commit Graph

286 Commits

Author SHA1 Message Date
DESKTOP-I1T6TF3\Q 3e132ce28b docs: drop the click-to-enlarge caption from README screenshots
GitHub markdown can't host an in-page lightbox or prev/next carousel —
all <script> is stripped server-side, so a clickable thumbnail can only
ever open the raw image in a new page. The hint line was misleading
about that interaction, so just remove it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 17:31:32 +08:00
DESKTOP-I1T6TF3\Q bce686a7eb docs: make README screenshots clickable to view full size
Wraps each <img> in an <a href="..."> linking to the same path so GitHub
opens the full-resolution image on click instead of just showing the
inline thumbnail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 17:24:44 +08:00
DESKTOP-I1T6TF3\Q d93c16d836 feat(web): 红果-style homepage + instant-play prebaked first acts
Rewrites all 64 homepage cards (32 男性向 + 32 女性向) as short-drama hook
stories (战神归来 / 重生分手前夜 / 系统选妃 / 穿成乙游男配 / 末世异能 / 民国
谍战 / 修真渡劫 …) and regenerates each cover via FLUX in its assigned art
style (12 styles spread across 64 cards) at 832×1024 ≈4:5.

Click-to-play path: cards now jump straight to /play?card=<name> and hydrate
Session from /home/firstact/<name>.json — the engine pipeline (Architect +
Writer + CharacterDesigner + Painter) has been pre-run for 44/64 cards. The
remaining 20 (m14/m29/f14..f31) are pending an LLM credit top-up; their
clicks fall through to live /api/start for now.

Runware-hosted first-scene images are downloaded into /home/firstscene/
and the JSONs are rewritten to point at the local webp, so click → first
image is bounded by local-disk decode (~100ms) instead of CDN round-trip.

Scripts:
- scripts/generate-home-images.mjs  — rewrites all 64 cover prompts, per-card
  styles baked into prompts, 832×1024 dims to match StoryCard aspect
- scripts/prebake-firstacts.mjs     — POST /api/start × 64 with concurrency
  4, saves StartResponse to public/home/firstact/<name>.json
- scripts/localize-firstact-images.mjs — downloads each prebaked imageUrl
  to public/home/firstscene/<name>.webp (q80, ≤1600px) and rewrites JSON

README: adds Screenshots section (3×3 gallery) to README.md / README.zh-CN.md,
9 in-game shots compressed to docs/screenshots/*.webp (7.5MB → 680KB).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 17:20:34 +08:00
DESKTOP-I1T6TF3\Q 9ae91dd3ed feat(play): hug-canvas action buttons, unified mute, enlarged back-link
UI layout (PlayCanvas + play/page.tsx):
- "F · 全 · 屏" button (renamed from 演 · 示 to match what users
  actually mean by F) floats above the canvas, right-aligned, via a
  new `aboveCanvas` ReactNode slot that lives on the relative
  inline-block image wrapper at `bottom-full right-0`. It hugs the
  actual image right edge regardless of aspect ratio.
- "有 · 声 / 静 · 音" button mirrors that on the left via a new
  `aboveCanvasLeft` slot.
- Both slots also render inside the loading placeholder so the two
  controls appear from frame one, before the scene image arrives.
- InfiPlot back-link grows from 15px to 22/26px (mobile/desktop) with
  a slightly larger arrow, matching the brand'\''s presence on the
  homepage hero.
- Canvas-bottom metadata row (image dims on left, tutorial hint on
  right) dropped. The "—" placeholder and "···" loading state looked
  like stray punctuation; users found them noisy.
- Footer collapses to a single centered "Ⅰ · Ⅰ" mark.

Audio gating logic (play/page.tsx):
- Collapse the two-flag audio gate into one source of truth. The
  homepage "语音配音" choice no longer lives in a separate
  `audioEnabledRef` flag that gates `fetchBeatAudio` independently
  of the in-page mute state. Instead the `muted` useState lazy
  initializer reads `sessionStorage["infiplot:custom"].audioEnabled`
  and projects it inversely (audioEnabled=false → muted=true) so
  the 静音/有声 button correctly reflects the homepage selection
  from the first frame. The in-page toggle remains the source of
  truth from then on (persisted to localStorage:infiplot:muted).
- This fixes a visible disconnect where picking "关闭" on the
  homepage left the play page showing 有声 because the in-page
  state had no link to the homepage choice.
- The sessionStorage read uses the renamed key "infiplot:custom"
  (the infiplot rename PR changed it from yume:custom on the home
  side but the play side hadn'\''t been updated to match).

No new TTS quota is ever burned while muted: fetchBeatAudio'\''s
mutedRef.current early-return is the only path to /api/beat-audio
and is checked before the fetch fires; mute transitions also abort
in-flight requests.
2026-06-02 15:42:49 +08:00
Zonghao Yuan cffe4da4ca docs: streamline 3 READMEs and fix EN language switcher (#6)
Slim overview across EN/zh/JA, drop badges/blockquote/contributing, trim LICENSE header; fix the English switcher to point at the repo homepage instead of the GitHub site root.
2026-06-02 15:33:08 +08:00
DESKTOP-I1T6TF3\Q 6da87df73a feat(web): home story-card polish + play page back-link rebrand
Home (apps/web/app/page.tsx):
- StoryCard locked to uniform aspectRatio "4 / 5". The previous
  "placeholder 4/5 → naturalRatio after onLoad" flow coupled card
  height to lazy-load order: cards still below the fold sat at the
  placeholder ratio while above-the-fold cards snapped to their
  image's actual ratio (1.6 landscape vs 0.75 portrait vs 1.23
  squarish), so the gallery looked inconsistent until a hard refresh
  re-decoded everything from cache synchronously. Fixed ratio +
  object-cover removes the coupling.
- StoryCard hover overlay collapsed from two sibling layers
  (backdrop-blur + mask-image + dark gradient sibling) into one
  element with a pure rgba(0,0,0,…) linear-gradient and an opacity
  transition. Chromium does not animate backdrop-filter cleanly when
  combined with mask-image on an empty element — the first hover
  frame shows a full rectangular blur before the mask kicks in, then
  snaps to the feathered shape ("矩形磨砂 → 渐变磨砂"). One layer,
  one transitioning property, no compositing race.

Play (apps/web/app/play/page.tsx):
- Header back-link "云梦" → "InfiPlot" using the same serif + italic
  ember "Plot" treatment as the homepage wordmark. Resolved against
  the parallel plain-text rebrand already on infiplot/staging by
  keeping the styled version for brand consistency.
2026-06-02 14:42:26 +08:00
Zonghao Yuan 588b668d14 Merge staging into main (#3)
* feat(engine): Architect agent + cross-scene StoryState coherence

Add a dedicated Architect LLM call at session start that expands the terse
world/style prompt into a persistent story bible (logline, genre, second-
person protagonist, cast, engineered opening hook). The bible seeds a
StoryState the Writer reads and patches every scene, carried + merged
across cuts (applyStoryStatePatch) so the story keeps a spine from beat
one instead of jumping between scenes.

- prompts: inject web-novel / short-drama / galgame craft into Writer +
  Architect; Writer emits storyStatePatch to update the running bible
- director: parallelize voice + non-entry portraits with the Painter
  (only entry-beat portraits block paint) to offset Architect latency
- architect: chat/parse guarded so a malformed response never aborts start
- types: StoryState / StoryStatePatch; required on Start/SceneResponse

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix (#2)

* docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix

- LICENSE: add GNU AGPL v3 with InfiPlot copyright notice
- README.md: rewrite for open-source project, fix TTS description
  (TTS uses MiMo's own protocol, not OpenAI-compatible)
- README.zh-CN.md: add Simplified Chinese translation
- README.ja.md: add Japanese translation
- package.json: change license from UNLICENSED to AGPL-3.0-only

* fix: address Copilot review — .env.example TTS comment, zh-CN formatting

- .env.example: clarify TTS uses MiMo's own protocol, not OpenAI-compatible
- README.md: 'land paper after paper' → 'publish paper after paper'
- README.zh-CN.md: add spaces around '5 月', fix code formatting
  for model names (deepseek-v4-flash)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 13:41:37 +08:00
Zonghao Yuan 9a3511f220 docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix (#2)
* docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix

- LICENSE: add GNU AGPL v3 with InfiPlot copyright notice
- README.md: rewrite for open-source project, fix TTS description
  (TTS uses MiMo's own protocol, not OpenAI-compatible)
- README.zh-CN.md: add Simplified Chinese translation
- README.ja.md: add Japanese translation
- package.json: change license from UNLICENSED to AGPL-3.0-only

* fix: address Copilot review — .env.example TTS comment, zh-CN formatting

- .env.example: clarify TTS uses MiMo's own protocol, not OpenAI-compatible
- README.md: 'land paper after paper' → 'publish paper after paper'
- README.zh-CN.md: add spaces around '5 月', fix code formatting
  for model names (deepseek-v4-flash)
2026-06-02 13:39:54 +08:00
Zonghao Yuan 70d0927a3e Merge pull request #1 from zonghaoyuan/feature/story-harness-opt
feat(engine): Architect agent + cross-scene StoryState coherence
2026-06-02 13:24:46 +08:00
yuanzonghao 15ce03a912 feat(engine): Architect agent + cross-scene StoryState coherence
Add a dedicated Architect LLM call at session start that expands the terse
world/style prompt into a persistent story bible (logline, genre, second-
person protagonist, cast, engineered opening hook). The bible seeds a
StoryState the Writer reads and patches every scene, carried + merged
across cuts (applyStoryStatePatch) so the story keeps a spine from beat
one instead of jumping between scenes.

- prompts: inject web-novel / short-drama / galgame craft into Writer +
  Architect; Writer emits storyStatePatch to update the running bible
- director: parallelize voice + non-entry portraits with the Painter
  (only entry-beat portraits block paint) to offset Architect latency
- architect: chat/parse guarded so a malformed response never aborts start
- types: StoryState / StoryStatePatch; required on Start/SceneResponse

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 11:44:55 +08:00
Zonghao Yuan 16707cc255 Merge pull request #10 from zonghaoyuan/rename/infiplot
chore: complete @yume → @infiplot rename (post-PR#9)
2026-06-02 09:30:23 +08:00
yuanzonghao 8eda27f241 chore: complete @yume → @infiplot rename (post-PR#9)
PR #9 已完成首页和 layout 的视觉品牌迁移,此 commit 补齐剩余的
技术性改名 —— workspace 包名、source import、localStorage 键、
CSS keyframe、内部 header logo、.env.example、README。

- @yume/* → @infiplot/* (6 package.json + 17 imports + lockfile)
- localStorage/sessionStorage: yume:* → infiplot:*
  (含 PR #9 新增的 yume:hintClosed)
- CSS keyframe yume-ripple → infiplot-ripple
- new/play 页面 header logo "云梦" → "InfiPlot"
- 代码注释中的「云梦」style 形容词删除(layout.tsx, page.tsx)
- 根 package.json name + description(描述跟齐 staging
  "AI 实时交互剧情游戏")
- README: tagline / Vercel deploy URL / 目录树 / engine 描述

保留:prompts.ts 的 LLM 体裁术语「视觉小说/galgame」、CustomForm
placeholder 的「视觉小说画风」(图像模型识别的风格名词)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-02 09:27:00 +08:00
Zonghao Yuan 5d0a5bb756 Merge pull request #9 from zonghaoyuan/feature/infiplot-low-fi-homepage
feat(web): InfiPlot low-fi homepage with AI-generated cards + gender-…
2026-06-02 01:30:29 +08:00
yuanzonghao 54fda1914c fix(web): gate play-page un-mute prefetch on actual mute transitions
On mount the mute effect fired alongside the scene effect (both call
prefetchSceneAudio), so the initial /api/beat-audio batch was dispatched
twice — the first set aborted mid-flight. Track the previous muted value
in a ref and only re-prefetch on a real transition, leaving the mount-time
synthesis to the scene effect. Addresses Copilot review on PR #9.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 01:29:33 +08:00
yuanzonghao 8f94d3aa65 feat(web): editorial homepage rework — flat 2×30 stories, autosize input
Revises the InfiPlot homepage from the initial prototype pass.

Stories data model
- Replaces the artificial 7-hero + 16-gallery split with a flat
  per-gender model: 30 preset stories each for 男性向 / 女性向.
- Renames assets hero*/gallery* → m{0..29} / f{0..29}; same index
  shares aspect ratio across genders so the gender crossfade never
  jumps card height.
- Fills in the missing 女性向 set and expands both genders to 30.

Cards
- StoryCard measures aspect ratio at runtime from the loaded image
  (onLoad → naturalWidth/Height), fixing the frosted-caption band
  reflow on lazy image load. Drops ready/fallback props; single
  masonry map over STORIES[gender].

Hero input
- Single-line <input> → auto-growing <textarea> (rows=1, resize-none)
  so long prompts and long card seeds are fully visible. Enter submits,
  Shift+Enter inserts a newline.
- lining-nums on the input so digits sit on the baseline instead of
  Cormorant's default old-style figures.

Typography / styles
- layout.tsx: editorial fonts (Cormorant Garamond + Inter via
  --font-serif / --font-sans) + Font Awesome; drops Patrick Hand /
  Noto Sans SC and the hand-drawn SVG jitter filters.
- globals.css trimmed to the editorial base (paper grain, hairline,
  num, ripple); play/page.tsx font/style follow-up.

Scripts
- generate-home-images.mjs reworked into a flat 2×30 idempotent
  Runware FLUX.2 generator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 01:14:49 +08:00
DESKTOP-I1T6TF3\Q 136ceff69f feat(web): InfiPlot low-fi homepage with AI-generated cards + gender-reactive hero + audio toggle fix
Rebuilds the landing page from the prototype: 1900px scale-to-fit hero with
hand-drawn SVG-jitter frames, typewriter input + start button, 5 horizontal
collapsible category selectors (with style-picker modal), 7 scattered hero
cards over a 16-card masonry gallery, and project intro panel.

Each card is filled with a Runware FLUX.2 image, pre-generated and stored as
WebP (~2 MB total for 30 cards). Hero card content + image switches by
性向 (男性向 / 女性向); gallery stays shared.

Hover overlay on every card shows title + outline in a bottom-up dark
gradient, matching the prior homepage's interaction style.

Bug fixes uncovered by tracing the form-state → engine pipeline:
- 「语音配音:关闭」was previously stuffed into styleGuide (consumed only by
  FLUX, ignored by TTS). Now serialized as audioEnabled boolean in the
  sessionStorage payload; play page's fetchBeatAudio early-returns when
  false, so no /api/beat-audio request fires.
- 「绘画风格:自动」used to pass the literal Chinese phrase "由模型根据
  prompt 自动判断画风" to FLUX, which painted it as text. Now maps to the
  二次元/galgame default prompt.

Adds reusable scripts under apps/web/scripts/:
- generate-home-images.mjs — Runware FLUX.2 idempotent batch generator
- optimize-home-images.mjs — sharp WebP downscale + recompress

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 17:08:55 +08:00
Zonghao Yuan 774f3734fd Merge pull request #8 from zonghaoyuan/feature/vercel-deploy-prep
feat: Vercel Hobby deploy readiness — image URLs, jsonrepair, DeepSeek
2026-06-01 16:31:54 +08:00
yuanzonghao 42a09c42f8 fix: address Copilot review — SSRF validation + log truncation
- annotate.ts: add assertSafeUrl() to reject non-https/data URLs and
  private/reserved IPs (SSRF prevention); cap response body to 10 MB
- jsonParser.ts: truncate raw model output in error log to first 800
  chars to avoid flooding logs / leaking sensitive content
2026-06-01 16:29:08 +08:00
yuanzonghao addbede929 feat: Vercel Hobby deploy readiness — image URLs, jsonrepair, DeepSeek
- Move vercel.json to apps/web/ with correct route paths; cap scene route
  maxDuration 120→60s for Hobby. Root vercel.json removed. Vercel project's
  Root Directory must be set to apps/web (Deploy button URL passes this).
- Switch image transport from base64-in-JSON to Runware-hosted URLs:
  generateImage now uses outputType=URL and returns {imageUrl, imageUuid};
  StartResponse/SceneResponse carry imageUrl; VisionRequest carries
  prevImageUrl (server re-fetches the bytes for click annotation). This
  eliminates the 4.5MB serverless body-size risk.
- Painter and director prefer URL over UUID for referenceImages — the UUID
  returned by Runware imageInference isn't always recognized in the refs
  pipeline (surfaces as `failedToTransferImage`).
- Client preloads scene images via `new Image().decode()` before committing
  to React state, so URL transitions render instantly; prefetched scenes
  also warm the HTTP cache.
- jsonParser uses the jsonrepair package (replaces hand-rolled repair) and
  adds a targeted preRepair regex for the missing-key-close-quote pattern
  that jsonrepair couldn't disambiguate. Full raw model output dumped on
  failure for diagnostic visibility.
- Default text provider switched to DeepSeek v4-flash via direct API
  (significantly more stable JSON than MiMo v2.5-pro). VISION/TTS stay on
  MiMo (DeepSeek has no multimodal / TTS offerings).
- next.config: drop dead experimental.serverActions.bodySizeLimit (no
  server actions used).
- README: real Deploy button URL (zonghaoyuan/yume + root-directory=apps/web
  + TTS/MOCK_IMAGE in env list); refreshed env vars table with optional
  TTS section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-01 16:04:13 +08:00
Zonghao Yuan a426b82275 Merge pull request #7 from zonghaoyuan/fix/chat-error-handling
fix(ai-client): improve error handling in chat and vision functions
2026-05-31 13:07:39 +08:00
yuanzonghao b44f52de5d Merge fix/copilot-review: address Copilot review comments 2026-05-31 12:57:26 +08:00
yuanzonghao 4eb6c69af9 fix: address Copilot review comments
- Read response.text() before JSON.parse to avoid double serialization
- Use text.slice(0, 500) instead of JSON.stringify(json).slice(0, 500)
- Add typeof content !== 'string' check for stronger type validation
- Add explicit JSON parse error handling with try/catch
2026-05-31 12:57:22 +08:00
yuanzonghao 10359d1a01 fix(ai-client): improve error handling in chat and vision functions
- Add explicit check for empty choices array in both chat.ts and vision.ts
- Add optional chaining for message property access
- Throw descriptive error when API returns no content
- Use English comments consistent with project style
- Fixes debugging issues when upstream returns empty responses

Related to: chat.ts and vision.ts silent empty string return on malformed responses
2026-05-31 12:41:56 +08:00
yuanzonghao c610efcb26 fix(ai-client): improve error handling in chat function
- Add explicit check for empty choices array
- Add optional chaining for message property access
- Throw descriptive error when API returns no content
- Fixes debugging issues when upstream returns empty responses

Resolves: chat.ts silent empty string return on malformed responses
2026-05-31 12:38:55 +08:00
Zonghao Yuan def1b25bd9 feat(engine): multi-agent character consistency pipeline (#6)
* feat(types): Character.voiceDescription rename + visual fields + Scene.sceneKey

Prepares the type surface for the multi-agent scene pipeline:

- Character.description → voiceDescription (clearer pairing with new visualDescription)
- Character gains visualDescription (English appearance card for Painter) +
  basePortraitBase64 + basePortraitUuid (for Runware referenceImages reuse)
- Scene gains sceneKey (English slug for cross-scene img2img continuity) +
  imageUuid (Runware UUID of the scene's rendered image for cheap seedImage
  reuse on subsequent same-sceneKey calls)
- Beat gains activeCharacters[] so the Cinematographer can read which
  characters are on-screen + their poses when composing the establishing shot

Co-Authored-By: QiChen88 <2291969160@qq.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai-client): generateImage img2img + multi-reference options + uploadImage

Extends the Runware adapter to support the two anchoring mechanisms FLUX.2
[klein] 9B KV needs for character + scene visual consistency:

- generateImage gains optional { seedImage, referenceImages, strength }:
  seedImage drives img2img (single starting image, sceneKey continuity),
  referenceImages drives multi-reference anchoring (up to 4 character
  portraits, capped per Runware spec). Default strength 0.85 — FLUX
  ignores strength < 0.8.
- uploadImage POSTs a base64 to Runware's imageUpload taskType and
  returns the UUID, so portraits/scene snapshots can be referenced by
  UUID on subsequent calls instead of resending base64 every scene.

Co-Authored-By: QiChen88 <2291969160@qq.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(engine): multi-agent scene pipeline (Writer→CharDesigner+Cinematographer→Painter)

Replaces the single-LLM directScene with a four-agent pipeline that
specializes each concern and parallelizes the slow parts. Adopts the
core idea from #4 (multi-agent dispatch + character visual consistency)
and grafts it onto the Scene/Beat architecture introduced in #2.

Pipeline per Scene (~9-12s critical path with parallelization):

  Writer LLM (序列, ~3s)
    │ outputs: sceneSummary + sceneKey + beats[] (each beat carries
    │           activeCharacters[] with poses)
    │
    ├─ CharacterDesigner LLM × N new chars (并行)
    │     │ outputs: { visualDescription (英文外貌卡), voiceDescription (中文音色卡) }
    │     ├─ FLUX portrait gen → upload → UUID (并行 within agent)
    │     └─ Xiaomi MiMo voicedesign provision (并行 within agent)
    │
    └─ Cinematographer LLM (并行 with CharacterDesigner)
          outputs: { shotType, integratedPrompt (英文构图+机位+人物站位) }

  Painter (FLUX img2img + referenceImages, ~1-3s)
    inputs: integratedPrompt + onStageCharacters' archetype block
            + (optional) prior sceneKey-hit scene as seedImage
            + (optional) character portrait UUIDs as referenceImages
    fallback chain: A) both anchors → B) refs only (保角色) →
                    C) seed only (保背景) → D) pure t2i
    output uploaded → Scene.imageUuid for the next sceneKey hop

Why this carving:
- Writer focuses purely on narrative (drops the voice-design duty
  staging's DIRECTOR_SYSTEM was carrying as a side concern).
- CharacterDesigner bundles visual + voice so the agent that thinks
  "who is this character" produces internally-consistent appearance +
  vocal personality (split agents tend to diverge).
- Cinematographer doesn't need character visualDescriptions —
  Painter appends archetypes after — so it parallelizes with
  CharacterDesigner.
- sceneKey enables cross-scene backdrop continuity that Scene/Beat
  doesn't cover (Scene/Beat only reuses backdrop WITHIN a scene's
  beats; sceneKey reuses across scenes that share a location).

Other changes:
- voice.ts loses provisionVoicesForScene (moved into CharacterDesigner);
  keeps synthesizeBeat for the lazy per-beat /api/beat-audio path.
- renderer.ts deleted (replaced by agents/painter.ts).
- directInsertBeat (vision-driven in-scene exploration) stays single-
  LLM — it forbids new characters and produces no image, so multi-
  agent doesn't apply.

apps/web is unchanged: orchestrator.ts keeps the same exports
(startSession / requestScene / visionDecide / requestInsertBeat /
requestBeatAudio) with identical request/response shapes.

Co-Authored-By: QiChen88 <2291969160@qq.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engine): Pattern B player POV + JSON repair + drop seedImage tier

Three hotfixes surfaced by manual end-to-end testing of the multi-agent
pipeline.

F1 — Player viewpoint (galgame Pattern B):
  - Writer accepts speaker="你" for player dialog (renders in dialog box,
    never TTS'd because no Character record exists for "你"). Filter POV
    variants (玩家/我/主角/protagonist/player/I/me/...) from
    activeCharacters so CharacterDesigner never wastes API calls on the
    player. Two-layer defense: explicit prompt rule in WRITER_SYSTEM +
    code normalization (POV_VARIANTS set, isPovName, normalizeSpeakerName).
  - Cinematographer and Painter prompts gain "player never in frame" rule
    so the player never appears in any rendered scene.
  - Cinematographer gains dynamic camera policy driven by the entry beat's
    speaker: NPC-speaker → close-up looking toward camera; "你"-speaker →
    medium shot of attentive NPC; no speaker → wide establishing shot.
  - director.ts filters POV from orphanSpeakers so provisionVoiceForName
    never fires for "你".

F2 — JSON parsing robustness:
  - parseJsonLoose gains a 4th repair tier: strip JS-style comments, strip
    trailing commas, insert missing commas between adjacent objects /
    arrays / quoted values. Logs the first 800 chars of raw LLM output
    when all repair attempts fail, so we can see what the model emitted.

F3 — Drop seedImage, use referenceImages for prior scene:
  - FLUX.2 [klein] 9B KV does not support seedImage (img2img). Removed
    Tier A (seedImage+refs) and Tier C (seedImage only) from the Painter
    degradation chain. New layout: prior scene's image slots into
    referenceImages[0] for spatial continuity, character portraits fill
    slots 1-3 (Runware caps at 4 total). Cinematographer instructed to
    emphasize continuity when sceneKey matches a prior scene.

All five package typechecks pass.

Co-Authored-By: QiChen88 <2291969160@qq.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engine): address Copilot review feedback on #6

Three targeted fixes from PR #6 Copilot review.

F4 — Stale seedImage/img2img docstrings
  Four locations still referenced the original img2img design after F3
  switched to referenceImages-based spatial continuity:
  - types/index.ts:57   Scene.sceneKey docstring
  - types/index.ts:63   Scene.imageUuid docstring
  - director.ts:34      pipeline diagram in module block comment
  - director.ts:128     directScene JSDoc
  Doc-only changes; misleading wording corrected to mention referenceImages.
  (The design-rationale comment in pickPriorSceneReference is kept — it
  explains WHY we don't use seedImage and is load-bearing context.)

F5 — Remove JS-comment stripping from JSON repair pass
  parseJsonLoose's repair tier previously stripped `// ...` and
  `/* ... */` across the entire text, which would corrupt JSON string
  values containing URLs (e.g. "https://example.com" → "https:"). Since
  LLMs in `responseFormat: "json_object"` mode essentially never emit
  comments, dropping the comment-stripping step is a net win for safety.
  Trailing-comma and missing-comma repair (the high-frequency failures)
  are kept.

F6 — Pattern B parity on the insert-beat path
  Previously: directInsertBeat's INSERT_BEAT_SYSTEM forbade any speaker
  not in session.characters, and the orchestrator's unregistered-speaker
  guard demoted such lines to narration. This meant the player could not
  speak via speaker="你" in transient in-scene beats — inconsistent with
  the Writer path.
  Fix:
  - INSERT_BEAT_SYSTEM prompt now allows speaker="你" (NPC name OR "你")
    and rejects other POV variants
  - directInsertBeat applies normalizeSpeakerName to the LLM output, same
    as the Writer path, so POV variants collapse to "你"
  - lineDelivery is dropped when speaker="你" (no TTS for player)
  - orchestrator's unregistered-speaker guard adds a `speaker !== "你"`
    exception so Pattern B player dialog passes through

Co-Authored-By: QiChen88 <2291969160@qq.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(engine): drop "JS-style comments" from parseJsonLoose header

The function header listed JS-style comments as a step-4 repair, but F5
already removed comment stripping from `repairJsonString` because the
regex would corrupt URLs inside JSON string values. The inner function's
comment was updated then; this header was missed.

Doc-only sync from second-round Copilot review on #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: QiChen88 <2291969160@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 13:30:24 +08:00
Zonghao Yuan e261f4a346 feat: Runware FLUX.2 image + lazy per-beat TTS (#5)
Reduce median scene-load latency from ~30-80s to ~17-25s by switching image generation to Runware FLUX.2 [klein] 9B KV and moving per-beat TTS synthesis off the scene response into a new lazy /api/beat-audio endpoint with hard timeout + abort support.

- feat(image): migrate to Runware FLUX.2 [klein] 9B KV — task-array API, $0.001/image, sub-second inference.
- feat(tts): split /api/scene into directScene + image + voicedesign-provisioning; lazily synth per beat via /api/beat-audio with 15s hard timeout + AbortSignal threaded to MiMo so timed-out calls don't keep burning sockets/quota; client fans out per-beat fetches on scene-id change with abort + identity-check finally to prevent cross-scene beat-id collisions.
- refactor(tts): slim BeatAudioRequest to { beat, voice } — ~800KB per-beat upload dropped to ~160KB by sending only the speaker's voice instead of the full session.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-28 23:43:51 +08:00
Zonghao Yuan fcd4e6c1ab feat(tts): Xiaomi MiMo per-beat voice + MOCK_IMAGE testing aid (#3)
Adds optional Xiaomi MiMo TTS layer on top of the scene/beat engine and a MOCK_IMAGE flag for cheap local TTS iteration.

- Per-character voice provisioning via MiMo voice design → clone, reference audio persisted in session
- Per-line free-form delivery direction (Director writes "鼓起勇气又害羞,声音发颤" style instructions; sent to MiMo's director channel, never read aloud)
- Per-beat audio served with the scene response; frontend plays via hidden <audio> with typewriter synced to audio duration; mute toggle persisted via localStorage lazy initializer
- Graceful degradation: any TTS step failing → silent beat, game continues
- MOCK_IMAGE=true returns a sharp-generated placeholder PNG so local TTS iteration doesn't burn image tokens
- Recommended config in .env.example: MiMo Token Plan covers TEXT/VISION/TTS with one key (mimo-v2.5-pro for text, mimo-v2.5 omni for vision, mimo-v2.5-tts for TTS)

Squashed from #3:
- feat(tts): 小米 MiMo 逐 beat 配音 + 按 session 角色音色 + 自由文本配音指导
- feat(engine): MOCK_IMAGE 占位图便于本地测试
- fix(tts): address Copilot review on PR #3
- fix(tts): Copilot round-2 review feedback

Known limitation: Session.characters carries the full WAV reference audio (~200-300KB/character base64) and round-trips through every /api/scene, /api/vision, /api/insert-beat request. This is intrinsic to MiMo's design→clone model (voice identity IS the audio, no server-side voiceId). Fixing requires server-side storage which is out of scope; documented for future hardening.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-28 20:45:21 +08:00
Zonghao Yuan d1f13d51a3 feat: scene/beat architecture — decouple dialogue from image generation (#2)
Replace the one-image-per-interaction model with scenes that hold multiple
dialogue beats. The image regenerates only on scene-change actions; tapping
through beats and in-scene choices are instant and zero-network.

Squashed from #2:
- feat: scene/beat architecture — decouple dialogue from image generation
- fix: harden LLM-output parsing, prefetch lifecycle, and typewriter (PR review)
- fix: dedupe beat ids; fallback narration on empty insert-beat (PR review #2)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-28 15:20:12 +08:00
Qi Chen d116c2e3b5 feat: separate UI choices from AI image (bypass vision)
HTML choice buttons now call /api/interact directly, bypassing the ~4s Vision roundtrip. Free-form background clicks still go through Vision as before.
2026-05-25 20:47:33 +08:00
yuanzonghao bf8f356e37 feat: 16:9 landscape canvas + F-key presentation mode
- image prompt: vertical 9:16 → landscape 16:9 cinematic, scene fills
  canvas with bottom dialogue band and horizontal choice row
- image-client: pass size=1792x1024 hint (provider honors it → output is
  now exact 16:9 instead of the model's default 1.75:1)
- PlayCanvas: drop 560px cap, use object-contain into available space,
  add fullViewport prop for chrome-less presentation rendering
- play page: F / Esc shortcuts + Fullscreen API + fullscreenchange
  sync; chrome-less black-letterbox overlay (bg-black) suited for
  screen recording

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 10:07:13 +08:00
yuanzonghao d81f4ab2f1 refactor(web): self-host fonts via next/font/google
Replace external <link> to fonts.googleapis.com with next/font/google
for Cormorant Garamond and Inter. Fonts are now built-time downloaded
and served from /_next/static/media, exposed via --font-serif and
--font-sans CSS variables that Tailwind's fontFamily reads.

Eliminates runtime dependency on Google Fonts CDN (helpful for offline
or region-restricted deploys), avoids FOUT through next/font's
size-adjusted fallback, and removes two render-blocking external
stylesheet requests on first load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:20:15 +08:00
yuanzonghao 2793c06278 refactor: rename project DADA → 云梦 (slug: yume)
- 所有 workspace 包 @dada/* → @yume/*,根包 dada → yume
- 全部导入路径同步更新
- 内部 ID 对齐:dada-ripple → yume-ripple,dada:custom → yume:custom
- 首页 / new / play 用户文案整段中文化,保留 smallcaps + 衬线 + 罗马数字排版语汇
- README 标题改为 "# 云梦",部署链接与目录树 slug 改为 yume
- 重新生成 pnpm-lock.yaml

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 10:14:14 +08:00
yuanzonghao d0f2868834 chore: drop MIT license and open-source framing
Project is now private; remove LICENSE file, README license
section, and "MIT · MMXXVI" footer tags. Root package.json
license set to UNLICENSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 13:18:07 +08:00
yuanzonghao 9cedfa66e4 feat: prefetch, vision split, provider adapter, UI polish
Engine
- Split /api/vision out from /api/interact so client can drive
  prefetch + cache lookup independently of click interpretation
- Image client switched to chat-completions+modalities API (OpenRouter/
  provider style), supporting markdown image URL responses
- annotateClick now resizes to 768w before composite to keep vision
  payloads small and avoid CDN timeouts
- Prompts updated to mention "JSON" in user messages (required by
  Gemini's strict JSON mode)
- Shared fetchWithRetry helper: 2 retries for chat/image, 0 for vision
  (with 60s hard timeout)

Client
- Parallel prefetch of all three choice branches on each new frame
- Effect deliberately excludes phase from deps so user-click doesn't
  abort in-flight prefetches
- Cache hit/miss/free-form fallback handled in handleClick
- PlayCanvas reads img naturalWidth/Height and adapts container to
  whatever aspect AI returns (no more cropped third choice)
- max-width raised to 560px, max-height calc(100dvh - 200px)

Misc
- README env-path corrected to apps/web/.env.local
- users.md: BGM/TTS idea note
- .env.example moved into apps/web alongside next config

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:38:03 +08:00
yuanzonghao ad4b09c744 fix(web): tame Next.js 16 dev server CPU runaway
- Disable typed routes (default-on in Next 16, loops infinitely
  with transpilePackages workspace setup, holding 500%+ CPU at idle)
- Pin turbopack.root to monorepo root so a stray ~/pnpm-lock.yaml
  cannot misinfer the workspace boundary
- Commit pnpm-lock.yaml; ignore .claude/ local plugin state

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 10:12:54 +08:00
yuanzonghao cbd95bbea2 Initial commit: AI-driven visual novel scaffold
- Monorepo (pnpm workspace): apps/web + packages/{types,ai-client,engine}
- Next.js 16 web app with three-stage AI orchestration
- Three independently configurable providers: text LLM, image generator, vision model
- Warm minimalist editorial UI design
- One-click Vercel deploy ready

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 13:29:58 +08:00