Commit Graph

310 Commits

Author SHA1 Message Date
Zonghao Yuan fcd4e6c1ab feat(tts): Xiaomi MiMo per-beat voice + MOCK_IMAGE testing aid (#3)
Adds optional Xiaomi MiMo TTS layer on top of the scene/beat engine and a MOCK_IMAGE flag for cheap local TTS iteration.

- Per-character voice provisioning via MiMo voice design → clone, reference audio persisted in session
- Per-line free-form delivery direction (Director writes "鼓起勇气又害羞,声音发颤" style instructions; sent to MiMo's director channel, never read aloud)
- Per-beat audio served with the scene response; frontend plays via hidden <audio> with typewriter synced to audio duration; mute toggle persisted via localStorage lazy initializer
- Graceful degradation: any TTS step failing → silent beat, game continues
- MOCK_IMAGE=true returns a sharp-generated placeholder PNG so local TTS iteration doesn't burn image tokens
- Recommended config in .env.example: MiMo Token Plan covers TEXT/VISION/TTS with one key (mimo-v2.5-pro for text, mimo-v2.5 omni for vision, mimo-v2.5-tts for TTS)

Squashed from #3:
- feat(tts): 小米 MiMo 逐 beat 配音 + 按 session 角色音色 + 自由文本配音指导
- feat(engine): MOCK_IMAGE 占位图便于本地测试
- fix(tts): address Copilot review on PR #3
- fix(tts): Copilot round-2 review feedback

Known limitation: Session.characters carries the full WAV reference audio (~200-300KB/character base64) and round-trips through every /api/scene, /api/vision, /api/insert-beat request. This is intrinsic to MiMo's design→clone model (voice identity IS the audio, no server-side voiceId). Fixing requires server-side storage which is out of scope; documented for future hardening.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-28 20:45:21 +08:00
Zonghao Yuan d1f13d51a3 feat: scene/beat architecture — decouple dialogue from image generation (#2)
Replace the one-image-per-interaction model with scenes that hold multiple
dialogue beats. The image regenerates only on scene-change actions; tapping
through beats and in-scene choices are instant and zero-network.

Squashed from #2:
- feat: scene/beat architecture — decouple dialogue from image generation
- fix: harden LLM-output parsing, prefetch lifecycle, and typewriter (PR review)
- fix: dedupe beat ids; fallback narration on empty insert-beat (PR review #2)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-28 15:20:12 +08:00
Qi Chen d116c2e3b5 feat: separate UI choices from AI image (bypass vision)
HTML choice buttons now call /api/interact directly, bypassing the ~4s Vision roundtrip. Free-form background clicks still go through Vision as before.
2026-05-25 20:47:33 +08:00
yuanzonghao bf8f356e37 feat: 16:9 landscape canvas + F-key presentation mode
- image prompt: vertical 9:16 → landscape 16:9 cinematic, scene fills
  canvas with bottom dialogue band and horizontal choice row
- image-client: pass size=1792x1024 hint (provider honors it → output is
  now exact 16:9 instead of the model's default 1.75:1)
- PlayCanvas: drop 560px cap, use object-contain into available space,
  add fullViewport prop for chrome-less presentation rendering
- play page: F / Esc shortcuts + Fullscreen API + fullscreenchange
  sync; chrome-less black-letterbox overlay (bg-black) suited for
  screen recording

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 10:07:13 +08:00
yuanzonghao d81f4ab2f1 refactor(web): self-host fonts via next/font/google
Replace external <link> to fonts.googleapis.com with next/font/google
for Cormorant Garamond and Inter. Fonts are now built-time downloaded
and served from /_next/static/media, exposed via --font-serif and
--font-sans CSS variables that Tailwind's fontFamily reads.

Eliminates runtime dependency on Google Fonts CDN (helpful for offline
or region-restricted deploys), avoids FOUT through next/font's
size-adjusted fallback, and removes two render-blocking external
stylesheet requests on first load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:20:15 +08:00
yuanzonghao 2793c06278 refactor: rename project DADA → 云梦 (slug: yume)
- 所有 workspace 包 @dada/* → @yume/*,根包 dada → yume
- 全部导入路径同步更新
- 内部 ID 对齐:dada-ripple → yume-ripple,dada:custom → yume:custom
- 首页 / new / play 用户文案整段中文化,保留 smallcaps + 衬线 + 罗马数字排版语汇
- README 标题改为 "# 云梦",部署链接与目录树 slug 改为 yume
- 重新生成 pnpm-lock.yaml

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 10:14:14 +08:00
yuanzonghao d0f2868834 chore: drop MIT license and open-source framing
Project is now private; remove LICENSE file, README license
section, and "MIT · MMXXVI" footer tags. Root package.json
license set to UNLICENSED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 13:18:07 +08:00
yuanzonghao 9cedfa66e4 feat: prefetch, vision split, provider adapter, UI polish
Engine
- Split /api/vision out from /api/interact so client can drive
  prefetch + cache lookup independently of click interpretation
- Image client switched to chat-completions+modalities API (OpenRouter/
  provider style), supporting markdown image URL responses
- annotateClick now resizes to 768w before composite to keep vision
  payloads small and avoid CDN timeouts
- Prompts updated to mention "JSON" in user messages (required by
  Gemini's strict JSON mode)
- Shared fetchWithRetry helper: 2 retries for chat/image, 0 for vision
  (with 60s hard timeout)

Client
- Parallel prefetch of all three choice branches on each new frame
- Effect deliberately excludes phase from deps so user-click doesn't
  abort in-flight prefetches
- Cache hit/miss/free-form fallback handled in handleClick
- PlayCanvas reads img naturalWidth/Height and adapts container to
  whatever aspect AI returns (no more cropped third choice)
- max-width raised to 560px, max-height calc(100dvh - 200px)

Misc
- README env-path corrected to apps/web/.env.local
- users.md: BGM/TTS idea note
- .env.example moved into apps/web alongside next config

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 19:38:03 +08:00
yuanzonghao ad4b09c744 fix(web): tame Next.js 16 dev server CPU runaway
- Disable typed routes (default-on in Next 16, loops infinitely
  with transpilePackages workspace setup, holding 500%+ CPU at idle)
- Pin turbopack.root to monorepo root so a stray ~/pnpm-lock.yaml
  cannot misinfer the workspace boundary
- Commit pnpm-lock.yaml; ignore .claude/ local plugin state

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 10:12:54 +08:00
yuanzonghao cbd95bbea2 Initial commit: AI-driven visual novel scaffold
- Monorepo (pnpm workspace): apps/web + packages/{types,ai-client,engine}
- Next.js 16 web app with three-stage AI orchestration
- Three independently configurable providers: text LLM, image generator, vision model
- Warm minimalist editorial UI design
- One-click Vercel deploy ready

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 13:29:58 +08:00