Commit Graph

231 Commits

Author SHA1 Message Date
yuanzonghao 0166c5e0a9 chore(home): re-enrich firstact voiceIds with gemini model
Re-ran scripts/enrich-firstacts-stepfun.mjs with gemini-3.1-flash-lite-preview
as the TEXT model (was deepseek-v4-flash). The new picks better match the
mysterious / cool / melancholic archetypes common in the curated cards
(e.g. 夜煌 清冷空灵+悲怆决绝 → lengyanyujie 冷艳御姐, was youyanvsheng).

Only stepfunVoiceId values changed across 52 cards; voice (Xiaomi) /
imageUrl / scene untouched. 0 failures across the 147-character run.

run: pnpm enrich:firstacts [--force] [--portrait]
2026-06-15 14:03:34 +08:00
yuanzonghao 17341cbd4a feat(play): remove hardcoded 1.2x speech playback speed
The SPEECH_RATE=1.2 constant was added to speed up the somewhat slow MiMo
voicedesign voice. With StepFun preset voices (whose tempo is already
appropriate) and no per-provider logic, a global 1.2x is no longer the
right default. Remove the constant and all 4 of its uses:

- the constant declaration + comment
- two `el.playbackRate = SPEECH_RATE` assignments (audio now plays at 1.0)
- the typewriter pacing divisor (`/ SPEECH_RATE`) — audio and text both
  return to original duration, staying in lockstep

A future user-facing speech-speed setting (UI control + persisted pref)
would be a separate feature with a different shape; no placeholder kept.
2026-06-15 14:03:20 +08:00
yuanzonghao 375f401c8f fix(tts): persist stepfunVoiceId on Character + harden probe race
Two follow-ups from pr-agent review of #79:

1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so
   on a StepFun server the client (which omits the voice payload to save
   FOT) echoed back only voiceDescription — and the server re-scored via
   pickStepfunVoiceId every beat instead of honoring the LLM pick. The
   whole "CharacterDesigner picks a preset id" mechanism was effectively
   bypassed on live StepFun sessions (it only worked for prebaked cards,
   which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto
   the Character so the client→server round-trip keeps the LLM selection.

2. fetchBeatAudio's null-provider branch (probe pending) required
   speaker.voice and silently dropped a stepfun-only speaker. Accept any
   synthesizable source (voice | stepfunVoiceId | voiceDescription) so a
   slow getTtsProvider probe can't drop audio during the first scene's
   fetch window. The server resolveVoice normalizes regardless of which
   fields arrive.
2026-06-15 13:05:36 +08:00
yuanzonghao ff03f3c085 chore(home): enrich firstact JSONs with StepFun voiceId
Add characters[i].stepfunVoiceId to all 106 prebaked homepage first-act
JSONs (firstact/ + firstact-portrait/) so cards produce sound when the
server runs StepFun. Generated by scripts/enrich-firstacts-stepfun.mjs —
one TEXT-provider LLM call per character picking from the 32-preset
catalog. voice (Xiaomi reference audio), imageUrl, and scene are untouched;
only the new stepfunVoiceId field is appended.

All 147 characters across both orientation sets are enriched (0 failures).
2026-06-15 12:50:09 +08:00
yuanzonghao ca73a41a0b feat(tts): StepFun voice selection via CharacterDesigner + provider-aware beat-audio
Make homepage cards and live sessions produce sound when the server is
configured for StepFun TTS, instead of silently failing (the prebaked
Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in
Fast Origin Transfer).

Three coordinated changes:

1. CharacterDesigner now picks a StepFun preset voice id directly from the
   32-entry catalog in the SAME LLM call that designs the character — zero
   extra latency, LLM-grade match quality. The Xiaomi prompt path is
   byte-identical to history (verified programmatically) so cache hit rate
   and voice quality are preserved. pickStepfunVoiceId (keyword scorer)
   remains the fallback for orphan speakers / invalid LLM picks.

2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the
   single source of truth, shared by the scorer, the CharacterDesigner
   prompt, /api/tts-provider, and the offline enrich script.

3. A new GET /api/tts-provider endpoint lets the client probe the server's
   TTS provider at /play mount. fetchBeatAudio then shapes its request body:
   on a StepFun server it sends the lightweight stepfunVoiceId /
   voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving
   ~13MB per protagonist per session on prebaked cards). requestBeatAudio
   re-provisions on a provider mismatch before synth, so audio never goes
   silent on a cross-provider replay or mid-session provider flip.

New type fields are all optional and backward-compatible: Character.stepfunVoiceId,
BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made
optional. AGENTS.md updated for the new route, type fields, dependency map,
and StepFun voice-selection flow.
2026-06-15 12:49:25 +08:00
Zonghao Yuan da191dd7a2 fix(play): render AuthModal in immersive branch (#78)
手机竖屏 (orientation === 'portrait') 和桌面按 F 全屏
(presentation) 都会走 PlayInner 的 immersive 渲染分支,但该分支
加入时只带了 SettingsModal、漏掉了 AuthModal。导致这两条路径下
若 API 返回 401 触发 setAuthModalOpen(true),登录框不会被挂载,
用户无法登录继续游戏。

预设故事卡片入口 (onCardClick) 不做跳转前登录校验,未登录用户进
/play 后点选项即触发 401,在手机上复现该 bug。

补上与非 immersive 分支完全一致的 AuthModal 块,复用现有
authResolveRef 重试机制,登录成功后自动重放被拦截的请求。
2026-06-14 23:26:52 +08:00
Zonghao Yuan 74e87673d1 Merge pull request #77 from zonghaoyuan/feat/legal-pages
feat(web): add privacy & terms pages for Google OAuth verification
2026-06-14 23:06:54 +08:00
yuanzonghao d813d3dccf fix(web): clarify data transmission vs storage in legal pages
Distinguish between temporary server-side processing and persistent
storage to accurately reflect the actual data flow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-14 23:04:22 +08:00
yuanzonghao b7ff39d467 feat(web): add privacy policy & terms pages, update homepage copy
Add /privacy and /terms pages for Google OAuth brand verification.
Update homepage: 内测→公测, remove sponsor text, refresh save tip,
simplify load button label, add footer legal links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-14 22:49:02 +08:00
Zonghao Yuan 4812d5b0b7 Merge pull request #75 from zonghaoyuan/worktree-update-roadmap
docs: update Roadmap with completed milestones and new directions
2026-06-14 22:47:24 +08:00
yuanzonghao f8c1d4a8f5 docs: use inline code formatting for .infiplot extension
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-14 17:55:49 +08:00
yuanzonghao 989f2a7872 docs: update Roadmap with completed milestones and new directions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-14 17:45:27 +08:00
Zonghao Yuan 0dea2f8e36 fix(ai-client): clean up regressions from OpenAI SDK migration and canvas frame fix (#74)
Three follow-ups to ef3b579 (OpenAI SDK migration) and ebe39ef (canvas frame):

- .env.example / config.ts / AGENTS.md: anthropic & google native protocols
  were removed with the Vercel AI SDK, but .env.example and AGENTS.md still
  advertised them. Rewrite the docs to point Claude/Gemini at their
  OpenAI-compatible endpoints (api.anthropic.com/v1,
  generativelanguage.googleapis.com/v1beta/openai), drop the dead Gemini
  "Nano Banana" image example, sync AGENTS.md (text/vision protocol list,
  image protocol list, the "OpenAI/Gemini via AI SDK" reference note), and
  append a short hint in readProvider() error message guiding
  anthropic/google users to openai_compatible instead of a bare rejection.

- chat.ts: drop the unsafe `as { prompt_tokens_details?: ... }` cast; read
  cached_tokens straight off the SDK's CompletionUsage type. Add a comment
  noting the OpenAI usage object reports cache reads only (no cache-write
  count), so the create cost the old AI SDK path logged is unrecoverable.

- PlayCanvas.tsx: revert <img key={imageUrl}> to key={imageUrl.slice(-48)}.
  The gpt-image/mock paths emit multi-MB data URIs; using the full string as
  React's reconciliation key adds avoidable diff overhead during the frequent
  re-renders. Matches the existing <audio> element's key convention.

Validation: pnpm typecheck passes. (pnpm lint fails on a pre-existing Next 16
`next lint` CLI issue, identical on staging — unrelated to this change.)
2026-06-14 13:36:19 +08:00
Zonghao Yuan 9157454b46 Merge pull request #73 from zonghaoyuan/fix/restore-server-tts-and-fot
fix(play): restore server TTS, FOT strip/merge, nudge, and blob cleanup
2026-06-14 13:11:58 +08:00
yuanzonghao 2f6e67bd80 fix(play): restore server TTS, FOT strip/merge, nudge, and blob cleanup
Reverts the regressions from b63b694 on the server-fallback path:

P0 — fetchBeatAudio non-BYO branch was a bare return; every non-BYO
user got silent playback regardless of server TTS config. Re-connect
to /api/beat-audio with the beatAudioAbortRef signal, count 204/!ok
as silence strikes, create a blob URL on success.

P1 — stripVoicesForTransport + mergeCharactersPreserveVoice were
deleted, so the server-fallback path re-sent ~160KB
referenceAudioBase64 per character on every request AND lost voices
for already-known characters after scene 1. Re-add both, applied
ONLY on the server-fallback branches in engineClient.ts (BYO
client-direct path untouched).

P3 — the aborted-before-store blob URL race had no revoke, leaking
one blob URL per cancelled synth. Re-add the else-if revoke.

P2 — handleSettingsSaved ignored ttsConfigured, so a BYO key entered
mid-session only took effect after a page reload. Re-add the ref/state
refresh + audio re-prefetch. Also restore the silence-nudge UI
(silenceStrikes counter, SILENCE_NUDGE_THRESHOLD, dismissible pill
beside the mute toggle) that surfaces BYO-key guidance when the
shared server key is being rate-limited.

Verified live: /api/beat-audio now returns 200 (was 0 calls under
the bug); audio plays after synth completes.
2026-06-14 13:09:09 +08:00
Zonghao Yuan 5a966627a6 Merge pull request #72 from zonghaoyuan/fix/settings-ui-polish
fix(web): unify settings model sections and refine home hint
2026-06-14 12:44:51 +08:00
yuanzonghao 54a0083e23 fix(web): unify settings model sections and refine home hint
- Rename "自带配音 Key" → "配音模型", drop the section-level "可选" badge,
  and switch its icon to fa-volume-high to match the other model sections
- Drop redundant manual letter-spacing and "·" separators from settings
  field labels (let .smallcaps tracking handle spacing)
- Move the CORS endpoint note to the top of the Models tab
- Home hint: reword to "输入想法", mention text/image/vision models + voice
  key, and add an AUTH_ENABLED-gated "测试期间,登录即可免费畅玩" line

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 11:24:22 +08:00
Zonghao Yuan c8ffd6443b feat(home): localize character portrait URLs in prebaked first-act JSONs (#71)
* feat(home): localize character portrait URLs in prebaked first-act JSONs

Runware CDN URLs expire, breaking character portraits in prebaked story
cards. Download all 144 portraits as static WebP assets and rewrite
first-act JSONs to reference local paths instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(scripts): add fetch timeout and simplify resize logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-14 11:23:17 +08:00
yuanzonghao 0c83f5f2a8 chore: gitignore local-only OpenDeploy and pitch files
- Dockerfile.opendeploy: local OpenDeploy build that hardcodes the public
  image-proxy URL; kept out of the repo so a public fork doesn't route image
  traffic through our Cloudflare Worker.
- .opendeploy: OpenDeploy CLI local context/credentials dir.
- pitch/: local pitch materials.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 00:56:54 +08:00
Zonghao Yuan d5ae45b943 Merge pull request #68 from zonghaoyuan/feat/supabase-auth
feat(auth): add Supabase auth with Google, GitHub, and email OTP login
2026-06-13 23:49:15 +08:00
yuanzonghao cb830f023d Merge origin/staging into feat/supabase-auth
Resolve conflicts: keep login_success alongside the new play_error /
play_visibility_lost analytics events; fold auth retry into the play-page
catch blocks so 401s open the login modal and are NOT tracked as play_error.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 23:44:23 +08:00
yuanzonghao 11f5ca83ec fix(auth): reject control chars in OAuth callback next param
Defense-in-depth against header injection if the post-login redirect
target ever reaches a context that doesn't re-encode it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 23:19:44 +08:00
yuanzonghao 89a5c54065 fix(auth): address PR review and OAuth state-loss bugs
- proxy: await getUser() so refreshed session cookies land on the response
- callback: gate on AUTH_ENABLED, reject non-relative next (open redirect)
- page: snapshot + resume form and style image across the OAuth redirect;
  require login before the style-image vision parse
- play: wire authResolveRef so login retries the action that hit 401;
  dismissing the modal no longer re-fires it
- server: wrap cookie setAll in try/catch for read-only contexts

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:27:51 +08:00
Zonghao Yuan e328d209e0 Merge pull request #69 from zonghaoyuan/feat/play-error-analytics
feat(play): add error observability analytics for mobile diagnostics
2026-06-13 19:27:31 +08:00
yuanzonghao ccdb4780d6 fix(play): throw AbortError on cancelled prefetch to avoid false analytics
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:09:04 +08:00
yuanzonghao 0998f7c46a feat(play): add error observability analytics for mobile diagnostics
Track play_error and play_visibility_lost events via Umami to
distinguish mobile vs desktop failure modes. Each error event
captures orientation, connection type, visibility state, elapsed
time bucket, and error classification — all categorical, no free
text. Includes postJson "HTTP \d+" status parsing for the new
engineClient dual-path architecture.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 18:57:38 +08:00
yuanzonghao 87a2f93edb feat(auth): add Supabase auth with Google, GitHub, and email OTP login
Introduce user registration/login gated behind optional NEXT_PUBLIC_SUPABASE_*
env vars (leave blank to disable — app behaves exactly as before). Adds
proxy.ts for automatic cookie session refresh, requireUser() API route
guards on all 7 compute-consuming routes, AuthModal (Google/GitHub OAuth +
6-digit email OTP), UserChip header component, and login_success analytics
event. Identity is fully decoupled from Session/engine — no type changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-13 17:33:55 +08:00
Zonghao Yuan b069313014 Merge pull request #67 from zonghaoyuan/fix/image-ready-gate
fix(play): gate scene transition on image decode
2026-06-13 17:32:01 +08:00
yuanzonghao a1b6848688 fix(play): guard decode callback against stale img ref
Verify imgRef.current === el before firing onImageReady, so a
late-resolving decode from a prior <img> element cannot trigger
the gate prematurely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-13 11:51:15 +08:00
Zonghao Yuan 2a2d58a64f Merge pull request #66 from zonghaoyuan/feat/painter-hedged-retry
feat(engine): add opt-in image timeout and scene-paint hedging
2026-06-13 11:44:41 +08:00
yuanzonghao e3ee3547e5 fix(play): gate scene transition on image decode
Keep the "transitioning" overlay visible until the <img> element's
bitmap is fully decoded, so the user never sees progressive paint
or a blank flash between scenes.

- Add onImageReady callback to PlayCanvas (<img onLoad> + decode())
- Delay setPhase("ready") until decode resolves (3s timeout fallback)
- Applied to all 4 scene entry paths: prebaked card, live /api/start,
  performSceneTransition, and recorded replay transition

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-13 11:43:35 +08:00
yuanzonghao e68e7e1690 feat(engine): add opt-in image timeout and scene-paint hedging
IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout);
IMAGE_HEDGE_MS races a second identical scene-paint request when the
first is still pending past the threshold. Both default to OFF when
unset, preserving historical behavior for self-hosted deploys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 11:21:47 +08:00
baizhi958216 c4ffc16498 Merge pull request #64 from zonghaoyuan/refactor/settings-modal
feat: add client-side model configuration and server fallback
2026-06-12 22:09:43 +08:00
baizhi958216 e6004020b5 Merge pull request #65 from zonghaoyuan/fix/play-canvas-stable-frame
fix(play): stabilize canvas frame during image swaps
2026-06-12 22:09:13 +08:00
baizhi958216 ebe39efcac fix(play): stabilize canvas frame during image swaps
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-12 22:02:49 +08:00
baizhi958216 299df0d098 feat(web): remove unuse openai native adapter 2026-06-11 16:56:11 +08:00
baizhi958216 5608b0fdd0 fix(engine): tolerate duplicated JSON outputs 2026-06-11 16:11:52 +08:00
baizhi958216 ef3b57953b refactor(ai-client): replace AI SDK adapters with OpenAI SDK 2026-06-11 16:11:44 +08:00
baizhi958216 6cd7d88326 feat(web): fallback to server API routes when no client-side model config is set
When a user has not configured their own model keys in localStorage,
engine calls now automatically route through /api/* server routes
instead of throwing "模型配置未设置". This lets Vercel deploys with
server-side environment variables work out of the box.

- Add lib/engineClient.ts as a unified client-side routing layer:
  checks localStorage for BYO config, falls back to POST /api/start,
  /api/scene, /api/vision, /api/classify-freeform, /api/insert-beat
- Update app/play/page.tsx to use engineClient instead of direct
  engine imports; remove buildEngineConfig()
- Update app/page.tsx style-image parsing to also fall back to
  /api/parse-style-image when no local model config exists

Signed-off-by: zhi <zhi@peropero.net>
2026-06-11 12:15:14 +08:00
baizhi958216 0f8e641c4c feat(web): merge SettingsModal and ModelSettingsModal with tab navigation
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 94973bc6c6 fix(tts): add non-null assertion in stepfun array access
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 b63b694940 refactor(play): use client-side engine API instead of direct fetch
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 ab2f42bc42 feat(web): merge TTS settings into ModelSettingsModal, remove from SettingsModal
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 6b11a225cd feat(web): add model settings button, modal, and client-side style image parsing
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 71216e1602 feat(ui): add ModelSettingsModal for configuring text/image/vision providers
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 759319bf28 feat(config): extract STYLE_EXTRACTION_PROMPT to shared lib for client reuse
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
baizhi958216 a2dd5ad630 feat(config): add client-side model config storage and EngineConfig resolver
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
baizhi958216 2088bae311 fix(tts): replace Buffer.from with browser-compatible arrayBufferToBase64 in stepfun
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
Qi Chen e34306997a Merge pull request #63 from zonghaoyuan/feat/export-with-audio
feat(web): embed beat audio into gallery and infiplot exports
2026-06-11 09:36:42 +08:00
DESKTOP-I1T6TF3\Q 621f83c47b feat(web): embed beat audio into gallery and infiplot exports
Walk every speaking beat at export time, reuse current scene's beatAudioMap,
and synth the rest via BYO TTS or /api/beat-audio with concurrency 4. Show a
progress toast on the play page while collecting.

Gallery export keeps audio in a sidecar localStorage key so the first paint
is not blocked by JSON.parse-ing several MB of base64; the gallery lazy-loads
it after the first scene image, then plays per-beat audio with a mute toggle
persisted to localStorage. .infiplot share files embed audioByBeatId in the
doc itself (v2); on import the data URIs survive scene swaps and feed back
into the per-beat audio map so replayers hear the original voices for free.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-11 09:29:16 +08:00