Commit Graph

80 Commits

Author SHA1 Message Date
Kai ki 6ba5307c6c fix(persistence): address PR #117 review feedback
Adopt 8 PR-agent (Qodo) findings; 4 declined (concurrency already guarded by
the putSyncedRecord/markRecordSynced guards + RPC optimistic concurrency;
SQL-injection / won-equality / microtask-race are false positives — see PR reply).

- markRecordSynced: guard on updatedAt too — softDeleteStory doesn't bump rev,
  so a same-rev newer local tombstone must not be marked synced by an older
  push's ack (symmetric with putSyncedRecord's guard)
- recordToEnvelope: fallback timestamps to 0 not Date.now() (a corrupt record
  should lose LWW, not win as "now")
- push/delete routes: validate rev/updatedAt as finite -> 400 (was silent 200);
  push: Content-Length pre-check before buffering the body
- pushDeletion: idbGet a single record instead of a full-store scan
- manifest: Cache-Control private,no-store + client fetch cache:no-store
- cloudSyncClient: Array.isArray narrowing on items/blobs
- RPC: `if found` instead of `v_row.id is not null` after RETURNING INTO

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 11:52:09 +08:00
Kai ki 739af60848 Merge remote-tracking branch 'origin/staging' into cloudflare-migration 2026-06-28 11:21:20 +08:00
Kai ki ff12b2759f feat(persistence): bidirectional local/cloud story sync (Supabase)
Connect the previously-skeleton cloudStore to the client with a full
bidirectional reconcile engine. Commercial build (AUTH_ENABLED) only; the
open-source build is byte-for-byte unchanged — all cloud paths short-circuit
when AUTH_ENABLED is false.

- cloudSync.ts: reconcile engine — decideAction (pure, LWW rev->updatedAt with
  tombstone priority) + syncOnLogin/pushOnSave/pushDeletion (best-effort,
  serialized, isAuthed-gated)
- cloudSyncClient.ts: browser fetch bridge (short-circuit + fault-tolerant)
- /api/stories/{manifest,pull,push,delete}: RLS-guarded sync endpoints
- upsert_story_if_newer RPC: optimistic concurrency (SECURITY INVOKER,
  auth.uid() injection, rev->updated_at guard, revoked from public)
- cloudStore: +manifest/pullBlobs, save->RPC {stored,won}, softDelete w/ rev
- localStore: +listAllRecordsForSync/putSyncedRecord/markRecordSynced
  (concurrency-guarded sync writes); types: +StorySyncMeta/StorySyncEnvelope
- facade + UserChip: inject pushOnSave/pushDeletion + login-triggered reconcile

Sync model: full reconcile on login + background push on save (no Realtime;
eventual consistency). Conflict resolution: last-write-wins.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 11:20:47 +08:00
yuanzonghao cf6e08aec4 fix(persistence): harden coerceEpoch null guard + autosave catch rollback
- coerceEpoch: early-return fallback on null/undefined input (was falling
  through to new Date(null) → epoch 0, surfacing as 1970-01-01 in the
  stories list); also unify the tail branch to Number.isFinite for
  consistency with the number-type fast path
- autosave .catch: roll back lastSavedFingerprintRef on unexpected throw
  so the next session change retries instead of silently marking the
  content as saved

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-27 18:36:26 +08:00
Kai ki da74e3e763 fix(persistence): address PR review feedback (4 low-cost improvements)
From PR #114 external review agent — adopted the real, low-cost findings;
remaining items (false positives / design trade-offs) explained in PR replies:

- coerceEpoch: !Number.isNaN → Number.isFinite — reject ±Infinity, which
  previously slipped through and produced Invalid Date via new Date(Infinity)
- enforceRetentionCap pass2/pass3: decrement overflow only when idbDelete
  actually succeeds — a failed best-effort delete no longer under-evicts
- cloudListStories: explicit column list instead of select() — avoid pulling
  the bulky session_jsonb when only metadata is needed
- Supabase stories: composite primary key (user_id, id) + onConflict user_id,id
  — avoid a cross-user Session.id collision rejecting the second user's save
  (skeleton not yet deployed, so the migration is edited in place)

typecheck + build:cf green.
2026-06-25 19:20:55 +08:00
Kai ki 610dba78b7 feat(persistence): local-first story persistence (IndexedDB + Supabase skeleton)
Remove Cloudflare D1 entirely (4 API routes, lib/db/, Drizzle config/migrations,
drizzle-orm/drizzle-kit deps, wrangler D1/R2/KV bindings) and replace with
browser-local-first architecture:

Open-source build (IndexedDB, no auth):
- lib/persistence/ 5-file module: types, idb adapter (zero-dep, fault-tolerant,
  post-open invalidation retry), localStore (CRUD + sync-reserved metadata +
  slim/rebuild + retention-cap eviction with tombstone reap + sync-state
  protection + last-resort bounded fallback), sessionSlim (voice strip +
  styleRef absent-delete), cloudStore (Supabase skeleton, server-only)
- Autosave: persistence fingerprint (history.length:lastBeatCount:playerName),
  serial saveChain, failure rollback retry, replaySourceRef guard (prevents
  replayed shared stories from clobbering user saves)
- clientStoryPersistence.ts: thin facade (SaveResult discriminated union)
- Stories page: /[locale]/stories with 3-language i18n (zh-CN/en/ja)
- Homepage: book icon entry point in header

Commercial build (Supabase, skeleton only):
- Single table public.stories (JSONB + RLS 4 policies on auth.uid()=user_id)
- supabase/migrations/ DDL (idempotent)
- cloudStore.ts server-only repository, AUTH_ENABLED short-circuit
- Not wired to client this phase

Featured stories: pure fallback (buildFallbackCards + localizeCards), no D1

Includes fixes from 3 rounds of subagent code-review (tasks 16-30):
- CR1: autosave restructure, coerceOrientation, D1 comment cleanup
- CR2: fingerprint+serial+rollback+replay guard, idb post-open retry,
  enforceRetentionCap latent defense, sessionSlim absent invariant
- CR3: single-scene share guard (replaySourceRef), insert-beat fingerprint
  (beats.length), pass3 overflow double-count fix, detach gate unification
2026-06-25 18:19:08 +08:00
Kai ki be39fcc77e Merge remote-tracking branch 'origin/staging' into cloudflare-migration 2026-06-25 18:08:46 +08:00
Zonghao Yuan 0643f185ac feat(web): move feedback entry from floating button to about section (#112)
Replace the fixed bottom-right ember-500 floating button with a
dedicated fourth column in the about grid, matching existing visual
language (smallcaps heading + icon link). Adds submitFeedback and
feedbackDescription i18n keys for zh-CN, en, ja.

Grid uses md:grid-cols-2 lg:grid-cols-4 for better mid-screen
responsiveness.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-24 22:10:43 +08:00
Zonghao Yuan df3b1d3307 Merge pull request #106 from zonghaoyuan/feat/freeform-always-new-scene
feat(play): freeform input always generates new scene + enhanced insert-beat
2026-06-24 19:40:04 +08:00
yuanzonghao d5b4a02cb3 refactor(engine): remove follow-up choices from insert-beat, keep multi-beat only
Insert-beat is a pure in-scene micro-interaction — adding choices that
lead to change-scene contradicted its purpose. Now insert-beat generates
1-3 richer beats then loops back to the original options, which is the
natural UX for "you glanced at something decorative."

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-24 19:09:09 +08:00
yuanzonghao 8964155b26 feat(web): add floating feedback FAB on homepage linking to Tally form
Collect beta user feedback via Tally.so — fixed-position button in the
bottom-right corner opens the form in a new tab. Ember-orange default,
clay-dark on hover, i18n labels for zh-CN / en / ja.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-24 18:59:08 +08:00
yuanzonghao b5f5ebc353 fix(engine): filter invalid choices before slicing to preserve valid ones
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-24 18:51:23 +08:00
yuanzonghao 6f8125570a feat(play): always generate new scene for freeform text input + enhance insert-beat
User feedback: custom interactions rarely produce new story content because
the classifier heavily biased toward insert-beat (single reaction, no scene
change). Three changes to fix this:

1. Freeform text input now always triggers a full scene generation (skips
   the classify step entirely) — users who type expect the story to advance.

2. Vision (background click) classifier de-biased: prompt now favors
   change-scene when uncertain, and the code fallback flipped from
   insert-beat to change-scene. insert-beat narrowed to pure observation.

3. Insert-beat enhanced: generates 1-3 beats (was 1) with follow-up
   choices (was: loop back to original beat). Even when vision classifies
   as insert-beat, the player gets richer content and new options.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-24 18:36:35 +08:00
Kai ki e31bd16b15 fix(engine): prevent directScene hang + enforce segment ID uniqueness in prod
Two defensive fixes surfaced by the PR #95 review (PR-Agent), applied on
top of the staging sync:

1. directScene: routeTaggedStream rejecting BEFORE onPlan fires would leave
   planPromise unsettled, hanging `await planPromise` — and thus the whole
   /api/start and /api/scene request — forever. Add a .catch that settles
   the plan with a minimal fallback and resolves routing to a degraded
   result, so the pipeline produces a playable fallback scene (graceful
   degradation) instead of hanging.

2. prompts/registry: the duplicate-segment-ID guard only ran under
   NODE_ENV=development, so a bad merge introducing a duplicate ID would
   silently shadow a segment in production. Run the check in all
   environments (once at module load; negligible cost).
2026-06-23 19:06:19 +08:00
yuanzonghao 0a7076d5b9 fix(i18n): overhaul i18n with [locale] routing, SSR translations, and hreflang SEO
Rewrites the i18n system introduced in PR #94 to use Next.js App Router
[locale] dynamic segments with SSR-rendered translations and proper
middleware locale routing.

- Add middleware locale detection: / rewrites to /zh-CN/ internally,
  /en and /ja pass through, /zh-CN/... redirects to bare path
- Move all 7 pages under app/[locale]/ with SSR translation injection
- Fix server→client serialization: pre-evaluate function-valued
  translations (makeSerializable) to eliminate hydration flash
- Fix language switch key flash: use hard navigation with localStorage-
  only persistence, avoiding React state update before page reload
- Add <link rel="alternate" hreflang> tags for multilingual SEO
- Fix Supabase setAll overwriting locale rewrite response
- Trim locales from 22 to 3 (zh-CN/en/ja), delete 19 incomplete files
- LLM-translate 240 firstact game preset JSONs (en + ja, landscape +
  portrait) and story titles via gemini-3.5-flash
- Delete 11 one-off migration scripts and outdated i18n docs
- Add useLocalePath hook and navigation utilities

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-18 23:16:17 +08:00
yuanzonghao 941b54c3f8 fix(share): require at least 1 byte of content in .infiplot files
Reject header-only files (37 bytes, empty plaintext) at the unpack
boundary instead of letting them through as empty strings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-18 21:56:38 +08:00
yuanzonghao 64cf9c330d refactor(share): remove GALLERY_SECRET, use plaintext + SHA-256 integrity for .infiplot files
The encrypted .infiplot format (AES-256-GCM via GALLERY_SECRET) provided no
meaningful security — the payload is AI-generated story content with no
credentials or PII, and the project is open source. Replace with plaintext +
SHA-256 integrity check (format v2). Story share is now always enabled without
requiring a server secret.

- galleryCrypto.ts: AES-256-GCM → plaintext + SHA-256 hash; remove secret param
- 4 API routes: remove GALLERY_SECRET guard and 503 fallback
- story-unpack: forward specific error messages (v1 compat, hash mismatch)
- gallery/page.tsx: remove stale AES-GCM comment
- AGENTS.md: document gallery-pack/gallery-unpack routes
- .env.example, wrangler.jsonc: remove GALLERY_SECRET references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-18 21:41:56 +08:00
Zonghao Yuan 0e4c2ebef4 feat(engine): merge cloudflare-migration — paradigm D engine, BYOK proxy, story persistence (#95)
Squash-merge the cloudflare-migration branch (7 commits by Kai ki) into
staging with conflict resolution, feature integration, and bug fixes.

Engine:
- Paradigm D: single-stream Writer replacing dual-phase Plan/Beats
- Delete Architect agent; story bible generated via Writer <plan> tag
- Modular prompt architecture (segments/registry/builder)
- StreamRouter for tagged stream splitting (<plan>/<story>/<choices>)

Infrastructure:
- Cloudflare Workers deployment (wrangler.jsonc, OpenNext adapter)
- D1 database schema + Drizzle ORM (scaffolded, not yet active)
- R2 storage helpers (scaffolded, not yet active)
- Story persistence API routes + client-side persistence

BYOK (Bring Your Own Key):
- /api/llm/user-proxy with SSRF-protected LLM proxy (+ requireUser auth)
- CORS-aware fetch in ai-client: auto-detect CORS failure, fallback to
  server proxy transparently via OpenAI SDK custom fetch
- BYO config support added to classify-freeform and vision routes
- SettingsModal CORS privacy notice (keys never logged/stored)

SSE streaming:
- engineClient.ts: fetchSSE helper for progressive scene events
- startSession/requestScene accept optional emit callback
- Fix SSE error event field name (error → message) in scene/start routes

i18n integration:
- Wire buildLanguageDirective into paradigm D's prompt builder
- Update corsNotice i18n keys (zh-CN/en/ja) with CORS proxy privacy text
- Preserve Session.language + LanguageSwitcher from i18n commit

Co-authored-by: Kai ki <155355644+zbf1009@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-18 18:05:38 +08:00
DESKTOP-I1T6TF3\Q 2d35c1d9de feat(i18n): add language switcher with en/ja translations
- New client-side i18n via React Context (useI18n, tArray, I18nProvider)
- Catalog ships 21 locale stubs; only zh-CN/en/ja have reviewed translations
- Header language switcher (globe icon + short label) before settings gear
- All hardcoded Chinese UI text migrated to keys: typewriter, options,
  hints (with embedded gear icon via dangerouslySetInnerHTML), settings
  panel, footer/about, play page hints
- AI output language follows user-selected locale via trailing one-liner
  directive appended to Architect/Writer/CharacterDesigner/InsertBeat
  user messages (preserves system-prompt cacheability)
- Per-locale separator rule: zh uses middot between every glyph; en/ja
  use plain spaces
- Option value → i18n key suffix maps preserve Chinese as the underlying
  identifier so analytics unions and STYLE_MAP keys stay byte-stable

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-18 16:54:35 +08:00
DESKTOP-I1T6TF3\Q f1fe7964a2 feat(options): add third gender option "X" for universal gender
- Add "X" to GENDERS array in lib/options.ts
- Add example phrases for "X" gender (sci-fi themed)
- Make "X" use same preset cards as male gender
- Map "X" to "通用性别" when transmitting to AI
- Add "X" to DISPLAY_ORDER (same as male)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-18 09:18:50 +08:00
yuanzonghao d0faa06cc1 perf(image): switch Runware output format from PNG to WEBP
WEBP produces ~90% smaller files than PNG at visually identical quality
(tested: 5.4MB → ~550KB per 1792×1024 image), significantly reducing
client download time for users on slower connections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-15 18:04:31 +08:00
yuanzonghao ba9f9c1342 Merge PR #79: feat(tts): StepFun voice selection via CharacterDesigner + provider-aware beat-audio
- StepFun voice selection: CharacterDesigner picks a preset voiceId from the
  32-entry catalog (zero extra LLM call); pickStepfunVoiceId remains as fallback.
- Prebaked homepage cards enriched with stepfunVoiceId (147 characters, gemini model).
- /api/tts-provider endpoint + client probe: skip the ~220KB Xiaomi reference
  audio when the server runs StepFun (saves Fast Origin Transfer bandwidth).
- Server-side resolveVoice normalization: re-provisions on provider mismatch.
- Removed hardcoded 1.2x speech playback speed (was for slow MiMo voice).
- Hardened voice-provider validation per PR-agent review.

Xiaomi path prompt is byte-identical to history (prompt-cache-preserving).
2026-06-15 15:08:21 +08:00
yuanzonghao 65b7daff0b fix(beat-audio): harden voice-provider validation and resolveVoice fast path
Address PR-agent review findings:

- resolveVoice fast path: replace ambiguous boolean comparison
  (voiceProvider === "stepfun") === serverStepfun with explicit
  per-provider equality checks. Prevents an undefined or unknown
  provider from matching the non-stepfun (xiaomi) branch by accident.

- /api/beat-audio route: reject requests whose voice.provider is present
  but not in the VALID_TTS_PROVIDERS whitelist (e.g. "azure"). Previously
  such a request would pass validation when fallback fields were also
  present, and resolveVoice might use the invalid voice directly instead
  of falling back to reprovision — producing a silent beat instead of a
  voiced one.
2026-06-15 14:33:46 +08:00
yuanzonghao 3a012d46bf fix(auth): harden snapshot paths per PR agent review
Address two suggestions from the PR agent review:

1. lib/authResume.ts — catch isAuthed() exceptions in
   consumeResumeSnapshot. The network/timeout path now returns
   null (snapshot already removed earlier to prevent the play-page
   bootstrap's retryBootstrap loop from re-entering this path).
   Document the intentional removeItem-before-isAuthed ordering.

2. components/AuthModal.tsx — wrap onBeforeOAuth in try-catch so
   a snapshot failure (e.g. sessionStorage blocked in privacy mode)
   does not abort the OAuth flow and leave the UI stuck in loading.
2026-06-15 14:32:04 +08:00
yuanzonghao 8cdeb1592f refactor(auth): share OAuth-resume plumbing between home and play pages
Extract the page-agnostic resume primitives into lib/authResume.ts:
- isAuthed() — single login check (was duplicated in app/page.tsx)
- writeResumeSnapshot(key, primary, fallbacks) — quota-safe sessionStorage
  write with ordered lighter-payload fallbacks (was hand-rolledTry/catch
  in both pages)
- consumeResumeSnapshot<T>(key) — consume-once resume gate that verifies
  the user is signed in before returning the snapshot, else clears it

Both pages now share this plumbing while keeping their own snapshot shapes
and restore side effects (home: form fields + start(); play: Session +
restorePlayResume + deferred action replay).

Unify the persist trigger: home previously snapshotted eagerly inside
start() before opening the modal, while play snapshotted in
AuthModal.onBeforeOAuth at redirect time. Move home to the same
onBeforeOAuth trigger so both pages persist at the single OAuth-redirect
instant — the eager-snapshot special case is gone, and OTP (no redirect)
keeps its in-place onSuccess resume on both pages.

Net: -21 lines. Behavior preserved for OTP; OAuth resume now consistent.
2026-06-15 14:03:14 +08:00
yuanzonghao 375f401c8f fix(tts): persist stepfunVoiceId on Character + harden probe race
Two follow-ups from pr-agent review of #79:

1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so
   on a StepFun server the client (which omits the voice payload to save
   FOT) echoed back only voiceDescription — and the server re-scored via
   pickStepfunVoiceId every beat instead of honoring the LLM pick. The
   whole "CharacterDesigner picks a preset id" mechanism was effectively
   bypassed on live StepFun sessions (it only worked for prebaked cards,
   which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto
   the Character so the client→server round-trip keeps the LLM selection.

2. fetchBeatAudio's null-provider branch (probe pending) required
   speaker.voice and silently dropped a stepfun-only speaker. Accept any
   synthesizable source (voice | stepfunVoiceId | voiceDescription) so a
   slow getTtsProvider probe can't drop audio during the first scene's
   fetch window. The server resolveVoice normalizes regardless of which
   fields arrive.
2026-06-15 13:05:36 +08:00
yuanzonghao ca73a41a0b feat(tts): StepFun voice selection via CharacterDesigner + provider-aware beat-audio
Make homepage cards and live sessions produce sound when the server is
configured for StepFun TTS, instead of silently failing (the prebaked
Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in
Fast Origin Transfer).

Three coordinated changes:

1. CharacterDesigner now picks a StepFun preset voice id directly from the
   32-entry catalog in the SAME LLM call that designs the character — zero
   extra latency, LLM-grade match quality. The Xiaomi prompt path is
   byte-identical to history (verified programmatically) so cache hit rate
   and voice quality are preserved. pickStepfunVoiceId (keyword scorer)
   remains the fallback for orphan speakers / invalid LLM picks.

2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the
   single source of truth, shared by the scorer, the CharacterDesigner
   prompt, /api/tts-provider, and the offline enrich script.

3. A new GET /api/tts-provider endpoint lets the client probe the server's
   TTS provider at /play mount. fetchBeatAudio then shapes its request body:
   on a StepFun server it sends the lightweight stepfunVoiceId /
   voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving
   ~13MB per protagonist per session on prebaked cards). requestBeatAudio
   re-provisions on a provider mismatch before synth, so audio never goes
   silent on a cross-provider replay or mid-session provider flip.

New type fields are all optional and backward-compatible: Character.stepfunVoiceId,
BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made
optional. AGENTS.md updated for the new route, type fields, dependency map,
and StepFun voice-selection flow.
2026-06-15 12:49:25 +08:00
Zonghao Yuan 0dea2f8e36 fix(ai-client): clean up regressions from OpenAI SDK migration and canvas frame fix (#74)
Three follow-ups to ef3b579 (OpenAI SDK migration) and ebe39ef (canvas frame):

- .env.example / config.ts / AGENTS.md: anthropic & google native protocols
  were removed with the Vercel AI SDK, but .env.example and AGENTS.md still
  advertised them. Rewrite the docs to point Claude/Gemini at their
  OpenAI-compatible endpoints (api.anthropic.com/v1,
  generativelanguage.googleapis.com/v1beta/openai), drop the dead Gemini
  "Nano Banana" image example, sync AGENTS.md (text/vision protocol list,
  image protocol list, the "OpenAI/Gemini via AI SDK" reference note), and
  append a short hint in readProvider() error message guiding
  anthropic/google users to openai_compatible instead of a bare rejection.

- chat.ts: drop the unsafe `as { prompt_tokens_details?: ... }` cast; read
  cached_tokens straight off the SDK's CompletionUsage type. Add a comment
  noting the OpenAI usage object reports cache reads only (no cache-write
  count), so the create cost the old AI SDK path logged is unrecoverable.

- PlayCanvas.tsx: revert <img key={imageUrl}> to key={imageUrl.slice(-48)}.
  The gpt-image/mock paths emit multi-MB data URIs; using the full string as
  React's reconciliation key adds avoidable diff overhead during the frequent
  re-renders. Matches the existing <audio> element's key convention.

Validation: pnpm typecheck passes. (pnpm lint fails on a pre-existing Next 16
`next lint` CLI issue, identical on staging — unrelated to this change.)
2026-06-14 13:36:19 +08:00
yuanzonghao 2f6e67bd80 fix(play): restore server TTS, FOT strip/merge, nudge, and blob cleanup
Reverts the regressions from b63b694 on the server-fallback path:

P0 — fetchBeatAudio non-BYO branch was a bare return; every non-BYO
user got silent playback regardless of server TTS config. Re-connect
to /api/beat-audio with the beatAudioAbortRef signal, count 204/!ok
as silence strikes, create a blob URL on success.

P1 — stripVoicesForTransport + mergeCharactersPreserveVoice were
deleted, so the server-fallback path re-sent ~160KB
referenceAudioBase64 per character on every request AND lost voices
for already-known characters after scene 1. Re-add both, applied
ONLY on the server-fallback branches in engineClient.ts (BYO
client-direct path untouched).

P3 — the aborted-before-store blob URL race had no revoke, leaking
one blob URL per cancelled synth. Re-add the else-if revoke.

P2 — handleSettingsSaved ignored ttsConfigured, so a BYO key entered
mid-session only took effect after a page reload. Re-add the ref/state
refresh + audio re-prefetch. Also restore the silence-nudge UI
(silenceStrikes counter, SILENCE_NUDGE_THRESHOLD, dismissible pill
beside the mute toggle) that surfaces BYO-key guidance when the
shared server key is being rate-limited.

Verified live: /api/beat-audio now returns 200 (was 0 calls under
the bug); audio plays after synth completes.
2026-06-14 13:09:09 +08:00
yuanzonghao cb830f023d Merge origin/staging into feat/supabase-auth
Resolve conflicts: keep login_success alongside the new play_error /
play_visibility_lost analytics events; fold auth retry into the play-page
catch blocks so 401s open the login modal and are NOT tracked as play_error.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 23:44:23 +08:00
yuanzonghao 89a5c54065 fix(auth): address PR review and OAuth state-loss bugs
- proxy: await getUser() so refreshed session cookies land on the response
- callback: gate on AUTH_ENABLED, reject non-relative next (open redirect)
- page: snapshot + resume form and style image across the OAuth redirect;
  require login before the style-image vision parse
- play: wire authResolveRef so login retries the action that hit 401;
  dismissing the modal no longer re-fires it
- server: wrap cookie setAll in try/catch for read-only contexts

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:27:51 +08:00
yuanzonghao 0998f7c46a feat(play): add error observability analytics for mobile diagnostics
Track play_error and play_visibility_lost events via Umami to
distinguish mobile vs desktop failure modes. Each error event
captures orientation, connection type, visibility state, elapsed
time bucket, and error classification — all categorical, no free
text. Includes postJson "HTTP \d+" status parsing for the new
engineClient dual-path architecture.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 18:57:38 +08:00
yuanzonghao 87a2f93edb feat(auth): add Supabase auth with Google, GitHub, and email OTP login
Introduce user registration/login gated behind optional NEXT_PUBLIC_SUPABASE_*
env vars (leave blank to disable — app behaves exactly as before). Adds
proxy.ts for automatic cookie session refresh, requireUser() API route
guards on all 7 compute-consuming routes, AuthModal (Google/GitHub OAuth +
6-digit email OTP), UserChip header component, and login_success analytics
event. Identity is fully decoupled from Session/engine — no type changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-13 17:33:55 +08:00
yuanzonghao e68e7e1690 feat(engine): add opt-in image timeout and scene-paint hedging
IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout);
IMAGE_HEDGE_MS races a second identical scene-paint request when the
first is still pending past the threshold. Both default to OFF when
unset, preserving historical behavior for self-hosted deploys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-13 11:21:47 +08:00
baizhi958216 c4ffc16498 Merge pull request #64 from zonghaoyuan/refactor/settings-modal
feat: add client-side model configuration and server fallback
2026-06-12 22:09:43 +08:00
baizhi958216 5608b0fdd0 fix(engine): tolerate duplicated JSON outputs 2026-06-11 16:11:52 +08:00
baizhi958216 ef3b57953b refactor(ai-client): replace AI SDK adapters with OpenAI SDK 2026-06-11 16:11:44 +08:00
baizhi958216 6cd7d88326 feat(web): fallback to server API routes when no client-side model config is set
When a user has not configured their own model keys in localStorage,
engine calls now automatically route through /api/* server routes
instead of throwing "模型配置未设置". This lets Vercel deploys with
server-side environment variables work out of the box.

- Add lib/engineClient.ts as a unified client-side routing layer:
  checks localStorage for BYO config, falls back to POST /api/start,
  /api/scene, /api/vision, /api/classify-freeform, /api/insert-beat
- Update app/play/page.tsx to use engineClient instead of direct
  engine imports; remove buildEngineConfig()
- Update app/page.tsx style-image parsing to also fall back to
  /api/parse-style-image when no local model config exists

Signed-off-by: zhi <zhi@peropero.net>
2026-06-11 12:15:14 +08:00
baizhi958216 94973bc6c6 fix(tts): add non-null assertion in stepfun array access
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:14 +08:00
baizhi958216 759319bf28 feat(config): extract STYLE_EXTRACTION_PROMPT to shared lib for client reuse
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
baizhi958216 a2dd5ad630 feat(config): add client-side model config storage and EngineConfig resolver
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
baizhi958216 2088bae311 fix(tts): replace Buffer.from with browser-compatible arrayBufferToBase64 in stepfun
Signed-off-by: baizhi958216 <1475289190@qq.com>
2026-06-11 12:15:13 +08:00
DESKTOP-I1T6TF3\Q 621f83c47b feat(web): embed beat audio into gallery and infiplot exports
Walk every speaking beat at export time, reuse current scene's beatAudioMap,
and synth the rest via BYO TTS or /api/beat-audio with concurrency 4. Show a
progress toast on the play page while collecting.

Gallery export keeps audio in a sidecar localStorage key so the first paint
is not blocked by JSON.parse-ing several MB of base64; the gallery lazy-loads
it after the first scene image, then plays per-beat audio with a mute toggle
persisted to localStorage. .infiplot share files embed audioByBeatId in the
doc itself (v2); on import the data URIs survive scene swaps and feed back
into the per-beat audio map so replayers hear the original voices for free.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-11 09:29:16 +08:00
Zonghao Yuan d15d53ba65 Merge pull request #57 from zonghaoyuan/feat/tts-stepfun-provider
feat(tts): add StepFun preset-voice provider, route by URL + voice tag
2026-06-09 14:28:36 +08:00
yuanzonghao 1a6238f8b8 fix(tts): harden StepFun provider integration
- Validate voice.provider against known whitelist (xiaomi|stepfun) in
  beat-audio route to return a clear 400 instead of falling through
- Move single-char pronouns (他/她) to weak-signal fallback in
  detectGender to avoid false positives on compounds like 其他
- Update .env.example with StepFun configuration examples

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-09 14:24:27 +08:00
DESKTOP-I1T6TF3\Q 04f22249c9 fix(tts): make stepfun preset pick case-stable and per-character
- Hash the lowercased description (matching the case-insensitive scoring)
  so the same archetype text picks the same preset regardless of case.
- Thread the character name through provisionVoice -> stepfunProvision as
  the hash salt, so two characters that share archetype keywords spread
  across the top-N candidate presets instead of collapsing on one voice.

Xiaomi path is unaffected (voicedesign mints a unique clip per call).
2026-06-09 09:14:44 +08:00
DESKTOP-I1T6TF3\Q 19bbee16fe feat(tts): add StepFun preset-voice provider, route by URL + voice tag
Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate
TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host
(contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the
image client infers Runware from `*.runware.ai`.

CharacterVoice becomes a discriminated union on `provider`:
- xiaomi: { referenceAudioBase64, mimeType } — unchanged
- stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model

Provision dispatches on the current cfg's base URL; synthesis dispatches
on the voice's own `provider` tag so a session with mixed voices (e.g. a
provider switch mid-development) routes each beat through the correct
protocol. xiaomiSynthesize now guards against being called with a non-
xiaomi voice, surfacing the bug as a clear runtime error instead of a
TypeScript narrow violation at the access site.

StepFun has no voicedesign equivalent — only preset voices + voice
cloning from a reference audio upload. Cloning would require an extra
asset per character, so v1 maps the LLM's Chinese voiceDescription to one
of the 32 published preset IDs via gender + age + tone keyword scoring,
with a deterministic hash spread across the top-3 candidates so multiple
characters with similar descriptions don't collapse onto the identical
preset. lineDelivery is accepted but not yet propagated to StepFun's
voice_label.emotion / .style fields — left as a follow-up.

beat-audio route validation relaxed from `voice.referenceAudioBase64`
(xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices
pass the gate; provider-specific shape errors still surface from the
synth function.

Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median
~2.3s per beat with 0% timeouts across the test session, vs MiMo's
median ~8s with the long tail tripping the existing 15s synth budget
on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14
per typical 50-beat session) vs MiMo TTS currently free under the
Token Plan creator incentive.

AGENTS.md provider matrix updated to describe both providers and the
discriminated-union dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-08 17:15:02 +08:00
Qi Chen fc62c9edf5 feat(engine): tighten CharacterDesigner prompt to prevent look-alike … (#56)
* feat(engine): tighten CharacterDesigner prompt to prevent look-alike characters

Expand the visualDescription rules into a 6-element mandatory checklist (hair
quad / eyes triad / face & build / outfit quad / personality-driven vibe /
silhouette tag) and add an explicit anti-collision rule comparing against the
existing cast across cross-color-family and cross-silhouette dimensions.

Also upgrade the user-message "已设定角色" block from soft hint to hard
constraint with an explicit pre-write scan step, nudging the LLM into chain-
of-thought differentiation before emitting tags.

All additions land in the session-stable system prefix, so prompt cache
absorbs the extra tokens — per-call billed token delta is ~0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(engine): replace pose examples with aura descriptors in personality vibe

The PERSONALITY-DRIVEN VIBE element listed concrete poses (arms crossed,
chin tilted up, slight slouch) which contradicted the earlier rule
banning transient poses from visualDescription. Switch to pure
atmosphere/aura keywords so the character card stays pose-neutral.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: yuanzonghao <yuanzonghao123@gmail.com>
2026-06-08 16:27:15 +08:00
yuanzonghao 75548ce005 Merge pull request #52 from zonghaoyuan/feat/story-share
feat(play): add encrypted story sharing with replay
2026-06-08 09:57:16 +08:00
yuanzonghao 39a7269494 fix(share): harden story share and relocate import button
- Add Content-Length pre-check to story-pack and story-unpack routes
  to reject oversized payloads before buffering the body
- Suppress internal error details in story-unpack catch (was leaking
  e.message to the client)
- Strengthen sceneIndex validation: require non-negative integer
- Guard against undefined storyState when replaying shared stories
- Fix prefetch regression: remove currentBeat?.id from useEffect deps
  that was re-triggering all change-scene prefetches on every beat
- Fix double detach: use else-if so the second replay detach guard
  doesn't fire redundantly after the first already detached
- Align client file-size limit by format (.json 12MB, .infiplot 13MB)
- Move "载入剧情" import button next to "开始" with hover tooltip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-08 08:46:05 +08:00