Squash-merge the cloudflare-migration branch (7 commits by Kai ki) into
staging with conflict resolution, feature integration, and bug fixes.
Engine:
- Paradigm D: single-stream Writer replacing dual-phase Plan/Beats
- Delete Architect agent; story bible generated via Writer <plan> tag
- Modular prompt architecture (segments/registry/builder)
- StreamRouter for tagged stream splitting (<plan>/<story>/<choices>)
Infrastructure:
- Cloudflare Workers deployment (wrangler.jsonc, OpenNext adapter)
- D1 database schema + Drizzle ORM (scaffolded, not yet active)
- R2 storage helpers (scaffolded, not yet active)
- Story persistence API routes + client-side persistence
BYOK (Bring Your Own Key):
- /api/llm/user-proxy with SSRF-protected LLM proxy (+ requireUser auth)
- CORS-aware fetch in ai-client: auto-detect CORS failure, fallback to
server proxy transparently via OpenAI SDK custom fetch
- BYO config support added to classify-freeform and vision routes
- SettingsModal CORS privacy notice (keys never logged/stored)
SSE streaming:
- engineClient.ts: fetchSSE helper for progressive scene events
- startSession/requestScene accept optional emit callback
- Fix SSE error event field name (error → message) in scene/start routes
i18n integration:
- Wire buildLanguageDirective into paradigm D's prompt builder
- Update corsNotice i18n keys (zh-CN/en/ja) with CORS proxy privacy text
- Preserve Session.language + LanguageSwitcher from i18n commit
Co-authored-by: Kai ki <155355644+zbf1009@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- New client-side i18n via React Context (useI18n, tArray, I18nProvider)
- Catalog ships 21 locale stubs; only zh-CN/en/ja have reviewed translations
- Header language switcher (globe icon + short label) before settings gear
- All hardcoded Chinese UI text migrated to keys: typewriter, options,
hints (with embedded gear icon via dangerouslySetInnerHTML), settings
panel, footer/about, play page hints
- AI output language follows user-selected locale via trailing one-liner
directive appended to Architect/Writer/CharacterDesigner/InsertBeat
user messages (preserves system-prompt cacheability)
- Per-locale separator rule: zh uses middot between every glyph; en/ja
use plain spaces
- Option value → i18n key suffix maps preserve Chinese as the underlying
identifier so analytics unions and STYLE_MAP keys stay byte-stable
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add "X" to GENDERS array in lib/options.ts
- Add example phrases for "X" gender (sci-fi themed)
- Make "X" use same preset cards as male gender
- Map "X" to "通用性别" when transmitting to AI
- Add "X" to DISPLAY_ORDER (same as male)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduce a Contributor License Agreement (CLA) so external contributions
can be licensed under AGPL-3.0 and any other terms (incl. closed-source),
keeping the AGPL-3.0 codebase usable in closed-source projects.
- CLA.md: authoritative English CLA (ICLA + employer authorization, v1.0)
- CLA.zh.md: non-binding Chinese reference translation
- CONTRIBUTING.md: bilingual contributing guide, points to CLA
- .github/workflows/cla.yml: self-hosted cla-assistant-action that records
signatures into cla-signatures/version-1.json; exempts maintainers & bots
via allowlist; skips when CLA_BOT_TOKEN is unset
- .github/PULL_REQUEST_TEMPLATE.md: guides contributors to sign
- README.{md,en.md,ja.md}: add License & contributing footer
- app/terms: note CLA requirement in the IP section
Enforcement requires repo-level setup (PAT secret + branch protection)
documented in cla.yml; not covered by this commit.
- StepFun voice selection: CharacterDesigner picks a preset voiceId from the
32-entry catalog (zero extra LLM call); pickStepfunVoiceId remains as fallback.
- Prebaked homepage cards enriched with stepfunVoiceId (147 characters, gemini model).
- /api/tts-provider endpoint + client probe: skip the ~220KB Xiaomi reference
audio when the server runs StepFun (saves Fast Origin Transfer bandwidth).
- Server-side resolveVoice normalization: re-provisions on provider mismatch.
- Removed hardcoded 1.2x speech playback speed (was for slow MiMo voice).
- Hardened voice-provider validation per PR-agent review.
Xiaomi path prompt is byte-identical to history (prompt-cache-preserving).
Address PR-agent review findings:
- resolveVoice fast path: replace ambiguous boolean comparison
(voiceProvider === "stepfun") === serverStepfun with explicit
per-provider equality checks. Prevents an undefined or unknown
provider from matching the non-stepfun (xiaomi) branch by accident.
- /api/beat-audio route: reject requests whose voice.provider is present
but not in the VALID_TTS_PROVIDERS whitelist (e.g. "azure"). Previously
such a request would pass validation when fallback fields were also
present, and resolveVoice might use the invalid voice directly instead
of falling back to reprovision — producing a silent beat instead of a
voiced one.
Critical: play-page bootstrap infinite loop when AUTH_ENABLED and no
resume snapshot. The refactor changed the gate from
`if (AUTH_ENABLED && hasSnapshot)` to `if (AUTH_ENABLED)`, so any
snapshot-less /play entry (the common case — normal card/preset/custom
start) entered the async branch, got null from consumeResumeSnapshot,
bumped retryBootstrap, and re-ran the effect forever. Restored the
peek-before-await: only enter the async resume branch when a snapshot
actually exists; otherwise fall straight through to normal bootstrap.
Verified via control-flow simulation across all three paths (no
snapshot / snapshot + signed in / snapshot + not signed in).
Major: homepage auto-started a game after a bare OAuth login. Routing
persistPendingStart through AuthModal.onBeforeOAuth fired it for every
OAuth redirect, including bare logins via UserChip / StyleModal
onRequireAuth (where pendingAction is null and the user only wanted to
sign in). Guarded the snapshot on `pendingAction === "start"` so only
the mid-start flow persists; bare logins no longer resurrect the form
and auto-start on return.
Extract the page-agnostic resume primitives into lib/authResume.ts:
- isAuthed() — single login check (was duplicated in app/page.tsx)
- writeResumeSnapshot(key, primary, fallbacks) — quota-safe sessionStorage
write with ordered lighter-payload fallbacks (was hand-rolledTry/catch
in both pages)
- consumeResumeSnapshot<T>(key) — consume-once resume gate that verifies
the user is signed in before returning the snapshot, else clears it
Both pages now share this plumbing while keeping their own snapshot shapes
and restore side effects (home: form fields + start(); play: Session +
restorePlayResume + deferred action replay).
Unify the persist trigger: home previously snapshotted eagerly inside
start() before opening the modal, while play snapshotted in
AuthModal.onBeforeOAuth at redirect time. Move home to the same
onBeforeOAuth trigger so both pages persist at the single OAuth-redirect
instant — the eager-snapshot special case is gone, and OTP (no redirect)
keeps its in-place onSuccess resume on both pages.
Net: -21 lines. Behavior preserved for OTP; OAuth resume now consistent.
Google/GitHub OAuth is a full-page round-trip that unmounts the app and
destroys the in-memory Session (the server is stateless). Returning to
/play?card=m0 re-bootstrapped from the first-act JSON, restarting the
story from scene 1 — the user lost all progress. OTP login kept state
in-memory (no redirect) and was unaffected.
Mirror the homepage 89a5c54 OAuth state-loss fix: snapshot the exact
scene/beat/visited-beats/orientation/image into sessionStorage just
before the redirect, then restore it on mount after the round-trip
(verified signed in). Re-resolve the remote image URL to a fresh blob
(blob: URLs are revoked on unmount). The pending action that hit the
401 (choice / freeform / background-click) is replayed once the restored
state commits, so the player lands exactly where they were headed.
Quota fallback drops the user-uploaded style-reference image (~100KB)
and retries; voices are kept (continuity over rare quota miss). Failure
to restore (corrupt snapshot / not signed in) relinquishes the bootstrap
slot and falls back to normal card/preset/custom start instead of a
blank loading screen.
AuthModal gains an optional onBeforeOAuth callback fired synchronously
before signInWithOAuth navigates away (sessionStorage.setItem is sync).
Two follow-ups from pr-agent review of #79:
1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so
on a StepFun server the client (which omits the voice payload to save
FOT) echoed back only voiceDescription — and the server re-scored via
pickStepfunVoiceId every beat instead of honoring the LLM pick. The
whole "CharacterDesigner picks a preset id" mechanism was effectively
bypassed on live StepFun sessions (it only worked for prebaked cards,
which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto
the Character so the client→server round-trip keeps the LLM selection.
2. fetchBeatAudio's null-provider branch (probe pending) required
speaker.voice and silently dropped a stepfun-only speaker. Accept any
synthesizable source (voice | stepfunVoiceId | voiceDescription) so a
slow getTtsProvider probe can't drop audio during the first scene's
fetch window. The server resolveVoice normalizes regardless of which
fields arrive.
Make homepage cards and live sessions produce sound when the server is
configured for StepFun TTS, instead of silently failing (the prebaked
Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in
Fast Origin Transfer).
Three coordinated changes:
1. CharacterDesigner now picks a StepFun preset voice id directly from the
32-entry catalog in the SAME LLM call that designs the character — zero
extra latency, LLM-grade match quality. The Xiaomi prompt path is
byte-identical to history (verified programmatically) so cache hit rate
and voice quality are preserved. pickStepfunVoiceId (keyword scorer)
remains the fallback for orphan speakers / invalid LLM picks.
2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the
single source of truth, shared by the scorer, the CharacterDesigner
prompt, /api/tts-provider, and the offline enrich script.
3. A new GET /api/tts-provider endpoint lets the client probe the server's
TTS provider at /play mount. fetchBeatAudio then shapes its request body:
on a StepFun server it sends the lightweight stepfunVoiceId /
voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving
~13MB per protagonist per session on prebaked cards). requestBeatAudio
re-provisions on a provider mismatch before synth, so audio never goes
silent on a cross-provider replay or mid-session provider flip.
New type fields are all optional and backward-compatible: Character.stepfunVoiceId,
BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made
optional. AGENTS.md updated for the new route, type fields, dependency map,
and StepFun voice-selection flow.
Distinguish between temporary server-side processing and persistent
storage to accurately reflect the actual data flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts the regressions from b63b694 on the server-fallback path:
P0 — fetchBeatAudio non-BYO branch was a bare return; every non-BYO
user got silent playback regardless of server TTS config. Re-connect
to /api/beat-audio with the beatAudioAbortRef signal, count 204/!ok
as silence strikes, create a blob URL on success.
P1 — stripVoicesForTransport + mergeCharactersPreserveVoice were
deleted, so the server-fallback path re-sent ~160KB
referenceAudioBase64 per character on every request AND lost voices
for already-known characters after scene 1. Re-add both, applied
ONLY on the server-fallback branches in engineClient.ts (BYO
client-direct path untouched).
P3 — the aborted-before-store blob URL race had no revoke, leaking
one blob URL per cancelled synth. Re-add the else-if revoke.
P2 — handleSettingsSaved ignored ttsConfigured, so a BYO key entered
mid-session only took effect after a page reload. Re-add the ref/state
refresh + audio re-prefetch. Also restore the silence-nudge UI
(silenceStrikes counter, SILENCE_NUDGE_THRESHOLD, dismissible pill
beside the mute toggle) that surfaces BYO-key guidance when the
shared server key is being rate-limited.
Verified live: /api/beat-audio now returns 200 (was 0 calls under
the bug); audio plays after synth completes.
- Rename "自带配音 Key" → "配音模型", drop the section-level "可选" badge,
and switch its icon to fa-volume-high to match the other model sections
- Drop redundant manual letter-spacing and "·" separators from settings
field labels (let .smallcaps tracking handle spacing)
- Move the CORS endpoint note to the top of the Models tab
- Home hint: reword to "输入想法", mention text/image/vision models + voice
key, and add an AUTH_ENABLED-gated "测试期间,登录即可免费畅玩" line
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflicts: keep login_success alongside the new play_error /
play_visibility_lost analytics events; fold auth retry into the play-page
catch blocks so 401s open the login modal and are NOT tracked as play_error.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Defense-in-depth against header injection if the post-login redirect
target ever reaches a context that doesn't re-encode it.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- proxy: await getUser() so refreshed session cookies land on the response
- callback: gate on AUTH_ENABLED, reject non-relative next (open redirect)
- page: snapshot + resume form and style image across the OAuth redirect;
require login before the style-image vision parse
- play: wire authResolveRef so login retries the action that hit 401;
dismissing the modal no longer re-fires it
- server: wrap cookie setAll in try/catch for read-only contexts
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Track play_error and play_visibility_lost events via Umami to
distinguish mobile vs desktop failure modes. Each error event
captures orientation, connection type, visibility state, elapsed
time bucket, and error classification — all categorical, no free
text. Includes postJson "HTTP \d+" status parsing for the new
engineClient dual-path architecture.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Introduce user registration/login gated behind optional NEXT_PUBLIC_SUPABASE_*
env vars (leave blank to disable — app behaves exactly as before). Adds
proxy.ts for automatic cookie session refresh, requireUser() API route
guards on all 7 compute-consuming routes, AuthModal (Google/GitHub OAuth +
6-digit email OTP), UserChip header component, and login_success analytics
event. Identity is fully decoupled from Session/engine — no type changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep the "transitioning" overlay visible until the <img> element's
bitmap is fully decoded, so the user never sees progressive paint
or a blank flash between scenes.
- Add onImageReady callback to PlayCanvas (<img onLoad> + decode())
- Delay setPhase("ready") until decode resolves (3s timeout fallback)
- Applied to all 4 scene entry paths: prebaked card, live /api/start,
performSceneTransition, and recorded replay transition
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a user has not configured their own model keys in localStorage,
engine calls now automatically route through /api/* server routes
instead of throwing "模型配置未设置". This lets Vercel deploys with
server-side environment variables work out of the box.
- Add lib/engineClient.ts as a unified client-side routing layer:
checks localStorage for BYO config, falls back to POST /api/start,
/api/scene, /api/vision, /api/classify-freeform, /api/insert-beat
- Update app/play/page.tsx to use engineClient instead of direct
engine imports; remove buildEngineConfig()
- Update app/page.tsx style-image parsing to also fall back to
/api/parse-style-image when no local model config exists
Signed-off-by: zhi <zhi@peropero.net>
Walk every speaking beat at export time, reuse current scene's beatAudioMap,
and synth the rest via BYO TTS or /api/beat-audio with concurrency 4. Show a
progress toast on the play page while collecting.
Gallery export keeps audio in a sidecar localStorage key so the first paint
is not blocked by JSON.parse-ing several MB of base64; the gallery lazy-loads
it after the first scene image, then plays per-beat audio with a mute toggle
persisted to localStorage. .infiplot share files embed audioByBeatId in the
doc itself (v2); on import the data URIs survive scene swaps and feed back
into the per-beat audio map so replayers hear the original voices for free.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the auto-generated kyoani / shinkai style thumbnails with hand-picked
reference frames. Source PNGs were center-cropped to square and re-encoded as
512x512 WEBP (~41KB each) to match the existing thumbnail format. Bumps the
shared cache-buster from v5 to v6 so existing browsers fetch the new files.
The home-page file-import button accepts .infiplot story files. The
tooltip now spells out the file type so users distinguish it from
"开始剧情"/"载入预设" affordances on the same screen.
- Validate voice.provider against known whitelist (xiaomi|stepfun) in
beat-audio route to return a clear 400 instead of falling through
- Move single-char pronouns (他/她) to weak-signal fallback in
detectGender to avoid false positives on compounds like 其他
- Update .env.example with StepFun configuration examples
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Hash the lowercased description (matching the case-insensitive scoring)
so the same archetype text picks the same preset regardless of case.
- Thread the character name through provisionVoice -> stepfunProvision as
the hash salt, so two characters that share archetype keywords spread
across the top-N candidate presets instead of collapsing on one voice.
Xiaomi path is unaffected (voicedesign mints a unique clip per call).
Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate
TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host
(contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the
image client infers Runware from `*.runware.ai`.
CharacterVoice becomes a discriminated union on `provider`:
- xiaomi: { referenceAudioBase64, mimeType } — unchanged
- stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model
Provision dispatches on the current cfg's base URL; synthesis dispatches
on the voice's own `provider` tag so a session with mixed voices (e.g. a
provider switch mid-development) routes each beat through the correct
protocol. xiaomiSynthesize now guards against being called with a non-
xiaomi voice, surfacing the bug as a clear runtime error instead of a
TypeScript narrow violation at the access site.
StepFun has no voicedesign equivalent — only preset voices + voice
cloning from a reference audio upload. Cloning would require an extra
asset per character, so v1 maps the LLM's Chinese voiceDescription to one
of the 32 published preset IDs via gender + age + tone keyword scoring,
with a deterministic hash spread across the top-3 candidates so multiple
characters with similar descriptions don't collapse onto the identical
preset. lineDelivery is accepted but not yet propagated to StepFun's
voice_label.emotion / .style fields — left as a follow-up.
beat-audio route validation relaxed from `voice.referenceAudioBase64`
(xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices
pass the gate; provider-specific shape errors still surface from the
synth function.
Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median
~2.3s per beat with 0% timeouts across the test session, vs MiMo's
median ~8s with the long tail tripping the existing 15s synth budget
on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14
per typical 50-beat session) vs MiMo TTS currently free under the
Token Plan creator incentive.
AGENTS.md provider matrix updated to describe both providers and the
discriminated-union dispatch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a ref-based mutex so concurrent /api/story-pack requests and
duplicate file downloads cannot be triggered by rapid clicking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Content-Length pre-check to story-pack and story-unpack routes
to reject oversized payloads before buffering the body
- Suppress internal error details in story-unpack catch (was leaking
e.message to the client)
- Strengthen sceneIndex validation: require non-negative integer
- Guard against undefined storyState when replaying shared stories
- Fix prefetch regression: remove currentBeat?.id from useEffect deps
that was re-triggering all change-scene prefetches on every beat
- Fix double detach: use else-if so the second replay detach guard
doesn't fire redundantly after the first already detached
- Align client file-size limit by format (.json 12MB, .infiplot 13MB)
- Move "载入剧情" import button next to "开始" with hover tooltip
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Handle downloadImagesAsZip return value and surface errors to user
- Fix inferImageExtension garbage output for data URIs without semicolons
- Scale blob URL revocation delay for large zip files (>5MB → 60s)
- Cap uniqueZipPath dedup loop at 10k iterations with timestamp fallback
- Support relative URLs in inferImageExtension via base URL
- Handle svg+xml MIME subtype correctly
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>