infiplot-web

Author	SHA1	Message	Date
Kai ki	be39fcc77e	Merge remote-tracking branch 'origin/staging' into cloudflare-migration	2026-06-25 18:08:46 +08:00
yuanzonghao	d5b4a02cb3	refactor(engine): remove follow-up choices from insert-beat, keep multi-beat only Insert-beat is a pure in-scene micro-interaction — adding choices that lead to change-scene contradicted its purpose. Now insert-beat generates 1-3 richer beats then loops back to the original options, which is the natural UX for "you glanced at something decorative." Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-24 19:09:09 +08:00
yuanzonghao	b5f5ebc353	fix(engine): filter invalid choices before slicing to preserve valid ones Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-24 18:51:23 +08:00
yuanzonghao	6f8125570a	feat(play): always generate new scene for freeform text input + enhance insert-beat User feedback: custom interactions rarely produce new story content because the classifier heavily biased toward insert-beat (single reaction, no scene change). Three changes to fix this: 1. Freeform text input now always triggers a full scene generation (skips the classify step entirely) — users who type expect the story to advance. 2. Vision (background click) classifier de-biased: prompt now favors change-scene when uncertain, and the code fallback flipped from insert-beat to change-scene. insert-beat narrowed to pure observation. 3. Insert-beat enhanced: generates 1-3 beats (was 1) with follow-up choices (was: loop back to original beat). Even when vision classifies as insert-beat, the player gets richer content and new options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-24 18:36:35 +08:00
Kai ki	e31bd16b15	fix(engine): prevent directScene hang + enforce segment ID uniqueness in prod Two defensive fixes surfaced by the PR #95 review (PR-Agent), applied on top of the staging sync: 1. directScene: routeTaggedStream rejecting BEFORE onPlan fires would leave planPromise unsettled, hanging `await planPromise` — and thus the whole /api/start and /api/scene request — forever. Add a .catch that settles the plan with a minimal fallback and resolves routing to a degraded result, so the pipeline produces a playable fallback scene (graceful degradation) instead of hanging. 2. prompts/registry: the duplicate-segment-ID guard only ran under NODE_ENV=development, so a bad merge introducing a duplicate ID would silently shadow a segment in production. Run the check in all environments (once at module load; negligible cost).	2026-06-23 19:06:19 +08:00
yuanzonghao	0a7076d5b9	fix(i18n): overhaul i18n with [locale] routing, SSR translations, and hreflang SEO Rewrites the i18n system introduced in PR #94 to use Next.js App Router [locale] dynamic segments with SSR-rendered translations and proper middleware locale routing. - Add middleware locale detection: / rewrites to /zh-CN/ internally, /en and /ja pass through, /zh-CN/... redirects to bare path - Move all 7 pages under app/[locale]/ with SSR translation injection - Fix server→client serialization: pre-evaluate function-valued translations (makeSerializable) to eliminate hydration flash - Fix language switch key flash: use hard navigation with localStorage- only persistence, avoiding React state update before page reload - Add <link rel="alternate" hreflang> tags for multilingual SEO - Fix Supabase setAll overwriting locale rewrite response - Trim locales from 22 to 3 (zh-CN/en/ja), delete 19 incomplete files - LLM-translate 240 firstact game preset JSONs (en + ja, landscape + portrait) and story titles via gemini-3.5-flash - Delete 11 one-off migration scripts and outdated i18n docs - Add useLocalePath hook and navigation utilities Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-18 23:16:17 +08:00
Zonghao Yuan	0e4c2ebef4	feat(engine): merge cloudflare-migration — paradigm D engine, BYOK proxy, story persistence (#95 ) Squash-merge the cloudflare-migration branch (7 commits by Kai ki) into staging with conflict resolution, feature integration, and bug fixes. Engine: - Paradigm D: single-stream Writer replacing dual-phase Plan/Beats - Delete Architect agent; story bible generated via Writer <plan> tag - Modular prompt architecture (segments/registry/builder) - StreamRouter for tagged stream splitting (<plan>/<story>/<choices>) Infrastructure: - Cloudflare Workers deployment (wrangler.jsonc, OpenNext adapter) - D1 database schema + Drizzle ORM (scaffolded, not yet active) - R2 storage helpers (scaffolded, not yet active) - Story persistence API routes + client-side persistence BYOK (Bring Your Own Key): - /api/llm/user-proxy with SSRF-protected LLM proxy (+ requireUser auth) - CORS-aware fetch in ai-client: auto-detect CORS failure, fallback to server proxy transparently via OpenAI SDK custom fetch - BYO config support added to classify-freeform and vision routes - SettingsModal CORS privacy notice (keys never logged/stored) SSE streaming: - engineClient.ts: fetchSSE helper for progressive scene events - startSession/requestScene accept optional emit callback - Fix SSE error event field name (error → message) in scene/start routes i18n integration: - Wire buildLanguageDirective into paradigm D's prompt builder - Update corsNotice i18n keys (zh-CN/en/ja) with CORS proxy privacy text - Preserve Session.language + LanguageSwitcher from i18n commit Co-authored-by: Kai ki <155355644+zbf1009@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-18 18:05:38 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	2d35c1d9de	feat(i18n): add language switcher with en/ja translations - New client-side i18n via React Context (useI18n, tArray, I18nProvider) - Catalog ships 21 locale stubs; only zh-CN/en/ja have reviewed translations - Header language switcher (globe icon + short label) before settings gear - All hardcoded Chinese UI text migrated to keys: typewriter, options, hints (with embedded gear icon via dangerouslySetInnerHTML), settings panel, footer/about, play page hints - AI output language follows user-selected locale via trailing one-liner directive appended to Architect/Writer/CharacterDesigner/InsertBeat user messages (preserves system-prompt cacheability) - Per-locale separator rule: zh uses middot between every glyph; en/ja use plain spaces - Option value → i18n key suffix maps preserve Chinese as the underlying identifier so analytics unions and STYLE_MAP keys stay byte-stable Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-18 16:54:35 +08:00
yuanzonghao	65b7daff0b	fix(beat-audio): harden voice-provider validation and resolveVoice fast path Address PR-agent review findings: - resolveVoice fast path: replace ambiguous boolean comparison (voiceProvider === "stepfun") === serverStepfun with explicit per-provider equality checks. Prevents an undefined or unknown provider from matching the non-stepfun (xiaomi) branch by accident. - /api/beat-audio route: reject requests whose voice.provider is present but not in the VALID_TTS_PROVIDERS whitelist (e.g. "azure"). Previously such a request would pass validation when fallback fields were also present, and resolveVoice might use the invalid voice directly instead of falling back to reprovision — producing a silent beat instead of a voiced one.	2026-06-15 14:33:46 +08:00
yuanzonghao	375f401c8f	fix(tts): persist stepfunVoiceId on Character + harden probe race Two follow-ups from pr-agent review of #79: 1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so on a StepFun server the client (which omits the voice payload to save FOT) echoed back only voiceDescription — and the server re-scored via pickStepfunVoiceId every beat instead of honoring the LLM pick. The whole "CharacterDesigner picks a preset id" mechanism was effectively bypassed on live StepFun sessions (it only worked for prebaked cards, which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto the Character so the client→server round-trip keeps the LLM selection. 2. fetchBeatAudio's null-provider branch (probe pending) required speaker.voice and silently dropped a stepfun-only speaker. Accept any synthesizable source (voice \| stepfunVoiceId \| voiceDescription) so a slow getTtsProvider probe can't drop audio during the first scene's fetch window. The server resolveVoice normalizes regardless of which fields arrive.	2026-06-15 13:05:36 +08:00
yuanzonghao	ca73a41a0b	feat(tts): StepFun voice selection via CharacterDesigner + provider-aware beat-audio Make homepage cards and live sessions produce sound when the server is configured for StepFun TTS, instead of silently failing (the prebaked Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in Fast Origin Transfer). Three coordinated changes: 1. CharacterDesigner now picks a StepFun preset voice id directly from the 32-entry catalog in the SAME LLM call that designs the character — zero extra latency, LLM-grade match quality. The Xiaomi prompt path is byte-identical to history (verified programmatically) so cache hit rate and voice quality are preserved. pickStepfunVoiceId (keyword scorer) remains the fallback for orphan speakers / invalid LLM picks. 2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the single source of truth, shared by the scorer, the CharacterDesigner prompt, /api/tts-provider, and the offline enrich script. 3. A new GET /api/tts-provider endpoint lets the client probe the server's TTS provider at /play mount. fetchBeatAudio then shapes its request body: on a StepFun server it sends the lightweight stepfunVoiceId / voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving ~13MB per protagonist per session on prebaked cards). requestBeatAudio re-provisions on a provider mismatch before synth, so audio never goes silent on a cross-provider replay or mid-session provider flip. New type fields are all optional and backward-compatible: Character.stepfunVoiceId, BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made optional. AGENTS.md updated for the new route, type fields, dependency map, and StepFun voice-selection flow.	2026-06-15 12:49:25 +08:00
yuanzonghao	e68e7e1690	feat(engine): add opt-in image timeout and scene-paint hedging IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout); IMAGE_HEDGE_MS races a second identical scene-paint request when the first is still pending past the threshold. Both default to OFF when unset, preserving historical behavior for self-hosted deploys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-13 11:21:47 +08:00
baizhi958216	5608b0fdd0	fix(engine): tolerate duplicated JSON outputs	2026-06-11 16:11:52 +08:00
Zonghao Yuan	d15d53ba65	Merge pull request #57 from zonghaoyuan/feat/tts-stepfun-provider feat(tts): add StepFun preset-voice provider, route by URL + voice tag	2026-06-09 14:28:36 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	04f22249c9	fix(tts): make stepfun preset pick case-stable and per-character - Hash the lowercased description (matching the case-insensitive scoring) so the same archetype text picks the same preset regardless of case. - Thread the character name through provisionVoice -> stepfunProvision as the hash salt, so two characters that share archetype keywords spread across the top-N candidate presets instead of collapsing on one voice. Xiaomi path is unaffected (voicedesign mints a unique clip per call).	2026-06-09 09:14:44 +08:00
Qi Chen	fc62c9edf5	feat(engine): tighten CharacterDesigner prompt to prevent look-alike … (#56 ) * feat(engine): tighten CharacterDesigner prompt to prevent look-alike characters Expand the visualDescription rules into a 6-element mandatory checklist (hair quad / eyes triad / face & build / outfit quad / personality-driven vibe / silhouette tag) and add an explicit anti-collision rule comparing against the existing cast across cross-color-family and cross-silhouette dimensions. Also upgrade the user-message "已设定角色" block from soft hint to hard constraint with an explicit pre-write scan step, nudging the LLM into chain- of-thought differentiation before emitting tags. All additions land in the session-stable system prefix, so prompt cache absorbs the extra tokens — per-call billed token delta is ~0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(engine): replace pose examples with aura descriptors in personality vibe The PERSONALITY-DRIVEN VIBE element listed concrete poses (arms crossed, chin tilted up, slight slouch) which contradicted the earlier rule banning transient poses from visualDescription. Switch to pure atmosphere/aura keywords so the character card stays pose-neutral. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: yuanzonghao <yuanzonghao123@gmail.com>	2026-06-08 16:27:15 +08:00
yuanzonghao	4972243a93	fix: address PR Agent review findings across 6 files Restrict PR Agent workflow to trusted collaborators on PR comments only, fix UTF-8 byte counting in gallery-pack, correct portrait-to-landscape fallback orientation, track inserted freeform beats in visitedBeatIds, allow clearing stored TTS key, and guard empty-string fuzzy match in style selector. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-07 14:40:37 +08:00
yuanzonghao	ae3dd17e6b	feat(web): add player name, freeform input, and unified settings modal - Player name: stored in localStorage, injected into Architect/Writer/InsertBeat prompts so NPCs address the player by name, displayed in dialogue UI - Freeform input: compact button at choice nodes expands to text input, LLM classifier routes to insert-beat (interactive NPC response) or change-scene - SettingsModal: unified panel merging player name, voice toggle (with collapsible TTS key section), replacing the old TtsKeyModal - Insert-beat upgrade: prompt now requires NPC reaction when characters are present, shared by both freeform and Vision paths - IME guard: isComposing check on freeform input to prevent CJK mid-composition submission Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-07 12:37:50 +08:00
yuanzonghao	57bc6556ab	refactor(ai-client): unify OpenAI-compatible path to AI SDK generateText Eliminate the dual code path (raw fetch vs AI SDK) for text and vision. All providers now go through createLanguageModel() + generateText(), removing chatOpenAiCompatible/analyzeOpenAiCompatible, the manual Usage type, summarizeUsage, and responseFormat plumbing from 8 call sites. Key fix: @ai-sdk/openai v3 defaults to the Responses API (/responses); DeepSeek only supports Chat Completions, so we use .chat() explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-07 00:31:36 +08:00
yuanzonghao	165dcbc5e6	fix(engine): prevent Architect from seeing literal "auto" styleGuide Replace session.styleGuide with a descriptive placeholder before the Architect runs, so its prompt reads a natural sentence instead of the raw "auto" marker. Also wrap selectStyle in a try-catch so a transient LLM failure falls back to 吉卜力 instead of crashing session start. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-06 22:28:44 +08:00
yuanzonghao	585f302908	feat(engine): auto-select art style via parallel LLM call When user picks "自动", the client sends styleGuide="auto" to the server. The orchestrator then runs a lightweight style-selector LLM call in parallel with the Architect — both only depend on worldSetting, so there is zero added latency. The selector picks the best-matching preset from STYLE_MAP based on genre, mood, and setting. Also moves STYLE_MAP from page.tsx to lib/options.ts so it can be shared between client and server. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-06 22:08:08 +08:00
yuanzonghao	9fc83de276	feat(web,engine): portrait-orientation scene images for mobile full-bleed Thread orientation (portrait\|landscape) from client through API, engine, and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes; desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image coordinate mapping, session-locked orientation, a shared coerceOrientation helper, and a choices overflow cap in portrait. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 17:30:54 +08:00
yuanzonghao	efe021d886	fix(engine): pin entry-beat roster to the plan in Phase B The Painter composites exactly plan.entryActiveCharacters into the entry frame (the same roster the Cinematographer framed). Phase B is told to reuse that roster, but only the entry beat's id was code-enforced — so an LLM slip could leave a character in the painted frame that the runtime entry beat says isn't there. Pin activeCharacters onto the plan's entry beat as a last line of defense, mirroring the existing id pin. Speaker is intentionally left to the prompt: it's coupled to line/TTS, so overwriting it could mis-attribute or orphan Phase B's dialogue. Addresses Copilot review feedback on PR #27. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 15:48:14 +08:00
yuanzonghao	3bf5c92841	perf(engine): split Writer into Phase A (plan) + Phase B (beats) The Writer was the serial long pole: a single LLM call wrote the scene skeleton AND the full beats[] graph before anything downstream could start, so variable-length beat generation blew up tail latency. Split it into two calls: - Phase A (runWriterPlan): minimal skeleton the image pipeline needs (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker). Serial, on the critical path, kept lightweight. - Phase B (runWriterBeats): full beats[] + storyStatePatch, written to honor the plan. Launched immediately, overlaps the ENTIRE image pipeline (cards / cinematographer / portraits / painter), awaited last. Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long beat-writing is hidden behind image gen. A Phase B failure degrades to a single playable beat synthesized from the plan. Paired distinct-payload A/B (6 content-matched stories, baseline vs split): - median end-to-end 42.6s -> 32.2s (-24%) - mean 46.4s -> 33.1s (-29%) - worst case 74.7s -> 37.6s (halved) - no content regression: total Writer output tokens 12858 -> 13699 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 11:17:34 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	347ab297d5	feat(web,engine): custom style — image upload, AI-extract prompt, painter ref 自定义画风入口里加上传按钮：客户端把图缩到 512px webp(base64)，传到新路由 /api/parse-style-image，vision LLM 解析成英文 style prompt 回填 textarea；图本身随 sessionStorage → /api/start → Session.styleReferenceImage 透传， painter.collectReferenceImages 把它置于 slot 0，整局每一幕都作为 reference 图锚定画风（brush / color / mood），比 priorScene 优先级更高。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 19:15:19 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	298ecd4ec0	perf(engine): reorder Writer/Cinematographer prompts for prefix caching Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+ on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers match a stable byte-identical prefix from message[0]; once a single byte changes everything after it misses, so the trick is to push every session-stable bit to the front and concentrate per-call churn in a short suffix. Three coordinated changes: 1. Split storyState rendering into spine + dynamic. renderStoryStateSpine: logline / genreTags / protagonist / castNotes — Architect-set fields that StoryStatePatch literally cannot touch (the type only declares the 4 volatile ones; coerce and apply both cherry-pick), so spine bytes are guaranteed stable for the entire session. Goes in the STABLE PREFIX. renderStoryStateDynamic: synopsis / openThreads / relationships / nextHook — the Writer rewrites these every scene via storyStatePatch. Goes in the DYNAMIC SUFFIX. renderStoryState kept as a convenience wrapper that joins both, for anything that still wants the merged bible. 2. Rewrite buildWriterUserMessage with a stable/dynamic split. STABLE PREFIX (byte-identical or pure append across consecutive calls): - 世界观 / 画风 (session-immutable scalars) - story bible spine - 已登记角色 [sentinel: "（以下每行一个已登记角色，开场前为空。）"] + entries - 已使用的 sceneKey [sentinel] + entries - 场景历史，已完结 [sentinel] + archivedHistory entries ↑ archivedHistory = history.slice(0, -1), NOT the full history — the live entry (history[-1]) keeps mutating mid-scene as the player walks new beats and speculative prefetches snapshot it at different moments, so it MUST stay out of the stable prefix or the byte-monotonic invariant breaks. DYNAMIC SUFFIX: - storyState dynamic patch - last-beat snippet (the exact emotional cliffhanger to continue from) - lastExit hint - format reminder tail The previous structure put the full storyState (including patched fields) at the very top of the user message, so the very first byte of the user message changed every scene — user-side cache hit was effectively 0% across the board. 3. Sentinel pattern for variable-length sections. Every list (characters / sceneKeys / archivedHistory) now emits a constant placeholder line after its header REGARDLESS of whether it has entries. With the old "if empty print '（暂无）' else print entries" pattern, adding the first item silently rewrites those placeholder bytes — the byte at offset N moves from a Chinese parenthesis to a dash, prefix cache torched. The sentinel line is the same bytes whether the list has 0 or N items; new items are pure appends after it. 4. Rewrite buildCinematographerUserMessage. New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued right after the session-stable styleGuide line, so the stable prefix is long enough to cross at least one full 64-token chunk boundary beyond the system prompt. The per-scene inputs (sceneSummary, entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity hint) all moved into the dynamic suffix below. Verified (see [cache] / [debug-writer] logs from staging): hash of 500-byte slices of the user message is byte-identical across two same-historyLen Writer calls through the entire stable prefix; only the dynamic suffix slice differs. The remaining cache-hit gap under MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally jumps to 4096); on DeepSeek the same prefix should hit fully. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 10:42:33 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	37c911f510	chore(engine): log prompt-cache hit/miss per chat call Add a `tag` option to chat() and have it print one `[cache] <tag> hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are probed in order so the same logger works across providers: - DeepSeek (v3+): usage.prompt_cache_hit_tokens / _miss_tokens - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens - Anthropic: usage.cache_read_input_tokens / _creation_* When none of them are present (MiMo / local Ollama / others) we still print prompt + completion totals so the cost baseline is visible. Tag every callsite so the log is greppable: architect / writer / character-designer / cinematographer / insert-beat This is the prerequisite for the prefix-cache reordering work that follows — without per-agent visibility there's no way to tell if a prompt rearrangement actually moved the needle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 10:42:33 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	cbabc54273	chore(engine): log worldSetting and storyBible at session start Two lines in startSession: the full worldSetting being fed to the Architect, and the resulting logline/genreTags/synopsis it produced. Cheap to keep — fires once per session — and makes it possible to tell at a glance whether a "story unrelated to my input" report is a frontend transport bug, a worldSetting layout problem, or the LLM ignoring the seed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-03 03:51:58 +08:00
Zonghao Yuan	dc5ecd60f6	refactor: flatten monorepo to single web package (#12 ) Flatten the pnpm monorepo (apps/web + packages/) into a single web package at the repo root. - Move app/lib/components/scripts/public to root; drop apps/web and packages/ wrappers - Rewrite tsconfig paths (@infiplot/) to ./lib/; turbopack.root = __dirname - Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths - Regenerate pnpm-lock.yaml to drop stale workspace importers - Bump engines.node to >=22 to match wrangler Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 00:55:45 +08:00

29 Commits