infiplot-web

Author	SHA1	Message	Date
yuanzonghao	ca73a41a0b	feat(tts): StepFun voice selection via CharacterDesigner + provider-aware beat-audio Make homepage cards and live sessions produce sound when the server is configured for StepFun TTS, instead of silently failing (the prebaked Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in Fast Origin Transfer). Three coordinated changes: 1. CharacterDesigner now picks a StepFun preset voice id directly from the 32-entry catalog in the SAME LLM call that designs the character — zero extra latency, LLM-grade match quality. The Xiaomi prompt path is byte-identical to history (verified programmatically) so cache hit rate and voice quality are preserved. pickStepfunVoiceId (keyword scorer) remains the fallback for orphan speakers / invalid LLM picks. 2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the single source of truth, shared by the scorer, the CharacterDesigner prompt, /api/tts-provider, and the offline enrich script. 3. A new GET /api/tts-provider endpoint lets the client probe the server's TTS provider at /play mount. fetchBeatAudio then shapes its request body: on a StepFun server it sends the lightweight stepfunVoiceId / voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving ~13MB per protagonist per session on prebaked cards). requestBeatAudio re-provisions on a provider mismatch before synth, so audio never goes silent on a cross-provider replay or mid-session provider flip. New type fields are all optional and backward-compatible: Character.stepfunVoiceId, BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made optional. AGENTS.md updated for the new route, type fields, dependency map, and StepFun voice-selection flow.	2026-06-15 12:49:25 +08:00
yuanzonghao	e68e7e1690	feat(engine): add opt-in image timeout and scene-paint hedging IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout); IMAGE_HEDGE_MS races a second identical scene-paint request when the first is still pending past the threshold. Both default to OFF when unset, preserving historical behavior for self-hosted deploys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-13 11:21:47 +08:00
baizhi958216	ef3b57953b	refactor(ai-client): replace AI SDK adapters with OpenAI SDK	2026-06-11 16:11:44 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	19bbee16fe	feat(tts): add StepFun preset-voice provider, route by URL + voice tag Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host (contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the image client infers Runware from `*.runware.ai`. CharacterVoice becomes a discriminated union on `provider`: - xiaomi: { referenceAudioBase64, mimeType } — unchanged - stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model Provision dispatches on the current cfg's base URL; synthesis dispatches on the voice's own `provider` tag so a session with mixed voices (e.g. a provider switch mid-development) routes each beat through the correct protocol. xiaomiSynthesize now guards against being called with a non- xiaomi voice, surfacing the bug as a clear runtime error instead of a TypeScript narrow violation at the access site. StepFun has no voicedesign equivalent — only preset voices + voice cloning from a reference audio upload. Cloning would require an extra asset per character, so v1 maps the LLM's Chinese voiceDescription to one of the 32 published preset IDs via gender + age + tone keyword scoring, with a deterministic hash spread across the top-3 candidates so multiple characters with similar descriptions don't collapse onto the identical preset. lineDelivery is accepted but not yet propagated to StepFun's voice_label.emotion / .style fields — left as a follow-up. beat-audio route validation relaxed from `voice.referenceAudioBase64` (xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices pass the gate; provider-specific shape errors still surface from the synth function. Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median ~2.3s per beat with 0% timeouts across the test session, vs MiMo's median ~8s with the long tail tripping the existing 15s synth budget on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14 per typical 50-beat session) vs MiMo TTS currently free under the Token Plan creator incentive. AGENTS.md provider matrix updated to describe both providers and the discriminated-union dispatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-08 17:15:02 +08:00
baizhi958216	0abd5f1525	feat(play): add encrypted story sharing	2026-06-07 17:13:27 +08:00
yuanzonghao	ae3dd17e6b	feat(web): add player name, freeform input, and unified settings modal - Player name: stored in localStorage, injected into Architect/Writer/InsertBeat prompts so NPCs address the player by name, displayed in dialogue UI - Freeform input: compact button at choice nodes expands to text input, LLM classifier routes to insert-beat (interactive NPC response) or change-scene - SettingsModal: unified panel merging player name, voice toggle (with collapsible TTS key section), replacing the old TtsKeyModal - Insert-beat upgrade: prompt now requires NPC reaction when characters are present, shared by both freeform and Vision paths - IME guard: isComposing check on freeform input to prevent CJK mid-composition submission Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-07 12:37:50 +08:00
yuanzonghao	9fc83de276	feat(web,engine): portrait-orientation scene images for mobile full-bleed Thread orientation (portrait\|landscape) from client through API, engine, and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes; desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image coordinate mapping, session-locked orientation, a shared coerceOrientation helper, and a choices overflow cap in portrait. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 17:30:54 +08:00
yuanzonghao	83fd5717e7	feat(ai-client): multi-provider compat — native Anthropic/Google + URL tolerance - TEXT/VISION: add native Anthropic & Google Gemini paths via Vercel AI SDK, selectable through TEXT_PROVIDER / VISION_PROVIDER (default openai_compatible) - IMAGE: expand to openai (gpt-image) / google (Nano Banana) via AI SDK alongside the existing Runware task-array and OpenAI-compatible REST paths - normalizeBaseUrl: tolerate URLs with/without /v1 (or /chat/completions); append the per-protocol version segment only for bare hosts - config: readProvider() reads *_PROVIDER; types: ProviderProtocol + provider? - deps: @ai-sdk/anthropic, @ai-sdk/google; docs in .env.example + README Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 17:09:05 +08:00
yuanzonghao	b0b2e922d3	feat(web): optional bring-your-own Xiaomi MiMo TTS key (browser-side synthesis) Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits cause silent playback under concurrency. This adds an OPTIONAL path: a user can store their own Xiaomi MiMo key in the browser and synthesize voice client-side against Xiaomi's CORS-open endpoints. The key lives only in localStorage and is never sent to or logged by our server; the shared server key still serves everyone who does not opt in. - components/TtsKeyModal.tsx: shared key modal (key-family + region picker), reused by both the home and play pages - app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens in place instead of redirecting to the home page - app/page.tsx: home page consumes the shared modal + readStoredTtsConfig - lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets - app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update - docs/xiaomi-tts-key.md + README note Verified with tsc --noEmit (exit 0). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 16:58:55 +08:00
yuanzonghao	3bf5c92841	perf(engine): split Writer into Phase A (plan) + Phase B (beats) The Writer was the serial long pole: a single LLM call wrote the scene skeleton AND the full beats[] graph before anything downstream could start, so variable-length beat generation blew up tail latency. Split it into two calls: - Phase A (runWriterPlan): minimal skeleton the image pipeline needs (sceneSummary, sceneKey, entryBeatId, cast, entry roster, entry speaker). Serial, on the critical path, kept lightweight. - Phase B (runWriterBeats): full beats[] + storyStatePatch, written to honor the plan. Launched immediately, overlaps the ENTIRE image pipeline (cards / cinematographer / portraits / painter), awaited last. Critical path becomes PhaseA + max(imagePipeline, PhaseB), so the long beat-writing is hidden behind image gen. A Phase B failure degrades to a single playable beat synthesized from the plan. Paired distinct-payload A/B (6 content-matched stories, baseline vs split): - median end-to-end 42.6s -> 32.2s (-24%) - mean 46.4s -> 33.1s (-29%) - worst case 74.7s -> 37.6s (halved) - no content regression: total Writer output tokens 12858 -> 13699 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-04 11:17:34 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	347ab297d5	feat(web,engine): custom style — image upload, AI-extract prompt, painter ref 自定义画风入口里加上传按钮：客户端把图缩到 512px webp(base64)，传到新路由 /api/parse-style-image，vision LLM 解析成英文 style prompt 回填 textarea；图本身随 sessionStorage → /api/start → Session.styleReferenceImage 透传， painter.collectReferenceImages 把它置于 slot 0，整局每一幕都作为 reference 图锚定画风（brush / color / mood），比 priorScene 优先级更高。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 19:15:19 +08:00
Zonghao Yuan	dc5ecd60f6	refactor: flatten monorepo to single web package (#12 ) Flatten the pnpm monorepo (apps/web + packages/) into a single web package at the repo root. - Move app/lib/components/scripts/public to root; drop apps/web and packages/ wrappers - Rewrite tsconfig paths (@infiplot/) to ./lib/; turbopack.root = __dirname - Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths - Regenerate pnpm-lock.yaml to drop stale workspace importers - Bump engines.node to >=22 to match wrangler Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 00:55:45 +08:00

12 Commits