infiplot-web

Author	SHA1	Message	Date
yuanzonghao	64cf9c330d	refactor(share): remove GALLERY_SECRET, use plaintext + SHA-256 integrity for .infiplot files The encrypted .infiplot format (AES-256-GCM via GALLERY_SECRET) provided no meaningful security — the payload is AI-generated story content with no credentials or PII, and the project is open source. Replace with plaintext + SHA-256 integrity check (format v2). Story share is now always enabled without requiring a server secret. - galleryCrypto.ts: AES-256-GCM → plaintext + SHA-256 hash; remove secret param - 4 API routes: remove GALLERY_SECRET guard and 503 fallback - story-unpack: forward specific error messages (v1 compat, hash mismatch) - gallery/page.tsx: remove stale AES-GCM comment - AGENTS.md: document gallery-pack/gallery-unpack routes - .env.example, wrangler.jsonc: remove GALLERY_SECRET references Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-18 21:41:56 +08:00
yuanzonghao	ca73a41a0b	feat(tts): StepFun voice selection via CharacterDesigner + provider-aware beat-audio Make homepage cards and live sessions produce sound when the server is configured for StepFun TTS, instead of silently failing (the prebaked Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in Fast Origin Transfer). Three coordinated changes: 1. CharacterDesigner now picks a StepFun preset voice id directly from the 32-entry catalog in the SAME LLM call that designs the character — zero extra latency, LLM-grade match quality. The Xiaomi prompt path is byte-identical to history (verified programmatically) so cache hit rate and voice quality are preserved. pickStepfunVoiceId (keyword scorer) remains the fallback for orphan speakers / invalid LLM picks. 2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the single source of truth, shared by the scorer, the CharacterDesigner prompt, /api/tts-provider, and the offline enrich script. 3. A new GET /api/tts-provider endpoint lets the client probe the server's TTS provider at /play mount. fetchBeatAudio then shapes its request body: on a StepFun server it sends the lightweight stepfunVoiceId / voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving ~13MB per protagonist per session on prebaked cards). requestBeatAudio re-provisions on a provider mismatch before synth, so audio never goes silent on a cross-provider replay or mid-session provider flip. New type fields are all optional and backward-compatible: Character.stepfunVoiceId, BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made optional. AGENTS.md updated for the new route, type fields, dependency map, and StepFun voice-selection flow.	2026-06-15 12:49:25 +08:00
Zonghao Yuan	0dea2f8e36	fix(ai-client): clean up regressions from OpenAI SDK migration and canvas frame fix (#74 ) Three follow-ups to `ef3b579` (OpenAI SDK migration) and `ebe39ef` (canvas frame): - .env.example / config.ts / AGENTS.md: anthropic & google native protocols were removed with the Vercel AI SDK, but .env.example and AGENTS.md still advertised them. Rewrite the docs to point Claude/Gemini at their OpenAI-compatible endpoints (api.anthropic.com/v1, generativelanguage.googleapis.com/v1beta/openai), drop the dead Gemini "Nano Banana" image example, sync AGENTS.md (text/vision protocol list, image protocol list, the "OpenAI/Gemini via AI SDK" reference note), and append a short hint in readProvider() error message guiding anthropic/google users to openai_compatible instead of a bare rejection. - chat.ts: drop the unsafe `as { prompt_tokens_details?: ... }` cast; read cached_tokens straight off the SDK's CompletionUsage type. Add a comment noting the OpenAI usage object reports cache reads only (no cache-write count), so the create cost the old AI SDK path logged is unrecoverable. - PlayCanvas.tsx: revert <img key={imageUrl}> to key={imageUrl.slice(-48)}. The gpt-image/mock paths emit multi-MB data URIs; using the full string as React's reconciliation key adds avoidable diff overhead during the frequent re-renders. Matches the existing <audio> element's key convention. Validation: pnpm typecheck passes. (pnpm lint fails on a pre-existing Next 16 `next lint` CLI issue, identical on staging — unrelated to this change.)	2026-06-14 13:36:19 +08:00
yuanzonghao	e68e7e1690	feat(engine): add opt-in image timeout and scene-paint hedging IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout); IMAGE_HEDGE_MS races a second identical scene-paint request when the first is still pending past the threshold. Both default to OFF when unset, preserving historical behavior for self-hosted deploys. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-13 11:21:47 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	19bbee16fe	feat(tts): add StepFun preset-voice provider, route by URL + voice tag Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host (contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the image client infers Runware from `*.runware.ai`. CharacterVoice becomes a discriminated union on `provider`: - xiaomi: { referenceAudioBase64, mimeType } — unchanged - stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model Provision dispatches on the current cfg's base URL; synthesis dispatches on the voice's own `provider` tag so a session with mixed voices (e.g. a provider switch mid-development) routes each beat through the correct protocol. xiaomiSynthesize now guards against being called with a non- xiaomi voice, surfacing the bug as a clear runtime error instead of a TypeScript narrow violation at the access site. StepFun has no voicedesign equivalent — only preset voices + voice cloning from a reference audio upload. Cloning would require an extra asset per character, so v1 maps the LLM's Chinese voiceDescription to one of the 32 published preset IDs via gender + age + tone keyword scoring, with a deterministic hash spread across the top-3 candidates so multiple characters with similar descriptions don't collapse onto the identical preset. lineDelivery is accepted but not yet propagated to StepFun's voice_label.emotion / .style fields — left as a follow-up. beat-audio route validation relaxed from `voice.referenceAudioBase64` (xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices pass the gate; provider-specific shape errors still surface from the synth function. Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median ~2.3s per beat with 0% timeouts across the test session, vs MiMo's median ~8s with the long tail tripping the existing 15s synth budget on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14 per typical 50-beat session) vs MiMo TTS currently free under the Token Plan creator incentive. AGENTS.md provider matrix updated to describe both providers and the discriminated-union dispatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-08 17:15:02 +08:00
baizhi958216	0abd5f1525	feat(play): add encrypted story sharing	2026-06-07 17:13:27 +08:00
baizhi958216	1bdd4dfd13	feat(dx): add missing nextjs default rules Signed-off-by: baizhi958216 <1475289190@qq.com>	2026-06-06 21:58:04 +08:00
baizhi958216	5a7daa8452	feat(play): add history dialog Signed-off-by: baizhi958216 <1475289190@qq.com>	2026-06-06 20:52:10 +08:00
baizhi958216	aef4771d2e	feat(dx): add agents.md Signed-off-by: baizhi958216 <1475289190@qq.com>	2026-06-06 19:49:32 +08:00

9 Commits