Make homepage cards and live sessions produce sound when the server is
configured for StepFun TTS, instead of silently failing (the prebaked
Xiaomi voice was useless on a StepFun server, and wasted ~220KB/beat in
Fast Origin Transfer).
Three coordinated changes:
1. CharacterDesigner now picks a StepFun preset voice id directly from the
32-entry catalog in the SAME LLM call that designs the character — zero
extra latency, LLM-grade match quality. The Xiaomi prompt path is
byte-identical to history (verified programmatically) so cache hit rate
and voice quality are preserved. pickStepfunVoiceId (keyword scorer)
remains the fallback for orphan speakers / invalid LLM picks.
2. The 32-preset catalog moves to lib/tts-client/stepfun-voices.json as the
single source of truth, shared by the scorer, the CharacterDesigner
prompt, /api/tts-provider, and the offline enrich script.
3. A new GET /api/tts-provider endpoint lets the client probe the server's
TTS provider at /play mount. fetchBeatAudio then shapes its request body:
on a StepFun server it sends the lightweight stepfunVoiceId /
voiceDescription and omits the ~220KB Xiaomi reference audio (FOT saving
~13MB per protagonist per session on prebaked cards). requestBeatAudio
re-provisions on a provider mismatch before synth, so audio never goes
silent on a cross-provider replay or mid-session provider flip.
New type fields are all optional and backward-compatible: Character.stepfunVoiceId,
BeatAudioRequest.voiceDescription/characterName/stepfunVoiceId, voice made
optional. AGENTS.md updated for the new route, type fields, dependency map,
and StepFun voice-selection flow.
IMAGE_TIMEOUT_MS sets a per-attempt hard deadline (AbortSignal.timeout);
IMAGE_HEDGE_MS races a second identical scene-paint request when the
first is still pending past the threshold. Both default to OFF when
unset, preserving historical behavior for self-hosted deploys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Hash the lowercased description (matching the case-insensitive scoring)
so the same archetype text picks the same preset regardless of case.
- Thread the character name through provisionVoice -> stepfunProvision as
the hash salt, so two characters that share archetype keywords spread
across the top-N candidate presets instead of collapsing on one voice.
Xiaomi path is unaffected (voicedesign mints a unique clip per call).
Eliminate the dual code path (raw fetch vs AI SDK) for text and vision.
All providers now go through createLanguageModel() + generateText(),
removing chatOpenAiCompatible/analyzeOpenAiCompatible, the manual Usage
type, summarizeUsage, and responseFormat plumbing from 8 call sites.
Key fix: @ai-sdk/openai v3 defaults to the Responses API (/responses);
DeepSeek only supports Chat Completions, so we use .chat() explicitly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a `tag` option to chat() and have it print one `[cache] <tag>
hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are
probed in order so the same logger works across providers:
- DeepSeek (v3+): usage.prompt_cache_hit_tokens / *_miss_tokens
- OpenAI / o-series: usage.prompt_tokens_details.cached_tokens
- Anthropic: usage.cache_read_input_tokens / *_creation_*
When none of them are present (MiMo / local Ollama / others) we still
print prompt + completion totals so the cost baseline is visible.
Tag every callsite so the log is greppable:
architect / writer / character-designer / cinematographer / insert-beat
This is the prerequisite for the prefix-cache reordering work that
follows — without per-agent visibility there's no way to tell if a
prompt rearrangement actually moved the needle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flatten the pnpm monorepo (apps/web + packages/*) into a single web package at the repo root.
- Move app/lib/components/scripts/public to root; drop apps/web and packages/* wrappers
- Rewrite tsconfig paths (@infiplot/*) to ./lib/*; turbopack.root = __dirname
- Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths
- Regenerate pnpm-lock.yaml to drop stale workspace importers
- Bump engines.node to >=22 to match wrangler
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>