fix(tts): persist stepfunVoiceId on Character + harden probe race

Two follow-ups from pr-agent review of #79:

1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so
   on a StepFun server the client (which omits the voice payload to save
   FOT) echoed back only voiceDescription — and the server re-scored via
   pickStepfunVoiceId every beat instead of honoring the LLM pick. The
   whole "CharacterDesigner picks a preset id" mechanism was effectively
   bypassed on live StepFun sessions (it only worked for prebaked cards,
   which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto
   the Character so the client→server round-trip keeps the LLM selection.

2. fetchBeatAudio's null-provider branch (probe pending) required
   speaker.voice and silently dropped a stepfun-only speaker. Accept any
   synthesizable source (voice | stepfunVoiceId | voiceDescription) so a
   slow getTtsProvider probe can't drop audio during the first scene's
   fetch window. The server resolveVoice normalizes regardless of which
   fields arrive.
This commit is contained in:
yuanzonghao
2026-06-15 13:05:36 +08:00
parent ff03f3c085
commit 375f401c8f
2 changed files with 10 additions and 3 deletions
+6 -3
View File
@@ -879,14 +879,17 @@ function PlayInner() {
// - BYO (xiaomi): baked voice OR voiceDescription to provision locally.
// - Server stepfun: stepfunVoiceId or voiceDescription — no Xiaomi
// `voice` needed (saves the ~220KB reference-audio FOT).
// - Server xiaomi / unknown: rely on speaker.voice (the server will
// normalize if provider mismatch — but we still need *something*).
// - Server xiaomi / unknown (probe pending): accept ANY synthesizable
// source. The null case covers the race where getTtsProvider hasn't
// resolved before the first beat fetch fires — without this widening
// a stepfun-only speaker (no Xiaomi voice) would be silently dropped.
// The server resolves + normalizes regardless of which fields arrive.
if (byo) {
if (!speaker.voice && !speaker.voiceDescription) return;
} else if (serverProvider === "stepfun") {
if (!speaker.stepfunVoiceId && !speaker.voiceDescription) return;
} else {
if (!speaker.voice) return;
if (!speaker.voice && !speaker.stepfunVoiceId && !speaker.voiceDescription) return;
}
if (beatAudioAbortRef.current.has(beat.id)) return;