fix(tts): persist stepfunVoiceId on Character + harden probe race
Two follow-ups from pr-agent review of #79: 1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so on a StepFun server the client (which omits the voice payload to save FOT) echoed back only voiceDescription — and the server re-scored via pickStepfunVoiceId every beat instead of honoring the LLM pick. The whole "CharacterDesigner picks a preset id" mechanism was effectively bypassed on live StepFun sessions (it only worked for prebaked cards, which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto the Character so the client→server round-trip keeps the LLM selection. 2. fetchBeatAudio's null-provider branch (probe pending) required speaker.voice and silently dropped a stepfun-only speaker. Accept any synthesizable source (voice | stepfunVoiceId | voiceDescription) so a slow getTtsProvider probe can't drop audio during the first scene's fetch window. The server resolveVoice normalizes regardless of which fields arrive.
This commit is contained in:
+6
-3
@@ -879,14 +879,17 @@ function PlayInner() {
|
|||||||
// - BYO (xiaomi): baked voice OR voiceDescription to provision locally.
|
// - BYO (xiaomi): baked voice OR voiceDescription to provision locally.
|
||||||
// - Server stepfun: stepfunVoiceId or voiceDescription — no Xiaomi
|
// - Server stepfun: stepfunVoiceId or voiceDescription — no Xiaomi
|
||||||
// `voice` needed (saves the ~220KB reference-audio FOT).
|
// `voice` needed (saves the ~220KB reference-audio FOT).
|
||||||
// - Server xiaomi / unknown: rely on speaker.voice (the server will
|
// - Server xiaomi / unknown (probe pending): accept ANY synthesizable
|
||||||
// normalize if provider mismatch — but we still need *something*).
|
// source. The null case covers the race where getTtsProvider hasn't
|
||||||
|
// resolved before the first beat fetch fires — without this widening
|
||||||
|
// a stepfun-only speaker (no Xiaomi voice) would be silently dropped.
|
||||||
|
// The server resolves + normalizes regardless of which fields arrive.
|
||||||
if (byo) {
|
if (byo) {
|
||||||
if (!speaker.voice && !speaker.voiceDescription) return;
|
if (!speaker.voice && !speaker.voiceDescription) return;
|
||||||
} else if (serverProvider === "stepfun") {
|
} else if (serverProvider === "stepfun") {
|
||||||
if (!speaker.stepfunVoiceId && !speaker.voiceDescription) return;
|
if (!speaker.stepfunVoiceId && !speaker.voiceDescription) return;
|
||||||
} else {
|
} else {
|
||||||
if (!speaker.voice) return;
|
if (!speaker.voice && !speaker.stepfunVoiceId && !speaker.voiceDescription) return;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (beatAudioAbortRef.current.has(beat.id)) return;
|
if (beatAudioAbortRef.current.has(beat.id)) return;
|
||||||
|
|||||||
@@ -308,6 +308,9 @@ export async function directScene(
|
|||||||
// On the StepFun path, thread the LLM-selected stepfunVoiceId from the card
|
// On the StepFun path, thread the LLM-selected stepfunVoiceId from the card
|
||||||
// into provision — it lets stepfunProvision honor the catalog pick instead
|
// into provision — it lets stepfunProvision honor the catalog pick instead
|
||||||
// of falling back to the keyword scorer (same network cost: still zero).
|
// of falling back to the keyword scorer (same network cost: still zero).
|
||||||
|
// ALSO persist it onto the Character so the client can echo it back on a
|
||||||
|
// StepFun server (where it skips the ~220KB voice payload) and the server
|
||||||
|
// resolveVoice honors the LLM pick at synth time instead of re-scoring.
|
||||||
const voicePromises = cards.map((card) =>
|
const voicePromises = cards.map((card) =>
|
||||||
provisionCharacterVoice(config, card.voiceDescription, card.name, {
|
provisionCharacterVoice(config, card.voiceDescription, card.name, {
|
||||||
stepfunVoiceId: card.stepfunVoiceId,
|
stepfunVoiceId: card.stepfunVoiceId,
|
||||||
@@ -316,6 +319,7 @@ export async function directScene(
|
|||||||
name: card.name,
|
name: card.name,
|
||||||
voiceDescription: card.voiceDescription,
|
voiceDescription: card.voiceDescription,
|
||||||
voice,
|
voice,
|
||||||
|
stepfunVoiceId: card.stepfunVoiceId,
|
||||||
}),
|
}),
|
||||||
),
|
),
|
||||||
);
|
);
|
||||||
|
|||||||
Reference in New Issue
Block a user