375f401c8f
Two follow-ups from pr-agent review of #79: 1. director.ts voicePromises built a Character WITHOUT stepfunVoiceId, so on a StepFun server the client (which omits the voice payload to save FOT) echoed back only voiceDescription — and the server re-scored via pickStepfunVoiceId every beat instead of honoring the LLM pick. The whole "CharacterDesigner picks a preset id" mechanism was effectively bypassed on live StepFun sessions (it only worked for prebaked cards, which carry stepfunVoiceId in their JSON). Persist stepfunVoiceId onto the Character so the client→server round-trip keeps the LLM selection. 2. fetchBeatAudio's null-provider branch (probe pending) required speaker.voice and silently dropped a stepfun-only speaker. Accept any synthesizable source (voice | stepfunVoiceId | voiceDescription) so a slow getTtsProvider probe can't drop audio during the first scene's fetch window. The server resolveVoice normalizes regardless of which fields arrive.