feat(web): optional bring-your-own Xiaomi MiMo TTS key (browser-side synthesis)

Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits
cause silent playback under concurrency. This adds an OPTIONAL path: a user
can store their own Xiaomi MiMo key in the browser and synthesize voice
client-side against Xiaomi's CORS-open endpoints. The key lives only in
localStorage and is never sent to or logged by our server; the shared server
key still serves everyone who does not opt in.

- components/TtsKeyModal.tsx: shared key modal (key-family + region picker),
  reused by both the home and play pages
- app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens
  in place instead of redirecting to the home page
- app/page.tsx: home page consumes the shared modal + readStoredTtsConfig
- lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets
- app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update
- docs/xiaomi-tts-key.md + README note

Verified with tsc --noEmit (exit 0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
yuanzonghao
2026-06-04 11:24:16 +08:00
parent 24b674d792
commit b0b2e922d3
13 changed files with 843 additions and 48 deletions
+10
View File
@@ -300,6 +300,12 @@ export type StartRequest = {
styleGuide: string;
/** Optional user-uploaded style reference image — see Session.styleReferenceImage. */
styleReferenceImage?: string;
/**
* When true the client supplied its own Xiaomi TTS key and will provision +
* synth voices in the browser (key never touches our server). The route then
* drops `config.tts` so the engine skips all server-side TTS work.
*/
clientTts?: boolean;
};
// /api/parse-style-image — vision LLM extracts a textual painting-style
@@ -332,6 +338,8 @@ export type StartResponse = {
// (frontend synthesizes a speculative exit).
export type SceneRequest = {
session: Session;
/** See StartRequest.clientTts — drops server-side TTS for BYO-key clients. */
clientTts?: boolean;
};
export type SceneResponse = {
@@ -389,6 +397,8 @@ export type VisionResponse = {
export type InsertBeatRequest = {
session: Session;
freeformAction: string;
/** See StartRequest.clientTts — drops server-side TTS for BYO-key clients. */
clientTts?: boolean;
};
/** Partial beat fields produced by the insert-beat director. */