feat(web): optional bring-your-own Xiaomi MiMo TTS key (browser-side synthesis)
Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits
cause silent playback under concurrency. This adds an OPTIONAL path: a user
can store their own Xiaomi MiMo key in the browser and synthesize voice
client-side against Xiaomi's CORS-open endpoints. The key lives only in
localStorage and is never sent to or logged by our server; the shared server
key still serves everyone who does not opt in.
- components/TtsKeyModal.tsx: shared key modal (key-family + region picker),
reused by both the home and play pages
- app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens
in place instead of redirecting to the home page
- app/page.tsx: home page consumes the shared modal + readStoredTtsConfig
- lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets
- app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update
- docs/xiaomi-tts-key.md + README note
Verified with tsc --noEmit (exit 0).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -300,6 +300,12 @@ export type StartRequest = {
|
||||
styleGuide: string;
|
||||
/** Optional user-uploaded style reference image — see Session.styleReferenceImage. */
|
||||
styleReferenceImage?: string;
|
||||
/**
|
||||
* When true the client supplied its own Xiaomi TTS key and will provision +
|
||||
* synth voices in the browser (key never touches our server). The route then
|
||||
* drops `config.tts` so the engine skips all server-side TTS work.
|
||||
*/
|
||||
clientTts?: boolean;
|
||||
};
|
||||
|
||||
// /api/parse-style-image — vision LLM extracts a textual painting-style
|
||||
@@ -332,6 +338,8 @@ export type StartResponse = {
|
||||
// (frontend synthesizes a speculative exit).
|
||||
export type SceneRequest = {
|
||||
session: Session;
|
||||
/** See StartRequest.clientTts — drops server-side TTS for BYO-key clients. */
|
||||
clientTts?: boolean;
|
||||
};
|
||||
|
||||
export type SceneResponse = {
|
||||
@@ -389,6 +397,8 @@ export type VisionResponse = {
|
||||
export type InsertBeatRequest = {
|
||||
session: Session;
|
||||
freeformAction: string;
|
||||
/** See StartRequest.clientTts — drops server-side TTS for BYO-key clients. */
|
||||
clientTts?: boolean;
|
||||
};
|
||||
|
||||
/** Partial beat fields produced by the insert-beat director. */
|
||||
|
||||
Reference in New Issue
Block a user