Files
infiplot-web/apps/web/lib/config.ts
T
Zonghao Yuan fcd4e6c1ab feat(tts): Xiaomi MiMo per-beat voice + MOCK_IMAGE testing aid (#3)
Adds optional Xiaomi MiMo TTS layer on top of the scene/beat engine and a MOCK_IMAGE flag for cheap local TTS iteration.

- Per-character voice provisioning via MiMo voice design → clone, reference audio persisted in session
- Per-line free-form delivery direction (Director writes "鼓起勇气又害羞,声音发颤" style instructions; sent to MiMo's director channel, never read aloud)
- Per-beat audio served with the scene response; frontend plays via hidden <audio> with typewriter synced to audio duration; mute toggle persisted via localStorage lazy initializer
- Graceful degradation: any TTS step failing → silent beat, game continues
- MOCK_IMAGE=true returns a sharp-generated placeholder PNG so local TTS iteration doesn't burn image tokens
- Recommended config in .env.example: MiMo Token Plan covers TEXT/VISION/TTS with one key (mimo-v2.5-pro for text, mimo-v2.5 omni for vision, mimo-v2.5-tts for TTS)

Squashed from #3:
- feat(tts): 小米 MiMo 逐 beat 配音 + 按 session 角色音色 + 自由文本配音指导
- feat(engine): MOCK_IMAGE 占位图便于本地测试
- fix(tts): address Copilot review on PR #3
- fix(tts): Copilot round-2 review feedback

Known limitation: Session.characters carries the full WAV reference audio (~200-300KB/character base64) and round-trips through every /api/scene, /api/vision, /api/insert-beat request. This is intrinsic to MiMo's design→clone model (voice identity IS the audio, no server-side voiceId). Fixing requires server-side storage which is out of scope; documented for future hardening.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-05-28 20:45:21 +08:00

46 lines
1.3 KiB
TypeScript

import type { EngineConfig, TtsConfig } from "@yume/types";
function readVar(name: string): string {
const v = process.env[name];
if (!v) throw new Error(`Missing required environment variable: ${name}`);
return v;
}
function readOptionalVar(name: string): string | undefined {
const v = process.env[name];
return v && v.length > 0 ? v : undefined;
}
function loadTtsConfig(): TtsConfig | undefined {
const baseUrl = readOptionalVar("TTS_BASE_URL");
const apiKey = readOptionalVar("TTS_API_KEY");
const speechModel = readOptionalVar("TTS_SPEECH_MODEL");
// Missing any → TTS disabled (game runs silently).
if (!baseUrl || !apiKey || !speechModel) return undefined;
return { baseUrl, apiKey, speechModel };
}
export function loadEngineConfig(): EngineConfig {
return {
text: {
baseUrl: readVar("TEXT_BASE_URL"),
apiKey: readVar("TEXT_API_KEY"),
model: readVar("TEXT_MODEL"),
},
image: {
baseUrl: readVar("IMAGE_BASE_URL"),
apiKey: readVar("IMAGE_API_KEY"),
model: readVar("IMAGE_MODEL"),
},
vision: {
baseUrl: readVar("VISION_BASE_URL"),
apiKey: readVar("VISION_API_KEY"),
model: readVar("VISION_MODEL"),
},
tts: loadTtsConfig(),
mockImage: readOptionalVar("MOCK_IMAGE") === "true",
};
}