From fcd4e6c1ab088092a3529e3d8112e5231912cb4e Mon Sep 17 00:00:00 2001 From: Zonghao Yuan <64521992+zonghaoyuan@users.noreply.github.com> Date: Thu, 28 May 2026 20:45:21 +0800 Subject: [PATCH] feat(tts): Xiaomi MiMo per-beat voice + MOCK_IMAGE testing aid (#3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds optional Xiaomi MiMo TTS layer on top of the scene/beat engine and a MOCK_IMAGE flag for cheap local TTS iteration. - Per-character voice provisioning via MiMo voice design → clone, reference audio persisted in session - Per-line free-form delivery direction (Director writes "鼓起勇气又害羞,声音发颤" style instructions; sent to MiMo's director channel, never read aloud) - Per-beat audio served with the scene response; frontend plays via hidden