infiplot-web

infiplot/infiplot-web

Fork 0

Commit Graph

Author	SHA1	Message	Date
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	04f22249c9	fix(tts): make stepfun preset pick case-stable and per-character - Hash the lowercased description (matching the case-insensitive scoring) so the same archetype text picks the same preset regardless of case. - Thread the character name through provisionVoice -> stepfunProvision as the hash salt, so two characters that share archetype keywords spread across the top-N candidate presets instead of collapsing on one voice. Xiaomi path is unaffected (voicedesign mints a unique clip per call).	2026-06-09 09:14:44 +08:00
$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q	19bbee16fe	feat(tts): add StepFun preset-voice provider, route by URL + voice tag Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host (contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the image client infers Runware from `*.runware.ai`. CharacterVoice becomes a discriminated union on `provider`: - xiaomi: { referenceAudioBase64, mimeType } — unchanged - stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model Provision dispatches on the current cfg's base URL; synthesis dispatches on the voice's own `provider` tag so a session with mixed voices (e.g. a provider switch mid-development) routes each beat through the correct protocol. xiaomiSynthesize now guards against being called with a non- xiaomi voice, surfacing the bug as a clear runtime error instead of a TypeScript narrow violation at the access site. StepFun has no voicedesign equivalent — only preset voices + voice cloning from a reference audio upload. Cloning would require an extra asset per character, so v1 maps the LLM's Chinese voiceDescription to one of the 32 published preset IDs via gender + age + tone keyword scoring, with a deterministic hash spread across the top-3 candidates so multiple characters with similar descriptions don't collapse onto the identical preset. lineDelivery is accepted but not yet propagated to StepFun's voice_label.emotion / .style fields — left as a follow-up. beat-audio route validation relaxed from `voice.referenceAudioBase64` (xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices pass the gate; provider-specific shape errors still surface from the synth function. Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median ~2.3s per beat with 0% timeouts across the test session, vs MiMo's median ~8s with the long tail tripping the existing 15s synth budget on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14 per typical 50-beat session) vs MiMo TTS currently free under the Token Plan creator incentive. AGENTS.md provider matrix updated to describe both providers and the discriminated-union dispatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-08 17:15:02 +08:00
Zonghao Yuan	dc5ecd60f6	refactor: flatten monorepo to single web package (#12 ) Flatten the pnpm monorepo (apps/web + packages/) into a single web package at the repo root. - Move app/lib/components/scripts/public to root; drop apps/web and packages/ wrappers - Rewrite tsconfig paths (@infiplot/) to ./lib/; turbopack.root = __dirname - Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths - Regenerate pnpm-lock.yaml to drop stale workspace importers - Bump engines.node to >=22 to match wrangler Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-03 00:55:45 +08:00

Author

SHA1

Message

Date

$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q

04f22249c9

fix(tts): make stepfun preset pick case-stable and per-character

- Hash the lowercased description (matching the case-insensitive scoring)
  so the same archetype text picks the same preset regardless of case.
- Thread the character name through provisionVoice -> stepfunProvision as
  the hash salt, so two characters that share archetype keywords spread
  across the top-N candidate presets instead of collapsing on one voice.

Xiaomi path is unaffected (voicedesign mints a unique clip per call).

2026-06-09 09:14:44 +08:00

$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q

19bbee16fe

feat(tts): add StepFun preset-voice provider, route by URL + voice tag

Add StepFun step-tts-mini / step-tts-2 / stepaudio-2.5-tts as an alternate
TTS provider alongside Xiaomi MiMo. Auto-detected from TTS_BASE_URL host
(contains `stepfun.com` → StepFun; otherwise → MiMo), mirroring how the
image client infers Runware from `*.runware.ai`.

CharacterVoice becomes a discriminated union on `provider`:
- xiaomi: { referenceAudioBase64, mimeType } — unchanged
- stepfun: { voiceId, model, mimeType } — preset voice ID + chosen model

Provision dispatches on the current cfg's base URL; synthesis dispatches
on the voice's own `provider` tag so a session with mixed voices (e.g. a
provider switch mid-development) routes each beat through the correct
protocol. xiaomiSynthesize now guards against being called with a non-
xiaomi voice, surfacing the bug as a clear runtime error instead of a
TypeScript narrow violation at the access site.

StepFun has no voicedesign equivalent — only preset voices + voice
cloning from a reference audio upload. Cloning would require an extra
asset per character, so v1 maps the LLM's Chinese voiceDescription to one
of the 32 published preset IDs via gender + age + tone keyword scoring,
with a deterministic hash spread across the top-3 candidates so multiple
characters with similar descriptions don't collapse onto the identical
preset. lineDelivery is accepted but not yet propagated to StepFun's
voice_label.emotion / .style fields — left as a follow-up.

beat-audio route validation relaxed from `voice.referenceAudioBase64`
(xiaomi-shaped) to `voice.provider` (shape-agnostic), so stepfun voices
pass the gate; provider-specific shape errors still surface from the
synth function.

Observed latency on InfiPlot's dev loop: StepFun step-tts-mini median
~2.3s per beat with 0% timeouts across the test session, vs MiMo's
median ~8s with the long tail tripping the existing 15s synth budget
on roughly 2 of 3 beats. Pricing: step-tts-mini ¥0.9/万字符 (~¥0.14
per typical 50-beat session) vs MiMo TTS currently free under the
Token Plan creator incentive.

AGENTS.md provider matrix updated to describe both providers and the
discriminated-union dispatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-08 17:15:02 +08:00

Zonghao Yuan

dc5ecd60f6

refactor: flatten monorepo to single web package (#12 )

Flatten the pnpm monorepo (apps/web + packages/*) into a single web package at the repo root.

- Move app/lib/components/scripts/public to root; drop apps/web and packages/* wrappers
- Rewrite tsconfig paths (@infiplot/*) to ./lib/*; turbopack.root = __dirname
- Update Vercel (no root-directory) and Cloudflare (pnpm build:cf at root) deploy paths
- Regenerate pnpm-lock.yaml to drop stale workspace importers
- Bump engines.node to >=22 to match wrangler

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-03 00:55:45 +08:00

3 Commits