588b668d14
* feat(engine): Architect agent + cross-scene StoryState coherence Add a dedicated Architect LLM call at session start that expands the terse world/style prompt into a persistent story bible (logline, genre, second- person protagonist, cast, engineered opening hook). The bible seeds a StoryState the Writer reads and patches every scene, carried + merged across cuts (applyStoryStatePatch) so the story keeps a spine from beat one instead of jumping between scenes. - prompts: inject web-novel / short-drama / galgame craft into Writer + Architect; Writer emits storyStatePatch to update the running bible - director: parallelize voice + non-entry portraits with the Painter (only entry-beat portraits block paint) to offset Architect latency - architect: chat/parse guarded so a malformed response never aborts start - types: StoryState / StoryStatePatch; required on Start/SceneResponse Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix (#2) * docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix - LICENSE: add GNU AGPL v3 with InfiPlot copyright notice - README.md: rewrite for open-source project, fix TTS description (TTS uses MiMo's own protocol, not OpenAI-compatible) - README.zh-CN.md: add Simplified Chinese translation - README.ja.md: add Japanese translation - package.json: change license from UNLICENSED to AGPL-3.0-only * fix: address Copilot review — .env.example TTS comment, zh-CN formatting - .env.example: clarify TTS uses MiMo's own protocol, not OpenAI-compatible - README.md: 'land paper after paper' → 'publish paper after paper' - README.zh-CN.md: add spaces around '5 月', fix code formatting for model names (deepseek-v4-flash) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
58 lines
2.8 KiB
Bash
58 lines
2.8 KiB
Bash
# =============================================================
|
|
# InfiPlot — AI 实时交互剧情游戏
|
|
# Recommended setup: Xiaomi MiMo Token Plan for TEXT / VISION / TTS
|
|
# (one API key covers all three) + Runware for IMAGE (FLUX.2 [klein]).
|
|
#
|
|
# TEXT / VISION use any OpenAI-compatible endpoint (any OpenAI-
|
|
# compatible host works: OpenRouter, OpenAI, Anthropic via proxy,
|
|
# Gemini, DeepSeek, Ollama, ...).
|
|
# TTS uses Xiaomi MiMo's own voice design / clone protocol
|
|
# (not OpenAI-compatible; appends -voicedesign / -voiceclone).
|
|
#
|
|
# IMAGE uses Runware's own task-array protocol (not OpenAI-compatible);
|
|
# the adapter posts an `imageInference` task to IMAGE_BASE_URL.
|
|
# =============================================================
|
|
|
|
# ---- 1. Text LLM · scene director ----------------------------------
|
|
# Any OpenAI-compatible endpoint works: OpenAI, Anthropic (via proxy),
|
|
# Gemini, OpenRouter, DeepSeek, OpenCode, MiMo, local Ollama, …
|
|
# Recommended starters:
|
|
# A. DeepSeek v4-flash direct (https://api.deepseek.com/v1) — pay-as-you-go,
|
|
# fastest first-token latency, very stable JSON output.
|
|
# B. OpenCode Go (https://opencode.ai/zen/go/v1) — $10/mo flat-rate bundle of
|
|
# 12 open-source models (DeepSeek v4-flash, Qwen, Kimi, GLM, MiMo, …).
|
|
# Cheaper at high volume, slower at the tail.
|
|
# C. MiMo v2.5 via Xiaomi Token Plan — bundles VISION + TTS in one tp- key.
|
|
TEXT_BASE_URL=https://api.deepseek.com/v1
|
|
TEXT_API_KEY=sk-xxx
|
|
TEXT_MODEL=deepseek-v4-flash
|
|
|
|
# ---- 2. Image generator (renders the scene background) -------------
|
|
# Recommended: Runware + FLUX.2 [klein] 9B KV — distilled 4-step model,
|
|
# sub-second inference at ~$0.0008/image. Sign up at https://runware.ai
|
|
# AIR ids for FLUX.2 [klein] variants:
|
|
# runware:400@1 · 4B (smaller)
|
|
# runware:400@6 · 9B KV (recommended — fastest at 16:9)
|
|
IMAGE_BASE_URL=https://api.runware.ai/v1
|
|
IMAGE_API_KEY=runware-xxx
|
|
IMAGE_MODEL=runware:400@6
|
|
|
|
# ---- 3. Vision model · multimodal click interpretation -------------
|
|
# Recommended: MiMo V2.5 — multimodal, accepts image_url content parts.
|
|
VISION_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
|
|
VISION_API_KEY=tp-xxx
|
|
VISION_MODEL=mimo-v2.5
|
|
|
|
# ---- 4. TTS · Xiaomi MiMo (optional — leave blank to disable) ------
|
|
# Per-character voice design → clone, with per-line delivery direction.
|
|
# Voice identity = the reference audio kept in the session (no server expiry).
|
|
# The adapter appends -voicedesign / -voiceclone to TTS_SPEECH_MODEL.
|
|
TTS_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
|
|
TTS_API_KEY=tp-xxx
|
|
TTS_SPEECH_MODEL=mimo-v2.5-tts
|
|
|
|
# ---- 5. MOCK_IMAGE — skip image generation (cheap TTS testing) -----
|
|
# true → return a placeholder image instead of calling the image model.
|
|
# Text/story/voice still run normally. Great for iterating on TTS.
|
|
MOCK_IMAGE=false
|