Merge staging into main (#3)

* feat(engine): Architect agent + cross-scene StoryState coherence Add a dedicated Architect LLM call at session start that expands the terse world/style prompt into a persistent story bible (logline, genre, second- person protagonist, cast, engineered opening hook). The bible seeds a StoryState the Writer reads and patches every scene, carried + merged across cuts (applyStoryStatePatch) so the story keeps a spine from beat one instead of jumping between scenes. - prompts: inject web-novel / short-drama / galgame craft into Writer + Architect; Writer emits storyStatePatch to update the running bible - director: parallelize voice + non-entry portraits with the Painter (only entry-beat portraits block paint) to offset Architect latency - architect: chat/parse guarded so a malformed response never aborts start - types: StoryState / StoryStatePatch; required on Start/SceneResponse Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix (#2) * docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix - LICENSE: add GNU AGPL v3 with InfiPlot copyright notice - README.md: rewrite for open-source project, fix TTS description (TTS uses MiMo's own protocol, not OpenAI-compatible) - README.zh-CN.md: add Simplified Chinese translation - README.ja.md: add Japanese translation - package.json: change license from UNLICENSED to AGPL-3.0-only * fix: address Copilot review — .env.example TTS comment, zh-CN formatting - .env.example: clarify TTS uses MiMo's own protocol, not OpenAI-compatible - README.md: 'land paper after paper' → 'publish paper after paper' - README.zh-CN.md: add spaces around '5 月', fix code formatting for model names (deepseek-v4-flash) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 13:41:37 +08:00
parent 16707cc255
commit 588b668d14
16 changed files with 1639 additions and 162 deletions
@@ -3,9 +3,11 @@
 # Recommended setup: Xiaomi MiMo Token Plan for TEXT / VISION / TTS
 # (one API key covers all three) + Runware for IMAGE (FLUX.2 [klein]).
 #
-# TEXT / VISION / TTS use OpenAI-compatible endpoints (any OpenAI-
+# TEXT / VISION use any OpenAI-compatible endpoint (any OpenAI-
 # compatible host works: OpenRouter, OpenAI, Anthropic via proxy,
 # Gemini, DeepSeek, Ollama, ...).
+# TTS uses Xiaomi MiMo's own voice design / clone protocol
+# (not OpenAI-compatible; appends -voicedesign / -voiceclone).
 #
 # IMAGE uses Runware's own task-array protocol (not OpenAI-compatible);
 # the adapter posts an `imageInference` task to IMAGE_BASE_URL.
@@ -187,6 +187,7 @@ function prefetchScenePath(
        const carriedBase: Session = {
          ...baseSession,
          characters: data.characters,
+          storyState: data.storyState,
        };
        prefetchScenePath(pool, carriedBase, [...steps, nextStep], depth + 1);
      }
@@ -539,6 +540,7 @@ function PlayInner() {
            },
          ],
          characters: data.characters,
+          storyState: data.storyState,
        };
        visitedBeatsRef.current = [data.scene.entryBeatId];
        setSession(initial);
@@ -635,6 +637,7 @@ function PlayInner() {
          },
        ],
        characters: result.characters,
+        storyState: result.storyState,
      };
      visitedBeatsRef.current = [result.scene.entryBeatId];
      setSession(newSession);