docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix (#2)

* docs: add AGPL-3.0 license, README i18n, and TTS accuracy fix - LICENSE: add GNU AGPL v3 with InfiPlot copyright notice - README.md: rewrite for open-source project, fix TTS description (TTS uses MiMo's own protocol, not OpenAI-compatible) - README.zh-CN.md: add Simplified Chinese translation - README.ja.md: add Japanese translation - package.json: change license from UNLICENSED to AGPL-3.0-only * fix: address Copilot review — .env.example TTS comment, zh-CN formatting - .env.example: clarify TTS uses MiMo's own protocol, not OpenAI-compatible - README.md: 'land paper after paper' → 'publish paper after paper' - README.zh-CN.md: add spaces around '5 月', fix code formatting for model names (deepseek-v4-flash)
2026-06-02 13:39:54 +08:00
parent 70d0927a3e
commit 9a3511f220
6 changed files with 1063 additions and 53 deletions
@@ -1,60 +1,101 @@
+English · [简体中文](README.zh-CN.md) · [日本語](README.ja.md)
+
 # InfiPlot

-> A real-time AI-generated interactive story game — painted scene by scene. You talk and explore within a scene; when the story turns a corner, it paints the next. You click. It paints. The story unfolds.
+> The first real-time, AI-generated interactive story game — describe the scene you've always fantasized about and watch it come alive in front of you: immersive, visual, and yours to step into. Every plot beat, every image, every character is designed and generated on the fly by a multimodal AI engine — all to grant that childhood wish of crossing over into the cartoon.
+
+[![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](LICENSE)
+[![GitHub stars](https://img.shields.io/github/stars/zonghaoyuan/infiplot?style=social)](https://github.com/zonghaoyuan/infiplot/stargazers)
+
+**▶ Free to play during the beta, no setup required — [infiplot.com](https://infiplot.com)**
+
+If you find this project interesting, a free ⭐ on the repo means the world to us — and it genuinely helps more people discover it. Thank you!
+
+---
+
+## What is InfiPlot
+
+InfiPlot is the world's first interactive story game with content generated by AI in real time. There are no pre-written plots, no pre-made characters, not even pre-recorded voices — everything is generated on demand, tailored to you. We're aiming for an experience on par with bishōjo games (galgame), otome games, and full-motion-video (FMV) games, while adding a personal, *participatory fantasy* — a richer, more immersive audio-visual experience that gives your imagination and curiosity free rein.
+
+In one line: what we're building is an AI-generated, real-time take on *Love Is All Around* (《完蛋！我被美女包围了！》).
+
+Whether you're a six-year-old, a twenty-something, thirty-five, or sixty, there's a fantasy here that belongs to you and you alone:
+
+Learn magic in the world of Harry Potter; become the one everyone at school adores and confesses to; publish paper after paper in top journals and conferences with grant money to spare; step into *Empresses in the Palace* and live out the court intrigue; or return to your younger self and make a different choice about something you regret…
+
+---
+
+## Why we built it
+
+We're a group of young people from Tsinghua University and other schools.
+
+On one hand, we're longtime, devoted players of galgames, otome games, FMV, and AI role-play games. Even while enjoying them, we kept imagining how much more delightful and thrilling it would be if the story choices weren't fixed in advance — or if you could truly interact with an AI character in depth, instead of just texting it through a chat app.
+
+On the other hand, we happen to know a little about large-model technology: enough to turn ideas into working software quickly with AI, and to have formed some modest views on the technical paths available and the limits of what today's tech can build.
+
+The spark came on April 22, 2026, when [@zan2434](https://x.com/zan2434) and others released [flipbook](https://flipbook.page/). We were stunned and delighted by this entirely new form of interaction. So one day in May, we agreed on the spot to build something like this — both to help people live out the fantasies they'd once set aside, and to explore the new modes of interaction that multimodal models make possible.
+
+The project is still very early and many features are far from polished. We'd love your feedback — open an [issue](https://github.com/zonghaoyuan/infiplot/issues), or join our dev team and explore the new possibilities with us, and satisfy your own curiosity.
+
+Get in touch: hi@infiplot.com

 ---

 ## How it works

-The story unfolds as a sequence of **scenes**. Each scene is one AI-painted background plus a short tree of **beats** — moments of narration, dialogue, and the occasional choice. You tap through a scene's beats and the image stays put; only when a choice leads somewhere genuinely new — another place, a new point of view, a jump in time — does the AI paint the next scene.
+Built on text, image, and audio models, we've assembled a multi-agent framework to deliver on InfiPlot's goal. We split the agents into five roles — **Architect, Writer, Character Designer, Cinematographer, and Painter** — that work together to keep the plot coherent, the characters consistent, and the scenes continuous, all while making the story as compelling as we can.

-```
-entering a scene
-        │
-        ▼
-1. Text LLM     directs the whole scene at once — a background prompt
-                plus a tree of beats (narration / dialogue / choices)
-        │
-        ▼
-2. Image model  paints the background once, 16:9, no UI baked in
-        │
-        ▼
-[ tap through beats — no model calls, instant ]
-        │
-        ├─ in-scene choice ──────▶ jump to another beat (instant)
-        │
-        └─ scene-change choice ──▶ the next scene
-                                   (usually pre-generated — see below)
+We call each complete playthrough a **story**.
+
+A story unfolds as a sequence of **scenes**. Each scene is one AI-painted background plus a short tree of **beats** — moments of narration, dialogue, and the occasional choice. You tap through a scene's beats and the image stays put; only when a choice leads somewhere genuinely new — another place, a new point of view, a jump in time — does the AI paint the next scene.
+
+```mermaid
+flowchart TD
+    U["Your input: world setting + art style"] --> A["Architect<br/>parses input → full story structure (first step)"]
+    A --> W["Writer<br/>directs this scene's beats: narration · dialogue · choices"]
+    subgraph SCENE["Generating one scene"]
+        direction TB
+        W --> C["Character Designer<br/>portrait + voice (parallel, per new character)"]
+        W --> S["Cinematographer<br/>shot composition + background prompt"]
+        C --> P["Painter<br/>renders the 16:9 background using portraits as reference"]
+        S --> P
+    end
+    P --> SC["One scene: background image + beat tree"]
+    SC -. speculatively pre-generate the next scene .-> W
 ```

-While you're reading one scene, the engine **speculatively generates the scenes your choices could lead to** — and, for unavoidable next steps, the scene after that. By the time you pick a direction, its image is usually already painted, so the cut feels instant.
+While you're reading one scene, the engine **speculatively generates the scenes your choices could lead to** — and, for unavoidable next steps, the scene after that. By the time you pick a direction, its image is usually already painted, so the cut feels instant. If you still notice some lag today, don't worry — we're working hard to bring it down.

-Clicking the background itself (not a button) routes through a **vision** model: it reads where you tapped and decides whether you're exploring the current scene (it inserts a beat — no new image) or moving on (a new scene).
+Clicking the background itself (not a button) routes through a **vision** model: it reads where you tapped and decides whether you're exploring the current scene (it inserts a beat — no new image) or moving on (a new scene). This builds on a valuable lesson we learned from flipbook, and we believe it will become one of InfiPlot's defining features — taking the experience to the next level.

-There is no traditional game UI baked into the art. The AI paints the world in whatever style you pick — "stick figure on grid paper" or "cyberpunk noir" — and the dialogue panel and choice buttons are a light HTML layer drawn on top, tuned to sit over the scene.
+There is no traditional game UI baked into the art. The AI paints the world in whatever style you pick — "stick figure on grid paper" or "cyberpunk noir" — and the dialogue panel and choice buttons are a light HTML layer drawn on top, tuned to sit over the scene. In other words, the UI fits the story of each playthrough, rather than staying the same every time.

 ---

 ## One-click deploy

-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision%2Ftts.&envLink=https://github.com/zonghaoyuan/infiplot%23environment-variables)
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&root-directory=apps/web&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot%23configuration-guide)

-After deploy, set the environment variables (see below) in your Vercel project. Nine are required; TTS is optional (leave blank to run silently); `MOCK_IMAGE=true` skips image generation for cheap TTS-only testing. The Vercel project's **Root Directory** must be set to `apps/web` (the deploy button passes this; if you configure manually, set it in Project Settings).
+After deploy, set your environment variables in the Vercel project — see the [Configuration guide](#configuration-guide) below. The Vercel project's **Root Directory** must be `apps/web` (the deploy button passes this automatically; if you configure manually, set it in Project Settings).

 ---

-## Environment variables
+## Configuration guide

-Three required providers + optional TTS. Text, Vision, and TTS accept any OpenAI-compatible endpoint (OpenAI, Anthropic via OpenAI-compat proxy, Gemini, OpenRouter, DeepSeek, local Ollama, …). Image goes to **Runware** (its own task-array protocol, not OpenAI-compatible).
+InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenAI-compatible endpoint** (OpenAI, Anthropic via an OpenAI-compat proxy, Gemini, OpenRouter, DeepSeek, local Ollama, …), so you can mix and match freely. **Image** currently goes to **Runware** (its own task-array protocol, not OpenAI-compatible), chosen for its combination of latency and cost. **TTS** uses **Xiaomi MiMo**'s own voice design / clone protocol (also not OpenAI-compatible) — per-character voice design, clone, and per-line delivery direction.
+
+**1. Choose your providers**

 | Provider | Variables | Required? | Recommended |
 |---|---|---|---|
-| Text · story director  | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL`        | ✅ | `claude-opus-4-7` via Anthropic |
-| Image · UI renderer    | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL`     | ✅ | `runware:400@6` (FLUX.2 [klein] 9B KV) via [Runware](https://runware.ai) |
-| Vision · click reader  | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL`  | ✅ | `gemini-3-flash` via Google |
-| TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo |
+| Text · story director  | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL`        | ✅ | `deepseek-v4-flash` via DeepSeek |
+| Image · scene renderer  | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL`     | ✅ | `runware:400@6` (FLUX.2 [klein] 9B KV) via [Runware](https://runware.ai) |
+| Vision · click reader  | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL`  | ✅ | `gemini-3.5-flash` via Google |
+| TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo (own protocol, not OpenAI-compat) |

-There's also a flag for cheap testing:
+**2. Set the environment variables**
+
+Set these in your Vercel project (**Settings → Environment Variables**), or in `apps/web/.env.local` for local runs. Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:

 | Variable | Effect |
 |---|---|
@@ -62,38 +103,46 @@ There's also a flag for cheap testing:

 See `apps/web/.env.example` for the exact shape.

+**3. Mind the cost**
+
+With the recommended trio, each **scene** is dominated by the text-LLM call. The FLUX.2 [klein] 9B KV image is roughly **\$0.00078** per scene (1792×1024, 4 steps, sub-second); the text call is the rest (very cheap with deepseek-v4-flash). Tapping through a scene's beats is free. To keep transitions instant, the engine also **pre-generates scenes you might pick but don't** — so real spend runs somewhat higher than the scenes you actually see. There is no rate limiting or auth out of the box — if you make your deployment public, your bill will reflect that. Add limits (and consider lowering the prefetch depth) before sharing widely.
+
 ---

-## Local development
+## Roadmap

-Requires Node 20+ and pnpm 9+.
+- [ ] Latency you can't perceive
+- [ ] Mobile browser support
+- [ ] Compatibility with most providers
+- [ ] User accounts and login
+- [ ] Upgrade from static images to motion video
+- [ ] Free-form player input mid-story
+- [ ] Voice interaction
+- [ ] Story sharing
+- [ ] Mobile app
+
+---
+
+## Contributing
+
+Issues and pull requests are welcome. To run InfiPlot locally (Node 20+, pnpm 9+):

 ```bash
 pnpm install
-cp apps/web/.env.example apps/web/.env.local
-# fill in env vars (9 required + optional TTS/MOCK_IMAGE)
-pnpm dev
-# open http://localhost:3000
+cp apps/web/.env.example apps/web/.env.local   # fill in your keys — see the Configuration guide
+pnpm dev                                        # open http://localhost:3000
 ```

+By contributing, you agree that your contributions are licensed under the AGPL-3.0.
+
 ---

-## Project layout
+## Star history

-```
-infiplot/
-├── apps/web/              Next.js 16 app — pages + API routes (Vercel root)
-└── packages/
-    ├── types/             shared TypeScript types
-    ├── ai-client/         unified OpenAI-compatible clients + Runware adapter
-    ├── tts-client/        Xiaomi MiMo TTS adapter
-    └── engine/            multi-agent AI orchestration (open core)
-```
-
-`packages/engine` is the open core — pure TS, no Next.js or browser dependency. Import it directly to build your own interactive-narrative front-end (Tauri, Electron, CLI, anywhere).
+[![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date)

 ---

-## Cost & limits
+## License

-With the recommended trio, each **scene** is dominated by the text-LLM call. The FLUX.2 [klein] 9B KV image is roughly **\$0.001** per scene (1792×1024, 4 steps, sub-second); the text call is the rest. Tapping through a scene's beats is free. To keep transitions instant, the engine also **pre-generates scenes you might pick but don't** — so real spend runs somewhat higher than the scenes you actually see. There is no rate limiting or auth out of the box — if you make your deployment public, your bill will reflect that. Add limits (and consider lowering the prefetch depth) before sharing widely.
+[AGPL-3.0](LICENSE) © InfiPlot. The core is fully open source. AGPL's "network use is distribution" clause means anyone who runs a modified version as a network service must also publish their source — this keeps the core open while leaving room for a future hosted or commercial edition.