Compare commits

...

10 Commits

Author SHA1 Message Date
Zonghao Yuan 09e898f6b0 Merge pull request #121 from zonghaoyuan/readme-demo-video
docs(readme): add demo video and restore config guide
2026-06-29 13:16:07 +08:00
yuanzonghao d05b564110 docs(readme): add demo video and restore config guide inline
Add demo video with inline <video> playback in all three README
language variants (zh/en/ja), hosted via GitHub user-attachments.

Restore the configuration guide from docs/configuration*.md back into
the main READMEs (positioned between Deploy and Roadmap) for better
discoverability. docs/configuration*.md files preserved for Vercel
button envLink references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-29 13:15:05 +08:00
Zonghao Yuan 3d80ac7f1c Merge pull request #120 from zonghaoyuan/readme-restructure
docs(readme): restructure for star conversion optimization
2026-06-29 12:20:07 +08:00
yuanzonghao 7dac77e200 docs(readme): restructure for star conversion optimization
- Restructure overview with scannable bullet lists for capabilities
- Move screenshots up (after overview), reduce from 14 to 6
- Extract configuration guide to docs/configuration{,.en,.ja}.md
- Update Vercel deploy button envLink to point to new config docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-29 12:16:03 +08:00
yuanzonghao c66ee38ddd docs(readme): mark story save & cloud sync as done in roadmap
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-28 23:33:49 +08:00
Zonghao Yuan 801e9004da Merge pull request #117 from zonghaoyuan/cloudflare-migration
feat(persistence): bidirectional local/cloud story sync (Supabase)
2026-06-28 23:09:43 +08:00
Kai ki 6ba5307c6c fix(persistence): address PR #117 review feedback
Adopt 8 PR-agent (Qodo) findings; 4 declined (concurrency already guarded by
the putSyncedRecord/markRecordSynced guards + RPC optimistic concurrency;
SQL-injection / won-equality / microtask-race are false positives — see PR reply).

- markRecordSynced: guard on updatedAt too — softDeleteStory doesn't bump rev,
  so a same-rev newer local tombstone must not be marked synced by an older
  push's ack (symmetric with putSyncedRecord's guard)
- recordToEnvelope: fallback timestamps to 0 not Date.now() (a corrupt record
  should lose LWW, not win as "now")
- push/delete routes: validate rev/updatedAt as finite -> 400 (was silent 200);
  push: Content-Length pre-check before buffering the body
- pushDeletion: idbGet a single record instead of a full-store scan
- manifest: Cache-Control private,no-store + client fetch cache:no-store
- cloudSyncClient: Array.isArray narrowing on items/blobs
- RPC: `if found` instead of `v_row.id is not null` after RETURNING INTO

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 11:52:09 +08:00
Kai ki 739af60848 Merge remote-tracking branch 'origin/staging' into cloudflare-migration 2026-06-28 11:21:20 +08:00
Kai ki ff12b2759f feat(persistence): bidirectional local/cloud story sync (Supabase)
Connect the previously-skeleton cloudStore to the client with a full
bidirectional reconcile engine. Commercial build (AUTH_ENABLED) only; the
open-source build is byte-for-byte unchanged — all cloud paths short-circuit
when AUTH_ENABLED is false.

- cloudSync.ts: reconcile engine — decideAction (pure, LWW rev->updatedAt with
  tombstone priority) + syncOnLogin/pushOnSave/pushDeletion (best-effort,
  serialized, isAuthed-gated)
- cloudSyncClient.ts: browser fetch bridge (short-circuit + fault-tolerant)
- /api/stories/{manifest,pull,push,delete}: RLS-guarded sync endpoints
- upsert_story_if_newer RPC: optimistic concurrency (SECURITY INVOKER,
  auth.uid() injection, rev->updated_at guard, revoked from public)
- cloudStore: +manifest/pullBlobs, save->RPC {stored,won}, softDelete w/ rev
- localStore: +listAllRecordsForSync/putSyncedRecord/markRecordSynced
  (concurrency-guarded sync writes); types: +StorySyncMeta/StorySyncEnvelope
- facade + UserChip: inject pushOnSave/pushDeletion + login-triggered reconcile

Sync model: full reconcile on login + background push on save (no Realtime;
eventual consistency). Conflict resolution: last-write-wins.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 11:20:47 +08:00
Zonghao Yuan cb5daf58ce fix(ci): grant CLA bot pull-request write permission (#115)
The CLA Assistant workflow had `pull-requests: read`, which prevented
the GITHUB_TOKEN from posting the sign-CLA comment on PRs. Change to
`pull-requests: write` so the bot can comment.

Also removed the `protect-cla-signatures` ruleset (GitHub-side) that
marked the signatures branch as protected, blocking the bot from
pushing signature records.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-27 20:04:10 +08:00
19 changed files with 1410 additions and 421 deletions
+1 -1
View File
@@ -21,7 +21,7 @@ on:
# branch-protection required check (cla/cla-assistant.yml) reports against. # branch-protection required check (cla/cla-assistant.yml) reports against.
permissions: permissions:
contents: read contents: read
pull-requests: read pull-requests: write
issues: write issues: write
statuses: write statuses: write
+131 -114
View File
@@ -27,9 +27,22 @@ InfiPlot is an interactive story game with content generated by AI in real time.
In one line: what we're building is an AI-generated, real-time take on *Love Is All Around* (《完蛋!我被美女包围了!》). In one line: what we're building is an AI-generated, real-time take on *Love Is All Around* (《完蛋!我被美女包围了!》).
Whether you're a six-year-old, a twenty-something, thirty-five, or sixty, there's a fantasy here that belongs to you and you alone: Whatever your age, there's a fantasy here that belongs to you alone:
Learn magic in the world of Harry Potter; become the one everyone at school adores and confesses to; publish paper after paper in top journals and conferences with grant money to spare; step into *Empresses in the Palace* and live out the court intrigue; or return to your younger self and make a different choice about something you regret… - Learn magic in the world of Harry Potter
- Become the one everyone at school adores and confesses to
- Publish top-tier papers and never run out of grant money
- Step into *Empresses in the Palace* and live out court intrigue
- Return to your younger self and choose differently about something you regret
- ……
Core capabilities:
- **Multi-agent collaboration** — Writer, Character Designer, Cinematographer, and Painter work together to keep story coherent and characters consistent
- **Speculative generation** — by the time you choose, the next scene is usually already painted; transitions feel instant
- **Click to explore** — tap anywhere on the scene; a vision model interprets your intent and responds
- **AI voice acting** — every character gets a unique voice, via Xiaomi MiMo (free) or StepFun (paid, higher quality)
- **Any art style** — stick figures, cyberpunk, watercolor, manga… generate in whatever style you want
--- ---
@@ -39,6 +52,53 @@ Free to play, no setup required: [infiplot.com](https://infiplot.com)
--- ---
## 🎬 Demo
<div align="center">
<video src="https://github.com/user-attachments/assets/414f0534-50c4-46d3-bc85-c681283b8c79" controls width="100%"></video>
</div>
---
## 📸 Screenshots
<table>
<tr>
<td><a href="docs/screenshots/1.webp"><img src="docs/screenshots/1.webp" width="420" alt="InfiPlot screenshot 1"></a></td>
<td><a href="docs/screenshots/3.webp"><img src="docs/screenshots/3.webp" width="420" alt="InfiPlot screenshot 3"></a></td>
</tr>
<tr>
<td><a href="docs/screenshots/6.webp"><img src="docs/screenshots/6.webp" width="420" alt="InfiPlot screenshot 6"></a></td>
<td><a href="docs/screenshots/8.webp"><img src="docs/screenshots/8.webp" width="420" alt="InfiPlot screenshot 8"></a></td>
</tr>
<tr>
<td><a href="docs/screenshots/12.webp"><img src="docs/screenshots/12.webp" width="420" alt="InfiPlot screenshot 12"></a></td>
<td><a href="docs/screenshots/14.webp"><img src="docs/screenshots/14.webp" width="420" alt="InfiPlot screenshot 14"></a></td>
</tr>
</table>
---
## How it works
Built on text, image, and audio models, we've assembled a multi-agent framework to deliver on InfiPlot's goal. We split the agents into four roles — **Writer, Character Designer, Cinematographer, and Painter** — that work together to keep the plot coherent, the characters consistent, and the scenes continuous, all while making the story as compelling as we can. The Writer also handles overall story architecture.
We call each complete playthrough a **story**.
A story unfolds as a sequence of scenes. Each scene is one AI-painted background plus a short tree of beats — moments of narration, dialogue, and the occasional choice. You tap through a scene's beats and the image stays put; only when a choice leads somewhere genuinely new — another place, a new point of view, a jump in time — does the AI paint the next scene.
<div align="center">
<img src="docs/pipeline.en.svg" alt="InfiPlot story generation pipeline" width="680">
</div>
While you're reading one scene, the engine speculatively generates the scenes your choices could lead to — and, for unavoidable next steps, the scene after that. By the time you pick a direction, its image is usually already painted, so the cut feels instant. If you still notice some lag today, don't worry — we're working hard to bring it down.
Clicking the background itself (not a button) routes through a vision model: it reads where you tapped and decides whether you're exploring the current scene (it inserts a beat — no new image) or moving on (a new scene). This builds on a valuable lesson we learned from flipbook, and we believe it will become one of InfiPlot's defining features — taking the experience to the next level.
There is no traditional game UI baked into the art. The AI paints the world in whatever style you pick — "stick figure on grid paper" or "cyberpunk noir" — and the dialogue panel and choice buttons are a light HTML layer drawn on top, tuned to sit over the scene. In other words, the UI fits the story of each playthrough, rather than staying the same every time.
---
## Deploy ## Deploy
InfiPlot offers multiple deployment options. For personal use, we recommend the one-click Vercel deploy; to self-host on your own server or local machine, use Docker. InfiPlot offers multiple deployment options. For personal use, we recommend the one-click Vercel deploy; to self-host on your own server or local machine, use Docker.
@@ -48,10 +108,10 @@ InfiPlot offers multiple deployment options. For personal use, we recommend the
Cloudflare deployment requires the Workers Paid Plan because the scene pipeline needs longer CPU time. OpenDeploy lets your AI agent handle the deployment for you. Cloudflare deployment requires the Workers Paid Plan because the scene pipeline needs longer CPU time. OpenDeploy lets your AI agent handle the deployment for you.
<a href="https://opendeploy.dev/github/zonghaoyuan/infiplot"><img src="https://oss.opendeploy.dev/static/deploy-with-your-agent.svg" alt="Deploy with your agent" height="34"></a>&nbsp; <a href="https://opendeploy.dev/github/zonghaoyuan/infiplot"><img src="https://oss.opendeploy.dev/static/deploy-with-your-agent.svg" alt="Deploy with your agent" height="34"></a>&nbsp;
<a href="https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.en.md%23configuration-guide"><img src="https://vercel.com/button" alt="Deploy with Vercel" height="34"></a>&nbsp; <a href="https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/docs/configuration.en.md"><img src="https://vercel.com/button" alt="Deploy with Vercel" height="34"></a>&nbsp;
<a href="https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" height="34"></a> <a href="https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" height="34"></a>
After deploy, fill in the environment variables — see the [Configuration guide](#configuration-guide) below. The repo root is the app itself: Vercel needs no special root directory; on Cloudflare, just set the build command to `pnpm build:cf`. After deploy, set up your environment variables following the [Configuration guide](#configuration-guide). The repo root is the app itself: Vercel needs no special root directory; on Cloudflare, just set the build command to `pnpm build:cf`.
### Docker (self-hosted) ### Docker (self-hosted)
@@ -79,58 +139,81 @@ Visit `http://localhost:3000` to start playing.
--- ---
## 📸 Screenshots ## Configuration guide
<table> InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenAI-compatible endpoint**, so you can mix and match freely — for Google Gemini, point `*_BASE_URL` at its OpenAI-compatible endpoint (`https://generativelanguage.googleapis.com/v1beta/openai`). For Anthropic Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's official endpoint offers an OpenAI-compatible layer but no caching, which raises cost and latency. **Image** supports **Runware** (its own task-array protocol) and **OpenAI** (`gpt-image`). **TTS** supports **Xiaomi MiMo** (its own voice design / clone protocol — per-character voice design, clone, and per-line delivery direction; free) and **StepFun** (32 preset voices, auto-matched by AI; paid but better quality).
<tr>
<td><a href="docs/screenshots/1.webp"><img src="docs/screenshots/1.webp" width="420" alt="InfiPlot screenshot 1"></a></td> ### 1. Choose your providers
<td><a href="docs/screenshots/2.webp"><img src="docs/screenshots/2.webp" width="420" alt="InfiPlot screenshot 2"></a></td>
</tr> | Provider | Variables | Required? | Recommended |
<tr> |---|---|---|---|
<td><a href="docs/screenshots/3.webp"><img src="docs/screenshots/3.webp" width="420" alt="InfiPlot screenshot 3"></a></td> | Text · story director | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | `deepseek-v4-flash` via DeepSeek |
<td><a href="docs/screenshots/4.webp"><img src="docs/screenshots/4.webp" width="420" alt="InfiPlot screenshot 4"></a></td> | Image · scene renderer | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | `runware:400@6` (FLUX.2 [klein] 9B KV) via [Runware](https://runware.ai) |
</tr> | Vision · click reader | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | `gemini-3.5-flash` via Google |
<tr> | TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo (free); paid alternative: `step-tts-2` via [StepFun](https://www.stepfun.com) |
<td><a href="docs/screenshots/5.webp"><img src="docs/screenshots/5.webp" width="420" alt="InfiPlot screenshot 5"></a></td>
<td><a href="docs/screenshots/6.webp"><img src="docs/screenshots/6.webp" width="420" alt="InfiPlot screenshot 6"></a></td> > **Optional · explicit protocol override**: each provider slot accepts a `*_PROVIDER` variable (`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`) to force a specific protocol. **Leave unset for backwards-compatible defaults** — text/vision default to OpenAI-compatible, image auto-detects from `*_BASE_URL` (`runware.ai` → Runware, otherwise OpenAI-compatible; models served via OpenAI protocol on `runware.ai` — such as `image-2-vip` — are handled as OpenAI-compatible; override with `IMAGE_PROVIDER` when needed).
</tr> >
<tr> > | Value | Applies to | Description |
<td><a href="docs/screenshots/7.webp"><img src="docs/screenshots/7.webp" width="420" alt="InfiPlot screenshot 7"></a></td> > |---|---|---|
<td><a href="docs/screenshots/8.webp"><img src="docs/screenshots/8.webp" width="420" alt="InfiPlot screenshot 8"></a></td> > | `openai_compatible` (default) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
</tr> > | `openai` | Image | OpenAI `gpt-image`, supports reference-image editing |
<tr> > | `runware` | Image | Runware task-array protocol |
<td><a href="docs/screenshots/9.webp"><img src="docs/screenshots/9.webp" width="420" alt="InfiPlot screenshot 9"></a></td> >
<td><a href="docs/screenshots/10.webp"><img src="docs/screenshots/10.webp" width="420" alt="InfiPlot screenshot 10"></a></td> > Text and vision **only** support `openai_compatible`. For Gemini, point `*_BASE_URL` at its OpenAI-compatible endpoint (`https://generativelanguage.googleapis.com/v1beta/openai`). For Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's official endpoint offers an OpenAI-compatible layer but no caching, raising cost and latency.
</tr> >
<tr> > `*_BASE_URL` works with or without a trailing `/v1` (or even a trailing `/chat/completions`) — the engine normalizes automatically.
<td><a href="docs/screenshots/11.webp"><img src="docs/screenshots/11.webp" width="420" alt="InfiPlot screenshot 11"></a></td>
<td><a href="docs/screenshots/12.webp"><img src="docs/screenshots/12.webp" width="420" alt="InfiPlot screenshot 12"></a></td> ### 2. Set the environment variables
</tr>
<tr> Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:
<td><a href="docs/screenshots/13.webp"><img src="docs/screenshots/13.webp" width="420" alt="InfiPlot screenshot 13"></a></td>
<td><a href="docs/screenshots/14.webp"><img src="docs/screenshots/14.webp" width="420" alt="InfiPlot screenshot 14"></a></td> | Variable | Effect |
</tr> |---|---|
</table> | `MOCK_IMAGE=true` | Skip image generation; the renderer returns a static placeholder. Story, voice, and choices still run normally. Great for iterating on TTS without burning Runware credits. |
Where to set them (see `.env.example` for the exact shape):
- **Local dev** — `.env.local`
- **Vercel** — Project Settings → Environment Variables
- **Cloudflare Workers** — from the repo root, run `wrangler secret put <NAME>` for each variable, or set them in the dashboard (Workers → infiplot → Settings → Variables and Secrets). For a private staging instance, gate the Worker behind [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) — zero-code email-whitelist auth in front of the Worker.
### 3. Mind the cost
With the recommended trio, each scene's cost comes mainly from the image generation model. The FLUX.2 [klein] 9B KV image is roughly **$0.00078** per scene (1792×1024, 4 steps, sub-second); the text model uses `deepseek-v4-flash`, so text costs are negligible by comparison. Tapping through a scene's beats is free. To keep transitions instant, the engine also pre-generates scenes you might pick but ultimately don't — so real spend runs somewhat higher than the scenes you actually see.
### 4. Image proxy (optional)
By default the browser fetches images directly from the provider — no setup needed; leave `NEXT_PUBLIC_IMAGE_PROXY_URL` blank and you're completely unaffected. You only want this if you hit progressive "top-to-bottom" image loading (Chrome's `ERR_QUIC_PROTOCOL_ERROR` on some networks paints partial PNGs row by row): deploy a tiny Cloudflare Worker that re-fetches images server-side and serves them atomically over HTTP/2. One-click deploy at **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**, then paste the `workers.dev` URL it prints into `NEXT_PUBLIC_IMAGE_PROXY_URL`.
### 5. Let players bring their own voice Key (optional, recommended)
Xiaomi rate-limits the TTS model by RPM/TPM. When a public deployment has many people playing at once through a single shared `TTS_API_KEY`, those limits are easy to hit — the symptom is **story and visuals work fine, but there's no audio**. To fix this, players can optionally enter **their own** Xiaomi MiMo key on the homepage (free to obtain). Synthesis then runs **browser-direct to Xiaomi**, the **key stays in the player's browser and never touches your server**, and they get stable voice with lower latency. It's purely additive: leave it blank and playback falls back to your server key exactly as before.
See the [Bring-your-own voice Key guide](docs/xiaomi-tts-key.md) for how to obtain and enter one.
--- ---
## How it works ## Roadmap
Built on text, image, and audio models, we've assembled a multi-agent framework to deliver on InfiPlot's goal. We split the agents into four roles — **Writer, Character Designer, Cinematographer, and Painter** — that work together to keep the plot coherent, the characters consistent, and the scenes continuous, all while making the story as compelling as we can. The Writer also handles overall story architecture. **Completed**
We call each complete playthrough a **story**. - [x] Latency optimized to ~10s
- [x] Vision-based image interaction
- [x] One-click deploy & custom model config
- [x] Frontend API Key & model setup
- [x] Mobile web support
- [x] Story sharing (`.infiplot` format)
- [x] OpenDeploy quick deployment
- [x] Story save & resume (local + cloud sync)
A story unfolds as a sequence of scenes. Each scene is one AI-painted background plus a short tree of beats — moments of narration, dialogue, and the occasional choice. You tap through a scene's beats and the image stays put; only when a choice leads somewhere genuinely new — another place, a new point of view, a jump in time — does the AI paint the next scene. **To Do**
<div align="center"> - [ ] Mobile app & creator platform
<img src="docs/pipeline.en.svg" alt="InfiPlot story generation pipeline" width="680"> - [ ] ComfyUI custom image generation
</div> - [ ] Reduce latency to under 5s
- [ ] Custom character cards & world settings
While you're reading one scene, the engine speculatively generates the scenes your choices could lead to — and, for unavoidable next steps, the scene after that. By the time you pick a direction, its image is usually already painted, so the cut feels instant. If you still notice some lag today, don't worry — we're working hard to bring it down. - [ ] Prompt cache hit-rate optimization
Clicking the background itself (not a button) routes through a vision model: it reads where you tapped and decides whether you're exploring the current scene (it inserts a beat — no new image) or moving on (a new scene). This builds on a valuable lesson we learned from flipbook, and we believe it will become one of InfiPlot's defining features — taking the experience to the next level.
There is no traditional game UI baked into the art. The AI paints the world in whatever style you pick — "stick figure on grid paper" or "cyberpunk noir" — and the dialogue panel and choice buttons are a light HTML layer drawn on top, tuned to sit over the scene. In other words, the UI fits the story of each playthrough, rather than staying the same every time.
--- ---
@@ -156,72 +239,6 @@ Scan to join our **beta community on QQ** (group ID `575404333`) to share feedba
--- ---
## Configuration guide
InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenAI-compatible endpoint**, so you can mix and match freely — for Google Gemini, point `*_BASE_URL` at its OpenAI-compatible endpoint (`https://generativelanguage.googleapis.com/v1beta/openai`). For Anthropic Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's official endpoint offers an OpenAI-compatible layer but no caching, which raises cost and latency. **Image** supports **Runware** (its own task-array protocol) and **OpenAI** (`gpt-image`). **TTS** supports **Xiaomi MiMo** (its own voice design / clone protocol — per-character voice design, clone, and per-line delivery direction; free) and **StepFun** (32 preset voices, auto-matched by AI; paid but better quality).
**1. Choose your providers**
| Provider | Variables | Required? | Recommended |
|---|---|---|---|
| Text · story director | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | `deepseek-v4-flash` via DeepSeek |
| Image · scene renderer | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | `runware:400@6` (FLUX.2 [klein] 9B KV) via [Runware](https://runware.ai) |
| Vision · click reader | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | `gemini-3.5-flash` via Google |
| TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo (free); paid alternative: `step-tts-2` via [StepFun](https://www.stepfun.com) |
**2. Set the environment variables**
Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:
| Variable | Effect |
|---|---|
| `MOCK_IMAGE=true` | Skip image generation; the renderer returns a static placeholder. Story, voice, and choices still run normally. Great for iterating on TTS without burning Runware credits. |
Where to set them (see `.env.example` for the exact shape):
- **Local dev** — `.env.local`
- **Vercel** — Project Settings → Environment Variables
- **Cloudflare Workers** — from the repo root, run `wrangler secret put <NAME>` for each variable, or set them in the dashboard (Workers → infiplot → Settings → Variables and Secrets). For a private staging instance, gate the Worker behind [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) — zero-code email-whitelist auth in front of the Worker.
**3. Mind the cost**
With the recommended trio, each scene's cost comes mainly from the image generation model. The FLUX.2 [klein] 9B KV image is roughly **\$0.00078** per scene (1792×1024, 4 steps, sub-second); the text model uses `deepseek-v4-flash`, so text costs are negligible by comparison. Tapping through a scene's beats is free. To keep transitions instant, the engine also pre-generates scenes you might pick but ultimately don't — so real spend runs somewhat higher than the scenes you actually see.
**4. Image proxy (optional)**
By default the browser fetches images directly from the provider — no setup needed; leave `NEXT_PUBLIC_IMAGE_PROXY_URL` blank and you're completely unaffected. You only want this if you hit progressive "top-to-bottom" image loading (Chrome's `ERR_QUIC_PROTOCOL_ERROR` on some networks paints partial PNGs row by row): deploy a tiny Cloudflare Worker that re-fetches images server-side and serves them atomically over HTTP/2. One-click deploy at **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**, then paste the `workers.dev` URL it prints into `NEXT_PUBLIC_IMAGE_PROXY_URL`.
**5. Let players bring their own voice Key (optional, recommended)**
Xiaomi rate-limits the TTS model by RPM/TPM. When a public deployment has many people playing at once through a single shared `TTS_API_KEY`, those limits are easy to hit — the symptom is **story and visuals work fine, but there's no audio**. To fix this, players can optionally enter **their own** Xiaomi MiMo key on the homepage (free to obtain). Synthesis then runs **browser-direct to Xiaomi**, the **key stays in the player's browser and never touches your server**, and they get stable voice with lower latency. It's purely additive: leave it blank and playback falls back to your server key exactly as before.
See the [Bring-your-own voice Key guide](docs/xiaomi-tts-key.md) for how to obtain and enter one.
---
## Roadmap
**Completed**
- [x] Latency optimized to ~10s
- [x] Vision-based image interaction
- [x] One-click deploy & custom model config
- [x] Frontend API Key & model setup
- [x] Mobile web support
- [x] Story sharing (`.infiplot` format)
- [x] OpenDeploy quick deployment
**To Do**
- [ ] Mobile app & creator platform
- [ ] ComfyUI custom image generation
- [ ] Reduce latency to under 5s
- [ ] Story save & resume
- [ ] Custom character cards & world settings
- [ ] Prompt cache hit-rate optimization
---
## Star history ## Star history
[![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date) [![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date)
+131 -114
View File
@@ -27,9 +27,22 @@ InfiPlot は、AI がコンテンツをリアルタイムに生成するイン
ひとことで言えば、私たちが作っているのは、AI がリアルタイムにコンテンツを生成する『Love Is All Around(完蛋!我被美女包围了!)』です。 ひとことで言えば、私たちが作っているのは、AI がリアルタイムにコンテンツを生成する『Love Is All Around(完蛋!我被美女包围了!)』です。
6 歳の子どもでも、20 代の若者でも、35 歳でも、60 歳でも —— ここにはあなただけのファンタジーがあります: どんな方でも、ここにはあなただけのファンタジーがあります:
ハリー・ポッターの世界で魔法を学ぶ。学校で誰もが憧れ、想いを寄せる存在になる。トップ誌・トップ会議に論文を出し続け、研究費にも事欠かない。『宮廷の諍い女(甄嬛传)』の世界で宮廷の駆け引きを味わう。あるいは若い頃に戻り、悔いの残るあの選択をやり直す…… - ハリー・ポッターの世界で魔法を学ぶ
- 学校で誰もが憧れ、想いを寄せる存在になる
- トップ誌・トップ会議に論文を出し続け、研究費にも事欠かない
- 『宮廷の諍い女(甄嬛传)』の世界で宮廷の駆け引きを味わう
- 若い頃に戻り、悔いの残るあの選択をやり直す
- ……
コア機能:
- **マルチエージェント連携** — 脚本家・キャラクターデザイナー・撮影監督・絵師がそれぞれの役割を担い、物語の一貫性とキャラクターの統一感を保つ
- **先読み生成** — あなたが選択する頃には、次のシーンはたいてい描き上がっていて、切り替えは一瞬
- **クリック探索** — 画面のどこでもタップでき、ビジョンモデルがあなたの意図を解釈して応答
- **AI ボイス** — キャラクターごとに固有の声を生成。Xiaomi MiMo(無料)または StepFun(有料・高品質)に対応
- **自由な画風** — 棒人間、サイバーパンク、水彩、漫画……あらゆるスタイルで生成可能
--- ---
@@ -39,6 +52,53 @@ InfiPlot は、AI がコンテンツをリアルタイムに生成するイン
--- ---
## 🎬 デモ
<div align="center">
<video src="https://github.com/user-attachments/assets/414f0534-50c4-46d3-bc85-c681283b8c79" controls width="100%"></video>
</div>
---
## 📸 スクリーンショット
<table>
<tr>
<td><a href="docs/screenshots/1.webp"><img src="docs/screenshots/1.webp" width="420" alt="InfiPlot スクリーンショット 1"></a></td>
<td><a href="docs/screenshots/3.webp"><img src="docs/screenshots/3.webp" width="420" alt="InfiPlot スクリーンショット 3"></a></td>
</tr>
<tr>
<td><a href="docs/screenshots/6.webp"><img src="docs/screenshots/6.webp" width="420" alt="InfiPlot スクリーンショット 6"></a></td>
<td><a href="docs/screenshots/8.webp"><img src="docs/screenshots/8.webp" width="420" alt="InfiPlot スクリーンショット 8"></a></td>
</tr>
<tr>
<td><a href="docs/screenshots/12.webp"><img src="docs/screenshots/12.webp" width="420" alt="InfiPlot スクリーンショット 12"></a></td>
<td><a href="docs/screenshots/14.webp"><img src="docs/screenshots/14.webp" width="420" alt="InfiPlot スクリーンショット 14"></a></td>
</tr>
</table>
---
## 仕組み
テキスト・画像・音声モデルを基盤に、私たちは InfiPlot の目標を実現するためのマルチエージェント・フレームワークを構築しました。エージェントを **脚本家(Writer)・キャラクターデザイナー(Character Designer)・撮影監督(Cinematographer)・絵師(Painter** の 4 つの役割に分け、互いに連携させることで、物語の一貫性・キャラクターの一貫性・シーンの連続性を保ちつつ、できる限り魅力的な物語を目指します。脚本家は物語全体の構造設計も兼ねています。
一回のプレイ全体を、私たちは**ストーリー(story)**と呼んでいます。
物語は一連のシーン(scene)として展開します。各シーンは、AI が描いた 1 枚の背景画と、短いビート(beat)のツリー —— ナレーション、セリフ、ときおりの選択肢 —— で構成されます。シーン内のビートをタップしていく間、画像はそのまま動きません。選択肢が本当に新しい場所 —— 別の空間、新しい視点、時間の跳躍 —— へ導いたときだけ、AI は次のシーンを描きます。
<div align="center">
<img src="docs/pipeline.ja.svg" alt="InfiPlot 物語生成パイプライン" width="680">
</div>
あなたがひとつのシーンを読んでいる間に、エンジンは選択肢が導きうるシーンを先回りして生成します —— 避けられない次の一歩については、そのさらに先のシーンまで。あなたが方向を選ぶ頃には、その画像はたいてい描き上がっているので、切り替えは一瞬に感じられます。いまはまだ多少の遅延を感じるかもしれませんが、ご安心ください —— 私たちは鋭意改善に取り組んでいます。
ボタンではなく背景そのものをクリックすると、ビジョン(vision)モデルを経由します。タップした位置を読み取り、いまのシーンを探索しているのか(新しい画像なしでビートを挿入)、先へ進もうとしているのか(新しいシーン)を判断します。これは flipbook から学んだ貴重な知見に基づくもので、この機能はいずれ InfiPlot を特徴づける鍵となり、プレイ体験をもう一段引き上げてくれると信じています。
アートの中には、従来型のゲーム UI は一切焼き込まれていません。AI は、あなたが選んだ任意のスタイル —— 「方眼紙の棒人間」でも「サイバーパンク・ノワール」でも —— で世界を描きます。セリフ枠と選択肢ボタンは、その上に重ねた軽量な HTML レイヤーで、シーンになじむよう調整されています。つまり UI は、毎回同じではなく、そのプレイの物語に寄り添って変化するのです。
---
## デプロイ ## デプロイ
InfiPlot は複数のデプロイ方法に対応しています。個人利用には Vercel のワンクリックデプロイをおすすめします。自分のサーバーやローカルマシンで動かしたい場合は Docker を使ってください。 InfiPlot は複数のデプロイ方法に対応しています。個人利用には Vercel のワンクリックデプロイをおすすめします。自分のサーバーやローカルマシンで動かしたい場合は Docker を使ってください。
@@ -48,10 +108,10 @@ InfiPlot は複数のデプロイ方法に対応しています。個人利用
Cloudflare へのデプロイはシーンパイプラインがより長い CPU 時間を必要とするため、Workers Paid Plan が必要です。OpenDeploy では AI エージェントにデプロイを任せることができます。 Cloudflare へのデプロイはシーンパイプラインがより長い CPU 時間を必要とするため、Workers Paid Plan が必要です。OpenDeploy では AI エージェントにデプロイを任せることができます。
<a href="https://opendeploy.dev/github/zonghaoyuan/infiplot"><img src="https://oss.opendeploy.dev/static/deploy-with-your-agent.svg" alt="Deploy with your agent" height="34"></a>&nbsp; <a href="https://opendeploy.dev/github/zonghaoyuan/infiplot"><img src="https://oss.opendeploy.dev/static/deploy-with-your-agent.svg" alt="Deploy with your agent" height="34"></a>&nbsp;
<a href="https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.ja.md%23%E8%A8%AD%E5%AE%9A%E3%82%AC%E3%82%A4%E3%83%89"><img src="https://vercel.com/button" alt="Deploy with Vercel" height="34"></a>&nbsp; <a href="https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/docs/configuration.ja.md"><img src="https://vercel.com/button" alt="Deploy with Vercel" height="34"></a>&nbsp;
<a href="https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" height="34"></a> <a href="https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" height="34"></a>
デプロイ後、環境変数を設定してください —— 下記の[設定ガイド](#設定ガイド)を参照。リポジトリのルートがアプリ本体です:Vercel では特別なルート設定は不要です。Cloudflare ではビルドコマンドを `pnpm build:cf` に設定するだけで済みます。 デプロイ後、[設定ガイド](#設定ガイド)に従って環境変数を設定してください。リポジトリのルートがアプリ本体です:Vercel では特別なルート設定は不要です。Cloudflare ではビルドコマンドを `pnpm build:cf` に設定するだけで済みます。
### Docker デプロイ(セルフホスト) ### Docker デプロイ(セルフホスト)
@@ -79,58 +139,81 @@ docker compose up -d
--- ---
## 📸 スクリーンショット ## 設定ガイド
<table> InfiPlot は 4 種類のモデルプロバイダと通信します。**テキスト(Text)・ビジョン(Vision)は、任意の OpenAI 互換エンドポイント**を使用でき、自由に組み合わせられます —— Google Gemini を使う場合は、`*_BASE_URL` をその OpenAI 互換エンドポイント(`https://generativelanguage.googleapis.com/v1beta/openai`)に向けるだけです。Anthropic Claude を使う場合は、互換ゲートウェイ(LiteLLM など)の経由を推奨します —— Anthropic の公式エンドポイントは OpenAI 互換レイヤーを提供していますがキャッシュ非対応のため、コストとレイテンシが上昇します。**画像(Image)**は **Runware**(独自の task-array プロトコル)と **OpenAI**`gpt-image`)に対応します。**音声(TTS)**は **Xiaomi MiMo**(独自の音声デザイン/クローンプロトコル —— キャラクターごとの音声デザイン、クローン、行ごとの抑揚指示に対応、無料)と **StepFun**(32 種のプリセット音声を AI が自動マッチング、有料ですがより高品質)に対応します。
<tr>
<td><a href="docs/screenshots/1.webp"><img src="docs/screenshots/1.webp" width="420" alt="InfiPlot スクリーンショット 1"></a></td> ### 1. プロバイダを選ぶ
<td><a href="docs/screenshots/2.webp"><img src="docs/screenshots/2.webp" width="420" alt="InfiPlot スクリーンショット 2"></a></td>
</tr> | プロバイダ | 環境変数 | 必須? | 推奨 |
<tr> |---|---|---|---|
<td><a href="docs/screenshots/3.webp"><img src="docs/screenshots/3.webp" width="420" alt="InfiPlot スクリーンショット 3"></a></td> | Text · ストーリー監督 | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | DeepSeek の `deepseek-v4-flash` |
<td><a href="docs/screenshots/4.webp"><img src="docs/screenshots/4.webp" width="420" alt="InfiPlot スクリーンショット 4"></a></td> | Image · シーン描画 | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | [Runware](https://runware.ai) の `runware:400@6`FLUX.2 [klein] 9B KV |
</tr> | Vision · クリック解釈 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google の `gemini-3.5-flash` |
<tr> | TTS · キャラクター音声 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 任意 —— 空欄なら無音で動作 | Xiaomi MiMo の `mimo-v2.5-tts`(無料);有料の選択肢:[StepFun](https://www.stepfun.com) の `step-tts-2` |
<td><a href="docs/screenshots/5.webp"><img src="docs/screenshots/5.webp" width="420" alt="InfiPlot スクリーンショット 5"></a></td>
<td><a href="docs/screenshots/6.webp"><img src="docs/screenshots/6.webp" width="420" alt="InfiPlot スクリーンショット 6"></a></td> > **オプション · プロトコルの明示的指定**:各プロバイダスロットには `*_PROVIDER` 変数(`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`)を追加して、使用するプロトコルを明示的に指定できます。**未設定なら後方互換のデフォルト**を維持します —— テキスト/ビジョンは OpenAI 互換がデフォルト、画像は `*_BASE_URL` から自動判定(`runware.ai` → Runware、それ以外は OpenAI 互換。`runware.ai` 上で OpenAI プロトコルで提供されるモデル —— `image-2-vip` など —— は OpenAI 互換として処理されます。必要に応じて `IMAGE_PROVIDER` で上書きしてください)。
</tr> >
<tr> > | 値 | 対象 | 説明 |
<td><a href="docs/screenshots/7.webp"><img src="docs/screenshots/7.webp" width="420" alt="InfiPlot スクリーンショット 7"></a></td> > |---|---|---|
<td><a href="docs/screenshots/8.webp"><img src="docs/screenshots/8.webp" width="420" alt="InfiPlot スクリーンショット 8"></a></td> > | `openai_compatible`(デフォルト) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
</tr> > | `openai` | Image | OpenAI `gpt-image`、参照画像編集に対応 |
<tr> > | `runware` | Image | Runware task-array プロトコル |
<td><a href="docs/screenshots/9.webp"><img src="docs/screenshots/9.webp" width="420" alt="InfiPlot スクリーンショット 9"></a></td> >
<td><a href="docs/screenshots/10.webp"><img src="docs/screenshots/10.webp" width="420" alt="InfiPlot スクリーンショット 10"></a></td> > テキストとビジョンは `openai_compatible` **のみ**対応。Gemini を使う場合は `*_BASE_URL` をその OpenAI 互換エンドポイント(`https://generativelanguage.googleapis.com/v1beta/openai`)に向けてください。Claude を使う場合は互換ゲートウェイ(LiteLLM など)の経由を推奨 —— Anthropic の公式エンドポイントは OpenAI 互換レイヤーを提供していますが、キャッシュ非対応のためコストとレイテンシが上昇します。
</tr> >
<tr> > `*_BASE_URL` は末尾に `/v1` があってもなくても(`/chat/completions` まで付いていても)正常に動作します —— エンジンが自動で正規化します。
<td><a href="docs/screenshots/11.webp"><img src="docs/screenshots/11.webp" width="420" alt="InfiPlot スクリーンショット 11"></a></td>
<td><a href="docs/screenshots/12.webp"><img src="docs/screenshots/12.webp" width="420" alt="InfiPlot スクリーンショット 12"></a></td> ### 2. 環境変数を設定する
</tr>
<tr> 9 つの変数が必須で、TTS は任意です(空欄なら無音で動作)。低コストなテスト用のフラグもあります。
<td><a href="docs/screenshots/13.webp"><img src="docs/screenshots/13.webp" width="420" alt="InfiPlot スクリーンショット 13"></a></td>
<td><a href="docs/screenshots/14.webp"><img src="docs/screenshots/14.webp" width="420" alt="InfiPlot スクリーンショット 14"></a></td> | 変数 | 効果 |
</tr> |---|---|
</table> | `MOCK_IMAGE=true` | 画像生成をスキップし、レンダラが静的なプレースホルダを返します。ストーリー・音声・選択肢は通常どおり動作します。Runware のクレジットを消費せずに TTS を調整するのに最適です。 |
設定場所(正確なフォーマットは `.env.example` を参照):
- **ローカル開発** —— `.env.local`
- **Vercel** —— Project Settings → Environment Variables
- **Cloudflare Workers** —— リポジトリのルートから各変数について `wrangler secret put <NAME>` を実行するか、ダッシュボード(Workers → infiplot → Settings → Variables and Secrets)で設定します。ステージング環境にアクセス制限を掛けたい場合は、Worker の前に [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) を挟むと、ゼロコードでメール許可リスト方式の認証が利用できます。
### 3. コストに注意
推奨の 3 点セットでは、各シーンのコストは主に画像生成モデルによるものです。FLUX.2 [klein] 9B KV の画像は 1 シーンあたり概ね **$0.00078**1792×1024、4 ステップ、サブ秒)。テキストモデルは `deepseek-v4-flash` を使用するため、テキストコストは比較になりません。シーン内のビートをタップしていくのは無料です。切り替えを一瞬に保つため、エンジンは選ぶ可能性はあるが最終的に選ばないシーンも先行生成します —— そのため実際の支出は、あなたが実際に見るシーン数よりやや高くなります。
### 4. 画像プロキシ(オプション)
デフォルトではブラウザが画像プロバイダーに直接アクセスするため、設定は不要です —— `NEXT_PUBLIC_IMAGE_PROXY_URL` を空欄のままにすれば、まったく影響ありません。画像が「上から順に」表示される現象(一部のネットワークで Chrome の `ERR_QUIC_PROTOCOL_ERROR` により PNG が行ごとに描画される)に遭遇した場合のみ必要です。小さな Cloudflare Worker をデプロイすると、画像をサーバー側で再取得し HTTP/2 で一括返却します。ワンクリックデプロイは **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)** を参照し、出力された `workers.dev` の URL を `NEXT_PUBLIC_IMAGE_PROXY_URL` に設定してください。
### 5. プレイヤー自身の音声 Key(任意・推奨)
Xiaomi は TTS モデルに RPM/TPM 制限を設けています。公開デプロイで多数のプレイヤーが単一の `TTS_API_KEY` を共有して同時にプレイすると、この制限に達しやすく、**ストーリーも画像も正常なのに音声だけ出ない**という症状になります。対策として、プレイヤーはトップページで**自分の** Xiaomi MiMo Key(無料で取得可)を任意で入力できます。合成は**ブラウザから Xiaomi へ直接**行われ、**Key はプレイヤーのブラウザ内にのみ保存され、あなたのサーバーを一切経由しません**。これにより安定した音声と低遅延が得られます。完全な追加機能であり、未入力ならこれまで通りサーバー側の Key にフォールバックします。
取得・入力の手順は [音声 Key 持ち込みガイド](docs/xiaomi-tts-key.md) を参照してください。
--- ---
## 仕組み ## Roadmap
テキスト・画像・音声モデルを基盤に、私たちは InfiPlot の目標を実現するためのマルチエージェント・フレームワークを構築しました。エージェントを **脚本家(Writer)・キャラクターデザイナー(Character Designer)・撮影監督(Cinematographer)・絵師(Painter** の 4 つの役割に分け、互いに連携させることで、物語の一貫性・キャラクターの一貫性・シーンの連続性を保ちつつ、できる限り魅力的な物語を目指します。脚本家は物語全体の構造設計も兼ねています。 **実装済み**
一回のプレイ全体を、私たちは**ストーリー(story)**と呼んでいます。 - [x] レイテンシを約 10 秒に最適化
- [x] ビジョンベース画像インタラクション
- [x] ワンクリックデプロイ&カスタムモデル設定
- [x] フロントエンドで API Key・モデル設定
- [x] モバイル Web 対応
- [x] ストーリー共有(`.infiplot` 形式)
- [x] OpenDeploy クイックデプロイ
- [x] ストーリーの保存・再開(ローカル + クラウド同期)
物語は一連のシーン(scene)として展開します。各シーンは、AI が描いた 1 枚の背景画と、短いビート(beat)のツリー —— ナレーション、セリフ、ときおりの選択肢 —— で構成されます。シーン内のビートをタップしていく間、画像はそのまま動きません。選択肢が本当に新しい場所 —— 別の空間、新しい視点、時間の跳躍 —— へ導いたときだけ、AI は次のシーンを描きます。 **未実装**
<div align="center"> - [ ] モバイルアプリ&クリエイタープラットフォーム
<img src="docs/pipeline.ja.svg" alt="InfiPlot 物語生成パイプライン" width="680"> - [ ] ComfyUI カスタム画像生成対応
</div> - [ ] レイテンシを 5 秒以内に短縮
- [ ] カスタムキャラクターカード&世界観設定
あなたがひとつのシーンを読んでいる間に、エンジンは選択肢が導きうるシーンを先回りして生成します —— 避けられない次の一歩については、そのさらに先のシーンまで。あなたが方向を選ぶ頃には、その画像はたいてい描き上がっているので、切り替えは一瞬に感じられます。いまはまだ多少の遅延を感じるかもしれませんが、ご安心ください —— 私たちは鋭意改善に取り組んでいます。 - [ ] プロンプトキャッシュヒット率の最適化
ボタンではなく背景そのものをクリックすると、ビジョン(vision)モデルを経由します。タップした位置を読み取り、いまのシーンを探索しているのか(新しい画像なしでビートを挿入)、先へ進もうとしているのか(新しいシーン)を判断します。これは flipbook から学んだ貴重な知見に基づくもので、この機能はいずれ InfiPlot を特徴づける鍵となり、プレイ体験をもう一段引き上げてくれると信じています。
アートの中には、従来型のゲーム UI は一切焼き込まれていません。AI は、あなたが選んだ任意のスタイル —— 「方眼紙の棒人間」でも「サイバーパンク・ノワール」でも —— で世界を描きます。セリフ枠と選択肢ボタンは、その上に重ねた軽量な HTML レイヤーで、シーンになじむよう調整されています。つまり UI は、毎回同じではなく、そのプレイの物語に寄り添って変化するのです。
--- ---
@@ -155,72 +238,6 @@ docker compose up -d
--- ---
## 設定ガイド
InfiPlot は 4 種類のモデルプロバイダと通信します。**テキスト(Text)・ビジョン(Vision)は、任意の OpenAI 互換エンドポイント**を使用でき、自由に組み合わせられます —— Google Gemini を使う場合は、`*_BASE_URL` をその OpenAI 互換エンドポイント(`https://generativelanguage.googleapis.com/v1beta/openai`)に向けるだけです。Anthropic Claude を使う場合は、互換ゲートウェイ(LiteLLM など)の経由を推奨します —— Anthropic の公式エンドポイントは OpenAI 互換レイヤーを提供していますがキャッシュ非対応のため、コストとレイテンシが上昇します。**画像(Image)**は **Runware**(独自の task-array プロトコル)と **OpenAI**`gpt-image`)に対応します。**音声(TTS)**は **Xiaomi MiMo**(独自の音声デザイン/クローンプロトコル —— キャラクターごとの音声デザイン、クローン、行ごとの抑揚指示に対応、無料)と **StepFun**(32 種のプリセット音声を AI が自動マッチング、有料ですがより高品質)に対応します。
**1. プロバイダを選ぶ**
| プロバイダ | 環境変数 | 必須? | 推奨 |
|---|---|---|---|
| Text · ストーリー監督 | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | DeepSeek の `deepseek-v4-flash` |
| Image · シーン描画 | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | [Runware](https://runware.ai) の `runware:400@6`FLUX.2 [klein] 9B KV |
| Vision · クリック解釈 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google の `gemini-3.5-flash` |
| TTS · キャラクター音声 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 任意 —— 空欄なら無音で動作 | Xiaomi MiMo の `mimo-v2.5-tts`(無料);有料の選択肢:[StepFun](https://www.stepfun.com) の `step-tts-2` |
**2. 環境変数を設定する**
9 つの変数が必須で、TTS は任意です(空欄なら無音で動作)。低コストなテスト用のフラグもあります。
| 変数 | 効果 |
|---|---|
| `MOCK_IMAGE=true` | 画像生成をスキップし、レンダラが静的なプレースホルダを返します。ストーリー・音声・選択肢は通常どおり動作します。Runware のクレジットを消費せずに TTS を調整するのに最適です。 |
設定場所(正確なフォーマットは `.env.example` を参照):
- **ローカル開発** —— `.env.local`
- **Vercel** —— Project Settings → Environment Variables
- **Cloudflare Workers** —— リポジトリのルートから各変数について `wrangler secret put <NAME>` を実行するか、ダッシュボード(Workers → infiplot → Settings → Variables and Secrets)で設定します。ステージング環境にアクセス制限を掛けたい場合は、Worker の前に [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) を挟むと、ゼロコードでメール許可リスト方式の認証が利用できます。
**3. コストに注意**
推奨の 3 点セットでは、各シーンのコストは主に画像生成モデルによるものです。FLUX.2 [klein] 9B KV の画像は 1 シーンあたり概ね **$0.00078**1792×1024、4 ステップ、サブ秒)。テキストモデルは `deepseek-v4-flash` を使用するため、テキストコストは比較になりません。シーン内のビートをタップしていくのは無料です。切り替えを一瞬に保つため、エンジンは選ぶ可能性はあるが最終的に選ばないシーンも先行生成します —— そのため実際の支出は、あなたが実際に見るシーン数よりやや高くなります。
**4. 画像プロキシ(オプション)**
デフォルトではブラウザが画像プロバイダーに直接アクセスするため、設定は不要です —— `NEXT_PUBLIC_IMAGE_PROXY_URL` を空欄のままにすれば、まったく影響ありません。画像が「上から順に」表示される現象(一部のネットワークで Chrome の `ERR_QUIC_PROTOCOL_ERROR` により PNG が行ごとに描画される)に遭遇した場合のみ必要です。小さな Cloudflare Worker をデプロイすると、画像をサーバー側で再取得し HTTP/2 で一括返却します。ワンクリックデプロイは **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)** を参照し、出力された `workers.dev` の URL を `NEXT_PUBLIC_IMAGE_PROXY_URL` に設定してください。
**5. プレイヤー自身の音声 Key(任意・推奨)**
Xiaomi は TTS モデルに RPM/TPM 制限を設けています。公開デプロイで多数のプレイヤーが単一の `TTS_API_KEY` を共有して同時にプレイすると、この制限に達しやすく、**ストーリーも画像も正常なのに音声だけ出ない**という症状になります。対策として、プレイヤーはトップページで**自分の** Xiaomi MiMo Key(無料で取得可)を任意で入力できます。合成は**ブラウザから Xiaomi へ直接**行われ、**Key はプレイヤーのブラウザ内にのみ保存され、あなたのサーバーを一切経由しません**。これにより安定した音声と低遅延が得られます。完全な追加機能であり、未入力ならこれまで通りサーバー側の Key にフォールバックします。
取得・入力の手順は [音声 Key 持ち込みガイド](docs/xiaomi-tts-key.md) を参照してください。
---
## Roadmap
**実装済み**
- [x] レイテンシを約 10 秒に最適化
- [x] ビジョンベース画像インタラクション
- [x] ワンクリックデプロイ&カスタムモデル設定
- [x] フロントエンドで API Key・モデル設定
- [x] モバイル Web 対応
- [x] ストーリー共有(`.infiplot` 形式)
- [x] OpenDeploy クイックデプロイ
**未実装**
- [ ] モバイルアプリ&クリエイタープラットフォーム
- [ ] ComfyUI カスタム画像生成対応
- [ ] レイテンシを 5 秒以内に短縮
- [ ] ストーリーの保存・再開
- [ ] カスタムキャラクターカード&世界観設定
- [ ] プロンプトキャッシュヒット率の最適化
---
## スター推移 ## スター推移
[![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date) [![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date)
+131 -126
View File
@@ -27,9 +27,22 @@ InfiPlot是一款AI实时生成内容的互动剧情游戏,这里没有预设
用一句话说,我们要做的是一款用AI实时生成内容的《完蛋!我被美女包围了!》 用一句话说,我们要做的是一款用AI实时生成内容的《完蛋!我被美女包围了!》
无论你是六岁的小朋友,20岁的年轻人,35岁的青年还是60岁的长者,都能在这里满足独属于你的幻想: 无论你是,都能在这里满足独属于你的幻想:
穿越到哈利波特世界学习魔法、成为学校里所有异性青睐和表达爱意的对象、顶刊顶会发不停科研经费拿到手软、穿越到甄嬛传体验宫廷斗争、或者重返年轻为遗憾的事情重新做选择...... - 穿越到哈利波特世界学习魔法
- 成为学校里所有异性青睐和表达爱意的对象
- 顶刊顶会发不停,科研经费拿到手软
- 穿越到甄嬛传体验宫廷斗争
- 重返年轻,为遗憾的事情重新做选择
- ......
核心能力:
- **多智能体协作** — 编剧、角色设计师、场景布置师、画家各司其职,保证剧情连贯、角色一致
- **预测式生成** — 你做出选择时,下一幕通常已经画好,切换瞬间完成
- **点击探索** — 直接点击画面任意位置,vision 模型会理解你的意图并做出响应
- **AI 配音** — 每个角色拥有独特声线,支持小米 MiMo(免费)和 StepFun(付费高品质)
- **风格自由** — 火柴人、赛博朋克、水彩、漫画......任意风格都能生成
--- ---
@@ -39,6 +52,53 @@ InfiPlot是一款AI实时生成内容的互动剧情游戏,这里没有预设
--- ---
## 🎬 Demo
<div align="center">
<video src="https://github.com/user-attachments/assets/414f0534-50c4-46d3-bc85-c681283b8c79" controls width="100%"></video>
</div>
---
## 📸 游戏截图
<table>
<tr>
<td><a href="docs/screenshots/1.webp"><img src="docs/screenshots/1.webp" width="420" alt="InfiPlot 游戏截图 1"></a></td>
<td><a href="docs/screenshots/3.webp"><img src="docs/screenshots/3.webp" width="420" alt="InfiPlot 游戏截图 3"></a></td>
</tr>
<tr>
<td><a href="docs/screenshots/6.webp"><img src="docs/screenshots/6.webp" width="420" alt="InfiPlot 游戏截图 6"></a></td>
<td><a href="docs/screenshots/8.webp"><img src="docs/screenshots/8.webp" width="420" alt="InfiPlot 游戏截图 8"></a></td>
</tr>
<tr>
<td><a href="docs/screenshots/12.webp"><img src="docs/screenshots/12.webp" width="420" alt="InfiPlot 游戏截图 12"></a></td>
<td><a href="docs/screenshots/14.webp"><img src="docs/screenshots/14.webp" width="420" alt="InfiPlot 游戏截图 14"></a></td>
</tr>
</table>
---
## 工作原理
基于文本、图像和音频模型,我们搭建了一个多智能体框架来实现InfiPlot的目标。我们把agent分为编剧、角色设计师、场景布置师和画家四个职能,让他们之间相互配合,在保证剧情连贯性、角色一致性、场景一致性的基础上,尽可能使得剧情足够富有吸引力。其中编剧同时负责剧情的整体架构规划。
我们把每一次游玩的整体体验称为故事(story)。
故事以一连串场景(scene)的形式展开。每个场景由一张 AI 绘制的背景图,加上一棵简短的节拍(beat)树组成 —— 也就是旁白、对话和偶尔出现的选项。你逐拍点过一个场景时,画面始终不变;只有当某个选项把你带到真正全新的地方 —— 换了空间、换了视角、跳跃了时间 —— AI 才会绘制下一幕场景。
<div align="center">
<img src="docs/pipeline.zh.svg" alt="InfiPlot 生成流水线流程图" width="680">
</div>
当你正在阅读一幕场景时,引擎会预测式地生成你的选项可能通向的那些场景 —— 对于无法回避的下一步,还会再往前生成一幕。等你真正选定方向时,那一幕的图通常已经画好了,于是切换瞬间完成、毫无停顿。如果你现在仍然感到有些延迟,别担心,我们正在努力优化它。
直接点击背景本身(而非按钮)会走一个视觉(vision)模型:它读取你点击的位置,判断你是在探索当前场景(于是插入一个节拍 —— 不生成新图),还是要继续前进(生成一幕新场景)。这是基于我们从flipbook那里学到的宝贵认知,我们相信这个功能会在未来成为InfiPlot的关键功能,让你的游玩体验更上一层楼。
未来,画面里将没有烤进任何传统的游戏 UI。AI 会用你选择的任意风格来描绘整个世界 —— 「方格纸上的火柴人」也好,「赛博朋克黑色电影」也罢 —— 而对话框和选项按钮,只是叠在画面之上、并为贴合场景而精心调校过的一层轻量 HTML。也就是说,每次游玩时,UI都会契合当前的故事,而不是一成不变。
---
## 部署 ## 部署
InfiPlot 支持多种部署方式。个人使用推荐 Vercel 一键部署;想部署到自己的服务器或本地运行,可以用 Docker。 InfiPlot 支持多种部署方式。个人使用推荐 Vercel 一键部署;想部署到自己的服务器或本地运行,可以用 Docker。
@@ -48,10 +108,10 @@ InfiPlot 支持多种部署方式。个人使用推荐 Vercel 一键部署;想
Cloudflare 部署因场景流水线需要更长 CPU 时间,需要 Workers Paid Plan。OpenDeploy 支持让 AI Agent 帮你完成部署。 Cloudflare 部署因场景流水线需要更长 CPU 时间,需要 Workers Paid Plan。OpenDeploy 支持让 AI Agent 帮你完成部署。
<a href="https://opendeploy.dev/github/zonghaoyuan/infiplot"><img src="https://oss.opendeploy.dev/static/deploy-with-your-agent.svg" alt="Deploy with your agent" height="34"></a>&nbsp; <a href="https://opendeploy.dev/github/zonghaoyuan/infiplot"><img src="https://oss.opendeploy.dev/static/deploy-with-your-agent.svg" alt="Deploy with your agent" height="34"></a>&nbsp;
<a href="https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot%23%E9%85%8D%E7%BD%AE%E6%95%99%E7%A8%8B"><img src="https://vercel.com/button" alt="Deploy with Vercel" height="34"></a>&nbsp; <a href="https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/docs/configuration.md"><img src="https://vercel.com/button" alt="Deploy with Vercel" height="34"></a>&nbsp;
<a href="https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" height="34"></a> <a href="https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot"><img src="https://deploy.workers.cloudflare.com/button" alt="Deploy to Cloudflare" height="34"></a>
部署完成后,填好环境变量 —— 详见下方的[配置教程](#配置教程)。仓库根目录就是应用本身:Vercel 无需额外设置 root directory;在 Cloudflare 上把构建命令设为 `pnpm build:cf` 即可。 部署完成后,按照[配置教程](#配置教程)设置环境变量即可开始游戏。仓库根目录就是应用本身:Vercel 无需额外设置 root directory;在 Cloudflare 上把构建命令设为 `pnpm build:cf` 即可。
### Docker 部署(自托管) ### Docker 部署(自托管)
@@ -79,58 +139,81 @@ docker compose up -d
--- ---
## 📸 游戏截图 ## 配置教程
<table> InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Vision** 只走 OpenAI 兼容接口——想用 Google Gemini 的话,把 `*_BASE_URL` 指向其 OpenAI 兼容端点(`https://generativelanguage.googleapis.com/v1beta/openai`)即可;想用 Anthropic Claude 的话,推荐通过兼容网关(如 LiteLLM)转发,官方 OpenAI 兼容层不支持缓存,可能推高成本与延迟。**图像(Image)** 支持 **Runware**(其自有 task-array 协议)与 **OpenAI**`gpt-image`)。**语音(TTS** 支持**小米 MiMo**(自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导,免费)和 **StepFun 阶跃星辰**(32 个预设音色,由 AI 自动匹配,付费但体验更好)。
<tr>
<td><a href="docs/screenshots/1.webp"><img src="docs/screenshots/1.webp" width="420" alt="InfiPlot 游戏截图 1"></a></td> ### 1. 选择你的供应商
<td><a href="docs/screenshots/2.webp"><img src="docs/screenshots/2.webp" width="420" alt="InfiPlot 游戏截图 2"></a></td>
</tr> | 供应商 | 环境变量 | 是否必填 | 推荐 |
<tr> |---|---|---|---|
<td><a href="docs/screenshots/3.webp"><img src="docs/screenshots/3.webp" width="420" alt="InfiPlot 游戏截图 3"></a></td> | Text · 剧情导演 | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | DeepSeek 的 `deepseek-v4-flash` |
<td><a href="docs/screenshots/4.webp"><img src="docs/screenshots/4.webp" width="420" alt="InfiPlot 游戏截图 4"></a></td> | Image · 场景渲染 | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | [Runware](https://runware.ai) 的 `runware:400@6`FLUX.2 [klein] 9B KV |
</tr> | Vision · 点击解读 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google 的 `gemini-3.5-flash` |
<tr> | TTS · 角色配音 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 可选 —— 留空则静音运行 | 小米 MiMo 的 `mimo-v2.5-tts`(免费);付费可选 [StepFun](https://www.stepfun.com) 的 `step-tts-2` |
<td><a href="docs/screenshots/5.webp"><img src="docs/screenshots/5.webp" width="420" alt="InfiPlot 游戏截图 5"></a></td>
<td><a href="docs/screenshots/6.webp"><img src="docs/screenshots/6.webp" width="420" alt="InfiPlot 游戏截图 6"></a></td> > **可选 · 指定接口协议**:每类模型都可加一个 `*_PROVIDER` 变量(`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`)显式选择接口协议。**不设则保持向后兼容**——文本/视觉默认走 OpenAI 兼容接口,图像按 `*_BASE_URL` 自动判断(`runware.ai` → Runware,否则 OpenAI 兼容;个别在 `runware.ai` 上以 OpenAI 协议提供的模型——如 `image-2-vip`——会按 OpenAI 兼容处理,需要时用 `IMAGE_PROVIDER` 显式覆盖即可)。
</tr> >
<tr> > | 取值 | 适用 | 说明 |
<td><a href="docs/screenshots/7.webp"><img src="docs/screenshots/7.webp" width="420" alt="InfiPlot 游戏截图 7"></a></td> > |---|---|---|
<td><a href="docs/screenshots/8.webp"><img src="docs/screenshots/8.webp" width="420" alt="InfiPlot 游戏截图 8"></a></td> > | `openai_compatible`(默认) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
</tr> > | `openai` | Image | OpenAI `gpt-image`,支持参考图编辑 |
<tr> > | `runware` | Image | Runware task-array 协议 |
<td><a href="docs/screenshots/9.webp"><img src="docs/screenshots/9.webp" width="420" alt="InfiPlot 游戏截图 9"></a></td> >
<td><a href="docs/screenshots/10.webp"><img src="docs/screenshots/10.webp" width="420" alt="InfiPlot 游戏截图 10"></a></td> > 文本和视觉**仅**支持 `openai_compatible`。要用 Gemini,把 `*_BASE_URL` 指向其 OpenAI 兼容端点(`https://generativelanguage.googleapis.com/v1beta/openai`)即可。要用 Claude,推荐通过兼容网关(如 LiteLLM)转发——Anthropic 官方端点虽提供 OpenAI 兼容层,但不支持缓存,会推高成本与延迟。
</tr> >
<tr> > 此外,`*_BASE_URL` 带不带 `/v1`(甚至末尾多写了 `/chat/completions`)都能正常工作——引擎会自动规范化。
<td><a href="docs/screenshots/11.webp"><img src="docs/screenshots/11.webp" width="420" alt="InfiPlot 游戏截图 11"></a></td>
<td><a href="docs/screenshots/12.webp"><img src="docs/screenshots/12.webp" width="420" alt="InfiPlot 游戏截图 12"></a></td> ### 2. 填写环境变量
</tr>
<tr> 九个变量为必填;TTS 可选(留空则静音运行)。此外还有一个用于低成本测试的开关:
<td><a href="docs/screenshots/13.webp"><img src="docs/screenshots/13.webp" width="420" alt="InfiPlot 游戏截图 13"></a></td>
<td><a href="docs/screenshots/14.webp"><img src="docs/screenshots/14.webp" width="420" alt="InfiPlot 游戏截图 14"></a></td> | 变量 | 作用 |
</tr> |---|---|
</table> | `MOCK_IMAGE=true` | 跳过图像生成,渲染器返回一张静态占位图。剧情、语音、选项照常运行。非常适合在不消耗 Runware 额度的情况下调试 TTS。 |
在哪里设置(确切字段见 `.env.example`):
- **本地开发** —— `.env.local`
- **Vercel** —— Project Settings → Environment Variables
- **Cloudflare Workers** —— 在仓库根目录下逐个执行 `wrangler secret put <NAME>`,或在 dashboard 里设置(Workers → infiplot → Settings → Variables and Secrets)。如果要给 staging 加访问限制,可以在 Worker 前面挂一个 [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/)(零代码,邮箱白名单)。
### 3. 注意成本
使用推荐的三件套时,每一幕场景的开销主要来自图像生成模型。FLUX.2 [klein] 9B KV 的图像大约 **$0.00078** 一张(1792×1024,4 步,亚秒级);文本模型使用 `deepseek-v4-flash` 时,成本极低。逐拍点过一个场景是免费的。为了让切换瞬间完成,引擎还会预测式地生成那些你可能选、但最终可能没选的场景 —— 所以真实花费会比你实际看到的场景数略高一些。
### 4. 图片代理(可选)
默认浏览器直连图片供应商,无需任何配置 —— 留空 `NEXT_PUBLIC_IMAGE_PROXY_URL` 即可,完全不受影响。只有当你遇到图片「层层加载」(Chrome 在某些网络下 `ERR_QUIC_PROTOCOL_ERROR` 导致 PNG 逐行渲染)时才需要它:部署一个极小的 Cloudflare Worker,把图片改为服务端转发 + HTTP/2 原子返回。一键部署见 **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**,然后把它给出的 `workers.dev` 地址填进 `NEXT_PUBLIC_IMAGE_PROXY_URL`
### 5. 玩家自带配音 Key(可选,推荐)
小米对 TTS 模型有 RPM/TPM 限额。当你的公共部署有多人同时游玩、共用同一把 `TTS_API_KEY` 时,很容易撞到限额,表现为**剧情、画面都正常,唯独没有声音**。为此,玩家可以在首页可选地填入**自己的**小米 MiMo Key(免费申请)——配音请求由**浏览器直连小米**完成,**Key 只存在玩家本地、绝不经过你的服务器**,从而获得稳定配音与更低延迟。这是纯增强:不填则照常使用你部署的服务器 Key,行为不变。
申请与填写步骤见 [自带配音 Key 教程](docs/xiaomi-tts-key.md)。
--- ---
## 工作原理 ## Roadmap
基于文本、图像和音频模型,我们搭建了一个多智能体框架来实现InfiPlot的目标。我们把agent分为编剧、角色设计师、场景布置师和画家四个职能,让他们之间相互配合,在保证剧情连贯性、角色一致性、场景一致性的基础上,尽可能使得剧情足够富有吸引力。其中编剧同时负责剧情的整体架构规划。 **已实现**
我们把每一次游玩的整体体验称为故事(story)。 - [x] 延迟优化至约 10 秒
- [x] 视觉识图交互
- [x] 一键部署与自定义模型配置
- [x] 前端直配 API Key 与模型
- [x] 移动端 Web 适配
- [x] 剧情分享(`.infiplot` 格式)
- [x] OpenDeploy 快速部署
- [x] 剧情存档与续玩(本地 + 云端同步)
故事以一连串场景(scene)的形式展开。每个场景由一张 AI 绘制的背景图,加上一棵简短的节拍(beat)树组成 —— 也就是旁白、对话和偶尔出现的选项。你逐拍点过一个场景时,画面始终不变;只有当某个选项把你带到真正全新的地方 —— 换了空间、换了视角、跳跃了时间 —— AI 才会绘制下一幕场景。 **未实现**
<div align="center"> - [ ] 移动端 App 与创作平台
<img src="docs/pipeline.zh.svg" alt="InfiPlot 生成流水线流程图" width="680"> - [ ] 兼容 ComfyUI 自定义生图
</div> - [ ] 延迟压缩至 5 秒以内
- [ ] 自定义角色卡与世界观
当你正在阅读一幕场景时,引擎会预测式地生成你的选项可能通向的那些场景 —— 对于无法回避的下一步,还会再往前生成一幕。等你真正选定方向时,那一幕的图通常已经画好了,于是切换瞬间完成、毫无停顿。如果你现在仍然感到有些延迟,别担心,我们正在努力优化它。 - [ ] Prompt 缓存命中率优化
直接点击背景本身(而非按钮)会走一个视觉(vision)模型:它读取你点击的位置,判断你是在探索当前场景(于是插入一个节拍 —— 不生成新图),还是要继续前进(生成一幕新场景)。这是基于我们从flipbook那里学到的宝贵认知,我们相信这个功能会在未来成为InfiPlot的关键功能,让你的游玩体验更上一层楼。
未来,画面里将没有烤进任何传统的游戏 UI。AI 会用你选择的任意风格来描绘整个世界 —— 「方格纸上的火柴人」也好,「赛博朋克黑色电影」也罢 —— 而对话框和选项按钮,只是叠在画面之上、并为贴合场景而精心调校过的一层轻量 HTML。也就是说,每次游玩时,UI都会契合当前的故事,而不是一成不变。
--- ---
@@ -155,84 +238,6 @@ docker compose up -d
--- ---
## 配置教程
InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Vision** 只走 OpenAI 兼容接口——想用 Google Gemini 的话,把 `*_BASE_URL` 指向其 OpenAI 兼容端点(`https://generativelanguage.googleapis.com/v1beta/openai`)即可;想用 Anthropic Claude 的话,推荐通过兼容网关(如 LiteLLM)转发,官方 OpenAI 兼容层不支持缓存,可能推高成本与延迟。**图像(Image)** 支持 **Runware**(其自有 task-array 协议)与 **OpenAI**`gpt-image`)。**语音(TTS** 支持**小米 MiMo**(自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导,免费)和 **StepFun 阶跃星辰**(32 个预设音色,由 AI 自动匹配,付费但体验更好)。
**1. 选择你的供应商**
| 供应商 | 环境变量 | 是否必填 | 推荐 |
|---|---|---|---|
| Text · 剧情导演 | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | DeepSeek 的 `deepseek-v4-flash` |
| Image · 场景渲染 | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | [Runware](https://runware.ai) 的 `runware:400@6`FLUX.2 [klein] 9B KV |
| Vision · 点击解读 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google 的 `gemini-3.5-flash` |
| TTS · 角色配音 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 可选 —— 留空则静音运行 | 小米 MiMo 的 `mimo-v2.5-tts`(免费);付费可选 [StepFun](https://www.stepfun.com) 的 `step-tts-2` |
> **可选 · 指定接口协议**:每类模型都可加一个 `*_PROVIDER` 变量(`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`)显式选择接口协议。**不设则保持向后兼容**——文本/视觉默认走 OpenAI 兼容接口,图像按 `*_BASE_URL` 自动判断(`runware.ai` → Runware,否则 OpenAI 兼容;个别在 `runware.ai` 上以 OpenAI 协议提供的模型——如 `image-2-vip`——会按 OpenAI 兼容处理,需要时用 `IMAGE_PROVIDER` 显式覆盖即可)。
>
> | 取值 | 适用 | 说明 |
> |---|---|---|
> | `openai_compatible`(默认) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
> | `openai` | Image | OpenAI `gpt-image`,支持参考图编辑 |
> | `runware` | Image | Runware task-array 协议 |
>
> 文本和视觉**仅**支持 `openai_compatible`。要用 Gemini,把 `*_BASE_URL` 指向其 OpenAI 兼容端点(`https://generativelanguage.googleapis.com/v1beta/openai`)即可。要用 Claude,推荐通过兼容网关(如 LiteLLM)转发——Anthropic 官方端点虽提供 OpenAI 兼容层,但不支持缓存,会推高成本与延迟。
>
> 此外,`*_BASE_URL` 带不带 `/v1`(甚至末尾多写了 `/chat/completions`)都能正常工作——引擎会自动规范化。
**2. 填写环境变量**
九个变量为必填;TTS 可选(留空则静音运行)。此外还有一个用于低成本测试的开关:
| 变量 | 作用 |
|---|---|
| `MOCK_IMAGE=true` | 跳过图像生成,渲染器返回一张静态占位图。剧情、语音、选项照常运行。非常适合在不消耗 Runware 额度的情况下调试 TTS。 |
在哪里设置(确切字段见 `.env.example`):
- **本地开发** —— `.env.local`
- **Vercel** —— Project Settings → Environment Variables
- **Cloudflare Workers** —— 在仓库根目录下逐个执行 `wrangler secret put <NAME>`,或在 dashboard 里设置(Workers → infiplot → Settings → Variables and Secrets)。如果要给 staging 加访问限制,可以在 Worker 前面挂一个 [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/)(零代码,邮箱白名单)。
**3. 注意成本**
使用推荐的三件套时,每一幕场景的开销主要来自图像生成模型。FLUX.2 [klein] 9B KV 的图像大约 **$0.00078** 一张(1792×1024,4 步,亚秒级);文本模型使用 `deepseek-v4-flash` 时,成本极低。逐拍点过一个场景是免费的。为了让切换瞬间完成,引擎还会预测式地生成那些你可能选、但最终可能没选的场景 —— 所以真实花费会比你实际看到的场景数略高一些。
**4. 图片代理(可选)**
默认浏览器直连图片供应商,无需任何配置 —— 留空 `NEXT_PUBLIC_IMAGE_PROXY_URL` 即可,完全不受影响。只有当你遇到图片「层层加载」(Chrome 在某些网络下 `ERR_QUIC_PROTOCOL_ERROR` 导致 PNG 逐行渲染)时才需要它:部署一个极小的 Cloudflare Worker,把图片改为服务端转发 + HTTP/2 原子返回。一键部署见 **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**,然后把它给出的 `workers.dev` 地址填进 `NEXT_PUBLIC_IMAGE_PROXY_URL`
**5. 玩家自带配音 Key(可选,推荐)**
小米对 TTS 模型有 RPM/TPM 限额。当你的公共部署有多人同时游玩、共用同一把 `TTS_API_KEY` 时,很容易撞到限额,表现为**剧情、画面都正常,唯独没有声音**。为此,玩家可以在首页可选地填入**自己的**小米 MiMo Key(免费申请)——配音请求由**浏览器直连小米**完成,**Key 只存在玩家本地、绝不经过你的服务器**,从而获得稳定配音与更低延迟。这是纯增强:不填则照常使用你部署的服务器 Key,行为不变。
申请与填写步骤见 [自带配音 Key 教程](docs/xiaomi-tts-key.md)。
---
## Roadmap
**已实现**
- [x] 延迟优化至约 10 秒
- [x] 视觉识图交互
- [x] 一键部署与自定义模型配置
- [x] 前端直配 API Key 与模型
- [x] 移动端 Web 适配
- [x] 剧情分享(`.infiplot` 格式)
- [x] OpenDeploy 快速部署
**未实现**
- [ ] 移动端 App 与创作平台
- [ ] 兼容 ComfyUI 自定义生图
- [ ] 延迟压缩至 5 秒以内
- [ ] 剧情存档与续玩
- [ ] 自定义角色卡与世界观
- [ ] Prompt 缓存命中率优化
---
## Star 趋势 ## Star 趋势
[![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date) [![Star History Chart](https://api.star-history.com/svg?repos=zonghaoyuan/infiplot&type=Date)](https://star-history.com/#zonghaoyuan/infiplot&Date)
+38
View File
@@ -0,0 +1,38 @@
import { NextResponse } from "next/server";
import { requireUser } from "@/lib/supabase/guard";
import { cloudSoftDeleteStory } from "@/lib/persistence/cloudStore";
export const runtime = "nodejs";
// POST /api/stories/delete — body { id, rev, deletedAt } → { ok }. Propagates a
// soft-delete (tombstone) under the same optimistic-concurrency guard as push.
// requireUser 401s an unauthenticated commercial caller; on the open-source
// build cloudSoftDeleteStory short-circuits to false.
export async function POST(req: Request) {
const auth = await requireUser();
if (auth instanceof NextResponse) return auth;
let body: { id?: unknown; rev?: unknown; deletedAt?: unknown };
try {
body = await req.json();
} catch {
return NextResponse.json({ error: "invalid json" }, { status: 400 });
}
const id = typeof body.id === "string" ? body.id : "";
if (!id) {
return NextResponse.json({ error: "missing id" }, { status: 400 });
}
// Validate rev/deletedAt as finite values (see push route rationale): reject
// bad input with 400 rather than letting NaN/Infinity reach the PostgREST
// filter or toISOString().
if (typeof body.rev !== "number" || !Number.isFinite(body.rev) || body.rev <= 0) {
return NextResponse.json({ error: "invalid rev" }, { status: 400 });
}
if (typeof body.deletedAt !== "number" || !Number.isFinite(body.deletedAt)) {
return NextResponse.json({ error: "invalid deletedAt" }, { status: 400 });
}
const ok = await cloudSoftDeleteStory(id, body.rev, body.deletedAt);
return NextResponse.json({ ok });
}
+22
View File
@@ -0,0 +1,22 @@
import { NextResponse } from "next/server";
import { requireUser } from "@/lib/supabase/guard";
import { cloudStoryManifest } from "@/lib/persistence/cloudStore";
export const runtime = "nodejs";
// GET /api/stories/manifest — the reconcile diff basis: every cloud row for the
// signed-in user (INCLUDING tombstones), projected to {id, rev, updatedAt,
// deletedAt} without the bulky session_jsonb. Pure passthrough to cloudStore;
// requireUser 401s an unauthenticated commercial-build caller, and on the
// open-source build (AUTH_ENABLED=false) cloudStoryManifest short-circuits to []
// without ever constructing a Supabase client.
export async function GET() {
const auth = await requireUser();
if (auth instanceof NextResponse) return auth;
const items = await cloudStoryManifest();
return NextResponse.json(
{ items },
{ headers: { "Cache-Control": "private, no-store" } },
);
}
+33
View File
@@ -0,0 +1,33 @@
import { NextResponse } from "next/server";
import { requireUser } from "@/lib/supabase/guard";
import { cloudPullBlobs } from "@/lib/persistence/cloudStore";
export const runtime = "nodejs";
// Cap per request — reconcile chunks its pull set, so one call never asks for an
// unbounded id list (a denial-of-wallet / oversized-response guard).
const MAX_PULL_IDS = 200;
// POST /api/stories/pull — body { ids: string[] } → { blobs: StorySyncEnvelope[] }
// (full payloads, INCLUDING tombstones, for write-back into the local store).
// Pure passthrough to cloudStore; same auth/short-circuit story as manifest.
export async function POST(req: Request) {
const auth = await requireUser();
if (auth instanceof NextResponse) return auth;
let body: { ids?: unknown };
try {
body = await req.json();
} catch {
return NextResponse.json({ error: "invalid json" }, { status: 400 });
}
const ids = Array.isArray(body.ids)
? body.ids
.filter((x): x is string => typeof x === "string" && x.length > 0)
.slice(0, MAX_PULL_IDS)
: [];
const blobs = await cloudPullBlobs(ids);
return NextResponse.json({ blobs });
}
+74
View File
@@ -0,0 +1,74 @@
import { NextResponse } from "next/server";
import { coerceOrientation } from "@infiplot/types";
import { requireUser } from "@/lib/supabase/guard";
import { cloudSaveStory } from "@/lib/persistence/cloudStore";
import { coerceEpoch, type StorySyncEnvelope } from "@/lib/persistence/types";
export const runtime = "nodejs";
// Matches story-pack's 12 MB doc ceiling — a slim Session (voice +
// styleReferenceImage stripped) is far smaller, so this only rejects
// pathological payloads, never normal saves.
const MAX_PUSH_BYTES = 12_000_000;
// POST /api/stories/push — body StorySyncEnvelope → { stored, won }. Pure
// passthrough to the optimistic-concurrency RPC; won=false means a newer cloud
// row was preserved. requireUser 401s an unauthenticated commercial caller; on
// the open-source build cloudSaveStory short-circuits to { stored:null, won:false }.
export async function POST(req: Request) {
const auth = await requireUser();
if (auth instanceof NextResponse) return auth;
// Pre-check Content-Length to reject an oversized body before buffering it.
// The post-read byteLength check below still covers chunked/omitted headers.
const contentLength = req.headers.get("content-length");
if (contentLength && Number(contentLength) > MAX_PUSH_BYTES) {
return NextResponse.json({ error: "payload too large" }, { status: 413 });
}
let raw: string;
try {
raw = await req.text();
} catch {
return NextResponse.json({ error: "invalid body" }, { status: 400 });
}
if (Buffer.byteLength(raw, "utf8") > MAX_PUSH_BYTES) {
return NextResponse.json({ error: "payload too large" }, { status: 413 });
}
let env: StorySyncEnvelope;
try {
env = JSON.parse(raw) as StorySyncEnvelope;
} catch {
return NextResponse.json({ error: "invalid json" }, { status: 400 });
}
if (!env?.id || typeof env.id !== "string") {
return NextResponse.json({ error: "missing id" }, { status: 400 });
}
// Validate the LWW-ordering fields as finite values: a non-finite rev /
// updatedAt would otherwise reach the RPC, throw at toISOString(), and surface
// as a silent { stored:null, won:false } 200 — return 400 so the caller can
// diagnose a bad request rather than mistake it for a normal lost conflict.
if (typeof env.rev !== "number" || !Number.isFinite(env.rev) || env.rev <= 0) {
return NextResponse.json({ error: "invalid rev" }, { status: 400 });
}
if (typeof env.updatedAt !== "number" || !Number.isFinite(env.updatedAt)) {
return NextResponse.json({ error: "invalid updatedAt" }, { status: 400 });
}
if (
env.deletedAt != null &&
(typeof env.deletedAt !== "number" || !Number.isFinite(env.deletedAt))
) {
return NextResponse.json({ error: "invalid deletedAt" }, { status: 400 });
}
// Defensive coercion at the trust boundary (the slim session itself is left to
// the client — it's reconstructible and never security-sensitive after slim).
const result = await cloudSaveStory({
...env,
orientation: coerceOrientation(env.orientation),
updatedAt: coerceEpoch(env.updatedAt, 0),
deletedAt: env.deletedAt == null ? null : coerceEpoch(env.deletedAt, 0),
});
return NextResponse.json(result);
}
+10 -1
View File
@@ -3,6 +3,7 @@
import { useCallback, useEffect, useState } from "react"; import { useCallback, useEffect, useState } from "react";
import { AUTH_ENABLED } from "@/lib/supabase/config"; import { AUTH_ENABLED } from "@/lib/supabase/config";
import { createClient } from "@/lib/supabase/client"; import { createClient } from "@/lib/supabase/client";
import { syncOnLogin } from "@/lib/persistence/cloudSync";
import type { AuthChangeEvent, Session, User } from "@supabase/supabase-js"; import type { AuthChangeEvent, Session, User } from "@supabase/supabase-js";
export function UserChip() { export function UserChip() {
@@ -15,8 +16,16 @@ export function UserChip() {
supabase.auth.getUser().then(({ data }: { data: { user: User | null } }) => setUser(data.user)); supabase.auth.getUser().then(({ data }: { data: { user: User | null } }) => setUser(data.user));
const { const {
data: { subscription }, data: { subscription },
} = supabase.auth.onAuthStateChange((_event: AuthChangeEvent, session: Session | null) => { } = supabase.auth.onAuthStateChange((event: AuthChangeEvent, session: Session | null) => {
setUser(session?.user ?? null); setUser(session?.user ?? null);
// A signed-in user — a fresh login (SIGNED_IN) OR an already-authed mount
// (INITIAL_SESSION fires on subscribe with the current session) — triggers
// a full reconcile. syncOnLogin serializes via its in-flight guard, so
// overlapping events never run concurrent syncs (Req 4.1, 4.2, 4.3). This
// is the single global trigger point; AuthModal instances don't duplicate it.
if (session?.user && (event === "SIGNED_IN" || event === "INITIAL_SESSION")) {
void syncOnLogin();
}
}); });
return () => subscription.unsubscribe(); return () => subscription.unsubscribe();
}, []); }, []);
+52
View File
@@ -0,0 +1,52 @@
# Configuration guide
InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenAI-compatible endpoint**, so you can mix and match freely — for Google Gemini, point `*_BASE_URL` at its OpenAI-compatible endpoint (`https://generativelanguage.googleapis.com/v1beta/openai`). For Anthropic Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's official endpoint offers an OpenAI-compatible layer but no caching, which raises cost and latency. **Image** supports **Runware** (its own task-array protocol) and **OpenAI** (`gpt-image`). **TTS** supports **Xiaomi MiMo** (its own voice design / clone protocol — per-character voice design, clone, and per-line delivery direction; free) and **StepFun** (32 preset voices, auto-matched by AI; paid but better quality).
## 1. Choose your providers
| Provider | Variables | Required? | Recommended |
|---|---|---|---|
| Text · story director | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | `deepseek-v4-flash` via DeepSeek |
| Image · scene renderer | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | `runware:400@6` (FLUX.2 [klein] 9B KV) via [Runware](https://runware.ai) |
| Vision · click reader | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | `gemini-3.5-flash` via Google |
| TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo (free); paid alternative: `step-tts-2` via [StepFun](https://www.stepfun.com) |
> **Optional · explicit protocol override**: each provider slot accepts a `*_PROVIDER` variable (`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`) to force a specific protocol. **Leave unset for backwards-compatible defaults** — text/vision default to OpenAI-compatible, image auto-detects from `*_BASE_URL` (`runware.ai` → Runware, otherwise OpenAI-compatible; models served via OpenAI protocol on `runware.ai` — such as `image-2-vip` — are handled as OpenAI-compatible; override with `IMAGE_PROVIDER` when needed).
>
> | Value | Applies to | Description |
> |---|---|---|
> | `openai_compatible` (default) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
> | `openai` | Image | OpenAI `gpt-image`, supports reference-image editing |
> | `runware` | Image | Runware task-array protocol |
>
> Text and vision **only** support `openai_compatible`. For Gemini, point `*_BASE_URL` at its OpenAI-compatible endpoint (`https://generativelanguage.googleapis.com/v1beta/openai`). For Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's official endpoint offers an OpenAI-compatible layer but no caching, raising cost and latency.
>
> `*_BASE_URL` works with or without a trailing `/v1` (or even a trailing `/chat/completions`) — the engine normalizes automatically.
## 2. Set the environment variables
Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:
| Variable | Effect |
|---|---|
| `MOCK_IMAGE=true` | Skip image generation; the renderer returns a static placeholder. Story, voice, and choices still run normally. Great for iterating on TTS without burning Runware credits. |
Where to set them (see `.env.example` for the exact shape):
- **Local dev** — `.env.local`
- **Vercel** — Project Settings → Environment Variables
- **Cloudflare Workers** — from the repo root, run `wrangler secret put <NAME>` for each variable, or set them in the dashboard (Workers → infiplot → Settings → Variables and Secrets). For a private staging instance, gate the Worker behind [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) — zero-code email-whitelist auth in front of the Worker.
## 3. Mind the cost
With the recommended trio, each scene's cost comes mainly from the image generation model. The FLUX.2 [klein] 9B KV image is roughly **\$0.00078** per scene (1792×1024, 4 steps, sub-second); the text model uses `deepseek-v4-flash`, so text costs are negligible by comparison. Tapping through a scene's beats is free. To keep transitions instant, the engine also pre-generates scenes you might pick but ultimately don't — so real spend runs somewhat higher than the scenes you actually see.
## 4. Image proxy (optional)
By default the browser fetches images directly from the provider — no setup needed; leave `NEXT_PUBLIC_IMAGE_PROXY_URL` blank and you're completely unaffected. You only want this if you hit progressive "top-to-bottom" image loading (Chrome's `ERR_QUIC_PROTOCOL_ERROR` on some networks paints partial PNGs row by row): deploy a tiny Cloudflare Worker that re-fetches images server-side and serves them atomically over HTTP/2. One-click deploy at **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**, then paste the `workers.dev` URL it prints into `NEXT_PUBLIC_IMAGE_PROXY_URL`.
## 5. Let players bring their own voice Key (optional, recommended)
Xiaomi rate-limits the TTS model by RPM/TPM. When a public deployment has many people playing at once through a single shared `TTS_API_KEY`, those limits are easy to hit — the symptom is **story and visuals work fine, but there's no audio**. To fix this, players can optionally enter **their own** Xiaomi MiMo key on the homepage (free to obtain). Synthesis then runs **browser-direct to Xiaomi**, the **key stays in the player's browser and never touches your server**, and they get stable voice with lower latency. It's purely additive: leave it blank and playback falls back to your server key exactly as before.
See the [Bring-your-own voice Key guide](xiaomi-tts-key.md) for how to obtain and enter one.
+52
View File
@@ -0,0 +1,52 @@
# 設定ガイド
InfiPlot は 4 種類のモデルプロバイダと通信します。**テキスト(Text)・ビジョン(Vision)は、任意の OpenAI 互換エンドポイント**を使用でき、自由に組み合わせられます —— Google Gemini を使う場合は、`*_BASE_URL` をその OpenAI 互換エンドポイント(`https://generativelanguage.googleapis.com/v1beta/openai`)に向けるだけです。Anthropic Claude を使う場合は、互換ゲートウェイ(LiteLLM など)の経由を推奨します —— Anthropic の公式エンドポイントは OpenAI 互換レイヤーを提供していますがキャッシュ非対応のため、コストとレイテンシが上昇します。**画像(Image)**は **Runware**(独自の task-array プロトコル)と **OpenAI**`gpt-image`)に対応します。**音声(TTS)**は **Xiaomi MiMo**(独自の音声デザイン/クローンプロトコル —— キャラクターごとの音声デザイン、クローン、行ごとの抑揚指示に対応、無料)と **StepFun**(32 種のプリセット音声を AI が自動マッチング、有料ですがより高品質)に対応します。
## 1. プロバイダを選ぶ
| プロバイダ | 環境変数 | 必須? | 推奨 |
|---|---|---|---|
| Text · ストーリー監督 | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | DeepSeek の `deepseek-v4-flash` |
| Image · シーン描画 | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | [Runware](https://runware.ai) の `runware:400@6`FLUX.2 [klein] 9B KV |
| Vision · クリック解釈 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google の `gemini-3.5-flash` |
| TTS · キャラクター音声 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 任意 —— 空欄なら無音で動作 | Xiaomi MiMo の `mimo-v2.5-tts`(無料);有料の選択肢:[StepFun](https://www.stepfun.com) の `step-tts-2` |
> **オプション · プロトコルの明示的指定**:各プロバイダスロットには `*_PROVIDER` 変数(`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`)を追加して、使用するプロトコルを明示的に指定できます。**未設定なら後方互換のデフォルト**を維持します —— テキスト/ビジョンは OpenAI 互換がデフォルト、画像は `*_BASE_URL` から自動判定(`runware.ai` → Runware、それ以外は OpenAI 互換。`runware.ai` 上で OpenAI プロトコルで提供されるモデル —— `image-2-vip` など —— は OpenAI 互換として処理されます。必要に応じて `IMAGE_PROVIDER` で上書きしてください)。
>
> | 値 | 対象 | 説明 |
> |---|---|---|
> | `openai_compatible`(デフォルト) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
> | `openai` | Image | OpenAI `gpt-image`、参照画像編集に対応 |
> | `runware` | Image | Runware task-array プロトコル |
>
> テキストとビジョンは `openai_compatible` **のみ**対応。Gemini を使う場合は `*_BASE_URL` をその OpenAI 互換エンドポイント(`https://generativelanguage.googleapis.com/v1beta/openai`)に向けてください。Claude を使う場合は互換ゲートウェイ(LiteLLM など)の経由を推奨 —— Anthropic の公式エンドポイントは OpenAI 互換レイヤーを提供していますが、キャッシュ非対応のためコストとレイテンシが上昇します。
>
> `*_BASE_URL` は末尾に `/v1` があってもなくても(`/chat/completions` まで付いていても)正常に動作します —— エンジンが自動で正規化します。
## 2. 環境変数を設定する
9 つの変数が必須で、TTS は任意です(空欄なら無音で動作)。低コストなテスト用のフラグもあります。
| 変数 | 効果 |
|---|---|
| `MOCK_IMAGE=true` | 画像生成をスキップし、レンダラが静的なプレースホルダを返します。ストーリー・音声・選択肢は通常どおり動作します。Runware のクレジットを消費せずに TTS を調整するのに最適です。 |
設定場所(正確なフォーマットは `.env.example` を参照):
- **ローカル開発** —— `.env.local`
- **Vercel** —— Project Settings → Environment Variables
- **Cloudflare Workers** —— リポジトリのルートから各変数について `wrangler secret put <NAME>` を実行するか、ダッシュボード(Workers → infiplot → Settings → Variables and Secrets)で設定します。ステージング環境にアクセス制限を掛けたい場合は、Worker の前に [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/) を挟むと、ゼロコードでメール許可リスト方式の認証が利用できます。
## 3. コストに注意
推奨の 3 点セットでは、各シーンのコストは主に画像生成モデルによるものです。FLUX.2 [klein] 9B KV の画像は 1 シーンあたり概ね **$0.00078**1792×1024、4 ステップ、サブ秒)。テキストモデルは `deepseek-v4-flash` を使用するため、テキストコストは比較になりません。シーン内のビートをタップしていくのは無料です。切り替えを一瞬に保つため、エンジンは選ぶ可能性はあるが最終的に選ばないシーンも先行生成します —— そのため実際の支出は、あなたが実際に見るシーン数よりやや高くなります。
## 4. 画像プロキシ(オプション)
デフォルトではブラウザが画像プロバイダーに直接アクセスするため、設定は不要です —— `NEXT_PUBLIC_IMAGE_PROXY_URL` を空欄のままにすれば、まったく影響ありません。画像が「上から順に」表示される現象(一部のネットワークで Chrome の `ERR_QUIC_PROTOCOL_ERROR` により PNG が行ごとに描画される)に遭遇した場合のみ必要です。小さな Cloudflare Worker をデプロイすると、画像をサーバー側で再取得し HTTP/2 で一括返却します。ワンクリックデプロイは **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)** を参照し、出力された `workers.dev` の URL を `NEXT_PUBLIC_IMAGE_PROXY_URL` に設定してください。
## 5. プレイヤー自身の音声 Key(任意・推奨)
Xiaomi は TTS モデルに RPM/TPM 制限を設けています。公開デプロイで多数のプレイヤーが単一の `TTS_API_KEY` を共有して同時にプレイすると、この制限に達しやすく、**ストーリーも画像も正常なのに音声だけ出ない**という症状になります。対策として、プレイヤーはトップページで**自分の** Xiaomi MiMo Key(無料で取得可)を任意で入力できます。合成は**ブラウザから Xiaomi へ直接**行われ、**Key はプレイヤーのブラウザ内にのみ保存され、あなたのサーバーを一切経由しません**。これにより安定した音声と低遅延が得られます。完全な追加機能であり、未入力ならこれまで通りサーバー側の Key にフォールバックします。
取得・入力の手順は [音声 Key 持ち込みガイド](xiaomi-tts-key.md) を参照してください。
+52
View File
@@ -0,0 +1,52 @@
# 配置教程
InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Vision** 只走 OpenAI 兼容接口——想用 Google Gemini 的话,把 `*_BASE_URL` 指向其 OpenAI 兼容端点(`https://generativelanguage.googleapis.com/v1beta/openai`)即可;想用 Anthropic Claude 的话,推荐通过兼容网关(如 LiteLLM)转发,官方 OpenAI 兼容层不支持缓存,可能推高成本与延迟。**图像(Image)** 支持 **Runware**(其自有 task-array 协议)与 **OpenAI**`gpt-image`)。**语音(TTS** 支持**小米 MiMo**(自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导,免费)和 **StepFun 阶跃星辰**(32 个预设音色,由 AI 自动匹配,付费但体验更好)。
## 1. 选择你的供应商
| 供应商 | 环境变量 | 是否必填 | 推荐 |
|---|---|---|---|
| Text · 剧情导演 | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL` | ✅ | DeepSeek 的 `deepseek-v4-flash` |
| Image · 场景渲染 | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL` | ✅ | [Runware](https://runware.ai) 的 `runware:400@6`FLUX.2 [klein] 9B KV |
| Vision · 点击解读 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google 的 `gemini-3.5-flash` |
| TTS · 角色配音 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 可选 —— 留空则静音运行 | 小米 MiMo 的 `mimo-v2.5-tts`(免费);付费可选 [StepFun](https://www.stepfun.com) 的 `step-tts-2` |
> **可选 · 指定接口协议**:每类模型都可加一个 `*_PROVIDER` 变量(`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`)显式选择接口协议。**不设则保持向后兼容**——文本/视觉默认走 OpenAI 兼容接口,图像按 `*_BASE_URL` 自动判断(`runware.ai` → Runware,否则 OpenAI 兼容;个别在 `runware.ai` 上以 OpenAI 协议提供的模型——如 `image-2-vip`——会按 OpenAI 兼容处理,需要时用 `IMAGE_PROVIDER` 显式覆盖即可)。
>
> | 取值 | 适用 | 说明 |
> |---|---|---|
> | `openai_compatible`(默认) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
> | `openai` | Image | OpenAI `gpt-image`,支持参考图编辑 |
> | `runware` | Image | Runware task-array 协议 |
>
> 文本和视觉**仅**支持 `openai_compatible`。要用 Gemini,把 `*_BASE_URL` 指向其 OpenAI 兼容端点(`https://generativelanguage.googleapis.com/v1beta/openai`)即可。要用 Claude,推荐通过兼容网关(如 LiteLLM)转发——Anthropic 官方端点虽提供 OpenAI 兼容层,但不支持缓存,会推高成本与延迟。
>
> 此外,`*_BASE_URL` 带不带 `/v1`(甚至末尾多写了 `/chat/completions`)都能正常工作——引擎会自动规范化。
## 2. 填写环境变量
九个变量为必填;TTS 可选(留空则静音运行)。此外还有一个用于低成本测试的开关:
| 变量 | 作用 |
|---|---|
| `MOCK_IMAGE=true` | 跳过图像生成,渲染器返回一张静态占位图。剧情、语音、选项照常运行。非常适合在不消耗 Runware 额度的情况下调试 TTS。 |
在哪里设置(确切字段见 `.env.example`):
- **本地开发** —— `.env.local`
- **Vercel** —— Project Settings → Environment Variables
- **Cloudflare Workers** —— 在仓库根目录下逐个执行 `wrangler secret put <NAME>`,或在 dashboard 里设置(Workers → infiplot → Settings → Variables and Secrets)。如果要给 staging 加访问限制,可以在 Worker 前面挂一个 [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/applications/)(零代码,邮箱白名单)。
## 3. 注意成本
使用推荐的三件套时,每一幕场景的开销主要来自图像生成模型。FLUX.2 [klein] 9B KV 的图像大约 **$0.00078** 一张(1792×1024,4 步,亚秒级);文本模型使用 `deepseek-v4-flash` 时,成本极低。逐拍点过一个场景是免费的。为了让切换瞬间完成,引擎还会预测式地生成那些你可能选、但最终可能没选的场景 —— 所以真实花费会比你实际看到的场景数略高一些。
## 4. 图片代理(可选)
默认浏览器直连图片供应商,无需任何配置 —— 留空 `NEXT_PUBLIC_IMAGE_PROXY_URL` 即可,完全不受影响。只有当你遇到图片「层层加载」(Chrome 在某些网络下 `ERR_QUIC_PROTOCOL_ERROR` 导致 PNG 逐行渲染)时才需要它:部署一个极小的 Cloudflare Worker,把图片改为服务端转发 + HTTP/2 原子返回。一键部署见 **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**,然后把它给出的 `workers.dev` 地址填进 `NEXT_PUBLIC_IMAGE_PROXY_URL`
## 5. 玩家自带配音 Key(可选,推荐)
小米对 TTS 模型有 RPM/TPM 限额。当你的公共部署有多人同时游玩、共用同一把 `TTS_API_KEY` 时,很容易撞到限额,表现为**剧情、画面都正常,唯独没有声音**。为此,玩家可以在首页可选地填入**自己的**小米 MiMo Key(免费申请)——配音请求由**浏览器直连小米**完成,**Key 只存在玩家本地、绝不经过你的服务器**,从而获得稳定配音与更低延迟。这是纯增强:不填则照常使用你部署的服务器 Key,行为不变。
申请与填写步骤见 [自带配音 Key 教程](xiaomi-tts-key.md)。
+11 -4
View File
@@ -14,6 +14,7 @@ import {
loadStorySession as loadSession, loadStorySession as loadSession,
softDeleteStory, softDeleteStory,
} from "@/lib/persistence/localStore"; } from "@/lib/persistence/localStore";
import { pushOnSave, pushDeletion } from "@/lib/persistence/cloudSync";
export type SaveResult = export type SaveResult =
| { ok: true; storyId: string } | { ok: true; storyId: string }
@@ -23,9 +24,11 @@ export type SaveResult =
* never throws, never blocks gameplay/navigation. */ * never throws, never blocks gameplay/navigation. */
export async function saveStory(session: Session): Promise<SaveResult> { export async function saveStory(session: Session): Promise<SaveResult> {
const rec = await saveStorySession(session); const rec = await saveStorySession(session);
return rec if (!rec) return { ok: false, error: "无法保存到本地存储" };
? { ok: true, storyId: rec.id } // Fire-and-forget cloud push. pushOnSave short-circuits when auth is off /
: { ok: false, error: "无法保存到本地存储" }; // the user is signed out, so the open-source build sees no behavior change.
void pushOnSave(rec);
return { ok: true, storyId: rec.id };
} }
/** List saved stories for the "我的剧情" page (newest first). */ /** List saved stories for the "我的剧情" page (newest first). */
@@ -40,5 +43,9 @@ export async function loadStorySession(id: string): Promise<Session | null> {
/** Delete a saved story (soft-delete). Returns false if not found. */ /** Delete a saved story (soft-delete). Returns false if not found. */
export async function deleteStory(storyId: string): Promise<boolean> { export async function deleteStory(storyId: string): Promise<boolean> {
return softDeleteStory(storyId); const ok = await softDeleteStory(storyId);
// Fire-and-forget tombstone propagation. pushDeletion short-circuits when auth
// is off / signed out, so the open-source build sees no behavior change.
if (ok) void pushDeletion(storyId);
return ok;
} }
+138 -60
View File
@@ -1,16 +1,22 @@
// Cloud story repository — server-only Supabase persistence skeleton for the // Cloud story repository — server-only Supabase persistence for the COMMERCIAL
// COMMERCIAL build. Mirrors the local repository (lib/persistence/localStore.ts) // build. Mirrors the local repository (lib/persistence/localStore.ts) so the
// method-for-method so next-phase local-first bidirectional sync can treat the // reconcile engine (lib/persistence/cloudSync.ts) can treat the cloud as a layer
// cloud as a layer over the local store rather than a parallel branch. // over the local store.
// //
// This phase is a SKELETON: no API route exposes these functions and no client // When AUTH_ENABLED is false (the open-source build) every method short-circuits
// calls them. When AUTH_ENABLED is false (the open-source build) every method // to a safe value on its first line and never touches Supabase.
// short-circuits to a safe value on its first line and never touches Supabase.
// //
// Isolation is by RLS only: the SSR client carries the user's anon key + cookie, // Isolation is by RLS only: the SSR client carries the user's anon key + cookie,
// and every public.stories policy is keyed on auth.uid() = user_id — so no // and every public.stories policy is keyed on auth.uid() = user_id — so no
// service_role key is used and no query needs a manual user filter for safety // service_role key is used and no query needs a manual user filter for safety
// (the explicit .eq("user_id") below is belt-and-suspenders + index alignment). // (the explicit .eq("user_id") below is belt-and-suspenders + index alignment).
//
// Optimistic concurrency:
// - cloudSaveStory upserts via the upsert_story_if_newer RPC (needs INSERT-if-
// absent + a conditional overwrite, which PostgREST upsert can't express).
// - cloudSoftDeleteStory is UPDATE-only (a story never pushed has no cloud row
// to tombstone), so it expresses the same rev→updatedAt guard with a
// PostgREST .or() filter — no RPC needed.
import "server-only"; import "server-only";
@@ -18,7 +24,7 @@ import type { Session } from "@infiplot/types";
import { coerceOrientation } from "@infiplot/types"; import { coerceOrientation } from "@infiplot/types";
import { AUTH_ENABLED } from "@/lib/supabase/config"; import { AUTH_ENABLED } from "@/lib/supabase/config";
import { createClient } from "@/lib/supabase/server"; import { createClient } from "@/lib/supabase/server";
import type { SlimStoryBlob, StoryMeta } from "./types"; import type { SlimStoryBlob, StoryMeta, StorySyncMeta, StorySyncEnvelope } from "./types";
import { coerceEpoch } from "./types"; import { coerceEpoch } from "./types";
/** One row of public.stories (snake_case columns ↔ SlimStoryBlob + sync meta). */ /** One row of public.stories (snake_case columns ↔ SlimStoryBlob + sync meta). */
@@ -78,63 +84,75 @@ function rowToMeta(row: StoryRow): StoryMeta {
}; };
} }
/** Full-blob projection for the sync layer: blob + (updatedAt, deletedAt) so
* reconcile has the LWW-ordering fields. Carries tombstones (deletedAt may be
* non-null) — a pulled cloud tombstone mirrors a remote soft-delete locally. */
function rowToEnvelope(row: StoryRow): StorySyncEnvelope {
return {
id: row.id,
worldSetting: row.world_setting ?? "",
styleGuide: row.style_guide ?? "",
orientation: coerceOrientation(row.orientation),
sceneCount: row.scene_count ?? 0,
rev: row.rev ?? 1,
session: row.session_jsonb,
updatedAt: coerceEpoch(row.updated_at, 0),
deletedAt: row.deleted_at ? coerceEpoch(row.deleted_at, 0) : null,
};
}
// ── Public API ────────────────────────────────────────────────────────────── // ── Public API ──────────────────────────────────────────────────────────────
// //
// CONTRACT NOTE (CR-15): these methods are the cloud COUNTERPARTS of // CONTRACT NOTE: the sync methods (manifest/pull/save/softDelete) speak the
// lib/persistence/localStore.ts, but their return shapes are intentionally NOT // StorySyncEnvelope/StorySyncMeta shapes — the convergence envelope the
// identical — the local store returns rich StoryRecord/Session values (carrying // reconcile engine maps StoryRecord ↔ envelope in one place. The legacy
// schemaVersion/createdAt/updatedAt/deletedAt/syncState), while the cloud store // cloudLoadStory/cloudListStories (leaner SlimStoryBlob/StoryMeta) are retained
// returns the leaner SlimStoryBlob. When next-phase bidirectional sync lands it // for non-sync callers; reconcile does not use them.
// must map StoryRecord ↔ SlimStoryBlob ↔ Session in one reconciliation layer
// rather than assuming a single shared shape; the intended convergence is a
// common envelope (SlimStoryBlob + sync-meta) at both edges. Documented here so
// the asymmetry is a known, bounded cost, not a surprise.
/** Upsert one story for the current user. onConflict targets the `id` PK; the /** Upsert one story for the current user via the optimistic-concurrency RPC.
* caller-supplied rev/updated_at are written verbatim and created_at is left to * Returns `{ stored, won }`:
* the DB default (insert only). NOTE (CR-10): this is last-write-wins — there is * - won=true → our version is now the cloud row (fresh insert, winning
* no `updated_at`-monotonic guard, so a slow concurrent writer can clobber newer * update, or already-equal no-op);
* cloud state; the next-phase sync layer must add an optimistic-concurrency * - won=false → a NEWER cloud row existed and was preserved; `stored` is that
* predicate (e.g. only overwrite when excluded.updated_at > stories.updated_at) * newer row so the caller can reconcile by pulling it back.
* before this is wired to real multi-device traffic. Returns the stored blob, or * Auth off / unauthenticated / write failure → `{ stored: null, won: false }`. */
* null when auth is off / unauthenticated / the write failed (incl. an RLS-hidden
* cross-user id collision surfacing as a PK violation). */
export async function cloudSaveStory( export async function cloudSaveStory(
blob: SlimStoryBlob, env: StorySyncEnvelope,
): Promise<SlimStoryBlob | null> { ): Promise<{ stored: StorySyncEnvelope | null; won: boolean }> {
if (!AUTH_ENABLED) return null; if (!AUTH_ENABLED) return { stored: null, won: false };
const userId = await currentUserId(); const userId = await currentUserId();
if (!userId) return null; if (!userId) return { stored: null, won: false };
try { try {
const supabase = await createClient(); const supabase = await createClient();
const { data, error } = await supabase const { data, error } = await supabase.rpc("upsert_story_if_newer", {
.from("stories") p_id: env.id,
.upsert( p_world: env.worldSetting ?? "",
{ p_style: env.styleGuide ?? "",
id: blob.id, p_orientation: coerceOrientation(env.orientation),
user_id: userId, p_scene_count: env.sceneCount ?? 0,
world_setting: blob.worldSetting ?? "", p_rev: env.rev ?? 1,
style_guide: blob.styleGuide ?? "", p_updated_at: new Date(env.updatedAt).toISOString(),
orientation: coerceOrientation(blob.orientation), p_deleted_at: env.deletedAt ? new Date(env.deletedAt).toISOString() : null,
scene_count: blob.sceneCount ?? 0, p_session: env.session,
rev: blob.rev ?? 1, });
updated_at: new Date().toISOString(), if (error || !data) return { stored: null, won: false };
deleted_at: null, // The RPC `returns public.stories` (a single composite); supabase-js may
session_jsonb: blob.session, // hand it back as the object or wrapped in an array — normalize both.
}, const row = (Array.isArray(data) ? data[0] : data) as StoryRow | undefined;
{ onConflict: "user_id,id" }, if (!row) return { stored: null, won: false };
) const stored = rowToEnvelope(row);
.select() // We won iff the stored row IS our version. A stale write returns the newer
.single(); // cloud row, whose (rev, updatedAt) differ from what we sent → won=false.
if (error || !data) return null; const won = stored.rev === env.rev && stored.updatedAt === env.updatedAt;
return rowToBlob(data as StoryRow); return { stored, won };
} catch { } catch {
return null; return { stored: null, won: false };
} }
} }
/** Load one story's slim blob for the current user. Tombstoned / absent / not /** Load one story's slim blob for the current user. Tombstoned / absent / not
* owned (RLS) → null. */ * owned (RLS) → null. Retained for non-sync callers (reconcile uses
* cloudPullBlobs, which carries tombstones + sync-ordering fields). */
export async function cloudLoadStory(id: string): Promise<SlimStoryBlob | null> { export async function cloudLoadStory(id: string): Promise<SlimStoryBlob | null> {
if (!AUTH_ENABLED) return null; if (!AUTH_ENABLED) return null;
const userId = await currentUserId(); const userId = await currentUserId();
@@ -180,21 +198,81 @@ export async function cloudListStories(): Promise<StoryMeta[]> {
} }
} }
/** Soft-delete one story (set the tombstone) for the current user so the /** Reconcile diff basis: ALL the current user's rows (INCLUDING tombstones),
* deletion can propagate. Absent / not owned / write failed → false. */ * projected to lightweight {id, rev, updatedAt, deletedAt}. Explicit column
export async function cloudSoftDeleteStory(id: string): Promise<boolean> { * list so it never pulls session_jsonb. Auth off / unauth → []. */
export async function cloudStoryManifest(): Promise<StorySyncMeta[]> {
if (!AUTH_ENABLED) return [];
const userId = await currentUserId();
if (!userId) return [];
try {
const supabase = await createClient();
const { data, error } = await supabase
.from("stories")
.select("id, rev, updated_at, deleted_at")
.eq("user_id", userId);
if (error || !data) return [];
return (data as StoryRow[]).map((row) => ({
id: row.id,
rev: row.rev ?? 1,
updatedAt: coerceEpoch(row.updated_at, 0),
deletedAt: row.deleted_at ? coerceEpoch(row.deleted_at, 0) : null,
}));
} catch {
return [];
}
}
/** Pull full envelopes for the given ids (INCLUDING tombstones — a pulled cloud
* tombstone mirrors a remote soft-delete locally). Empty ids / auth off /
* unauth → []. */
export async function cloudPullBlobs(
ids: string[],
): Promise<StorySyncEnvelope[]> {
if (!AUTH_ENABLED) return [];
if (!ids.length) return [];
const userId = await currentUserId();
if (!userId) return [];
try {
const supabase = await createClient();
const { data, error } = await supabase
.from("stories")
.select()
.eq("user_id", userId)
.in("id", ids);
if (error || !data) return [];
return (data as StoryRow[]).map(rowToEnvelope);
} catch {
return [];
}
}
/** Propagate a soft-delete (tombstone) for the current user, with the same
* optimistic-concurrency guard as the save RPC expressed as a PostgREST .or()
* filter: only stamp when the incoming version is newer (rev higher, or rev
* tie with a later updatedAt). UPDATE-only — a story never pushed has no cloud
* row and needs no tombstone (returns false, which the caller treats as
* "nothing to delete remotely"). Auth off / unauth / not-newer / absent →
* false. */
export async function cloudSoftDeleteStory(
id: string,
rev: number,
deletedAt: number,
): Promise<boolean> {
if (!AUTH_ENABLED) return false; if (!AUTH_ENABLED) return false;
const userId = await currentUserId(); const userId = await currentUserId();
if (!userId) return false; if (!userId) return false;
try { try {
const supabase = await createClient(); const supabase = await createClient();
const now = new Date().toISOString(); const deletedIso = new Date(deletedAt).toISOString();
const { data, error } = await supabase const { data, error } = await supabase
.from("stories") .from("stories")
.update({ deleted_at: now, updated_at: now }) .update({ deleted_at: deletedIso, updated_at: deletedIso, rev })
.eq("id", id)
.eq("user_id", userId) .eq("user_id", userId)
.is("deleted_at", null) .eq("id", id)
// Quote the timestamptz value so PostgREST parses the colons/dots in the
// ISO string as a literal, not filter syntax.
.or(`rev.lt.${rev},and(rev.eq.${rev},updated_at.lt."${deletedIso}")`)
.select("id"); .select("id");
if (error || !data || data.length === 0) return false; if (error || !data || data.length === 0) return false;
return true; return true;
+247
View File
@@ -0,0 +1,247 @@
// Reconcile engine — the bidirectional local↔cloud sync orchestration for the
// COMMERCIAL build. Browser-only. This is the single place that maps
// StoryRecord ↔ StorySyncEnvelope ↔ StorySyncMeta and owns every merge decision.
//
// Triggers (all best-effort, never throw, never block gameplay):
// - syncOnLogin(): full reconcile on sign-in / authed mount, serialized so a
// second trigger joins the in-flight run instead of racing it.
// - pushOnSave(record): fire-and-forget single push after a local autosave.
// - pushDeletion(id): fire-and-forget tombstone propagation after a soft-delete.
//
// Conflict policy is last-write-wins: rev wins; on a rev tie, the later
// updatedAt wins (decideAction). A losing side is overwritten — acceptable for
// single-player, full-snapshot galgame saves (see design.md conflict tradeoff).
import { AUTH_ENABLED } from "@/lib/supabase/config";
import { isAuthed } from "@/lib/authResume";
import {
pullManifest,
pullBlobs,
pushBlob,
pushDelete,
} from "./cloudSyncClient";
import {
listAllRecordsForSync,
putSyncedRecord,
markRecordSynced,
} from "./localStore";
import { idbGet, STORIES_STORE } from "./idb";
import { coerceEpoch, type StoryRecord, type StorySyncMeta, type StorySyncEnvelope } from "./types";
// Keep in lockstep with the pull route's MAX_PULL_IDS.
const PULL_CHUNK = 200;
type ReconcileAction = "push" | "pull" | "delete-remote" | "noop";
/** Which side is newer by the LWW order (rev, then updatedAt). Pure. */
function newerSide(
local: StoryRecord,
cloud: StorySyncMeta,
): "local" | "cloud" | "equal" {
const lr = local.rev ?? 1;
const cr = cloud.rev ?? 1;
if (lr > cr) return "local";
if (lr < cr) return "cloud";
const lu = coerceEpoch(local.updatedAt, 0);
const cu = coerceEpoch(cloud.updatedAt, 0);
if (lu > cu) return "local";
if (lu < cu) return "cloud";
return "equal";
}
/** Pure merge decision for one id (no I/O) — implements the design decision
* table incl. tombstone priority ("the newer operation wins"). A soft-delete
* carries (rev, updatedAt) and is compared like an edit. NOTE softDeleteStory
* does NOT bump rev, so within the SAME rev a later-updatedAt delete propagates
* and a later-updatedAt edit resurrects; ACROSS revs the rev-primary LWW order
* applies (a higher-rev edit beats a wall-clock-later but lower-rev delete).
* Exported for the decision-matrix test.
*
* - only cloud, live → pull
* - only cloud, tombstone→ noop (don't materialize an already-reaped / never-held
* tombstone — avoids a 30-day-reap → re-pull-of-blob loop)
* - only local, live → push
* - only local, tombstone→ noop (no cloud row to delete; reaped locally)
* - both, local newer → tombstone ? delete-remote : push
* - both, cloud newer → pull
* - both, equal → noop (reconcile markSyncs if local not yet synced) */
export function decideAction(
local: StoryRecord | undefined,
cloud: StorySyncMeta | undefined,
): ReconcileAction {
if (!local && cloud) return cloud.deletedAt ? "noop" : "pull";
if (local && !cloud) return local.deletedAt ? "noop" : "push";
if (!local || !cloud) return "noop"; // both undefined — unreachable in reconcile
const side = newerSide(local, cloud);
if (side === "local") return local.deletedAt ? "delete-remote" : "push";
if (side === "cloud") return "pull";
return "noop";
}
/** StoryRecord → envelope for push (carries the LWW-ordering fields). */
function recordToEnvelope(rec: StoryRecord): StorySyncEnvelope {
return {
id: rec.id,
worldSetting: rec.worldSetting ?? "",
styleGuide: rec.styleGuide ?? "",
orientation: rec.orientation,
sceneCount: rec.sceneCount ?? 0,
rev: rec.rev ?? 1,
session: rec.session,
updatedAt: coerceEpoch(rec.updatedAt, 0),
deletedAt: rec.deletedAt == null ? null : coerceEpoch(rec.deletedAt, 0),
};
}
function chunk<T>(arr: T[], size: number): T[][] {
const out: T[][] = [];
for (let i = 0; i < arr.length; i += size) out.push(arr.slice(i, i + size));
return out;
}
/** Push one local record; on a lost optimistic-concurrency race (won=false)
* pull the newer cloud row back instead. Each step swallows its own errors. */
async function pushOne(rec: StoryRecord): Promise<void> {
const res = await pushBlob(recordToEnvelope(rec));
if (!res) return; // network/auth failure → leave pending for next reconcile
if (res.won) {
await markRecordSynced(rec.id, rec.rev ?? 1, coerceEpoch(rec.updatedAt, 0));
} else if (res.stored) {
await putSyncedRecord(res.stored); // we lost → adopt the newer cloud state
}
}
/** Full bidirectional reconcile. Diffs the local set (incl. tombstones) against
* the cloud manifest, then applies each id's action, every item fault-tolerant
* (one failure skips that id, never the whole pass). */
async function reconcile(): Promise<void> {
const [localRecords, manifest] = await Promise.all([
listAllRecordsForSync(),
pullManifest(),
]);
const localById = new Map(localRecords.map((r) => [r.id, r]));
const cloudById = new Map(manifest.map((m) => [m.id, m]));
const allIds = new Set<string>([...localById.keys(), ...cloudById.keys()]);
const toPull: string[] = [];
const toPush: StoryRecord[] = [];
const toDelete: StoryRecord[] = [];
const toMarkSynced: StoryRecord[] = [];
for (const id of allIds) {
const local = localById.get(id);
const cloud = cloudById.get(id);
switch (decideAction(local, cloud)) {
case "pull":
toPull.push(id);
break;
case "push":
if (local) toPush.push(local);
break;
case "delete-remote":
if (local) toDelete.push(local);
break;
case "noop":
// Already consistent on both sides but local not yet flagged synced —
// align its syncState (guard on cloud existing so a local-only tombstone
// isn't wrongly marked synced).
if (local && cloud && local.syncState !== "synced") toMarkSynced.push(local);
break;
}
}
// Pull (batched, chunked to the route cap).
for (const ids of chunk(toPull, PULL_CHUNK)) {
try {
const blobs = await pullBlobs(ids);
for (const b of blobs) {
try {
await putSyncedRecord(b);
} catch {
/* skip this id */
}
}
} catch {
/* skip this chunk (consistent with the push/delete loops' fault isolation) */
}
}
// Push.
for (const rec of toPush) {
try {
await pushOne(rec);
} catch {
/* leave pending */
}
}
// Tombstone propagation.
for (const rec of toDelete) {
try {
const ok = await pushDelete(rec.id, rec.rev ?? 1, coerceEpoch(rec.deletedAt, Date.now()));
if (ok) await markRecordSynced(rec.id, rec.rev ?? 1, coerceEpoch(rec.updatedAt, 0));
// !ok → cloud has a newer row; the next reconcile pulls it back.
} catch {
/* leave pending */
}
}
// Mark already-consistent records synced.
for (const rec of toMarkSynced) {
try {
await markRecordSynced(rec.id, rec.rev ?? 1, coerceEpoch(rec.updatedAt, 0));
} catch {
/* best-effort */
}
}
}
// ── Public triggers ─────────────────────────────────────────────────────────
// Serialize full syncs: a second trigger joins the in-flight run rather than
// starting a concurrent reconcile (Req 4.3). Module-level, mirrors the play
// page's saveChain dedup idea.
let inFlight: Promise<void> | null = null;
/** Trigger a full reconcile on sign-in / authed mount. Serialized + best-effort;
* short-circuits when auth is off or the user isn't signed in. */
export async function syncOnLogin(): Promise<void> {
if (!AUTH_ENABLED) return;
if (inFlight) return inFlight;
inFlight = (async () => {
try {
if (!(await isAuthed())) return;
await reconcile();
} catch {
/* best-effort */
} finally {
inFlight = null;
}
})();
return inFlight;
}
/** Fire-and-forget single push after a local autosave. Leaves the record pending
* on any failure so the next reconcile re-pushes it. */
export async function pushOnSave(record: StoryRecord): Promise<void> {
if (!AUTH_ENABLED || !record?.id) return;
try {
if (!(await isAuthed())) return;
await pushOne(record);
} catch {
/* leave pending */
}
}
/** Fire-and-forget tombstone propagation after a local soft-delete. Reads the
* local tombstone for its rev/deletedAt, then pushes the delete. */
export async function pushDeletion(id: string): Promise<void> {
if (!AUTH_ENABLED || !id) return;
try {
if (!(await isAuthed())) return;
const rec = await idbGet<StoryRecord>(STORIES_STORE, id);
if (!rec || !rec.deletedAt) return; // not a tombstone / already gone
const ok = await pushDelete(id, rec.rev ?? 1, coerceEpoch(rec.deletedAt, Date.now()));
if (ok) await markRecordSynced(id, rec.rev ?? 1, coerceEpoch(rec.updatedAt, 0));
} catch {
/* leave pending */
}
}
+81
View File
@@ -0,0 +1,81 @@
// Network bridge — the ONLY fetch layer between the local store / reconcile
// engine and the cloud story API. Browser-only (imports the public AUTH_ENABLED
// flag, never the server-only cloudStore).
//
// Two-layer short-circuit:
// 1. AUTH_ENABLED=false (open-source build) → every method returns a safe empty
// value on its first line and NEVER issues a request.
// 2. The signed-in gate is enforced ONCE by the caller — the reconcile engine
// checks isAuthed() before touching this bridge — so methods here don't
// re-run getUser() per call. If an unauthenticated request slips through
// anyway, the route 401s and the fault-tolerant fetch below maps it to the
// same safe empty value.
//
// Every request is fully fault-tolerant: any non-2xx / network error / parse
// failure resolves to a safe value and never throws (best-effort sync).
import { AUTH_ENABLED } from "@/lib/supabase/config";
import type { StorySyncMeta, StorySyncEnvelope } from "./types";
async function postJson<T>(url: string, body: unknown): Promise<T | null> {
try {
const res = await fetch(url, {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify(body),
});
if (!res.ok) return null;
return (await res.json()) as T;
} catch {
return null;
}
}
/** GET the cloud manifest (all rows incl. tombstones, lightweight). [] on any
* failure / auth off. */
export async function pullManifest(): Promise<StorySyncMeta[]> {
if (!AUTH_ENABLED) return [];
try {
const res = await fetch("/api/stories/manifest", { method: "GET", cache: "no-store" });
if (!res.ok) return [];
const data = (await res.json()) as { items?: unknown };
return Array.isArray(data.items) ? (data.items as StorySyncMeta[]) : [];
} catch {
return [];
}
}
/** Pull full envelopes for the given ids. [] on empty ids / failure / auth off. */
export async function pullBlobs(ids: string[]): Promise<StorySyncEnvelope[]> {
if (!AUTH_ENABLED || ids.length === 0) return [];
const data = await postJson<{ blobs?: unknown }>("/api/stories/pull", { ids });
return Array.isArray(data?.blobs) ? (data.blobs as StorySyncEnvelope[]) : [];
}
/** Push one envelope through the optimistic-concurrency RPC. Returns the
* `{ stored, won }` result, or null on failure / auth off (caller leaves the
* record pending for the next reconcile). */
export async function pushBlob(
env: StorySyncEnvelope,
): Promise<{ stored: StorySyncEnvelope | null; won: boolean } | null> {
if (!AUTH_ENABLED) return null;
return postJson<{ stored: StorySyncEnvelope | null; won: boolean }>(
"/api/stories/push",
env,
);
}
/** Propagate a soft-delete tombstone. false on failure / auth off / not-newer. */
export async function pushDelete(
id: string,
rev: number,
deletedAt: number,
): Promise<boolean> {
if (!AUTH_ENABLED) return false;
const data = await postJson<{ ok?: boolean }>("/api/stories/delete", {
id,
rev,
deletedAt,
});
return data?.ok ?? false;
}
+78 -1
View File
@@ -11,7 +11,7 @@ import type { Session } from "@infiplot/types";
import { coerceOrientation } from "@infiplot/types"; import { coerceOrientation } from "@infiplot/types";
import { idbGet, idbGetAll, idbPut, idbDelete, idbCount, STORIES_STORE } from "./idb"; import { idbGet, idbGetAll, idbPut, idbDelete, idbCount, STORIES_STORE } from "./idb";
import { slimSession } from "./sessionSlim"; import { slimSession } from "./sessionSlim";
import { STORY_SCHEMA_VERSION, coerceEpoch, type StoryRecord, type StoryMeta } from "./types"; import { STORY_SCHEMA_VERSION, coerceEpoch, type StoryRecord, type StoryMeta, type StorySyncEnvelope } from "./types";
/** Max number of non-tombstoned stories retained locally. IndexedDB has ample /** Max number of non-tombstoned stories retained locally. IndexedDB has ample
* quota, so this is generous vs the old localStorage cap of 20; it aligns with * quota, so this is generous vs the old localStorage cap of 20; it aligns with
@@ -186,3 +186,80 @@ export async function softDeleteStory(id: string): Promise<boolean> {
}; };
return idbPut(STORIES_STORE, updated); return idbPut(STORIES_STORE, updated);
} }
// ── Sync support (story-cloud-sync) ─────────────────────────────────────────
// These are the cloud-sync counterparts to the user-write path above. The
// distinction matters: saveStorySession is a USER write (bumps rev,
// synced→pending), while putSyncedRecord is a SYNC write (cloud is
// authoritative: takes the cloud rev verbatim, marks synced, never bumps).
/** Reconcile diff basis (local side): ALL records INCLUDING tombstones, with
* rev/syncState intact — the local mirror of cloudStoryManifest's
* tombstone-inclusive scan. [] when storage is unavailable. */
export async function listAllRecordsForSync(): Promise<StoryRecord[]> {
return idbGetAll<StoryRecord>(STORIES_STORE);
}
/** Write a cloud-pulled version as the authoritative synced baseline:
* rev/updatedAt/deletedAt taken from the envelope, syncState="synced", and
* rev is NOT bumped (unlike saveStorySession). createdAt is preserved if a
* local record already exists, else seeded from the envelope's updatedAt (the
* cloud row carries no createdAt; createdAt is display-only). Keeps the
* schemaVersion invariant and the slim session as-is. Returns false on write
* failure (Req 3.3, 3.6). Runs retention housekeeping after a durable write. */
export async function putSyncedRecord(
env: StorySyncEnvelope,
): Promise<boolean> {
if (!env?.id) return false;
const existing = await idbGet<StoryRecord>(STORIES_STORE, env.id);
// Concurrency guard (symmetric with markRecordSynced's rev guard): if the local
// record was updated to a strictly newer version (rev → updatedAt) between
// reconcile's decision snapshot and this write, don't clobber it — leave it
// (pending) for the next reconcile to re-push. Otherwise a local autosave that
// lands mid-reconcile could be overwritten by a now-stale cloud version (a
// legitimate LWW winner silently lost).
if (existing) {
const er = existing.rev ?? 1;
const nr = env.rev ?? 1;
const eu = coerceEpoch(existing.updatedAt, 0);
const nu = coerceEpoch(env.updatedAt, 0);
if (er > nr || (er === nr && eu > nu)) return false;
}
const record: StoryRecord = {
id: env.id,
schemaVersion: STORY_SCHEMA_VERSION,
worldSetting: env.worldSetting ?? "",
styleGuide: env.styleGuide ?? "",
orientation: coerceOrientation(env.orientation),
sceneCount: env.sceneCount ?? 0,
createdAt: existing
? coerceEpoch(existing.createdAt, env.updatedAt)
: coerceEpoch(env.updatedAt, Date.now()),
updatedAt: coerceEpoch(env.updatedAt, Date.now()),
rev: env.rev ?? 1,
deletedAt: env.deletedAt == null ? null : coerceEpoch(env.deletedAt, Date.now()),
syncState: "synced",
session: env.session,
};
const ok = await idbPut(STORIES_STORE, record);
if (ok) await enforceRetentionCap();
return ok;
}
/** Mark a local record synced after a successful push, aligning syncState to
* the cloud-acknowledged baseline — but ONLY if the local record still matches
* the rev we pushed. A newer local edit (rev moved past what we pushed) is left
* pending so the next reconcile re-pushes the newer content. No-op if the
* record is gone or already synced (Req 8.1). */
export async function markRecordSynced(id: string, rev: number, updatedAt: number): Promise<void> {
const rec = await idbGet<StoryRecord>(STORIES_STORE, id);
if (!rec) return;
// Guard on BOTH rev and updatedAt. softDeleteStory bumps updatedAt WITHOUT
// bumping rev, so a same-rev-but-newer local tombstone produced while a push
// was in flight must NOT be marked synced by that older push's ack (it still
// owes a delete push). Symmetric with putSyncedRecord's concurrency guard.
if ((rec.rev ?? 1) !== rev) return;
if (coerceEpoch(rec.updatedAt, 0) !== coerceEpoch(updatedAt, 0)) return;
if (rec.syncState === "synced") return;
await idbPut(STORIES_STORE, { ...rec, syncState: "synced" });
}
+31
View File
@@ -99,3 +99,34 @@ export type StoryRecord = {
* structured-clones objects, so this is stored as-is (no JSON.stringify). */ * structured-clones objects, so this is stored as-is (no JSON.stringify). */
session: Session; session: Session;
}; };
// ── Cloud-sync wire types (story-cloud-sync) ────────────────────────────────
/** Manifest projection of one cloud story — the lightweight metadata the
* reconcile engine diffs against the local set. Unlike `StoryMeta` it CARRIES
* the tombstone (`deletedAt`) and `rev`, because reconcile needs both to pick
* a winner (rev → updatedAt last-write-wins) and to propagate soft-deletes.
* Never carries the session blob — the manifest is the cheap diff basis. */
export type StorySyncMeta = {
id: string;
rev: number;
/** epoch ms */
updatedAt: number;
/** Soft-delete tombstone (epoch ms) or null. */
deletedAt: number | null;
};
/** Full-payload carrier for pull/push between the local store and the cloud.
* Extends the shared `SlimStoryBlob` with the two sync-ordering fields:
* - `updatedAt` is the CLIENT-recorded modification time (NOT a server
* `now()`), so when two devices collide on the same `rev`, `updatedAt`
* stays a meaningful last-write-wins tiebreaker rather than always-now.
* - `deletedAt` lets a tombstone ride the same envelope (delete propagation).
* `rev` is already on `SlimStoryBlob`, so the envelope = blob + (updatedAt,
* deletedAt). This is the single shape crossing the API at pull/push. */
export type StorySyncEnvelope = SlimStoryBlob & {
/** epoch ms */
updatedAt: number;
/** Soft-delete tombstone (epoch ms) or null. */
deletedAt: number | null;
};
@@ -0,0 +1,97 @@
-- Story cloud sync — optimistic-concurrency upsert RPC (story-cloud-sync).
--
-- Why an RPC (not a plain .upsert): the bare upsert in cloudStore.cloudSaveStory
-- was last-write-wins with NO monotonic guard, so a slow concurrent writer could
-- clobber newer cloud state. This function moves the "only overwrite when newer"
-- decision into SQL, matching the reconcile decision table (rev wins; on a rev
-- tie, the later updated_at wins). A stale write leaves the cloud row untouched
-- and returns the CURRENT cloud row, so the client can detect it lost and pull
-- the newer state back instead of erroring.
--
-- Security model: SECURITY INVOKER (the default, stated explicitly) so the
-- existing RLS policies on public.stories (auth.uid() = user_id) still apply —
-- no service_role, no RLS bypass. user_id is injected from auth.uid(), never
-- from the client, so a caller cannot write rows for another user. Granted to
-- the `authenticated` role only.
--
-- Idempotent: create or replace + idempotent grant — safe to re-run.
create or replace function public.upsert_story_if_newer(
p_id text,
p_world text,
p_style text,
p_orientation text,
p_scene_count integer,
p_rev integer,
p_updated_at timestamptz,
p_deleted_at timestamptz,
p_session jsonb
)
returns public.stories
language plpgsql
security invoker
as $$
declare
v_uid uuid := auth.uid();
v_row public.stories;
begin
-- Defense in depth: RLS would already reject an anonymous write, but failing
-- fast here avoids inserting with a null user_id and yields a clearer error.
if v_uid is null then
raise exception 'upsert_story_if_newer: not authenticated';
end if;
insert into public.stories (
id, user_id, world_setting, style_guide, orientation,
scene_count, rev, created_at, updated_at, deleted_at, session_jsonb
)
values (
p_id, v_uid, coalesce(p_world, ''), coalesce(p_style, ''),
coalesce(p_orientation, 'landscape'), coalesce(p_scene_count, 0),
coalesce(p_rev, 1), now(), coalesce(p_updated_at, now()),
p_deleted_at, p_session
)
on conflict (user_id, id) do update
set world_setting = excluded.world_setting,
style_guide = excluded.style_guide,
orientation = excluded.orientation,
scene_count = excluded.scene_count,
rev = excluded.rev,
updated_at = excluded.updated_at,
deleted_at = excluded.deleted_at,
session_jsonb = excluded.session_jsonb
-- Optimistic-concurrency guard: overwrite ONLY when the incoming version is
-- strictly newer. created_at is intentionally NOT in the SET list, so an
-- update preserves the original insert timestamp.
where excluded.rev > public.stories.rev
or (excluded.rev = public.stories.rev
and excluded.updated_at > public.stories.updated_at)
returning * into v_row;
-- FOUND is the idiomatic PL/pgSQL test for whether RETURNING produced a row:
-- true on a fresh insert OR a winning update; false when the row already
-- existed AND the where-guard rejected the update (stale write). In the stale
-- case fall through and return the current cloud row so the caller sees it
-- lost and can reconcile by pulling the newer cloud state.
if found then
return v_row;
end if;
select * into v_row
from public.stories
where user_id = v_uid and id = p_id;
return v_row;
end;
$$;
-- Lock down execution. Postgres grants EXECUTE to PUBLIC by default on function
-- creation, which would let the `anon` role reach this RPC via PostgREST. The
-- SECURITY INVOKER + null check + RLS would still reject an anonymous call, but
-- least-privilege says don't rely on the function body as the only gate —
-- revoke PUBLIC, then grant only the authenticated role.
revoke execute on function public.upsert_story_if_newer(
text, text, text, text, integer, integer, timestamptz, timestamptz, jsonb
) from public;
grant execute on function public.upsert_story_if_newer(
text, text, text, text, integer, integer, timestamptz, timestamptz, jsonb
) to authenticated;