infiplot-web

T

$DESKTOP-I1T6TF3\Q$ DESKTOP-I1T6TF3\Q 298ecd4ec0 perf(engine): reorder Writer/Cinematographer prompts for prefix caching

Goal: lift prompt-cache hit rate from the ~75% baseline toward 95%+
on DeepSeek/MiMo-style 64-token chunked prefix caches. Both providers
match a stable byte-identical prefix from message[0]; once a single
byte changes everything after it misses, so the trick is to push every
session-stable bit to the front and concentrate per-call churn in a
short suffix.

Three coordinated changes:

1. Split storyState rendering into spine + dynamic.

renderStoryStateSpine: logline / genreTags / protagonist / castNotes
— Architect-set fields that StoryStatePatch literally cannot touch
(the type only declares the 4 volatile ones; coerce and apply both
cherry-pick), so spine bytes are guaranteed stable for the entire
session. Goes in the STABLE PREFIX.

renderStoryStateDynamic: synopsis / openThreads / relationships /
nextHook — the Writer rewrites these every scene via storyStatePatch.
Goes in the DYNAMIC SUFFIX.

renderStoryState kept as a convenience wrapper that joins both, for
anything that still wants the merged bible.

2. Rewrite buildWriterUserMessage with a stable/dynamic split.

STABLE PREFIX (byte-identical or pure append across consecutive calls):
- 世界观 / 画风 (session-immutable scalars)
- story bible spine
- 已登记角色 [sentinel: "（以下每行一个已登记角色，开场前为空。）"] + entries
- 已使用的 sceneKey [sentinel] + entries
- 场景历史，已完结 [sentinel] + archivedHistory entries
↑ archivedHistory = history.slice(0, -1), NOT the full history
— the live entry (history[-1]) keeps mutating mid-scene as the
player walks new beats and speculative prefetches snapshot it
at different moments, so it MUST stay out of the stable prefix
or the byte-monotonic invariant breaks.

DYNAMIC SUFFIX:
- storyState dynamic patch
- last-beat snippet (the exact emotional cliffhanger to continue from)
- lastExit hint
- format reminder tail

The previous structure put the full storyState (including patched
fields) at the very top of the user message, so the very first byte
of the user message changed every scene — user-side cache hit was
effectively 0% across the board.

3. Sentinel pattern for variable-length sections.

Every list (characters / sceneKeys / archivedHistory) now emits a
constant placeholder line after its header REGARDLESS of whether
it has entries. With the old "if empty print '（暂无）' else print
entries" pattern, adding the first item silently rewrites those
placeholder bytes — the byte at offset N moves from a Chinese
parenthesis to a dash, prefix cache torched. The sentinel line is
the same bytes whether the list has 0 or N items; new items are
pure appends after it.

4. Rewrite buildCinematographerUserMessage.

New CINE_STABLE_HINT constant (~80 tokens of fixed guidance) glued
right after the session-stable styleGuide line, so the stable prefix
is long enough to cross at least one full 64-token chunk boundary
beyond the system prompt. The per-scene inputs (sceneSummary,
entryBeatActive, entryBeatSpeaker policy, prior-sceneKey continuity
hint) all moved into the dynamic suffix below.

Verified (see [cache] / [debug-writer] logs from staging): hash of
500-byte slices of the user message is byte-identical across two
same-historyLen Writer calls through the entire stable prefix; only
the dynamic suffix slice differs. The remaining cache-hit gap under
MiMo is a server-side quirk (hit plateaus near 3072 tokens, occasionally
jumps to 4096); on DeepSeek the same prefix should hit fully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-03 10:42:33 +08:00

app

chore(play): remove session-id readout and decorative footer mark

2026-06-03 16:00:16 +08:00

components

fix(play): scene image renders as 1px sliver while CDN bytes still arrive

2026-06-03 07:24:42 +08:00

docs

docs: replace README screenshots with 14-image 2-column gallery

2026-06-03 06:53:46 +08:00

lib

perf(engine): reorder Writer/Cinematographer prompts for prefix caching

2026-06-03 10:42:33 +08:00

public

chore(web): swap in 6 curated male covers

2026-06-03 04:11:26 +08:00

scripts

feat(web): gender-differentiated 4:5 covers + per-card styleGuide prebake

2026-06-03 02:26:35 +08:00

.env.example

feat: add privacy-friendly Umami page-view analytics (#15 )

2026-06-03 01:14:55 +08:00

.gitignore

feat: add Cloudflare Workers deployment alongside Vercel

2026-06-02 21:47:03 +08:00

LICENSE

docs: streamline 3 READMEs and fix EN language switcher (#6 )

2026-06-02 15:33:08 +08:00

next-env.d.ts

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

next.config.ts

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

open-next.config.ts

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

package.json

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

pnpm-lock.yaml

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

postcss.config.mjs

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

README.en.md

docs: replace README screenshots with 14-image 2-column gallery

2026-06-03 06:53:46 +08:00

README.ja.md

docs: replace README screenshots with 14-image 2-column gallery

2026-06-03 06:53:46 +08:00

README.md

docs: replace README screenshots with 14-image 2-column gallery

2026-06-03 06:53:46 +08:00

tailwind.config.ts

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

tsconfig.json

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

vercel.json

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

wrangler.jsonc

refactor: flatten monorepo to single web package (#12 )

2026-06-03 00:55:45 +08:00

README.en.md

An interactive story game, generated in real time for you

简体中文 · English · 日本語

⚡ Overview

InfiPlot is an interactive story game with content generated by AI in real time. There are no pre-written plots and no pre-made characters — everything is generated on demand, tailored to you.

In one line: what we're building is an AI-generated, real-time take on Love Is All Around (《完蛋！我被美女包围了！》).

Whether you're a six-year-old, a twenty-something, thirty-five, or sixty, there's a fantasy here that belongs to you and you alone:

Learn magic in the world of Harry Potter; become the one everyone at school adores and confesses to; publish paper after paper in top journals and conferences with grant money to spare; step into Empresses in the Palace and live out the court intrigue; or return to your younger self and make a different choice about something you regret…

🌐 Live Demo

Free to play, no setup required: infiplot.com

One-click deploy

InfiPlot deploys to both Vercel and Cloudflare Workers. Cloudflare deployment requires the Workers Paid Plan because the scene pipeline needs longer CPU time; for personal use, the one-click Vercel deploy is recommended.

After deploy, fill in the environment variables — see the Configuration guide below. The repo root is the app itself: Vercel needs no special root directory; on Cloudflare, just set the build command to pnpm build:cf.

📸 Screenshots

How it works

Built on text, image, and audio models, we've assembled a multi-agent framework to deliver on InfiPlot's goal. We split the agents into five roles — Architect, Writer, Character Designer, Cinematographer, and Painter — that work together to keep the plot coherent, the characters consistent, and the scenes continuous, all while making the story as compelling as we can.

We call each complete playthrough a story.

A story unfolds as a sequence of scenes. Each scene is one AI-painted background plus a short tree of beats — moments of narration, dialogue, and the occasional choice. You tap through a scene's beats and the image stays put; only when a choice leads somewhere genuinely new — another place, a new point of view, a jump in time — does the AI paint the next scene.

While you're reading one scene, the engine speculatively generates the scenes your choices could lead to — and, for unavoidable next steps, the scene after that. By the time you pick a direction, its image is usually already painted, so the cut feels instant. If you still notice some lag today, don't worry — we're working hard to bring it down.

Clicking the background itself (not a button) routes through a vision model: it reads where you tapped and decides whether you're exploring the current scene (it inserts a beat — no new image) or moving on (a new scene). This builds on a valuable lesson we learned from flipbook, and we believe it will become one of InfiPlot's defining features — taking the experience to the next level.

There is no traditional game UI baked into the art. The AI paints the world in whatever style you pick — "stick figure on grid paper" or "cyberpunk noir" — and the dialogue panel and choice buttons are a light HTML layer drawn on top, tuned to sit over the scene. In other words, the UI fits the story of each playthrough, rather than staying the same every time.

Team & Vision

We're a group of young people from Tsinghua University and other schools.

On one hand, we're longtime, devoted players of galgames, otome games, FMV, and AI role-play games. Even while enjoying them, we kept imagining how much more delightful and thrilling it would be if the story choices weren't fixed in advance — or if you could truly interact with an AI character in depth, instead of just texting it through a chat app.

On the other hand, we happen to know a little about large-model technology: enough to turn ideas into working software quickly with AI, and to have formed some modest views on the technical paths available and the limits of what today's tech can build.

The spark came on April 22, 2026, when @zan2434 and others released flipbook. We were stunned and delighted by this entirely new form of interaction.

So one day in May, we agreed on the spot to build something like this — both to help people live out the fantasies they'd once set aside, and to explore the new modes of interaction that multimodal models make possible.

The project is still very early and many features are far from polished. We'd love your feedback — open an issue, or join our dev team and explore the new possibilities with us, and satisfy your own curiosity.

Get in touch: hi@infiplot.com

Scan to join our beta community on QQ (group ID 575404333) to share feedback and help shape the project:

InfiPlot beta community QQ group QR code

Configuration guide

InfiPlot talks to four kinds of model providers. Text and Vision use any OpenAI-compatible endpoint, so you can mix and match freely. Image currently goes to Runware (its own task-array protocol, not OpenAI-compatible). TTS uses Xiaomi MiMo's own voice design / clone protocol — per-character voice design, clone, and per-line delivery direction.

1. Choose your providers

Provider	Variables	Required?	Recommended
Text · story director	`TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL`	✅	`deepseek-v4-flash` via DeepSeek
Image · scene renderer	`IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL`	✅	`runware:400@6` (FLUX.2 [klein] 9B KV) via Runware
Vision · click reader	`VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL`	✅	`gemini-3.5-flash` via Google
TTS · per-character voice	`TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL`	optional — leave blank to run silently	`mimo-v2.5-tts` via Xiaomi MiMo

2. Set the environment variables

Nine variables are required; TTS is optional (leave blank to run silently). There's also a flag for cheap testing:

Variable	Effect
`MOCK_IMAGE=true`	Skip image generation; the renderer returns a static placeholder. Story, voice, and choices still run normally. Great for iterating on TTS without burning Runware credits.

Where to set them (see .env.example for the exact shape):

Local dev — .env.local
Vercel — Project Settings → Environment Variables
Cloudflare Workers — from the repo root, run wrangler secret put <NAME> for each variable, or set them in the dashboard (Workers → infiplot → Settings → Variables and Secrets). For a private staging instance, gate the Worker behind Cloudflare Access — zero-code email-whitelist auth in front of the Worker.

3. Mind the cost

With the recommended trio, each scene's cost comes mainly from the image generation model. The FLUX.2 [klein] 9B KV image is roughly $0.00078 per scene (1792×1024, 4 steps, sub-second); the text model uses deepseek-v4-flash, so text costs are negligible by comparison. Tapping through a scene's beats is free. To keep transitions instant, the engine also pre-generates scenes you might pick but ultimately don't — so real spend runs somewhat higher than the scenes you actually see.

Roadmap

Make generation latency imperceptible
Compatibility with more model providers
Free-form player input mid-story
Mobile browser support
User accounts and login
Upgrade from static images to motion video
Voice interaction
Share the story you're playing
Mobile app

README.en.md Unescape Escape

⚡ Overview

🌐 Live Demo

One-click deploy

📸 Screenshots

How it works

Team & Vision

Configuration guide

Roadmap

Star history

README.en.md