Addresses Copilot review on PR #9:
- /api/vision: add MAX_ANNOTATED_BYTES (3 MB) cap on annotatedImageBase64,
plus an explicit type/non-empty check. Browser annotator resizes to 768
wide (typically 200-800 KB base64), so 3 MB rejects abusive direct-API
payloads that would otherwise inflate upstream vision LLM costs.
- annotateClient: replace `img.src = ""` on timeout with removeAttribute
to avoid the legacy browser behavior of treating empty src as a
navigation to the current document URL.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
InfiPlot now deploys to either Vercel or Cloudflare Workers — both
targets are first-class. The project is fully stateless (sessions live
on the client), so the Cloudflare side needs only Workers + Workers
Assets and zero D1/KV/R2.
- apps/web/wrangler.jsonc — nodejs_compat, Assets binding, 60s CPU
limit (Workers Paid required; matches vercel.json maxDuration). I/O
wait does not count against this budget — fits the LLM-bound
workload that's most of the runtime.
- apps/web/open-next.config.ts — minimal defineCloudflareConfig (no
cache needed since the engine is stateless).
- apps/web/package.json — added build:cf / preview:cf / deploy:cf via
@opennextjs/cloudflare + wrangler (both devDeps); sharp moved from
dependencies to devDependencies (only used by the manual
optimize-home-images.mjs / localize-firstact-images.mjs scripts now).
- .gitignore — .open-next, .wrangler, .dev.vars.
- READMEs (3 langs) — Deploy to Cloudflare button next to Vercel,
plus a Cloudflare section in the env-var setup (wrangler secret put
+ Cloudflare Access for staging access control).
Verified: pnpm typecheck + pnpm build (Vercel path) + pnpm build:cf
(OpenNext bundle: worker 4 KB, server 24 MB, assets 32 MB / 186
files — all within Workers limits) + pnpm preview:cf with the full
play loop (start → scene → background click → CORS-clean Canvas
annotation via Runware CDN → vision LLM → insert-beat) all green.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The vision pipeline used sharp to draw a click marker on the scene image
server-side (engine/src/annotate.ts) and to render the MOCK_IMAGE
placeholder PNG (engine/src/mockImage.ts). Both moved off the runtime:
- annotateClick → apps/web/lib/annotateClient.ts (Canvas 2D in the
browser; toDataURL → raw PNG base64 forwarded to /api/vision). Saves
a server-side image re-fetch per click and frees the engine from
sharp's native binding (which doesn't run on Cloudflare Workers).
- mockImageDataUri → self-describing SVG data URI (no rendering needed).
VisionRequest contract changes: prevImageUrl + click → annotatedImageBase64.
Server forwards the bytes straight to the vision LLM as image_url.
sharp is removed from packages/engine entirely and from next.config.ts's
serverExternalPackages. apps/web/package.json + lockfile cleanup ships
in the follow-up Cloudflare deployment commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drops the fa-qq penguin icon and the "扫码加入,或搜索群号" call-to-action
in favor of a plain "QQ群号:575404333" label — the QR right above already
implies scanning, and the column header names the group.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fills the long-empty "内 测 用 户 群" placeholder (was "群二维码 /
邀请链接(待补充)") on the homepage contact grid with the real QQ
group QR (group ID 575404333) plus a scan-or-search line.
Mirrors it across all three READMEs as a scan-to-join block right
after the contact line, rendered from apps/web/public/qq-group.webp
(760×760 QR-only crop with a white quiet zone, ~45KB).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewrites all 64 homepage cards (32 男性向 + 32 女性向) as short-drama hook
stories (战神归来 / 重生分手前夜 / 系统选妃 / 穿成乙游男配 / 末世异能 / 民国
谍战 / 修真渡劫 …) and regenerates each cover via FLUX in its assigned art
style (12 styles spread across 64 cards) at 832×1024 ≈4:5.
Click-to-play path: cards now jump straight to /play?card=<name> and hydrate
Session from /home/firstact/<name>.json — the engine pipeline (Architect +
Writer + CharacterDesigner + Painter) has been pre-run for 44/64 cards. The
remaining 20 (m14/m29/f14..f31) are pending an LLM credit top-up; their
clicks fall through to live /api/start for now.
Runware-hosted first-scene images are downloaded into /home/firstscene/
and the JSONs are rewritten to point at the local webp, so click → first
image is bounded by local-disk decode (~100ms) instead of CDN round-trip.
Scripts:
- scripts/generate-home-images.mjs — rewrites all 64 cover prompts, per-card
styles baked into prompts, 832×1024 dims to match StoryCard aspect
- scripts/prebake-firstacts.mjs — POST /api/start × 64 with concurrency
4, saves StartResponse to public/home/firstact/<name>.json
- scripts/localize-firstact-images.mjs — downloads each prebaked imageUrl
to public/home/firstscene/<name>.webp (q80, ≤1600px) and rewrites JSON
README: adds Screenshots section (3×3 gallery) to README.md / README.zh-CN.md,
9 in-game shots compressed to docs/screenshots/*.webp (7.5MB → 680KB).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
UI layout (PlayCanvas + play/page.tsx):
- "F · 全 · 屏" button (renamed from 演 · 示 to match what users
actually mean by F) floats above the canvas, right-aligned, via a
new `aboveCanvas` ReactNode slot that lives on the relative
inline-block image wrapper at `bottom-full right-0`. It hugs the
actual image right edge regardless of aspect ratio.
- "有 · 声 / 静 · 音" button mirrors that on the left via a new
`aboveCanvasLeft` slot.
- Both slots also render inside the loading placeholder so the two
controls appear from frame one, before the scene image arrives.
- InfiPlot back-link grows from 15px to 22/26px (mobile/desktop) with
a slightly larger arrow, matching the brand'\''s presence on the
homepage hero.
- Canvas-bottom metadata row (image dims on left, tutorial hint on
right) dropped. The "—" placeholder and "···" loading state looked
like stray punctuation; users found them noisy.
- Footer collapses to a single centered "Ⅰ · Ⅰ" mark.
Audio gating logic (play/page.tsx):
- Collapse the two-flag audio gate into one source of truth. The
homepage "语音配音" choice no longer lives in a separate
`audioEnabledRef` flag that gates `fetchBeatAudio` independently
of the in-page mute state. Instead the `muted` useState lazy
initializer reads `sessionStorage["infiplot:custom"].audioEnabled`
and projects it inversely (audioEnabled=false → muted=true) so
the 静音/有声 button correctly reflects the homepage selection
from the first frame. The in-page toggle remains the source of
truth from then on (persisted to localStorage:infiplot:muted).
- This fixes a visible disconnect where picking "关闭" on the
homepage left the play page showing 有声 because the in-page
state had no link to the homepage choice.
- The sessionStorage read uses the renamed key "infiplot:custom"
(the infiplot rename PR changed it from yume:custom on the home
side but the play side hadn'\''t been updated to match).
No new TTS quota is ever burned while muted: fetchBeatAudio'\''s
mutedRef.current early-return is the only path to /api/beat-audio
and is checked before the fetch fires; mute transitions also abort
in-flight requests.
Home (apps/web/app/page.tsx):
- StoryCard locked to uniform aspectRatio "4 / 5". The previous
"placeholder 4/5 → naturalRatio after onLoad" flow coupled card
height to lazy-load order: cards still below the fold sat at the
placeholder ratio while above-the-fold cards snapped to their
image's actual ratio (1.6 landscape vs 0.75 portrait vs 1.23
squarish), so the gallery looked inconsistent until a hard refresh
re-decoded everything from cache synchronously. Fixed ratio +
object-cover removes the coupling.
- StoryCard hover overlay collapsed from two sibling layers
(backdrop-blur + mask-image + dark gradient sibling) into one
element with a pure rgba(0,0,0,…) linear-gradient and an opacity
transition. Chromium does not animate backdrop-filter cleanly when
combined with mask-image on an empty element — the first hover
frame shows a full rectangular blur before the mask kicks in, then
snaps to the feathered shape ("矩形磨砂 → 渐变磨砂"). One layer,
one transitioning property, no compositing race.
Play (apps/web/app/play/page.tsx):
- Header back-link "云梦" → "InfiPlot" using the same serif + italic
ember "Plot" treatment as the homepage wordmark. Resolved against
the parallel plain-text rebrand already on infiplot/staging by
keeping the styled version for brand consistency.
Add a dedicated Architect LLM call at session start that expands the terse
world/style prompt into a persistent story bible (logline, genre, second-
person protagonist, cast, engineered opening hook). The bible seeds a
StoryState the Writer reads and patches every scene, carried + merged
across cuts (applyStoryStatePatch) so the story keeps a spine from beat
one instead of jumping between scenes.
- prompts: inject web-novel / short-drama / galgame craft into Writer +
Architect; Writer emits storyStatePatch to update the running bible
- director: parallelize voice + non-entry portraits with the Painter
(only entry-beat portraits block paint) to offset Architect latency
- architect: chat/parse guarded so a malformed response never aborts start
- types: StoryState / StoryStatePatch; required on Start/SceneResponse
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
On mount the mute effect fired alongside the scene effect (both call
prefetchSceneAudio), so the initial /api/beat-audio batch was dispatched
twice — the first set aborted mid-flight. Track the previous muted value
in a ref and only re-prefetch on a real transition, leaving the mount-time
synthesis to the scene effect. Addresses Copilot review on PR #9.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Revises the InfiPlot homepage from the initial prototype pass.
Stories data model
- Replaces the artificial 7-hero + 16-gallery split with a flat
per-gender model: 30 preset stories each for 男性向 / 女性向.
- Renames assets hero*/gallery* → m{0..29} / f{0..29}; same index
shares aspect ratio across genders so the gender crossfade never
jumps card height.
- Fills in the missing 女性向 set and expands both genders to 30.
Cards
- StoryCard measures aspect ratio at runtime from the loaded image
(onLoad → naturalWidth/Height), fixing the frosted-caption band
reflow on lazy image load. Drops ready/fallback props; single
masonry map over STORIES[gender].
Hero input
- Single-line <input> → auto-growing <textarea> (rows=1, resize-none)
so long prompts and long card seeds are fully visible. Enter submits,
Shift+Enter inserts a newline.
- lining-nums on the input so digits sit on the baseline instead of
Cormorant's default old-style figures.
Typography / styles
- layout.tsx: editorial fonts (Cormorant Garamond + Inter via
--font-serif / --font-sans) + Font Awesome; drops Patrick Hand /
Noto Sans SC and the hand-drawn SVG jitter filters.
- globals.css trimmed to the editorial base (paper grain, hairline,
num, ripple); play/page.tsx font/style follow-up.
Scripts
- generate-home-images.mjs reworked into a flat 2×30 idempotent
Runware FLUX.2 generator.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rebuilds the landing page from the prototype: 1900px scale-to-fit hero with
hand-drawn SVG-jitter frames, typewriter input + start button, 5 horizontal
collapsible category selectors (with style-picker modal), 7 scattered hero
cards over a 16-card masonry gallery, and project intro panel.
Each card is filled with a Runware FLUX.2 image, pre-generated and stored as
WebP (~2 MB total for 30 cards). Hero card content + image switches by
性向 (男性向 / 女性向); gallery stays shared.
Hover overlay on every card shows title + outline in a bottom-up dark
gradient, matching the prior homepage's interaction style.
Bug fixes uncovered by tracing the form-state → engine pipeline:
- 「语音配音:关闭」was previously stuffed into styleGuide (consumed only by
FLUX, ignored by TTS). Now serialized as audioEnabled boolean in the
sessionStorage payload; play page's fetchBeatAudio early-returns when
false, so no /api/beat-audio request fires.
- 「绘画风格:自动」used to pass the literal Chinese phrase "由模型根据
prompt 自动判断画风" to FLUX, which painted it as text. Now maps to the
二次元/galgame default prompt.
Adds reusable scripts under apps/web/scripts/:
- generate-home-images.mjs — Runware FLUX.2 idempotent batch generator
- optimize-home-images.mjs — sharp WebP downscale + recompress
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move vercel.json to apps/web/ with correct route paths; cap scene route
maxDuration 120→60s for Hobby. Root vercel.json removed. Vercel project's
Root Directory must be set to apps/web (Deploy button URL passes this).
- Switch image transport from base64-in-JSON to Runware-hosted URLs:
generateImage now uses outputType=URL and returns {imageUrl, imageUuid};
StartResponse/SceneResponse carry imageUrl; VisionRequest carries
prevImageUrl (server re-fetches the bytes for click annotation). This
eliminates the 4.5MB serverless body-size risk.
- Painter and director prefer URL over UUID for referenceImages — the UUID
returned by Runware imageInference isn't always recognized in the refs
pipeline (surfaces as `failedToTransferImage`).
- Client preloads scene images via `new Image().decode()` before committing
to React state, so URL transitions render instantly; prefetched scenes
also warm the HTTP cache.
- jsonParser uses the jsonrepair package (replaces hand-rolled repair) and
adds a targeted preRepair regex for the missing-key-close-quote pattern
that jsonrepair couldn't disambiguate. Full raw model output dumped on
failure for diagnostic visibility.
- Default text provider switched to DeepSeek v4-flash via direct API
(significantly more stable JSON than MiMo v2.5-pro). VISION/TTS stay on
MiMo (DeepSeek has no multimodal / TTS offerings).
- next.config: drop dead experimental.serverActions.bodySizeLimit (no
server actions used).
- README: real Deploy button URL (zonghaoyuan/yume + root-directory=apps/web
+ TTS/MOCK_IMAGE in env list); refreshed env vars table with optional
TTS section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce median scene-load latency from ~30-80s to ~17-25s by switching image generation to Runware FLUX.2 [klein] 9B KV and moving per-beat TTS synthesis off the scene response into a new lazy /api/beat-audio endpoint with hard timeout + abort support.
- feat(image): migrate to Runware FLUX.2 [klein] 9B KV — task-array API, $0.001/image, sub-second inference.
- feat(tts): split /api/scene into directScene + image + voicedesign-provisioning; lazily synth per beat via /api/beat-audio with 15s hard timeout + AbortSignal threaded to MiMo so timed-out calls don't keep burning sockets/quota; client fans out per-beat fetches on scene-id change with abort + identity-check finally to prevent cross-scene beat-id collisions.
- refactor(tts): slim BeatAudioRequest to { beat, voice } — ~800KB per-beat upload dropped to ~160KB by sending only the speaker's voice instead of the full session.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Adds optional Xiaomi MiMo TTS layer on top of the scene/beat engine and a MOCK_IMAGE flag for cheap local TTS iteration.
- Per-character voice provisioning via MiMo voice design → clone, reference audio persisted in session
- Per-line free-form delivery direction (Director writes "鼓起勇气又害羞,声音发颤" style instructions; sent to MiMo's director channel, never read aloud)
- Per-beat audio served with the scene response; frontend plays via hidden <audio> with typewriter synced to audio duration; mute toggle persisted via localStorage lazy initializer
- Graceful degradation: any TTS step failing → silent beat, game continues
- MOCK_IMAGE=true returns a sharp-generated placeholder PNG so local TTS iteration doesn't burn image tokens
- Recommended config in .env.example: MiMo Token Plan covers TEXT/VISION/TTS with one key (mimo-v2.5-pro for text, mimo-v2.5 omni for vision, mimo-v2.5-tts for TTS)
Squashed from #3:
- feat(tts): 小米 MiMo 逐 beat 配音 + 按 session 角色音色 + 自由文本配音指导
- feat(engine): MOCK_IMAGE 占位图便于本地测试
- fix(tts): address Copilot review on PR #3
- fix(tts): Copilot round-2 review feedback
Known limitation: Session.characters carries the full WAV reference audio (~200-300KB/character base64) and round-trips through every /api/scene, /api/vision, /api/insert-beat request. This is intrinsic to MiMo's design→clone model (voice identity IS the audio, no server-side voiceId). Fixing requires server-side storage which is out of scope; documented for future hardening.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Replace the one-image-per-interaction model with scenes that hold multiple
dialogue beats. The image regenerates only on scene-change actions; tapping
through beats and in-scene choices are instant and zero-network.
Squashed from #2:
- feat: scene/beat architecture — decouple dialogue from image generation
- fix: harden LLM-output parsing, prefetch lifecycle, and typewriter (PR review)
- fix: dedupe beat ids; fallback narration on empty insert-beat (PR review #2)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
HTML choice buttons now call /api/interact directly, bypassing the ~4s Vision roundtrip. Free-form background clicks still go through Vision as before.
- image prompt: vertical 9:16 → landscape 16:9 cinematic, scene fills
canvas with bottom dialogue band and horizontal choice row
- image-client: pass size=1792x1024 hint (provider honors it → output is
now exact 16:9 instead of the model's default 1.75:1)
- PlayCanvas: drop 560px cap, use object-contain into available space,
add fullViewport prop for chrome-less presentation rendering
- play page: F / Esc shortcuts + Fullscreen API + fullscreenchange
sync; chrome-less black-letterbox overlay (bg-black) suited for
screen recording
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace external <link> to fonts.googleapis.com with next/font/google
for Cormorant Garamond and Inter. Fonts are now built-time downloaded
and served from /_next/static/media, exposed via --font-serif and
--font-sans CSS variables that Tailwind's fontFamily reads.
Eliminates runtime dependency on Google Fonts CDN (helpful for offline
or region-restricted deploys), avoids FOUT through next/font's
size-adjusted fallback, and removes two render-blocking external
stylesheet requests on first load.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Project is now private; remove LICENSE file, README license
section, and "MIT · MMXXVI" footer tags. Root package.json
license set to UNLICENSED.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Engine
- Split /api/vision out from /api/interact so client can drive
prefetch + cache lookup independently of click interpretation
- Image client switched to chat-completions+modalities API (OpenRouter/
provider style), supporting markdown image URL responses
- annotateClick now resizes to 768w before composite to keep vision
payloads small and avoid CDN timeouts
- Prompts updated to mention "JSON" in user messages (required by
Gemini's strict JSON mode)
- Shared fetchWithRetry helper: 2 retries for chat/image, 0 for vision
(with 60s hard timeout)
Client
- Parallel prefetch of all three choice branches on each new frame
- Effect deliberately excludes phase from deps so user-click doesn't
abort in-flight prefetches
- Cache hit/miss/free-form fallback handled in handleClick
- PlayCanvas reads img naturalWidth/Height and adapts container to
whatever aspect AI returns (no more cropped third choice)
- max-width raised to 560px, max-height calc(100dvh - 200px)
Misc
- README env-path corrected to apps/web/.env.local
- users.md: BGM/TTS idea note
- .env.example moved into apps/web alongside next config
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Disable typed routes (default-on in Next 16, loops infinitely
with transpilePackages workspace setup, holding 500%+ CPU at idle)
- Pin turbopack.root to monorepo root so a stray ~/pnpm-lock.yaml
cannot misinfer the workspace boundary
- Commit pnpm-lock.yaml; ignore .claude/ local plugin state
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>