- Move vercel.json to apps/web/ with correct route paths; cap scene route
maxDuration 120→60s for Hobby. Root vercel.json removed. Vercel project's
Root Directory must be set to apps/web (Deploy button URL passes this).
- Switch image transport from base64-in-JSON to Runware-hosted URLs:
generateImage now uses outputType=URL and returns {imageUrl, imageUuid};
StartResponse/SceneResponse carry imageUrl; VisionRequest carries
prevImageUrl (server re-fetches the bytes for click annotation). This
eliminates the 4.5MB serverless body-size risk.
- Painter and director prefer URL over UUID for referenceImages — the UUID
returned by Runware imageInference isn't always recognized in the refs
pipeline (surfaces as `failedToTransferImage`).
- Client preloads scene images via `new Image().decode()` before committing
to React state, so URL transitions render instantly; prefetched scenes
also warm the HTTP cache.
- jsonParser uses the jsonrepair package (replaces hand-rolled repair) and
adds a targeted preRepair regex for the missing-key-close-quote pattern
that jsonrepair couldn't disambiguate. Full raw model output dumped on
failure for diagnostic visibility.
- Default text provider switched to DeepSeek v4-flash via direct API
(significantly more stable JSON than MiMo v2.5-pro). VISION/TTS stay on
MiMo (DeepSeek has no multimodal / TTS offerings).
- next.config: drop dead experimental.serverActions.bodySizeLimit (no
server actions used).
- README: real Deploy button URL (zonghaoyuan/yume + root-directory=apps/web
+ TTS/MOCK_IMAGE in env list); refreshed env vars table with optional
TTS section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce median scene-load latency from ~30-80s to ~17-25s by switching image generation to Runware FLUX.2 [klein] 9B KV and moving per-beat TTS synthesis off the scene response into a new lazy /api/beat-audio endpoint with hard timeout + abort support.
- feat(image): migrate to Runware FLUX.2 [klein] 9B KV — task-array API, $0.001/image, sub-second inference.
- feat(tts): split /api/scene into directScene + image + voicedesign-provisioning; lazily synth per beat via /api/beat-audio with 15s hard timeout + AbortSignal threaded to MiMo so timed-out calls don't keep burning sockets/quota; client fans out per-beat fetches on scene-id change with abort + identity-check finally to prevent cross-scene beat-id collisions.
- refactor(tts): slim BeatAudioRequest to { beat, voice } — ~800KB per-beat upload dropped to ~160KB by sending only the speaker's voice instead of the full session.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Replace the one-image-per-interaction model with scenes that hold multiple
dialogue beats. The image regenerates only on scene-change actions; tapping
through beats and in-scene choices are instant and zero-network.
Squashed from #2:
- feat: scene/beat architecture — decouple dialogue from image generation
- fix: harden LLM-output parsing, prefetch lifecycle, and typewriter (PR review)
- fix: dedupe beat ids; fallback narration on empty insert-beat (PR review #2)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Project is now private; remove LICENSE file, README license
section, and "MIT · MMXXVI" footer tags. Root package.json
license set to UNLICENSED.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Engine
- Split /api/vision out from /api/interact so client can drive
prefetch + cache lookup independently of click interpretation
- Image client switched to chat-completions+modalities API (OpenRouter/
provider style), supporting markdown image URL responses
- annotateClick now resizes to 768w before composite to keep vision
payloads small and avoid CDN timeouts
- Prompts updated to mention "JSON" in user messages (required by
Gemini's strict JSON mode)
- Shared fetchWithRetry helper: 2 retries for chat/image, 0 for vision
(with 60s hard timeout)
Client
- Parallel prefetch of all three choice branches on each new frame
- Effect deliberately excludes phase from deps so user-click doesn't
abort in-flight prefetches
- Cache hit/miss/free-form fallback handled in handleClick
- PlayCanvas reads img naturalWidth/Height and adapts container to
whatever aspect AI returns (no more cropped third choice)
- max-width raised to 560px, max-height calc(100dvh - 200px)
Misc
- README env-path corrected to apps/web/.env.local
- users.md: BGM/TTS idea note
- .env.example moved into apps/web alongside next config
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>