Eliminate the dual code path (raw fetch vs AI SDK) for text and vision.
All providers now go through createLanguageModel() + generateText(),
removing chatOpenAiCompatible/analyzeOpenAiCompatible, the manual Usage
type, summarizeUsage, and responseFormat plumbing from 8 call sites.
Key fix: @ai-sdk/openai v3 defaults to the Responses API (/responses);
DeepSeek only supports Chat Completions, so we use .chat() explicitly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add isComposing guard to the homepage prompt textarea so CJK users
no longer accidentally submit while composing. Also show a subtle
"Enter 发送 · Shift+Enter 换行" hint when the input has content.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace session.styleGuide with a descriptive placeholder before the
Architect runs, so its prompt reads a natural sentence instead of the
raw "auto" marker. Also wrap selectStyle in a try-catch so a transient
LLM failure falls back to 吉卜力 instead of crashing session start.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When user picks "自动", the client sends styleGuide="auto" to the
server. The orchestrator then runs a lightweight style-selector LLM
call in parallel with the Architect — both only depend on worldSetting,
so there is zero added latency. The selector picks the best-matching
preset from STYLE_MAP based on genre, mood, and setting.
Also moves STYLE_MAP from page.tsx to lib/options.ts so it can be
shared between client and server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add multi-platform Docker image build (amd64 + arm64) with GitHub Actions
CI that pushes to GHCR on every merge to main. Users can self-host with
a single `docker compose up -d` command.
- Dockerfile: multi-stage build with Next.js standalone output (~150-200MB)
- docker-compose.yml: one-command self-hosted deployment
- .github/workflows/docker.yml: CI workflow with QEMU cross-compilation
- next.config.ts: conditional `output: "standalone"` via BUILD_STANDALONE env
- README (zh/en/ja): restructure deploy section to include Docker option
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gen-style-thumbs.ts script uses Bun-only APIs (import.meta.dir,
Bun.write) which fail TypeScript checking under the project's Next.js
tsconfig. Exclude the scripts directory from compilation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `isAuto` variable after magic-wand button removal
- Add focus-visible ring to style cards for keyboard accessibility
- Update DEFAULT_STYLE comment to match actual fallback (吉卜力)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rework custom style view: fixed modal height to match grid view, move
upload and preset-import controls to bottom toolbar alongside cancel/save,
textarea fills remaining space. Add bordered style to cancel button,
improve disabled save button visibility, remove per-card magic-wand
customize button, and add placeholder hint about English prompts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix @AGETNTS.md → @AGENTS.md typo in CLAUDE.md
- Remove ref read inside useMemo (React anti-pattern causing one-frame stale data)
- Simplify buildDialogueHistory to read visitedBeatIds directly from session.history,
which also fixes incorrect scene-ID matching when the same ID appears multiple times
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite all 20 STYLE_MAP prompts with precise art terminology (sfumato,
feibai, bokashi, broken-color, etc.) and richer color/texture descriptions.
KyoAni prompt now references Beyond the Boundary and Sound Euphonium;
Ghibli references Spirited Away and Howl's Moving Castle. Regenerate all
style thumbnails using a two-step pipeline: DeepSeek picks an optimal
visual-novel scene per style, then Runware renders it. Add cache-busting
query param (thumbV) to thumbnail URLs. Include gen-style-thumbs.ts script
for future regeneration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign the painting-style picker inspired by Pollo AI: widen modal to
1400px, show styles as square thumbnail cards in a 4-column grid with
name labels below, add ember glow hover effect, and split custom-style
editing into its own view. Simplify style names (e.g. "京阿尼细腻日常" →
"京阿尼"), add 22 .webp preview thumbnails, and remove the per-preset
override mechanism in favor of a cleaner grid + custom flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code-level `export const maxDuration = 60` and vercel.json `functions`
block were overriding the dashboard's 300s setting, causing ~100 504
timeouts per day on /api/scene and /api/start. Removing them lets each
Vercel plan use its own default (60s Hobby, 300s Pro) without breaking
self-deployers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The BYO (Bring Your Own) API key configuration for LLM and image
generation will be re-implemented via Cloudflare Workers. Remove
the client-side implementation to prepare for that migration.
TTS (text-to-speech) BYO key support is intentionally preserved.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three transport-only optimizations that cut per-session Vercel FOT by ~50-60%:
P0 — Server strips voice.referenceAudioBase64 from already-known characters
in /api/scene and /api/insert-beat responses (defense-in-depth).
P1 — Client strips all voice data from session before sending to
/api/scene, /api/vision, and /api/insert-beat. Voices are retained locally
and re-merged from responses via mergeCharactersPreserveVoice(). The engine
only needs character names + visualDescriptions for scene generation.
P3 — /api/beat-audio returns binary audio (Response with Content-Type)
instead of JSON-wrapped base64, saving ~33% encoding overhead. Client
converts to blob URLs; PlayCanvas accepts a single audioSrc prop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(security): harden BYO API header against SSRF and input abuse
- Add lib/validateUrl.ts with HTTPS-only + public-IP enforcement,
provider allowlist, IPv6 rejection, and userinfo-in-URL blocking.
- Add lib/byoHeaders.ts — single source of truth for client-side BYO
header construction (deduplicates app/page.tsx & app/play/page.tsx).
- config.ts: validate BYO endpoints via isPublicUrl(), cap header at
2 KB, truncate apiKey/model strings, sanitize log output.
- fetchWithRetry: default redirect to "manual" to block 302-to-intranet.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(security): address Copilot review — trim endpoint, strip control chars, drop unused import
- safeEndpoint: trim whitespace before URL validation
- safeString: strip ASCII control characters to prevent header injection
- play/page.tsx: remove unused BYO_STORAGE_KEY import
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite docs/xiaomi-tts-key.md:
- Lead with the sk- (pay-as-you-go) key path as the recommended route,
since most users don't have a Token Plan subscription.
- Add direct link to the console/api-keys page.
- Polish Chinese prose throughout for natural phrasing and clarity
(replace jargon like "0x 计费" → "免费", "端点" → "服务地址", etc.).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set the session orientation in an isomorphic layout effect so portrait
phones don't flash the landscape loading chrome for a frame before the
bootstrap effect runs. State still inits to "landscape" for SSR-safety;
the correction now lands before first paint (no-op on landscape devices).
Addresses Copilot review on PR #31.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thread orientation (portrait|landscape) from client through API, engine,
and image gen. Portrait devices render 1024x1792 (9:16) full-bleed scenes;
desktop/landscape keeps 1792x1024 (16:9). Adds cover-aware click→image
coordinate mapping, session-locked orientation, a shared coerceOrientation
helper, and a choices overflow cap in portrait.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- inferImageProtocol: match runware.ai by parsed hostname (exact match or
subdomain) instead of a bare substring, so notrunware.ai /
runware.ai.evil.com no longer misroute to the Runware protocol
- README: document the image-2-vip → OpenAI-compatible exception; correct the
Imagen wording (deprecated, EOL 2026-06-24 — not yet discontinued)
Addresses Copilot review on #30.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- TEXT/VISION: add native Anthropic & Google Gemini paths via Vercel AI SDK,
selectable through TEXT_PROVIDER / VISION_PROVIDER (default openai_compatible)
- IMAGE: expand to openai (gpt-image) / google (Nano Banana) via AI SDK
alongside the existing Runware task-array and OpenAI-compatible REST paths
- normalizeBaseUrl: tolerate URLs with/without /v1 (or /chat/completions);
append the per-protocol version segment only for bare hosts
- config: readProvider() reads *_PROVIDER; types: ProviderProtocol + provider?
- deps: @ai-sdk/anthropic, @ai-sdk/google; docs in .env.example + README
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Harden the BYO-mode signal at the API boundary (start/scene/insert-beat):
only clientTts === true drops server TTS, so a stray truthy non-boolean can't
silently disable it. Add a non-blocking prefix hint in TtsKeyModal that warns
when the pasted key prefix (tp-/sk-) mismatches the selected key type — a
mismatch hits the wrong endpoint and plays silently, the symptom BYO fixes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Public users share one server TTS key, so Xiaomi's per-key RPM/TPM limits
cause silent playback under concurrency. This adds an OPTIONAL path: a user
can store their own Xiaomi MiMo key in the browser and synthesize voice
client-side against Xiaomi's CORS-open endpoints. The key lives only in
localStorage and is never sent to or logged by our server; the shared server
key still serves everyone who does not opt in.
- components/TtsKeyModal.tsx: shared key modal (key-family + region picker),
reused by both the home and play pages
- app/play/page.tsx: silence nudge moved beside the mute toggle; modal opens
in place instead of redirecting to the home page
- app/page.tsx: home page consumes the shared modal + readStoredTtsConfig
- lib/clientTtsConfig.ts, lib/ttsPresets.ts: browser config + region presets
- app/api/{start,scene,insert-beat}: thread per-request voice; lib/types update
- docs/xiaomi-tts-key.md + README note
Verified with tsc --noEmit (exit 0).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Painter composites exactly plan.entryActiveCharacters into the entry
frame (the same roster the Cinematographer framed). Phase B is told to
reuse that roster, but only the entry beat's id was code-enforced — so an
LLM slip could leave a character in the painted frame that the runtime
entry beat says isn't there. Pin activeCharacters onto the plan's entry
beat as a last line of defense, mirroring the existing id pin.
Speaker is intentionally left to the prompt: it's coupled to line/TTS, so
overwriting it could mis-attribute or orphan Phase B's dialogue.
Addresses Copilot review feedback on PR #27.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>