feat: prefetch, vision split, provider adapter, UI polish

Engine
- Split /api/vision out from /api/interact so client can drive
  prefetch + cache lookup independently of click interpretation
- Image client switched to chat-completions+modalities API (OpenRouter/
  provider style), supporting markdown image URL responses
- annotateClick now resizes to 768w before composite to keep vision
  payloads small and avoid CDN timeouts
- Prompts updated to mention "JSON" in user messages (required by
  Gemini's strict JSON mode)
- Shared fetchWithRetry helper: 2 retries for chat/image, 0 for vision
  (with 60s hard timeout)

Client
- Parallel prefetch of all three choice branches on each new frame
- Effect deliberately excludes phase from deps so user-click doesn't
  abort in-flight prefetches
- Cache hit/miss/free-form fallback handled in handleClick
- PlayCanvas reads img naturalWidth/Height and adapts container to
  whatever aspect AI returns (no more cropped third choice)
- max-width raised to 560px, max-height calc(100dvh - 200px)

Misc
- README env-path corrected to apps/web/.env.local
- users.md: BGM/TTS idea note
- .env.example moved into apps/web alongside next config

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
yuanzonghao
2026-05-12 19:38:03 +08:00
parent ad4b09c744
commit 9cedfa66e4
20 changed files with 405 additions and 151 deletions
-21
View File
@@ -1,21 +0,0 @@
# =============================================================
# Dada — AI Visual Novel
# Three independently configurable AI providers
# Any OpenAI-compatible endpoint works (OpenAI, Anthropic, Gemini,
# OpenRouter, DeepSeek, Ollama, ...).
# =============================================================
# ---- 1. Text LLM (story director) -----------------------------
TEXT_BASE_URL=https://api.anthropic.com/v1
TEXT_API_KEY=sk-ant-xxx
TEXT_MODEL=claude-opus-4-7
# ---- 2. Image generator (renders the whole UI screen) ---------
IMAGE_BASE_URL=https://api.openai.com/v1
IMAGE_API_KEY=sk-xxx
IMAGE_MODEL=gpt-image-2
# ---- 3. Vision model (interprets where the user clicked) ------
VISION_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
VISION_API_KEY=xxx
VISION_MODEL=gemini-3-flash