fix(ai-client): clean up regressions from OpenAI SDK migration and canvas frame fix (#74)

Three follow-ups to ef3b579 (OpenAI SDK migration) and ebe39ef (canvas frame):

- .env.example / config.ts / AGENTS.md: anthropic & google native protocols
  were removed with the Vercel AI SDK, but .env.example and AGENTS.md still
  advertised them. Rewrite the docs to point Claude/Gemini at their
  OpenAI-compatible endpoints (api.anthropic.com/v1,
  generativelanguage.googleapis.com/v1beta/openai), drop the dead Gemini
  "Nano Banana" image example, sync AGENTS.md (text/vision protocol list,
  image protocol list, the "OpenAI/Gemini via AI SDK" reference note), and
  append a short hint in readProvider() error message guiding
  anthropic/google users to openai_compatible instead of a bare rejection.

- chat.ts: drop the unsafe `as { prompt_tokens_details?: ... }` cast; read
  cached_tokens straight off the SDK's CompletionUsage type. Add a comment
  noting the OpenAI usage object reports cache reads only (no cache-write
  count), so the create cost the old AI SDK path logged is unrecoverable.

- PlayCanvas.tsx: revert <img key={imageUrl}> to key={imageUrl.slice(-48)}.
  The gpt-image/mock paths emit multi-MB data URIs; using the full string as
  React's reconciliation key adds avoidable diff overhead during the frequent
  re-renders. Matches the existing <audio> element's key convention.

Validation: pnpm typecheck passes. (pnpm lint fails on a pre-existing Next 16
`next lint` CLI issue, identical on staging — unrelated to this change.)
This commit is contained in:
Zonghao Yuan
2026-06-14 13:36:19 +08:00
committed by GitHub
parent 9157454b46
commit 0dea2f8e36
5 changed files with 43 additions and 27 deletions
+21 -19
View File
@@ -3,18 +3,22 @@
# Recommended setup: Xiaomi MiMo Token Plan for TEXT / VISION / TTS
# (one API key covers all three) + Runware for IMAGE (FLUX.2 [klein]).
#
# TEXT / VISION default to any OpenAI-compatible endpoint, and can switch to
# native Anthropic or Google Gemini via TEXT_PROVIDER / VISION_PROVIDER.
# TEXT / VISION / IMAGE all speak the OpenAI wire format. Anthropic Claude
# and Google Gemini are reachable through their own OpenAI-compatible
# endpoints (see TEXT_PROVIDER notes below) — no native protocol switch is
# needed.
# TTS uses Xiaomi MiMo's own voice design / clone protocol
# (not OpenAI-compatible; appends -voicedesign / -voiceclone).
#
# IMAGE supports Runware (its own task-array protocol), OpenAI (gpt-image),
# and Google Gemini (Nano Banana) via IMAGE_PROVIDER.
# IMAGE supports Runware (its own task-array protocol) and OpenAI (gpt-image)
# via IMAGE_PROVIDER.
#
# *_PROVIDER (optional) selects the wire protocol; leave unset for the
# OpenAI-compatible default (image is auto-detected from the URL). Base URLs
# tolerate a missing or extra /v1 (or a trailing /chat/completions) — the
# engine normalizes them.
# OpenAI-compatible default (image is auto-detected from the URL). Valid
# values are openai_compatible / openai / runware — native "anthropic" /
# "google" protocols were removed when the Vercel AI SDK was dropped.
# Base URLs tolerate a missing or extra /v1 (or a trailing /chat/completions)
# — the engine normalizes them.
# =============================================================
# ---- 1. Text LLM · scene director ----------------------------------
@@ -30,9 +34,11 @@
TEXT_BASE_URL=https://api.deepseek.com/v1
TEXT_API_KEY=sk-xxx
TEXT_MODEL=deepseek-v4-flash
# TEXT_PROVIDER: openai_compatible (default) | anthropic | google
# anthropic → TEXT_BASE_URL=https://api.anthropic.com TEXT_MODEL=claude-sonnet-4-6
# google → TEXT_BASE_URL=https://generativelanguage.googleapis.com TEXT_MODEL=gemini-3.5-flash
# TEXT_PROVIDER: openai_compatible (default). This is the ONLY supported text
# protocol. To use Claude or Gemini, leave TEXT_PROVIDER unset and point at
# their OpenAI-compatible endpoints:
# Claude → TEXT_BASE_URL=https://api.anthropic.com/v1 TEXT_MODEL=claude-sonnet-4-6
# Gemini → TEXT_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai TEXT_MODEL=gemini-3.5-flash
# TEXT_PROVIDER=openai_compatible
# ---- 2. Image generator (renders the scene background) -------------
@@ -44,14 +50,10 @@ TEXT_MODEL=deepseek-v4-flash
IMAGE_BASE_URL=https://api.runware.ai/v1
IMAGE_API_KEY=runware-xxx
IMAGE_MODEL=runware:400@6
# IMAGE_PROVIDER: runware (auto-detected for runware.ai) | openai_compatible
# | openai | google
# IMAGE_PROVIDER: runware (auto-detected for runware.ai) | openai_compatible | openai
# openai → gpt-image, supports referenceImages (character/scene continuity).
# IMAGE_BASE_URL=https://api.openai.com IMAGE_MODEL=gpt-image-1
# google → Gemini "Nano Banana" (Imagen is EOL 2026-06-24, do not use it).
# IMAGE_BASE_URL=https://generativelanguage.googleapis.com
# IMAGE_MODEL=gemini-2.5-flash-image
# NOTE: openai/google return raw bytes → inlined as a data: URI for the session
# NOTE: openai returns raw bytes → inlined as a data: URI for the session
# (heavier per-call transport than Runware's UUID re-reference loop). Runware
# stays fastest + cheapest for the scene-by-scene flow.
# IMAGE_PROVIDER=runware
@@ -77,9 +79,9 @@ IMAGE_MODEL=runware:400@6
VISION_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
VISION_API_KEY=tp-xxx
VISION_MODEL=mimo-v2.5
# VISION_PROVIDER: openai_compatible (default) | anthropic | google
# anthropic → VISION_BASE_URL=https://api.anthropic.com VISION_MODEL=claude-sonnet-4-6
# google → VISION_BASE_URL=https://generativelanguage.googleapis.com VISION_MODEL=gemini-3.5-flash
# VISION_PROVIDER: openai_compatible (default). Only openai_compatible is
# supported — reach Claude/Gemini via their OpenAI-compatible endpoints
# (same base URLs as TEXT above). Leave unset to use the default.
# VISION_PROVIDER=openai_compatible
# ---- 4. TTS (optional — leave blank to disable) --------------------