feat(ai-client): multi-provider compat — native Anthropic/Google + URL tolerance
- TEXT/VISION: add native Anthropic & Google Gemini paths via Vercel AI SDK, selectable through TEXT_PROVIDER / VISION_PROVIDER (default openai_compatible) - IMAGE: expand to openai (gpt-image) / google (Nano Banana) via AI SDK alongside the existing Runware task-array and OpenAI-compatible REST paths - normalizeBaseUrl: tolerate URLs with/without /v1 (or /chat/completions); append the per-protocol version segment only for bare hosts - config: readProvider() reads *_PROVIDER; types: ProviderProtocol + provider? - deps: @ai-sdk/anthropic, @ai-sdk/google; docs in .env.example + README Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
+28
-5
@@ -3,14 +3,18 @@
|
||||
# Recommended setup: Xiaomi MiMo Token Plan for TEXT / VISION / TTS
|
||||
# (one API key covers all three) + Runware for IMAGE (FLUX.2 [klein]).
|
||||
#
|
||||
# TEXT / VISION use any OpenAI-compatible endpoint (any OpenAI-
|
||||
# compatible host works: OpenRouter, OpenAI, Anthropic via proxy,
|
||||
# Gemini, DeepSeek, Ollama, ...).
|
||||
# TEXT / VISION default to any OpenAI-compatible endpoint, and can switch to
|
||||
# native Anthropic or Google Gemini via TEXT_PROVIDER / VISION_PROVIDER.
|
||||
# TTS uses Xiaomi MiMo's own voice design / clone protocol
|
||||
# (not OpenAI-compatible; appends -voicedesign / -voiceclone).
|
||||
#
|
||||
# IMAGE uses Runware's own task-array protocol (not OpenAI-compatible);
|
||||
# the adapter posts an `imageInference` task to IMAGE_BASE_URL.
|
||||
# IMAGE supports Runware (its own task-array protocol), OpenAI (gpt-image),
|
||||
# and Google Gemini (Nano Banana) via IMAGE_PROVIDER.
|
||||
#
|
||||
# *_PROVIDER (optional) selects the wire protocol; leave unset for the
|
||||
# OpenAI-compatible default (image is auto-detected from the URL). Base URLs
|
||||
# tolerate a missing or extra /v1 (or a trailing /chat/completions) — the
|
||||
# engine normalizes them.
|
||||
# =============================================================
|
||||
|
||||
# ---- 1. Text LLM · scene director ----------------------------------
|
||||
@@ -26,6 +30,10 @@
|
||||
TEXT_BASE_URL=https://api.deepseek.com/v1
|
||||
TEXT_API_KEY=sk-xxx
|
||||
TEXT_MODEL=deepseek-v4-flash
|
||||
# TEXT_PROVIDER: openai_compatible (default) | anthropic | google
|
||||
# anthropic → TEXT_BASE_URL=https://api.anthropic.com TEXT_MODEL=claude-sonnet-4-6
|
||||
# google → TEXT_BASE_URL=https://generativelanguage.googleapis.com TEXT_MODEL=gemini-3.5-flash
|
||||
# TEXT_PROVIDER=openai_compatible
|
||||
|
||||
# ---- 2. Image generator (renders the scene background) -------------
|
||||
# Recommended: Runware + FLUX.2 [klein] 9B KV — distilled 4-step model,
|
||||
@@ -36,12 +44,27 @@ TEXT_MODEL=deepseek-v4-flash
|
||||
IMAGE_BASE_URL=https://api.runware.ai/v1
|
||||
IMAGE_API_KEY=runware-xxx
|
||||
IMAGE_MODEL=runware:400@6
|
||||
# IMAGE_PROVIDER: runware (auto-detected for runware.ai) | openai_compatible
|
||||
# | openai | google
|
||||
# openai → gpt-image, supports referenceImages (character/scene continuity).
|
||||
# IMAGE_BASE_URL=https://api.openai.com IMAGE_MODEL=gpt-image-1
|
||||
# google → Gemini "Nano Banana" (Imagen is EOL 2026-06-24, do not use it).
|
||||
# IMAGE_BASE_URL=https://generativelanguage.googleapis.com
|
||||
# IMAGE_MODEL=gemini-2.5-flash-image
|
||||
# NOTE: openai/google return raw bytes → inlined as a data: URI for the session
|
||||
# (heavier per-call transport than Runware's UUID re-reference loop). Runware
|
||||
# stays fastest + cheapest for the scene-by-scene flow.
|
||||
# IMAGE_PROVIDER=runware
|
||||
|
||||
# ---- 3. Vision model · multimodal click interpretation -------------
|
||||
# Recommended: MiMo V2.5 — multimodal, accepts image_url content parts.
|
||||
VISION_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
|
||||
VISION_API_KEY=tp-xxx
|
||||
VISION_MODEL=mimo-v2.5
|
||||
# VISION_PROVIDER: openai_compatible (default) | anthropic | google
|
||||
# anthropic → VISION_BASE_URL=https://api.anthropic.com VISION_MODEL=claude-sonnet-4-6
|
||||
# google → VISION_BASE_URL=https://generativelanguage.googleapis.com VISION_MODEL=gemini-3.5-flash
|
||||
# VISION_PROVIDER=openai_compatible
|
||||
|
||||
# ---- 4. TTS · Xiaomi MiMo (optional — leave blank to disable) ------
|
||||
# Per-character voice design → clone, with per-line delivery direction.
|
||||
|
||||
Reference in New Issue
Block a user