docs: sync README with recent changes (TTS, model support, Claude caching)

- Model support: text/vision now OpenAI-compatible only (Claude/Gemini via their OpenAI-compatible endpoints); drop stale native-protocol copy and the anthropic/google rows from the *_PROVIDER table - TTS: document StepFun (step-tts-2, paid, better quality) alongside Xiaomi MiMo (free) across zh/en/ja; update Vercel deploy envDescription - Claude: note that direct Anthropic endpoints lack caching, recommend gateway for full experience - Cloudflare: preserve existing one-click deploy (compat work in progress)
2026-06-17 18:02:31 +08:00
parent 98d8e83c92
commit 51afeb09ac
4 changed files with 16 additions and 14 deletions
@@ -35,10 +35,12 @@ TEXT_BASE_URL=https://api.deepseek.com/v1
 TEXT_API_KEY=sk-xxx
 TEXT_MODEL=deepseek-v4-flash
 # TEXT_PROVIDER: openai_compatible (default). This is the ONLY supported text
-# protocol. To use Claude or Gemini, leave TEXT_PROVIDER unset and point at
+# protocol. To use Gemini, leave TEXT_PROVIDER unset and point at its
-# their OpenAI-compatible endpoints:
+# OpenAI-compatible endpoint:
 #   Claude  → TEXT_BASE_URL=https://api.anthropic.com/v1  TEXT_MODEL=claude-sonnet-4-6
 #   Gemini  → TEXT_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai  TEXT_MODEL=gemini-3.5-flash
 # For Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's
 # official endpoint offers an OpenAI-compatible layer but no caching, raising
 # cost and latency for this app's prefix-cache-driven design.
 # TEXT_PROVIDER=openai_compatible
 # ---- 2. Image generator (renders the scene background) -------------
@@ -45,7 +45,7 @@ InfiPlot offers multiple deployment options. For personal use, we recommend the
 Cloudflare deployment requires the Workers Paid Plan because the scene pipeline needs longer CPU time.
-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.en.md%23configuration-guide) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot)
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.en.md%23configuration-guide) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot)
 After deploy, fill in the environment variables — see the [Configuration guide](#configuration-guide) below. The repo root is the app itself: Vercel needs no special root directory; on Cloudflare, just set the build command to `pnpm build:cf`.
@@ -154,7 +154,7 @@ Scan to join our **beta community on QQ** (group ID `575404333`) to share feedba
 ## Configuration guide
-InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenAI-compatible endpoint**, so you can mix and match freely. **Image** currently goes to **Runware** (its own task-array protocol, not OpenAI-compatible). **TTS** uses **Xiaomi MiMo**'s own voice design / clone protocol — per-character voice design, clone, and per-line delivery direction.
+InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenAI-compatible endpoint**, so you can mix and match freely — for Google Gemini, point `*_BASE_URL` at its OpenAI-compatible endpoint (`https://generativelanguage.googleapis.com/v1beta/openai`). For Anthropic Claude, a compatible gateway (e.g. LiteLLM) is recommended — Anthropic's official endpoint offers an OpenAI-compatible layer but no caching, which raises cost and latency. **Image** supports **Runware** (its own task-array protocol) and **OpenAI** (`gpt-image`). **TTS** supports **Xiaomi MiMo** (its own voice design / clone protocol — per-character voice design, clone, and per-line delivery direction; free) and **StepFun** (32 preset voices, auto-matched by AI; paid but better quality).
 **1. Choose your providers**
@@ -163,7 +163,7 @@ InfiPlot talks to four kinds of model providers. **Text and Vision use any OpenA
 | Text · story director  | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL`        | ✅ | `deepseek-v4-flash` via DeepSeek |
 | Image · scene renderer  | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL`     | ✅ | `runware:400@6` (FLUX.2 [klein] 9B KV) via [Runware](https://runware.ai) |
 | Vision · click reader  | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL`  | ✅ | `gemini-3.5-flash` via Google |
-| TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo |
+| TTS · per-character voice | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | optional — leave blank to run silently | `mimo-v2.5-tts` via Xiaomi MiMo (free); paid alternative: `step-tts-2` via [StepFun](https://www.stepfun.com) |
 **2. Set the environment variables**
@@ -45,7 +45,7 @@ InfiPlot は複数のデプロイ方法に対応しています。個人利用
 Cloudflare へのデプロイはシーンパイプラインがより長い CPU 時間を必要とするため、Workers Paid Plan が必要です。
-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.ja.md%23%E8%A8%AD%E5%AE%9A%E3%82%AC%E3%82%A4%E3%83%89) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot)
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot/blob/main/README.ja.md%23%E8%A8%AD%E5%AE%9A%E3%82%AC%E3%82%A4%E3%83%89) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot)
 デプロイ後、環境変数を設定してください —— 下記の[設定ガイド](#設定ガイド)を参照。リポジトリのルートがアプリ本体です：Vercel では特別なルート設定は不要です。Cloudflare ではビルドコマンドを `pnpm build:cf` に設定するだけで済みます。
@@ -153,7 +153,7 @@ docker compose up -d
 ## 設定ガイド
-InfiPlot は 4 種類のモデルプロバイダと通信します。**テキスト（Text）・ビジョン（Vision）は、任意の OpenAI 互換エンドポイント**を使用でき、自由に組み合わせられます。**画像（Image）**は現在 **Runware**（OpenAI 互換ではなく、独自の task-array プロトコル）を使用します。**音声（TTS）**は **Xiaomi MiMo** の独自音声デザイン/クローンプロトコルを使用します —— キャラクターごとの音声デザイン、クローン、行ごとの抑揚指示に対応します。
+InfiPlot は 4 種類のモデルプロバイダと通信します。**テキスト（Text）・ビジョン（Vision）は、任意の OpenAI 互換エンドポイント**を使用でき、自由に組み合わせられます —— Google Gemini を使う場合は、`*_BASE_URL` をその OpenAI 互換エンドポイント（`https://generativelanguage.googleapis.com/v1beta/openai`）に向けるだけです。Anthropic Claude を使う場合は、互換ゲートウェイ（LiteLLM など）の経由を推奨します —— Anthropic の公式エンドポイントは OpenAI 互換レイヤーを提供していますがキャッシュ非対応のため、コストとレイテンシが上昇します。**画像（Image）**は **Runware**（独自の task-array プロトコル）と **OpenAI**（`gpt-image`）に対応します。**音声（TTS）**は **Xiaomi MiMo**（独自の音声デザイン/クローンプロトコル —— キャラクターごとの音声デザイン、クローン、行ごとの抑揚指示に対応、無料）と **StepFun**（32 種のプリセット音声を AI が自動マッチング、有料ですがより高品質）に対応します。
 **1. プロバイダを選ぶ**
@@ -162,7 +162,7 @@ InfiPlot は 4 種類のモデルプロバイダと通信します。**テキス
 | Text · ストーリー監督  | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL`        | ✅ | DeepSeek の `deepseek-v4-flash` |
 | Image · シーン描画  | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL`     | ✅ | [Runware](https://runware.ai) の `runware:400@6`（FLUX.2 [klein] 9B KV） |
 | Vision · クリック解釈  | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL`  | ✅ | Google の `gemini-3.5-flash` |
-| TTS · キャラクター音声 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 任意 —— 空欄なら無音で動作 | Xiaomi MiMo の `mimo-v2.5-tts` |
+| TTS · キャラクター音声 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 任意 —— 空欄なら無音で動作 | Xiaomi MiMo の `mimo-v2.5-tts`（無料）；有料の選択肢：[StepFun](https://www.stepfun.com) の `step-tts-2` |
 **2. 環境変数を設定する**
@@ -45,7 +45,7 @@ InfiPlot 支持多种部署方式。个人使用推荐 Vercel 一键部署；想
 Cloudflare 部署因场景流水线需要更长 CPU 时间，需要 Workers Paid Plan。
-[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%20uses%20MiMo%27s%20own%20protocol.&envLink=https://github.com/zonghaoyuan/infiplot%23%E9%85%8D%E7%BD%AE%E6%95%99%E7%A8%8B) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot)
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https://github.com/zonghaoyuan/infiplot&env=TEXT_BASE_URL,TEXT_API_KEY,TEXT_MODEL,IMAGE_BASE_URL,IMAGE_API_KEY,IMAGE_MODEL,VISION_BASE_URL,VISION_API_KEY,VISION_MODEL,TTS_BASE_URL,TTS_API_KEY,TTS_SPEECH_MODEL,MOCK_IMAGE&envDescription=Three%20required%20providers%20%2B%20optional%20TTS.%20Any%20OpenAI-compatible%20endpoint%20works%20for%20text%2Fvision.%20TTS%3A%20Xiaomi%20MiMo%20%28free%29%20or%20StepFun%20%28paid%2C%20better%20quality%29.&envLink=https://github.com/zonghaoyuan/infiplot%23%E9%85%8D%E7%BD%AE%E6%95%99%E7%A8%8B) &nbsp; [![Deploy to Cloudflare](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/zonghaoyuan/infiplot)
 部署完成后，填好环境变量 —— 详见下方的[配置教程](#配置教程)。仓库根目录就是应用本身：Vercel 无需额外设置 root directory；在 Cloudflare 上把构建命令设为 `pnpm build:cf` 即可。
@@ -153,7 +153,7 @@ docker compose up -d
 ## 配置教程
-InfiPlot 会与四类模型供应商通信。**文本（Text）和视觉（Vision）** 默认使用 OpenAI 兼容接口，也可原生切换到 **Anthropic** 或 **Google Gemini**。**图像（Image）** 支持 **Runware**（其自有 task-array 协议）、**OpenAI**（`gpt-image`）与 **Google Gemini**（Nano Banana）。**语音（TTS）**使用**小米 MiMo** 自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导。
+InfiPlot 会与四类模型供应商通信。**文本（Text）和视觉（Vision）** 只走 OpenAI 兼容接口——想用 Google Gemini 的话，把 `*_BASE_URL` 指向其 OpenAI 兼容端点（`https://generativelanguage.googleapis.com/v1beta/openai`）即可；想用 Anthropic Claude 的话，推荐通过兼容网关（如 LiteLLM）转发，官方 OpenAI 兼容层不支持缓存，可能推高成本与延迟。**图像（Image）** 支持 **Runware**（其自有 task-array 协议）与 **OpenAI**（`gpt-image`）。**语音（TTS）** 支持**小米 MiMo**（自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导，免费）和 **StepFun 阶跃星辰**（32 个预设音色，由 AI 自动匹配，付费但体验更好）。
 **1. 选择你的供应商**
@@ -162,18 +162,18 @@ InfiPlot 会与四类模型供应商通信。**文本（Text）和视觉（Visio
 | Text · 剧情导演  | `TEXT_BASE_URL` `TEXT_API_KEY` `TEXT_MODEL`        | ✅ | DeepSeek 的 `deepseek-v4-flash` |
 | Image · 场景渲染  | `IMAGE_BASE_URL` `IMAGE_API_KEY` `IMAGE_MODEL`     | ✅ | [Runware](https://runware.ai) 的 `runware:400@6`（FLUX.2 [klein] 9B KV） |
 | Vision · 点击解读  | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL`  | ✅ | Google 的 `gemini-3.5-flash` |
-| TTS · 角色配音 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 可选 —— 留空则静音运行 | 小米 MiMo 的 `mimo-v2.5-tts` |
+| TTS · 角色配音 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 可选 —— 留空则静音运行 | 小米 MiMo 的 `mimo-v2.5-tts`（免费）；付费可选 [StepFun](https://www.stepfun.com) 的 `step-tts-2` |
 > **可选 · 指定接口协议**：每类模型都可加一个 `*_PROVIDER` 变量（`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`）显式选择接口协议。**不设则保持向后兼容**——文本/视觉默认走 OpenAI 兼容接口，图像按 `*_BASE_URL` 自动判断（`runware.ai` → Runware，否则 OpenAI 兼容；个别在 `runware.ai` 上以 OpenAI 协议提供的模型——如 `image-2-vip`——会按 OpenAI 兼容处理，需要时用 `IMAGE_PROVIDER` 显式覆盖即可）。
 >
 > | 取值 | 适用 | 说明 |
 > |---|---|---|
 > | `openai_compatible`（默认） | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
 > | `anthropic` | Text · Vision | 原生 Anthropic Messages 接口 |
 > | `google` | Text · Vision · Image | 原生 Gemini；图像用 Nano Banana 系（如 `gemini-2.5-flash-image`，**勿用 Imagen（已废弃，2026-06-24 停服）**） |
 > | `openai` | Image | OpenAI `gpt-image`，支持参考图编辑 |
 > | `runware` | Image | Runware task-array 协议 |
 >
 > 文本和视觉**仅**支持 `openai_compatible`。要用 Gemini，把 `*_BASE_URL` 指向其 OpenAI 兼容端点（`https://generativelanguage.googleapis.com/v1beta/openai`）即可。要用 Claude，推荐通过兼容网关（如 LiteLLM）转发——Anthropic 官方端点虽提供 OpenAI 兼容层，但不支持缓存，会推高成本与延迟。
 >
 > 此外，`*_BASE_URL` 带不带 `/v1`（甚至末尾多写了 `/chat/completions`）都能正常工作——引擎会自动规范化。
 **2. 填写环境变量**