Merge pull request #36 from zonghaoyuan/staging
Release staging to production
This commit is contained in:
+28
-5
@@ -3,14 +3,18 @@
|
||||
# Recommended setup: Xiaomi MiMo Token Plan for TEXT / VISION / TTS
|
||||
# (one API key covers all three) + Runware for IMAGE (FLUX.2 [klein]).
|
||||
#
|
||||
# TEXT / VISION use any OpenAI-compatible endpoint (any OpenAI-
|
||||
# compatible host works: OpenRouter, OpenAI, Anthropic via proxy,
|
||||
# Gemini, DeepSeek, Ollama, ...).
|
||||
# TEXT / VISION default to any OpenAI-compatible endpoint, and can switch to
|
||||
# native Anthropic or Google Gemini via TEXT_PROVIDER / VISION_PROVIDER.
|
||||
# TTS uses Xiaomi MiMo's own voice design / clone protocol
|
||||
# (not OpenAI-compatible; appends -voicedesign / -voiceclone).
|
||||
#
|
||||
# IMAGE uses Runware's own task-array protocol (not OpenAI-compatible);
|
||||
# the adapter posts an `imageInference` task to IMAGE_BASE_URL.
|
||||
# IMAGE supports Runware (its own task-array protocol), OpenAI (gpt-image),
|
||||
# and Google Gemini (Nano Banana) via IMAGE_PROVIDER.
|
||||
#
|
||||
# *_PROVIDER (optional) selects the wire protocol; leave unset for the
|
||||
# OpenAI-compatible default (image is auto-detected from the URL). Base URLs
|
||||
# tolerate a missing or extra /v1 (or a trailing /chat/completions) — the
|
||||
# engine normalizes them.
|
||||
# =============================================================
|
||||
|
||||
# ---- 1. Text LLM · scene director ----------------------------------
|
||||
@@ -26,6 +30,10 @@
|
||||
TEXT_BASE_URL=https://api.deepseek.com/v1
|
||||
TEXT_API_KEY=sk-xxx
|
||||
TEXT_MODEL=deepseek-v4-flash
|
||||
# TEXT_PROVIDER: openai_compatible (default) | anthropic | google
|
||||
# anthropic → TEXT_BASE_URL=https://api.anthropic.com TEXT_MODEL=claude-sonnet-4-6
|
||||
# google → TEXT_BASE_URL=https://generativelanguage.googleapis.com TEXT_MODEL=gemini-3.5-flash
|
||||
# TEXT_PROVIDER=openai_compatible
|
||||
|
||||
# ---- 2. Image generator (renders the scene background) -------------
|
||||
# Recommended: Runware + FLUX.2 [klein] 9B KV — distilled 4-step model,
|
||||
@@ -36,12 +44,27 @@ TEXT_MODEL=deepseek-v4-flash
|
||||
IMAGE_BASE_URL=https://api.runware.ai/v1
|
||||
IMAGE_API_KEY=runware-xxx
|
||||
IMAGE_MODEL=runware:400@6
|
||||
# IMAGE_PROVIDER: runware (auto-detected for runware.ai) | openai_compatible
|
||||
# | openai | google
|
||||
# openai → gpt-image, supports referenceImages (character/scene continuity).
|
||||
# IMAGE_BASE_URL=https://api.openai.com IMAGE_MODEL=gpt-image-1
|
||||
# google → Gemini "Nano Banana" (Imagen is EOL 2026-06-24, do not use it).
|
||||
# IMAGE_BASE_URL=https://generativelanguage.googleapis.com
|
||||
# IMAGE_MODEL=gemini-2.5-flash-image
|
||||
# NOTE: openai/google return raw bytes → inlined as a data: URI for the session
|
||||
# (heavier per-call transport than Runware's UUID re-reference loop). Runware
|
||||
# stays fastest + cheapest for the scene-by-scene flow.
|
||||
# IMAGE_PROVIDER=runware
|
||||
|
||||
# ---- 3. Vision model · multimodal click interpretation -------------
|
||||
# Recommended: MiMo V2.5 — multimodal, accepts image_url content parts.
|
||||
VISION_BASE_URL=https://token-plan-sgp.xiaomimimo.com/v1
|
||||
VISION_API_KEY=tp-xxx
|
||||
VISION_MODEL=mimo-v2.5
|
||||
# VISION_PROVIDER: openai_compatible (default) | anthropic | google
|
||||
# anthropic → VISION_BASE_URL=https://api.anthropic.com VISION_MODEL=claude-sonnet-4-6
|
||||
# google → VISION_BASE_URL=https://generativelanguage.googleapis.com VISION_MODEL=gemini-3.5-flash
|
||||
# VISION_PROVIDER=openai_compatible
|
||||
|
||||
# ---- 4. TTS · Xiaomi MiMo (optional — leave blank to disable) ------
|
||||
# Per-character voice design → clone, with per-line delivery direction.
|
||||
|
||||
@@ -159,6 +159,12 @@ With the recommended trio, each scene's cost comes mainly from the image generat
|
||||
|
||||
By default the browser fetches images directly from the provider — no setup needed; leave `NEXT_PUBLIC_IMAGE_PROXY_URL` blank and you're completely unaffected. You only want this if you hit progressive "top-to-bottom" image loading (Chrome's `ERR_QUIC_PROTOCOL_ERROR` on some networks paints partial PNGs row by row): deploy a tiny Cloudflare Worker that re-fetches images server-side and serves them atomically over HTTP/2. One-click deploy at **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**, then paste the `workers.dev` URL it prints into `NEXT_PUBLIC_IMAGE_PROXY_URL`.
|
||||
|
||||
**5. Let players bring their own voice Key (optional, recommended)**
|
||||
|
||||
Xiaomi rate-limits the TTS model by RPM/TPM. When a public deployment has many people playing at once through a single shared `TTS_API_KEY`, those limits are easy to hit — the symptom is **story and visuals work fine, but there's no audio**. To fix this, players can optionally enter **their own** Xiaomi MiMo key on the homepage (free to obtain). Synthesis then runs **browser-direct to Xiaomi**, the **key stays in the player's browser and never touches your server**, and they get stable voice with lower latency. It's purely additive: leave it blank and playback falls back to your server key exactly as before.
|
||||
|
||||
See the [Bring-your-own voice Key guide](docs/xiaomi-tts-key.md) for how to obtain and enter one.
|
||||
|
||||
---
|
||||
|
||||
## Roadmap
|
||||
|
||||
@@ -158,6 +158,12 @@ InfiPlot は 4 種類のモデルプロバイダと通信します。**テキス
|
||||
|
||||
デフォルトではブラウザが画像プロバイダーに直接アクセスするため、設定は不要です —— `NEXT_PUBLIC_IMAGE_PROXY_URL` を空欄のままにすれば、まったく影響ありません。画像が「上から順に」表示される現象(一部のネットワークで Chrome の `ERR_QUIC_PROTOCOL_ERROR` により PNG が行ごとに描画される)に遭遇した場合のみ必要です。小さな Cloudflare Worker をデプロイすると、画像をサーバー側で再取得し HTTP/2 で一括返却します。ワンクリックデプロイは **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)** を参照し、出力された `workers.dev` の URL を `NEXT_PUBLIC_IMAGE_PROXY_URL` に設定してください。
|
||||
|
||||
**5. プレイヤー自身の音声 Key(任意・推奨)**
|
||||
|
||||
Xiaomi は TTS モデルに RPM/TPM 制限を設けています。公開デプロイで多数のプレイヤーが単一の `TTS_API_KEY` を共有して同時にプレイすると、この制限に達しやすく、**ストーリーも画像も正常なのに音声だけ出ない**という症状になります。対策として、プレイヤーはトップページで**自分の** Xiaomi MiMo Key(無料で取得可)を任意で入力できます。合成は**ブラウザから Xiaomi へ直接**行われ、**Key はプレイヤーのブラウザ内にのみ保存され、あなたのサーバーを一切経由しません**。これにより安定した音声と低遅延が得られます。完全な追加機能であり、未入力ならこれまで通りサーバー側の Key にフォールバックします。
|
||||
|
||||
取得・入力の手順は [音声 Key 持ち込みガイド](docs/xiaomi-tts-key.md) を参照してください。
|
||||
|
||||
---
|
||||
|
||||
## Roadmap
|
||||
|
||||
@@ -125,7 +125,7 @@ InfiPlot 同时支持部署到 Vercel 与 Cloudflare Workers。Cloudflare 部署
|
||||
|
||||
## 配置教程
|
||||
|
||||
InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Vision)都使用 OpenAI 兼容的接口**,可以自由搭配。**图像(Image)**目前接入 **Runware**(其自有的 task-array 协议,并非 OpenAI 兼容)。**语音(TTS)**使用**小米 MiMo** 自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导。
|
||||
InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Vision)** 默认使用 OpenAI 兼容接口,也可原生切换到 **Anthropic** 或 **Google Gemini**。**图像(Image)** 支持 **Runware**(其自有 task-array 协议)、**OpenAI**(`gpt-image`)与 **Google Gemini**(Nano Banana)。**语音(TTS)**使用**小米 MiMo** 自有的音色设计/克隆协议——支持角色级音色设计、克隆与逐行演绎指导。
|
||||
|
||||
**1. 选择你的供应商**
|
||||
|
||||
@@ -136,6 +136,18 @@ InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Visio
|
||||
| Vision · 点击解读 | `VISION_BASE_URL` `VISION_API_KEY` `VISION_MODEL` | ✅ | Google 的 `gemini-3.5-flash` |
|
||||
| TTS · 角色配音 | `TTS_BASE_URL` `TTS_API_KEY` `TTS_SPEECH_MODEL` | 可选 —— 留空则静音运行 | 小米 MiMo 的 `mimo-v2.5-tts` |
|
||||
|
||||
> **可选 · 指定接口协议**:每类模型都可加一个 `*_PROVIDER` 变量(`TEXT_PROVIDER` / `VISION_PROVIDER` / `IMAGE_PROVIDER`)显式选择接口协议。**不设则保持向后兼容**——文本/视觉默认走 OpenAI 兼容接口,图像按 `*_BASE_URL` 自动判断(`runware.ai` → Runware,否则 OpenAI 兼容;个别在 `runware.ai` 上以 OpenAI 协议提供的模型——如 `image-2-vip`——会按 OpenAI 兼容处理,需要时用 `IMAGE_PROVIDER` 显式覆盖即可)。
|
||||
>
|
||||
> | 取值 | 适用 | 说明 |
|
||||
> |---|---|---|
|
||||
> | `openai_compatible`(默认) | Text · Vision · Image | OpenAI Chat Completions / `/images/generations` |
|
||||
> | `anthropic` | Text · Vision | 原生 Anthropic Messages 接口 |
|
||||
> | `google` | Text · Vision · Image | 原生 Gemini;图像用 Nano Banana 系(如 `gemini-2.5-flash-image`,**勿用 Imagen(已废弃,2026-06-24 停服)**) |
|
||||
> | `openai` | Image | OpenAI `gpt-image`,支持参考图编辑 |
|
||||
> | `runware` | Image | Runware task-array 协议 |
|
||||
>
|
||||
> 此外,`*_BASE_URL` 带不带 `/v1`(甚至末尾多写了 `/chat/completions`)都能正常工作——引擎会自动规范化。
|
||||
|
||||
**2. 填写环境变量**
|
||||
|
||||
九个变量为必填;TTS 可选(留空则静音运行)。此外还有一个用于低成本测试的开关:
|
||||
@@ -158,6 +170,12 @@ InfiPlot 会与四类模型供应商通信。**文本(Text)和视觉(Visio
|
||||
|
||||
默认浏览器直连图片供应商,无需任何配置 —— 留空 `NEXT_PUBLIC_IMAGE_PROXY_URL` 即可,完全不受影响。只有当你遇到图片「层层加载」(Chrome 在某些网络下 `ERR_QUIC_PROTOCOL_ERROR` 导致 PNG 逐行渲染)时才需要它:部署一个极小的 Cloudflare Worker,把图片改为服务端转发 + HTTP/2 原子返回。一键部署见 **[infiplot-image-proxy](https://github.com/zonghaoyuan/infiplot-image-proxy)**,然后把它给出的 `workers.dev` 地址填进 `NEXT_PUBLIC_IMAGE_PROXY_URL`。
|
||||
|
||||
**5. 玩家自带配音 Key(可选,推荐)**
|
||||
|
||||
小米对 TTS 模型有 RPM/TPM 限额。当你的公共部署有多人同时游玩、共用同一把 `TTS_API_KEY` 时,很容易撞到限额,表现为**剧情、画面都正常,唯独没有声音**。为此,玩家可以在首页可选地填入**自己的**小米 MiMo Key(免费申请)——配音请求由**浏览器直连小米**完成,**Key 只存在玩家本地、绝不经过你的服务器**,从而获得稳定配音与更低延迟。这是纯增强:不填则照常使用你部署的服务器 Key,行为不变。
|
||||
|
||||
申请与填写步骤见 [自带配音 Key 教程](docs/xiaomi-tts-key.md)。
|
||||
|
||||
---
|
||||
|
||||
## Roadmap
|
||||
|
||||
@@ -4,9 +4,6 @@ import { NextResponse } from "next/server";
|
||||
import { loadEngineConfig } from "@/lib/config";
|
||||
|
||||
export const runtime = "nodejs";
|
||||
// The synth itself has a 15s per-call ceiling in the engine. 30s here just
|
||||
// covers JSON parsing + outbound network buffer.
|
||||
export const maxDuration = 30;
|
||||
|
||||
export async function POST(req: Request) {
|
||||
let body: BeatAudioRequest;
|
||||
@@ -26,7 +23,11 @@ export async function POST(req: Request) {
|
||||
try {
|
||||
const config = loadEngineConfig();
|
||||
const result = await requestBeatAudio(config, body);
|
||||
return NextResponse.json(result);
|
||||
if (!result.audio) return new Response(null, { status: 204 });
|
||||
const binary = Buffer.from(result.audio.base64, "base64");
|
||||
return new Response(binary, {
|
||||
headers: { "Content-Type": result.audio.mime },
|
||||
});
|
||||
} catch (err) {
|
||||
// Engine already swallows synth errors and returns audio:null. Anything
|
||||
// that reaches here is config-level — surface so the client can log it.
|
||||
|
||||
@@ -4,7 +4,6 @@ import { NextResponse } from "next/server";
|
||||
import { loadEngineConfig } from "@/lib/config";
|
||||
|
||||
export const runtime = "nodejs";
|
||||
export const maxDuration = 60;
|
||||
|
||||
export async function POST(req: Request) {
|
||||
let body: InsertBeatRequest;
|
||||
@@ -22,9 +21,14 @@ export async function POST(req: Request) {
|
||||
}
|
||||
|
||||
try {
|
||||
const config = loadEngineConfig();
|
||||
const base = loadEngineConfig();
|
||||
// See StartRequest.clientTts — BYO clients synth in-browser, so drop server TTS.
|
||||
const config = body.clientTts === true ? { ...base, tts: undefined } : base;
|
||||
const result = await requestInsertBeat(config, body);
|
||||
return NextResponse.json(result);
|
||||
return NextResponse.json({
|
||||
...result,
|
||||
characters: result.characters.map((c) => ({ ...c, voice: undefined })),
|
||||
});
|
||||
} catch (err) {
|
||||
const message = err instanceof Error ? err.message : "Unknown error";
|
||||
return NextResponse.json({ error: message }, { status: 500 });
|
||||
|
||||
@@ -7,7 +7,6 @@ import { NextResponse } from "next/server";
|
||||
import { loadEngineConfig } from "@/lib/config";
|
||||
|
||||
export const runtime = "nodejs";
|
||||
export const maxDuration = 60;
|
||||
|
||||
// Same rationale as /api/vision: the client resizes to 512px max-dim webp
|
||||
// (~30-80KB base64 typical) before upload, so 3 MB is generous headroom
|
||||
|
||||
+20
-8
@@ -1,14 +1,18 @@
|
||||
import { requestScene } from "@infiplot/engine";
|
||||
import type { SceneRequest } from "@infiplot/types";
|
||||
import type { Character, SceneRequest } from "@infiplot/types";
|
||||
import { NextResponse } from "next/server";
|
||||
import { loadEngineConfig } from "@/lib/config";
|
||||
|
||||
function stripKnownVoices(
|
||||
characters: Character[],
|
||||
knownNames: Set<string>,
|
||||
): Character[] {
|
||||
return characters.map((c) =>
|
||||
knownNames.has(c.name) ? { ...c, voice: undefined } : c,
|
||||
);
|
||||
}
|
||||
|
||||
export const runtime = "nodejs";
|
||||
// Capped at 60 for Vercel Hobby (300 allowed on Pro). The scene pipeline is
|
||||
// Writer + CharDesigner×N + Cinematographer + Painter — happy path 9–12s; the
|
||||
// tail (cold provider, multiple new characters) can push 30–45s, so 60 is a
|
||||
// reasonable headroom on Hobby.
|
||||
export const maxDuration = 60;
|
||||
|
||||
export async function POST(req: Request) {
|
||||
let body: SceneRequest;
|
||||
@@ -23,9 +27,17 @@ export async function POST(req: Request) {
|
||||
}
|
||||
|
||||
try {
|
||||
const config = loadEngineConfig();
|
||||
const base = loadEngineConfig();
|
||||
// See StartRequest.clientTts — BYO clients synth in-browser, so drop server TTS.
|
||||
const config = body.clientTts === true ? { ...base, tts: undefined } : base;
|
||||
const result = await requestScene(config, body);
|
||||
return NextResponse.json(result);
|
||||
const knownNames = new Set(
|
||||
(body.session.characters ?? []).map((c) => c.name),
|
||||
);
|
||||
return NextResponse.json({
|
||||
...result,
|
||||
characters: stripKnownVoices(result.characters, knownNames),
|
||||
});
|
||||
} catch (err) {
|
||||
const message = err instanceof Error ? err.message : "Unknown error";
|
||||
return NextResponse.json({ error: message }, { status: 500 });
|
||||
|
||||
@@ -4,7 +4,6 @@ import { NextResponse } from "next/server";
|
||||
import { loadEngineConfig } from "@/lib/config";
|
||||
|
||||
export const runtime = "nodejs";
|
||||
export const maxDuration = 60;
|
||||
|
||||
// Matches /api/vision and /api/parse-style-image — the user's resized 512px
|
||||
// webp is ~30-80 KB; this caps pathological direct-API payloads (which would
|
||||
@@ -41,7 +40,11 @@ export async function POST(req: Request) {
|
||||
}
|
||||
|
||||
try {
|
||||
const config = loadEngineConfig();
|
||||
const base = loadEngineConfig();
|
||||
// BYO key: the browser provisions + synths voices directly against Xiaomi
|
||||
// (key never reaches us), so strip server-side TTS so the engine skips all
|
||||
// provisioning + synth. See StartRequest.clientTts.
|
||||
const config = body.clientTts === true ? { ...base, tts: undefined } : base;
|
||||
const result = await startSession(config, body);
|
||||
return NextResponse.json(result);
|
||||
} catch (err) {
|
||||
|
||||
@@ -4,7 +4,6 @@ import { NextResponse } from "next/server";
|
||||
import { loadEngineConfig } from "@/lib/config";
|
||||
|
||||
export const runtime = "nodejs";
|
||||
export const maxDuration = 60;
|
||||
|
||||
// Browser annotator resizes to 768 wide → typically 200-800 KB base64.
|
||||
// 3 MB caps abusive direct-API payloads (which would inflate upstream
|
||||
|
||||
+10
-1
@@ -1,4 +1,4 @@
|
||||
import type { Metadata } from "next";
|
||||
import type { Metadata, Viewport } from "next";
|
||||
import { Cormorant_Garamond, Inter } from "next/font/google";
|
||||
import { Analytics } from "@/components/Analytics";
|
||||
import "./globals.css";
|
||||
@@ -25,6 +25,15 @@ export const metadata: Metadata = {
|
||||
description: "InfiPlot 是一款用 AI 实时生成图片、语音与剧情分支的交互式剧情游戏 Demo。",
|
||||
};
|
||||
|
||||
// viewportFit:cover lets the immersive /play portrait layout extend under the
|
||||
// iOS notch / home-indicator and exposes env(safe-area-inset-*) to the
|
||||
// floating controls. device-width + initialScale keep mobile rendering 1:1.
|
||||
export const viewport: Viewport = {
|
||||
width: "device-width",
|
||||
initialScale: 1,
|
||||
viewportFit: "cover",
|
||||
};
|
||||
|
||||
export default function RootLayout({
|
||||
children,
|
||||
}: {
|
||||
|
||||
+51
-8
@@ -10,14 +10,8 @@ import {
|
||||
PLOT_STYLES,
|
||||
type Gender,
|
||||
} from "@/lib/options";
|
||||
|
||||
/* ============================================================================
|
||||
InfiPlot · 首页(编辑式视觉风格 · 居中构图,呼应低保真原型)
|
||||
- 顶部 Header:左上角衬线 wordmark logo
|
||||
"use client";
|
||||
|
||||
import { useRouter } from "next/navigation";
|
||||
import { useEffect, useRef, useState } from "react";
|
||||
import { readStoredTtsConfig } from "@/lib/clientTtsConfig";
|
||||
import { TtsKeyModal } from "@/components/TtsKeyModal";
|
||||
|
||||
/* ============================================================================
|
||||
InfiPlot · 首页(编辑式视觉风格 · 居中构图,呼应低保真原型)
|
||||
@@ -1394,7 +1388,12 @@ export default function HomePage() {
|
||||
// 顶部使用提示:默认展示,用户可点 × 永久关闭(localStorage:infiplot:hintClosed)。
|
||||
const [hintClosed, setHintClosed] = useState(false);
|
||||
|
||||
// 自带 TTS Key 弹窗:可选增强,Key 只存浏览器、绝不经过服务器。
|
||||
const [ttsOpen, setTtsOpen] = useState(false);
|
||||
const [ttsConfigured, setTtsConfigured] = useState(false);
|
||||
|
||||
const styleRow = OPTS.findIndex((o) => o.modal);
|
||||
const voiceRow = OPTS.findIndex((o) => o.label === "语音配音");
|
||||
const genderIndex = sel[0] ?? 0;
|
||||
const gender = (OPTS[0]!.items[genderIndex] as Gender) ?? "男性向";
|
||||
const phrases = EXAMPLE_PHRASES[gender];
|
||||
@@ -1436,6 +1435,11 @@ export default function HomePage() {
|
||||
}
|
||||
}, []);
|
||||
|
||||
// 启动时回填「已启用」徽标——读 localStorage 判断用户是否已存过 Key。
|
||||
useEffect(() => {
|
||||
setTtsConfigured(readStoredTtsConfig() != null);
|
||||
}, []);
|
||||
|
||||
// 输入框随内容自动增高:长文本整段可见(打字与点卡片填入都覆盖)。
|
||||
useEffect(() => {
|
||||
const el = inputRef.current;
|
||||
@@ -1661,6 +1665,30 @@ export default function HomePage() {
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* 自带 TTS Key 入口:公共语音模型有 RPM/TPM 限额,高并发易静音;
|
||||
填自己的小米 MiMo Key(免费)→ 稳定配音、延迟更低,且 Key 只存本地。 */}
|
||||
<div className="mt-5 flex justify-center">
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setTtsOpen(true)}
|
||||
className={
|
||||
"inline-flex items-center gap-2 rounded-full border px-4 py-1.5 font-sans text-xs md:text-[13px] transition-colors " +
|
||||
(ttsConfigured
|
||||
? "border-ember-500/40 bg-ember-500/5 text-ember-500 hover:bg-ember-500/10"
|
||||
: "border-clay-900/15 text-clay-500 hover:border-clay-900/30 hover:text-clay-700")
|
||||
}
|
||||
>
|
||||
<i
|
||||
className={
|
||||
ttsConfigured
|
||||
? "fa-solid fa-circle-check text-[11px]"
|
||||
: "fa-solid fa-microphone-lines text-[11px]"
|
||||
}
|
||||
/>
|
||||
{ttsConfigured ? "自带配音 Key · 已启用" : "经常没声音?自带配音 Key(可选)"}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* 使用提示:可被用户永久关闭(localStorage:infiplot:hintClosed) */}
|
||||
{!hintClosed && (
|
||||
<div className="relative mx-auto mt-10 md:mt-12 max-w-[640px] rounded-sm border border-clay-900/10 bg-cream-100/50 px-8 py-3.5">
|
||||
@@ -1826,6 +1854,21 @@ export default function HomePage() {
|
||||
setCustomStyleRefImage={setCustomStyleRefImage}
|
||||
/>
|
||||
)}
|
||||
{ttsOpen && (
|
||||
<TtsKeyModal
|
||||
onClose={() => setTtsOpen(false)}
|
||||
onSaved={(configured) => {
|
||||
setTtsConfigured(configured);
|
||||
// 启用自带 Key 时顺手把「语音配音」拨到「开启」——否则用户配了 Key
|
||||
// 却还是静音,体验自相矛盾。停用时不动其选择,尊重用户原本的偏好。
|
||||
if (configured && voiceRow >= 0) {
|
||||
const onIdx = OPTS[voiceRow]!.items.indexOf("开启");
|
||||
if (onIdx >= 0)
|
||||
setSel((s) => s.map((v, j) => (j === voiceRow ? onIdx : v)));
|
||||
}
|
||||
}}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
+398
-59
@@ -6,30 +6,87 @@ import {
|
||||
Suspense,
|
||||
useCallback,
|
||||
useEffect,
|
||||
useLayoutEffect,
|
||||
useMemo,
|
||||
useRef,
|
||||
useState,
|
||||
} from "react";
|
||||
import { PlayCanvas, type Phase } from "@/components/PlayCanvas";
|
||||
import { TtsKeyModal } from "@/components/TtsKeyModal";
|
||||
import { annotateClick } from "@/lib/annotateClient";
|
||||
import { loadClientTtsConfig } from "@/lib/clientTtsConfig";
|
||||
import { PRESETS } from "@/lib/presets";
|
||||
import { provisionVoice, synthesize } from "@infiplot/tts-client";
|
||||
import type {
|
||||
Beat,
|
||||
BeatAudio,
|
||||
BeatAudioResponse,
|
||||
BeatChoice,
|
||||
Character,
|
||||
CharacterVoice,
|
||||
InsertBeatResponse,
|
||||
Orientation,
|
||||
Scene,
|
||||
SceneExit,
|
||||
SceneResponse,
|
||||
Session,
|
||||
StartResponse,
|
||||
TtsConfig,
|
||||
VisionResponse,
|
||||
} from "@infiplot/types";
|
||||
import { track } from "@/lib/analytics";
|
||||
|
||||
const MUTED_STORAGE_KEY = "infiplot:muted";
|
||||
|
||||
// ── FOT reduction helpers ──────────────────────────────────────────────
|
||||
// Strip bulky voice.referenceAudioBase64 from the session before sending it to
|
||||
// the server. The engine only needs character names + visualDescriptions for
|
||||
// scene generation; voice data is only used by /api/beat-audio (which receives
|
||||
// the voice directly, not via session). The client retains voices locally and
|
||||
// re-merges them from the response via mergeCharactersPreserveVoice.
|
||||
function stripVoicesForTransport(session: Session): Session {
|
||||
return {
|
||||
...session,
|
||||
characters: session.characters.map((c) => ({ ...c, voice: undefined })),
|
||||
};
|
||||
}
|
||||
|
||||
// Merge server-returned characters with locally-held voices. The server strips
|
||||
// voice from already-known characters (P0), so only NEW characters carry voice.
|
||||
// For existing characters, re-attach the voice the client already holds.
|
||||
function mergeCharactersPreserveVoice(
|
||||
local: Character[],
|
||||
remote: Character[],
|
||||
): Character[] {
|
||||
const localByName = new Map(local.map((c) => [c.name, c]));
|
||||
return remote.map((c) => {
|
||||
const prev = localByName.get(c.name);
|
||||
if (!prev) return c;
|
||||
return { ...c, voice: c.voice ?? prev.voice };
|
||||
});
|
||||
}
|
||||
|
||||
// Consecutive silent (no-audio) beats before we surface the BYO-key nudge to a
|
||||
// non-BYO, unmuted player. Set high enough that one transient miss won't trip
|
||||
// it, low enough to catch a scene that's clearly being rate-limited.
|
||||
const SILENCE_NUDGE_THRESHOLD = 3;
|
||||
|
||||
// Mobile-portrait users get a 9:16 scene image painted for them; everyone else
|
||||
// (desktop, tablet, mobile-landscape) keeps the 16:9 landscape image. Only a
|
||||
// touch device (coarse pointer) held upright counts as "portrait" — a mouse
|
||||
// device is always landscape. Detected once and locked for the whole session.
|
||||
function detectOrientation(): Orientation {
|
||||
if (typeof window === "undefined") return "landscape";
|
||||
const portrait = window.matchMedia("(orientation: portrait)").matches;
|
||||
const coarse = window.matchMedia("(pointer: coarse)").matches;
|
||||
return portrait && coarse ? "portrait" : "landscape";
|
||||
}
|
||||
|
||||
// Runs before the browser paints (so it can correct first-frame state without a
|
||||
// visible flash), but useLayoutEffect warns when called during SSR. PlayInner
|
||||
// only ever renders on the client (/play prerenders the Suspense fallback), yet
|
||||
// fall back to useEffect on the server anyway to keep the warning out.
|
||||
const useIsomorphicLayoutEffect =
|
||||
typeof window !== "undefined" ? useLayoutEffect : useEffect;
|
||||
|
||||
// Cap how long we wait for the browser to download + decode a scene image
|
||||
// before giving up and rendering anyway. Runware's CDN is usually <2s for a
|
||||
// 1792×1024 PNG, but over slow links / VPN / strict corp networks the same
|
||||
@@ -257,6 +314,7 @@ function prefetchScenePath(
|
||||
baseSession: Session,
|
||||
steps: ScenePathStep[],
|
||||
depth: number,
|
||||
clientTts: boolean,
|
||||
): void {
|
||||
if (depth >= PREFETCH_MAX_DEPTH) return;
|
||||
const key = pathKey(steps);
|
||||
@@ -267,8 +325,10 @@ function prefetchScenePath(
|
||||
const promise = (async () => {
|
||||
const res = await fetch("/api/scene", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ session: specSession }),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({ session: stripVoicesForTransport(specSession), clientTts }),
|
||||
signal: abort.signal,
|
||||
});
|
||||
if (!res.ok) {
|
||||
@@ -283,6 +343,12 @@ function prefetchScenePath(
|
||||
// transition path awaits the same cached promise via getOrCreateBlobUrl.
|
||||
void getOrCreateBlobUrl(data.imageUrl);
|
||||
|
||||
// Re-attach locally-held voices the server stripped from known characters.
|
||||
data.characters = mergeCharactersPreserveVoice(
|
||||
baseSession.characters,
|
||||
data.characters,
|
||||
);
|
||||
|
||||
// Recursive: if the resulting scene has exactly one change-scene exit,
|
||||
// it is a must-pass node — prefetch its child too.
|
||||
if (depth + 1 < PREFETCH_MAX_DEPTH) {
|
||||
@@ -307,7 +373,13 @@ function prefetchScenePath(
|
||||
characters: data.characters,
|
||||
storyState: data.storyState,
|
||||
};
|
||||
prefetchScenePath(pool, carriedBase, [...steps, nextStep], depth + 1);
|
||||
prefetchScenePath(
|
||||
pool,
|
||||
carriedBase,
|
||||
[...steps, nextStep],
|
||||
depth + 1,
|
||||
clientTts,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -342,6 +414,44 @@ function clearPool(pool: Map<string, PrefetchEntry>): void {
|
||||
pool.clear();
|
||||
}
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// BYO voice resolution (client-direct Xiaomi TTS).
|
||||
//
|
||||
// In BYO mode the server skips all TTS (clientTts:true), so the browser must
|
||||
// obtain each speaker's reference audio itself. `cache` is keyed by character
|
||||
// NAME and persists for the whole session, so a voice locked in on a
|
||||
// character's first speaking beat stays identical across every later scene —
|
||||
// even though /api/scene returns its characters without `.voice`. Storing the
|
||||
// in-flight Promise (not the resolved value) dedupes the burst of concurrent
|
||||
// beats by the same speaker into ONE voicedesign call, which matters because
|
||||
// Xiaomi rate-limits voicedesign hard.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
async function resolveByoVoice(
|
||||
cache: Map<string, Promise<CharacterVoice>>,
|
||||
cfg: TtsConfig,
|
||||
speaker: Character,
|
||||
): Promise<CharacterVoice | null> {
|
||||
const cached = cache.get(speaker.name);
|
||||
if (cached) return cached;
|
||||
// Prebaked cards ship baked reference audio — reuse it directly (cross-key
|
||||
// synth with the user's key works), keeping the prebaked voice identical.
|
||||
if (speaker.voice) {
|
||||
const ready = Promise.resolve(speaker.voice);
|
||||
cache.set(speaker.name, ready);
|
||||
return ready;
|
||||
}
|
||||
if (!speaker.voiceDescription) return null;
|
||||
const p = provisionVoice(cfg, speaker.voiceDescription);
|
||||
cache.set(speaker.name, p);
|
||||
try {
|
||||
return await p;
|
||||
} catch (e) {
|
||||
cache.delete(speaker.name); // failed provision — let a later beat retry
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Component
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
@@ -355,7 +465,7 @@ function PlayInner() {
|
||||
const [currentScene, setCurrentScene] = useState<Scene | null>(null);
|
||||
const [currentBeatId, setCurrentBeatId] = useState<string | null>(null);
|
||||
const [imageUrl, setImageUrl] = useState<string | null>(null);
|
||||
const [beatAudioMap, setBeatAudioMap] = useState<Record<string, BeatAudio>>({});
|
||||
const [beatAudioMap, setBeatAudioMap] = useState<Record<string, string>>({});
|
||||
// Lazy-initialize 优先级:本局选择(homepage 的「语音配音」存到 sessionStorage:infiplot:custom)
|
||||
// > 上次会话的粘性偏好(localStorage:infiplot:muted) > 默认非静音。
|
||||
// 这样首页选了「关闭」开始游戏,进来就是静音;选「开启」就不是静音;进入 play 页后用户自己
|
||||
@@ -381,7 +491,20 @@ function PlayInner() {
|
||||
} | null>(null);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
const [presentation, setPresentation] = useState(false);
|
||||
// Session-locked image orientation (see detectOrientation). "portrait" makes
|
||||
// the whole play surface render full-bleed vertical on phones.
|
||||
const [orientation, setOrientation] = useState<Orientation>("landscape");
|
||||
const [lastExitLabel, setLastExitLabel] = useState<string | null>(null);
|
||||
// Consecutive server-side TTS misses (null audio / failed /api/beat-audio).
|
||||
// Climbs when the shared server key is rate-limited by MiMo — the exact pain
|
||||
// BYO fixes — so the play page can nudge non-BYO users to add their own key.
|
||||
// Reset to 0 on any successful synth. Only the server path touches it.
|
||||
const [silenceStrikes, setSilenceStrikes] = useState(0);
|
||||
// Once the player dismisses the silence nudge, keep it gone for this session.
|
||||
const [nudgeDismissed, setNudgeDismissed] = useState(false);
|
||||
// The in-place BYO-key modal, opened from the silence nudge so the player can
|
||||
// add a key without leaving the play page.
|
||||
const [ttsModalOpen, setTtsModalOpen] = useState(false);
|
||||
|
||||
const startedRef = useRef(false);
|
||||
const poolRef = useRef<Map<string, PrefetchEntry>>(new Map());
|
||||
@@ -396,6 +519,21 @@ function PlayInner() {
|
||||
// 不再单独维护 audioEnabledRef —— 单一来源避免两个 flag 漂移。
|
||||
const mutedRef = useRef<boolean>(muted);
|
||||
|
||||
// Resolved bring-your-own Xiaomi TTS config (region preset + key), read once
|
||||
// from localStorage. When non-null, the browser provisions + synths voices
|
||||
// directly against Xiaomi — the key never touches our server — and every
|
||||
// start/scene/insert-beat request carries clientTts:true so the engine skips
|
||||
// server-side TTS. null = user hasn't opted in (server default / silent).
|
||||
const [byoTtsConfig, setByoTtsConfig] = useState<TtsConfig | null>(() =>
|
||||
loadClientTtsConfig(),
|
||||
);
|
||||
const byoTtsRef = useRef<TtsConfig | null>(byoTtsConfig);
|
||||
// BYO voice cache (see resolveByoVoice). Keyed by character name; persists
|
||||
// across scenes so each speaker is provisioned at most once per session.
|
||||
const provisionedVoicesRef = useRef<Map<string, Promise<CharacterVoice>>>(
|
||||
new Map(),
|
||||
);
|
||||
|
||||
// Mirrors for use inside async handlers (closure-stable)
|
||||
const sessionRef = useRef<Session | null>(null);
|
||||
const currentSceneRef = useRef<Scene | null>(null);
|
||||
@@ -411,9 +549,7 @@ function PlayInner() {
|
||||
return currentScene.beats.find((b) => b.id === currentBeatId) ?? null;
|
||||
}, [currentScene, currentBeatId]);
|
||||
|
||||
const currentBeatAudio = currentBeat ? beatAudioMap[currentBeat.id] : undefined;
|
||||
const audioBase64 = currentBeatAudio?.base64 ?? null;
|
||||
const audioMime = currentBeatAudio?.mime ?? null;
|
||||
const audioSrc = (currentBeat ? beatAudioMap[currentBeat.id] : undefined) ?? null;
|
||||
|
||||
useEffect(() => {
|
||||
sessionRef.current = session;
|
||||
@@ -476,31 +612,73 @@ function PlayInner() {
|
||||
// 「首页选关闭」也走这条路:bootstrap 时 muted 已被初始化为 true。
|
||||
if (!beat.speaker || !beat.line) return;
|
||||
const speaker = sess.characters.find((c) => c.name === beat.speaker);
|
||||
if (!speaker?.voice) return; // not yet provisioned — server can't synth anyway
|
||||
if (!speaker) return;
|
||||
|
||||
const byo = byoTtsRef.current;
|
||||
// Non-BYO relies on the server having provisioned speaker.voice. BYO
|
||||
// skipped server TTS, so it needs a baked voice (prebaked card) or a
|
||||
// voiceDescription to provision from in the browser.
|
||||
if (!byo && !speaker.voice) return;
|
||||
if (byo && !speaker.voice && !speaker.voiceDescription) return;
|
||||
|
||||
if (beatAudioAbortRef.current.has(beat.id)) return;
|
||||
const abort = new AbortController();
|
||||
beatAudioAbortRef.current.set(beat.id, abort);
|
||||
try {
|
||||
const res = await fetch("/api/beat-audio", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
beat: { id: beat.id, line: beat.line, lineDelivery: beat.lineDelivery },
|
||||
voice: speaker.voice,
|
||||
}),
|
||||
signal: abort.signal,
|
||||
});
|
||||
if (!res.ok) return;
|
||||
const json = (await res.json()) as BeatAudioResponse;
|
||||
// Skip the state write if we've been aborted between the .ok check and
|
||||
let audioUrl: string | null = null;
|
||||
if (byo) {
|
||||
// Client-direct: provision (once per speaker, cached) + synth against
|
||||
// Xiaomi with the user's own key — no /api/beat-audio round-trip and
|
||||
// the key never touches our server.
|
||||
const voice = await resolveByoVoice(
|
||||
provisionedVoicesRef.current,
|
||||
byo,
|
||||
speaker,
|
||||
);
|
||||
if (!voice || abort.signal.aborted) return;
|
||||
const out = await synthesize(
|
||||
byo,
|
||||
voice,
|
||||
beat.line,
|
||||
beat.lineDelivery,
|
||||
abort.signal,
|
||||
);
|
||||
audioUrl = `data:${out.mimeType};base64,${out.audioBase64}`;
|
||||
} else {
|
||||
const res = await fetch("/api/beat-audio", {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
beat: { id: beat.id, line: beat.line, lineDelivery: beat.lineDelivery },
|
||||
voice: speaker.voice,
|
||||
}),
|
||||
signal: abort.signal,
|
||||
});
|
||||
if (res.status === 204) {
|
||||
setSilenceStrikes((n) => Math.min(n + 1, 99));
|
||||
return;
|
||||
}
|
||||
if (!res.ok) {
|
||||
setSilenceStrikes((n) => Math.min(n + 1, 99));
|
||||
return;
|
||||
}
|
||||
const blob = await res.blob();
|
||||
audioUrl = URL.createObjectURL(blob);
|
||||
setSilenceStrikes(0);
|
||||
}
|
||||
// Skip the state write if we've been aborted between the await and
|
||||
// here — beat ids are scene-local, so a late arrival from a prior
|
||||
// scene would otherwise overwrite the current scene's audio under the
|
||||
// same id.
|
||||
if (json.audio && !abort.signal.aborted) {
|
||||
setBeatAudioMap((m) => ({ ...m, [beat.id]: json.audio as BeatAudio }));
|
||||
if (audioUrl && !abort.signal.aborted) {
|
||||
setBeatAudioMap((m) => ({ ...m, [beat.id]: audioUrl }));
|
||||
} else if (audioUrl?.startsWith("blob:")) {
|
||||
URL.revokeObjectURL(audioUrl);
|
||||
}
|
||||
} catch {
|
||||
// aborted or network error — silent fallback
|
||||
// aborted / network / Xiaomi rate-limit — silent fallback (no audio)
|
||||
} finally {
|
||||
// Only clear the slot if it's still ours. An aborted prior fetch
|
||||
// running its finally late could otherwise delete the controller of a
|
||||
@@ -536,7 +714,12 @@ function PlayInner() {
|
||||
// scenes) so a late arrival would land under the wrong beat otherwise.
|
||||
useEffect(() => {
|
||||
cancelBeatAudioFetches();
|
||||
setBeatAudioMap({});
|
||||
setBeatAudioMap((prev) => {
|
||||
for (const url of Object.values(prev)) {
|
||||
if (url.startsWith("blob:")) URL.revokeObjectURL(url);
|
||||
}
|
||||
return {};
|
||||
});
|
||||
prefetchSceneAudio();
|
||||
}, [currentScene?.id, prefetchSceneAudio]);
|
||||
|
||||
@@ -571,10 +754,41 @@ function PlayInner() {
|
||||
if (prev === muted) return;
|
||||
cancelBeatAudioFetches();
|
||||
if (muted) return;
|
||||
setBeatAudioMap({});
|
||||
setBeatAudioMap((prev) => {
|
||||
for (const url of Object.values(prev)) {
|
||||
if (url.startsWith("blob:")) URL.revokeObjectURL(url);
|
||||
}
|
||||
return {};
|
||||
});
|
||||
prefetchSceneAudio();
|
||||
}, [muted, prefetchSceneAudio]);
|
||||
|
||||
// ── BYO key enabled/disabled from the play page (silence nudge → modal) ─
|
||||
// On enable: point the synth path at the user's key and immediately
|
||||
// re-synthesize the current scene in-browser, so the voices the player just
|
||||
// missed come back without a reload (their characters already carry
|
||||
// server-provisioned `voice`, which resolveByoVoice reuses with the new key).
|
||||
// On disable: just stop using it; later scenes fall back to the server.
|
||||
const handleByoSaved = useCallback(
|
||||
(configured: boolean) => {
|
||||
const cfg = configured ? loadClientTtsConfig() : null;
|
||||
byoTtsRef.current = cfg;
|
||||
setByoTtsConfig(cfg);
|
||||
if (cfg) {
|
||||
setSilenceStrikes(0);
|
||||
cancelBeatAudioFetches();
|
||||
setBeatAudioMap((prev) => {
|
||||
for (const url of Object.values(prev)) {
|
||||
if (url.startsWith("blob:")) URL.revokeObjectURL(url);
|
||||
}
|
||||
return {};
|
||||
});
|
||||
prefetchSceneAudio();
|
||||
}
|
||||
},
|
||||
[prefetchSceneAudio],
|
||||
);
|
||||
|
||||
// ── Presentation mode toggle ─────────────────────────────────────────
|
||||
const togglePresentation = useCallback(async () => {
|
||||
const entering = !presentation;
|
||||
@@ -619,6 +833,16 @@ function PlayInner() {
|
||||
};
|
||||
}, [togglePresentation, presentation]);
|
||||
|
||||
// Lock the visible orientation BEFORE the first paint, so portrait phones
|
||||
// never flash the landscape loading chrome. The state inits to "landscape"
|
||||
// for SSR-safety; this corrects it pre-paint (no-op re-render on landscape
|
||||
// devices). Prebaked cards (decision C) stay landscape-baked regardless of
|
||||
// device. The bootstrap effect below re-derives the same value for the
|
||||
// /api/start payload.
|
||||
useIsomorphicLayoutEffect(() => {
|
||||
setOrientation(params.get("card") ? "landscape" : detectOrientation());
|
||||
}, [params]);
|
||||
|
||||
// ── Bootstrap: start session ─────────────────────────────────────────
|
||||
useEffect(() => {
|
||||
if (startedRef.current) return;
|
||||
@@ -638,6 +862,7 @@ function PlayInner() {
|
||||
worldSetting: string;
|
||||
styleGuide: string;
|
||||
styleReferenceImage?: string;
|
||||
orientation?: Orientation;
|
||||
} | null = null;
|
||||
if (!cardName) {
|
||||
if (presetId) {
|
||||
@@ -666,6 +891,16 @@ function PlayInner() {
|
||||
}
|
||||
}
|
||||
|
||||
// Lock orientation for the whole session. Prebaked cards (decision C) are
|
||||
// landscape-baked, so they stay landscape regardless of device; only the
|
||||
// live /api/start path requests a portrait paint when the phone is upright.
|
||||
// The visible state is already set pre-paint by the layout effect above;
|
||||
// here we only need the value for the /api/start payload.
|
||||
const sessionOrientation: Orientation = cardName
|
||||
? "landscape"
|
||||
: detectOrientation();
|
||||
if (livePayload) livePayload.orientation = sessionOrientation;
|
||||
|
||||
if (!cardName && !livePayload) {
|
||||
router.replace("/");
|
||||
return;
|
||||
@@ -693,8 +928,13 @@ function PlayInner() {
|
||||
)
|
||||
: fetch("/api/start", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify(livePayload),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
...livePayload,
|
||||
clientTts: !!byoTtsRef.current,
|
||||
}),
|
||||
}).then(async (r) => {
|
||||
if (!r.ok) {
|
||||
const j = (await r.json().catch(() => ({}))) as { error?: string };
|
||||
@@ -734,6 +974,7 @@ function PlayInner() {
|
||||
characters: data.characters,
|
||||
storyState: data.storyState,
|
||||
styleReferenceImage: data.styleReferenceImage,
|
||||
orientation: data.scene.orientation ?? sessionOrientation,
|
||||
};
|
||||
visitedBeatsRef.current = [data.scene.entryBeatId];
|
||||
setSession(initial);
|
||||
@@ -767,7 +1008,7 @@ function PlayInner() {
|
||||
nextSceneSeed: choice.effect.nextSceneSeed,
|
||||
},
|
||||
};
|
||||
prefetchScenePath(poolRef.current, s, [step], 0);
|
||||
prefetchScenePath(poolRef.current, s, [step], 0, !!byoTtsRef.current);
|
||||
}
|
||||
}, [currentScene?.id, session?.id]);
|
||||
|
||||
@@ -844,7 +1085,10 @@ function PlayInner() {
|
||||
visitedBeatIds: [result.scene.entryBeatId],
|
||||
},
|
||||
],
|
||||
characters: result.characters,
|
||||
characters: mergeCharactersPreserveVoice(
|
||||
base.characters,
|
||||
result.characters,
|
||||
),
|
||||
storyState: result.storyState,
|
||||
};
|
||||
visitedBeatsRef.current = [result.scene.entryBeatId];
|
||||
@@ -918,8 +1162,13 @@ function PlayInner() {
|
||||
const promise = (async () => {
|
||||
const res = await fetch("/api/scene", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ session: specSession }),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
session: stripVoicesForTransport(specSession),
|
||||
clientTts: !!byoTtsRef.current,
|
||||
}),
|
||||
});
|
||||
if (!res.ok) {
|
||||
const j = (await res.json().catch(() => ({}))) as { error?: string };
|
||||
@@ -940,8 +1189,10 @@ function PlayInner() {
|
||||
const annotatedImageBase64 = await annotateClick(imageUrl, click);
|
||||
const visionRes = await fetch("/api/vision", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ session, annotatedImageBase64 }),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({ session: stripVoicesForTransport(session), annotatedImageBase64 }),
|
||||
});
|
||||
if (!visionRes.ok) {
|
||||
const j = (await visionRes.json().catch(() => ({}))) as {
|
||||
@@ -956,10 +1207,13 @@ function PlayInner() {
|
||||
setPhase("inserting-beat");
|
||||
const insertRes = await fetch("/api/insert-beat", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
session,
|
||||
session: stripVoicesForTransport(session),
|
||||
freeformAction: decision.intent.freeformAction,
|
||||
clientTts: !!byoTtsRef.current,
|
||||
}),
|
||||
});
|
||||
if (!insertRes.ok) {
|
||||
@@ -995,7 +1249,10 @@ function PlayInner() {
|
||||
history: session.history.map((h, i, arr) =>
|
||||
i === arr.length - 1 ? { ...h, scene: patched } : h,
|
||||
),
|
||||
characters: insertChars,
|
||||
characters: mergeCharactersPreserveVoice(
|
||||
session.characters,
|
||||
insertChars,
|
||||
),
|
||||
};
|
||||
setSession(nextSession);
|
||||
setCurrentScene(patched);
|
||||
@@ -1036,8 +1293,13 @@ function PlayInner() {
|
||||
const promise = (async () => {
|
||||
const res = await fetch("/api/scene", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ session: specSession }),
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
body: JSON.stringify({
|
||||
session: stripVoicesForTransport(specSession),
|
||||
clientTts: !!byoTtsRef.current,
|
||||
}),
|
||||
});
|
||||
if (!res.ok) {
|
||||
const j = (await res.json().catch(() => ({}))) as {
|
||||
@@ -1071,12 +1333,12 @@ function PlayInner() {
|
||||
<p className="text-[10px] smallcaps text-clay-500 mb-6">
|
||||
出 · 了 · 点 · 状 · 况
|
||||
</p>
|
||||
<p className="font-serif italic text-clay-900 text-lg leading-[1.7] mb-10">
|
||||
<p className="font-serif italic text-clay-900 text-lg leading-[1.7] mb-6">
|
||||
{error}
|
||||
</p>
|
||||
<Link
|
||||
href="/"
|
||||
className="text-[10px] smallcaps text-clay-700 hover:text-ember-500 transition-colors inline-flex items-center gap-3"
|
||||
className="mt-4 text-[10px] smallcaps text-clay-700 hover:text-ember-500 transition-colors inline-flex items-center gap-3"
|
||||
>
|
||||
<i className="fa-solid fa-arrow-left text-[9px]" />
|
||||
返 回
|
||||
@@ -1086,13 +1348,18 @@ function PlayInner() {
|
||||
);
|
||||
}
|
||||
|
||||
if (presentation) {
|
||||
// Mobile portrait renders full-bleed by default — it sidesteps the iOS
|
||||
// Safari Fullscreen API (unsupported on iPhone) with a CSS full-viewport
|
||||
// layout instead. Desktop "presentation" mode shares the same immersive
|
||||
// canvas, toggled via the F key.
|
||||
const immersive = presentation || orientation === "portrait";
|
||||
|
||||
if (immersive) {
|
||||
return (
|
||||
<div className="fixed inset-0 bg-black flex items-center justify-center z-50">
|
||||
<PlayCanvas
|
||||
imageUrl={imageUrl}
|
||||
audioBase64={audioBase64}
|
||||
audioMime={audioMime}
|
||||
audioSrc={audioSrc}
|
||||
muted={muted}
|
||||
phase={phase}
|
||||
beat={currentBeat}
|
||||
@@ -1100,8 +1367,33 @@ function PlayInner() {
|
||||
onBackgroundClick={onBackgroundClick}
|
||||
onAdvance={onAdvance}
|
||||
onSelectChoice={onSelectChoice}
|
||||
orientation={orientation}
|
||||
fullViewport
|
||||
/>
|
||||
{orientation === "portrait" && (
|
||||
<div
|
||||
className="absolute inset-x-0 top-0 z-10 flex items-center justify-between px-4 pointer-events-none"
|
||||
style={{ paddingTop: "max(0.5rem, env(safe-area-inset-top))" }}
|
||||
>
|
||||
<Link
|
||||
href="/"
|
||||
className="pointer-events-auto flex h-9 w-9 items-center justify-center rounded-full bg-black/40 text-white/80 backdrop-blur-sm transition-colors hover:text-white"
|
||||
aria-label="返回"
|
||||
>
|
||||
<i className="fa-solid fa-arrow-left text-[13px]" />
|
||||
</Link>
|
||||
<button
|
||||
type="button"
|
||||
onClick={toggleMuted}
|
||||
className="pointer-events-auto flex h-9 w-9 items-center justify-center rounded-full bg-black/40 text-white/80 backdrop-blur-sm transition-colors hover:text-white"
|
||||
aria-label={muted ? "取消静音" : "静音"}
|
||||
>
|
||||
<i
|
||||
className={`fa-solid ${muted ? "fa-volume-xmark" : "fa-volume-high"} text-[13px]`}
|
||||
/>
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -1109,6 +1401,16 @@ function PlayInner() {
|
||||
const sceneCount = session?.history.length ?? 0;
|
||||
const beatCount = visitedBeatsRef.current.length;
|
||||
|
||||
// Surface the BYO-key nudge only to an unmuted, non-BYO player whose last few
|
||||
// beats came back silent (shared key rate-limited) — the exact pain BYO fixes.
|
||||
// Dismissible for the session.
|
||||
const showSilenceNudge =
|
||||
phase === "ready" &&
|
||||
!muted &&
|
||||
!byoTtsConfig &&
|
||||
!nudgeDismissed &&
|
||||
silenceStrikes >= SILENCE_NUDGE_THRESHOLD;
|
||||
|
||||
return (
|
||||
<div className="min-h-screen flex flex-col">
|
||||
<header className="px-5 md:px-12 pt-6 md:pt-8 flex items-center justify-between">
|
||||
@@ -1131,8 +1433,7 @@ function PlayInner() {
|
||||
<main className="flex-1 flex flex-col items-center justify-center px-4 md:px-8 py-6 md:py-10">
|
||||
<PlayCanvas
|
||||
imageUrl={imageUrl}
|
||||
audioBase64={audioBase64}
|
||||
audioMime={audioMime}
|
||||
audioSrc={audioSrc}
|
||||
muted={muted}
|
||||
phase={phase}
|
||||
beat={currentBeat}
|
||||
@@ -1140,6 +1441,7 @@ function PlayInner() {
|
||||
onBackgroundClick={onBackgroundClick}
|
||||
onAdvance={onAdvance}
|
||||
onSelectChoice={onSelectChoice}
|
||||
orientation={orientation}
|
||||
aboveCanvas={
|
||||
<button
|
||||
type="button"
|
||||
@@ -1153,18 +1455,46 @@ function PlayInner() {
|
||||
</button>
|
||||
}
|
||||
aboveCanvasLeft={
|
||||
<button
|
||||
type="button"
|
||||
onClick={toggleMuted}
|
||||
className="text-[10px] smallcaps text-clay-500 hover:text-ember-500 transition-colors flex items-center gap-2"
|
||||
aria-label={muted ? "取消静音" : "静音"}
|
||||
title={muted ? "取消静音" : "静音"}
|
||||
>
|
||||
<i
|
||||
className={`fa-solid ${muted ? "fa-volume-xmark" : "fa-volume-high"} text-[10px]`}
|
||||
/>
|
||||
{muted ? "静 · 音" : "有 · 声"}
|
||||
</button>
|
||||
<>
|
||||
<button
|
||||
type="button"
|
||||
onClick={toggleMuted}
|
||||
className="text-[10px] smallcaps text-clay-500 hover:text-ember-500 transition-colors flex items-center gap-2"
|
||||
aria-label={muted ? "取消静音" : "静音"}
|
||||
title={muted ? "取消静音" : "静音"}
|
||||
>
|
||||
<i
|
||||
className={`fa-solid ${muted ? "fa-volume-xmark" : "fa-volume-high"} text-[10px]`}
|
||||
/>
|
||||
{muted ? "静 · 音" : "有 · 声"}
|
||||
</button>
|
||||
|
||||
{/* Silence nudge — a compact pill right beside the mute toggle.
|
||||
Clicking opens the BYO-key modal in place (no trip to the
|
||||
homepage). The × dismisses it for the session. */}
|
||||
{showSilenceNudge && (
|
||||
<span className="flex items-center gap-1 animate-fade-in">
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setTtsModalOpen(true)}
|
||||
className="inline-flex items-center gap-1.5 rounded-full border border-ember-500/40 bg-ember-500/10 px-2.5 py-1 text-[10px] text-ember-500 hover:bg-ember-500/20 transition-colors"
|
||||
title="经常没声音?填入你自己的小米 MiMo Key(免费),配音更稳定"
|
||||
>
|
||||
<i className="fa-solid fa-volume-xmark text-[9px]" />
|
||||
经常没声音?自带 Key
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setNudgeDismissed(true)}
|
||||
aria-label="关闭提示"
|
||||
title="关闭"
|
||||
className="text-clay-400 hover:text-clay-700 transition-colors"
|
||||
>
|
||||
<i className="fa-solid fa-xmark text-[10px]" />
|
||||
</button>
|
||||
</span>
|
||||
)}
|
||||
</>
|
||||
}
|
||||
/>
|
||||
|
||||
@@ -1181,7 +1511,16 @@ function PlayInner() {
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
|
||||
</main>
|
||||
|
||||
{ttsModalOpen && (
|
||||
<TtsKeyModal
|
||||
onClose={() => setTtsModalOpen(false)}
|
||||
onSaved={handleByoSaved}
|
||||
footerNote="保存后会立即用这把 Key 在你的浏览器里合成当前这一幕的配音;本设备后续游玩也会自动使用此 Key。"
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
+99
-47
@@ -1,7 +1,7 @@
|
||||
"use client";
|
||||
|
||||
import { useCallback, useEffect, useRef, useState, type ReactNode } from "react";
|
||||
import type { Beat, BeatChoice } from "@infiplot/types";
|
||||
import type { Beat, BeatChoice, Orientation } from "@infiplot/types";
|
||||
|
||||
export type Phase =
|
||||
| "loading-first" // first scene not yet rendered
|
||||
@@ -109,11 +109,13 @@ function ChoiceButton({
|
||||
index,
|
||||
label,
|
||||
disabled,
|
||||
vertical,
|
||||
onClick,
|
||||
}: {
|
||||
index: number;
|
||||
label: string;
|
||||
disabled: boolean;
|
||||
vertical: boolean;
|
||||
onClick: () => void;
|
||||
}) {
|
||||
return (
|
||||
@@ -121,8 +123,8 @@ function ChoiceButton({
|
||||
type="button"
|
||||
disabled={disabled}
|
||||
onClick={onClick}
|
||||
className="group relative flex-1 min-w-0 px-4 py-3 text-left transition-all duration-200
|
||||
disabled:opacity-50 disabled:cursor-wait"
|
||||
className={`group relative ${vertical ? "w-full" : "flex-1 min-w-0"} px-4 py-3 text-left transition-all duration-200
|
||||
disabled:opacity-50 disabled:cursor-wait`}
|
||||
style={{
|
||||
background: "rgba(20, 14, 8, 0.68)",
|
||||
border: "1.5px solid rgba(180, 140, 80, 0.65)",
|
||||
@@ -141,13 +143,13 @@ function ChoiceButton({
|
||||
/>
|
||||
<span className="relative flex items-baseline gap-2">
|
||||
<span
|
||||
className="shrink-0 font-serif text-[11px] num"
|
||||
className={`shrink-0 font-serif num ${vertical ? "text-[13px]" : "text-[11px]"}`}
|
||||
style={{ color: "rgba(195,155,75,0.9)" }}
|
||||
>
|
||||
{index + 1}.
|
||||
</span>
|
||||
<span
|
||||
className="font-serif text-[13px] md:text-[14px] leading-snug"
|
||||
className={`font-serif leading-snug ${vertical ? "text-[15px]" : "text-[13px] md:text-[14px]"}`}
|
||||
style={{ color: "rgba(245,235,210,0.95)" }}
|
||||
>
|
||||
{label}
|
||||
@@ -160,8 +162,7 @@ function ChoiceButton({
|
||||
// ── Main component ─────────────────────────────────────────────────────
|
||||
export function PlayCanvas({
|
||||
imageUrl,
|
||||
audioBase64,
|
||||
audioMime,
|
||||
audioSrc,
|
||||
muted,
|
||||
phase,
|
||||
beat,
|
||||
@@ -170,12 +171,12 @@ export function PlayCanvas({
|
||||
onAdvance,
|
||||
onSelectChoice,
|
||||
fullViewport = false,
|
||||
orientation = "landscape",
|
||||
aboveCanvas,
|
||||
aboveCanvasLeft,
|
||||
}: {
|
||||
imageUrl: string | null;
|
||||
audioBase64: string | null;
|
||||
audioMime: string | null;
|
||||
audioSrc: string | null;
|
||||
muted: boolean;
|
||||
phase: Phase;
|
||||
beat: Beat | null;
|
||||
@@ -184,6 +185,8 @@ export function PlayCanvas({
|
||||
onAdvance: () => void;
|
||||
onSelectChoice: (choice: BeatChoice) => void;
|
||||
fullViewport?: boolean;
|
||||
// 会话锁定的图片朝向。"portrait" 时整图铺满视口(object-fit:cover)、选项竖排、字号放大。
|
||||
orientation?: Orientation;
|
||||
// 渲染在图片正上方、右对齐的 slot(画面外、紧贴右上角)。
|
||||
aboveCanvas?: ReactNode;
|
||||
// 渲染在图片正上方、左对齐的 slot(画面外、紧贴左上角),与 aboveCanvas 水平镜像。
|
||||
@@ -204,7 +207,7 @@ export function PlayCanvas({
|
||||
const { shown: typedBody, done: typingDone, skip: skipTypewriter } =
|
||||
useTypewriter(displayBody, beat?.id ?? "", {
|
||||
targetDurationMs: audioDurationMs,
|
||||
waitForAudio: Boolean(audioBase64),
|
||||
waitForAudio: Boolean(audioSrc),
|
||||
});
|
||||
|
||||
// ── Audio source change ──────────────────────────────────────────────
|
||||
@@ -212,12 +215,12 @@ export function PlayCanvas({
|
||||
// unblock the typewriter via timeout so text doesn't stall.
|
||||
useEffect(() => {
|
||||
setAudioDurationMs(undefined);
|
||||
if (!audioBase64) return;
|
||||
if (!audioSrc) return;
|
||||
const timer = setTimeout(() => {
|
||||
setAudioDurationMs((prev) => prev ?? 0);
|
||||
}, AUDIO_WAIT_TIMEOUT_MS);
|
||||
return () => clearTimeout(timer);
|
||||
}, [audioBase64]);
|
||||
}, [audioSrc]);
|
||||
|
||||
// ── Mute toggle ───────────────────────────────────────────────────────
|
||||
useEffect(() => {
|
||||
@@ -225,12 +228,12 @@ export function PlayCanvas({
|
||||
if (!el) return;
|
||||
el.muted = muted;
|
||||
el.playbackRate = SPEECH_RATE;
|
||||
if (!muted && audioBase64 && el.paused) {
|
||||
if (!muted && audioSrc && el.paused) {
|
||||
el.play().catch(() => {
|
||||
// autoplay blocked — silent until next interaction
|
||||
});
|
||||
}
|
||||
}, [muted, audioBase64]);
|
||||
}, [muted, audioSrc]);
|
||||
|
||||
function handleAudioMetadata() {
|
||||
const el = audioRef.current;
|
||||
@@ -255,9 +258,27 @@ export function PlayCanvas({
|
||||
|
||||
function handleImageClick(e: React.MouseEvent<HTMLImageElement>) {
|
||||
if (phase !== "ready" || !imgRef.current || !beat) return;
|
||||
const rect = imgRef.current.getBoundingClientRect();
|
||||
const x = (e.clientX - rect.left) / rect.width;
|
||||
const y = (e.clientY - rect.top) / rect.height;
|
||||
const el = imgRef.current;
|
||||
const rect = el.getBoundingClientRect();
|
||||
// Portrait renders with object-fit:cover, which scales the 9:16 image to
|
||||
// FILL the box and crops the overflow — so the rendered box ≠ the full
|
||||
// image. Map the click from box-space back into full-image-space via the
|
||||
// cover geometry so the marker lands where the user tapped. Landscape's box
|
||||
// matches the image aspect (no crop), so it keeps simple normalization.
|
||||
let x: number;
|
||||
let y: number;
|
||||
if (orientation === "portrait") {
|
||||
const nw = el.naturalWidth || 1024;
|
||||
const nh = el.naturalHeight || 1792;
|
||||
const scale = Math.max(rect.width / nw, rect.height / nh);
|
||||
const dispW = nw * scale;
|
||||
const dispH = nh * scale;
|
||||
x = (e.clientX - rect.left + (dispW - rect.width) / 2) / dispW;
|
||||
y = (e.clientY - rect.top + (dispH - rect.height) / 2) / dispH;
|
||||
} else {
|
||||
x = (e.clientX - rect.left) / rect.width;
|
||||
y = (e.clientY - rect.top) / rect.height;
|
||||
}
|
||||
// If the typewriter is still printing, a click completes it instantly
|
||||
// (standard VN affordance) — the page never sees this click.
|
||||
if (!typingDone) {
|
||||
@@ -291,13 +312,26 @@ export function PlayCanvas({
|
||||
const interactive = phase === "ready" && !!imageUrl;
|
||||
const dimmed = phase === "transitioning";
|
||||
|
||||
const sizeStyle = fullViewport
|
||||
? { maxWidth: "100vw", maxHeight: "100dvh" }
|
||||
: { maxWidth: "96vw", maxHeight: "calc(100dvh - 200px)" };
|
||||
const portrait = orientation === "portrait";
|
||||
const intrinsicW = portrait ? 1024 : 1792;
|
||||
const intrinsicH = portrait ? 1792 : 1024;
|
||||
|
||||
const placeholderWidth = fullViewport
|
||||
? "min(100vw, calc(100dvh * 16 / 9))"
|
||||
: "min(96vw, calc((100dvh - 200px) * 16 / 9))";
|
||||
// Portrait (mobile) always fills the whole viewport with object-fit:cover so
|
||||
// the 9:16 image matches the exact device/window — no letterbox. Landscape
|
||||
// keeps the prior contain-style sizing so the full 16:9 frame stays visible.
|
||||
const sizeStyle: React.CSSProperties = portrait
|
||||
? { width: "100vw", height: "100dvh", objectFit: "cover" }
|
||||
: fullViewport
|
||||
? { maxWidth: "100vw", maxHeight: "100dvh" }
|
||||
: { maxWidth: "96vw", maxHeight: "calc(100dvh - 200px)" };
|
||||
|
||||
const placeholderStyle: React.CSSProperties = portrait
|
||||
? { width: "100vw", height: "100dvh" }
|
||||
: {
|
||||
width: fullViewport
|
||||
? "min(100vw, calc(100dvh * 16 / 9))"
|
||||
: "min(96vw, calc((100dvh - 200px) * 16 / 9))",
|
||||
};
|
||||
|
||||
|
||||
return (
|
||||
@@ -305,11 +339,11 @@ export function PlayCanvas({
|
||||
className={`flex flex-col items-center ${fullViewport ? "w-full h-full justify-center" : "w-full"}`}
|
||||
>
|
||||
{/* Hidden audio element — voice playback for the current beat */}
|
||||
{audioBase64 && (
|
||||
{audioSrc && (
|
||||
<audio
|
||||
key={audioBase64.slice(-48)}
|
||||
key={audioSrc.slice(-48)}
|
||||
ref={audioRef}
|
||||
src={`data:${audioMime ?? "audio/wav"};base64,${audioBase64}`}
|
||||
src={audioSrc}
|
||||
preload="auto"
|
||||
onLoadedMetadata={handleAudioMetadata}
|
||||
onError={handleAudioError}
|
||||
@@ -323,22 +357,23 @@ export function PlayCanvas({
|
||||
style={{ boxShadow: fullViewport ? "none" : SHADOW }}
|
||||
>
|
||||
{/* Background image — Runware CDN URL or data URI (mock mode).
|
||||
The width/height attributes are NOT rendered dimensions (w-auto
|
||||
h-auto + the maxWidth/maxHeight in sizeStyle still drive the
|
||||
final layout); they give the browser an intrinsic aspect ratio
|
||||
so that, while the bytes are still arriving from the CDN, the
|
||||
<img> reserves a 1792:1024 box instead of collapsing to a
|
||||
one-pixel sliver — fixes the "等很久 → 一根线 → 突然出图" jank. */}
|
||||
The width/height attributes give the browser the intrinsic aspect
|
||||
ratio (1792:1024 landscape / 1024:1792 portrait) so that, while the
|
||||
bytes are still arriving from the CDN, the <img> reserves the right
|
||||
box instead of collapsing to a one-pixel sliver — fixes the
|
||||
"等很久 → 一根线 → 突然出图" jank. Landscape uses w-auto/h-auto +
|
||||
maxWidth/maxHeight (contain); portrait switches sizeStyle to
|
||||
100vw×100dvh with object-fit:cover (full-bleed, no letterbox). */}
|
||||
<img
|
||||
key={imageUrl.slice(-48)}
|
||||
ref={imgRef}
|
||||
src={imageUrl}
|
||||
width={1792}
|
||||
height={1024}
|
||||
width={intrinsicW}
|
||||
height={intrinsicH}
|
||||
alt="Generated scene"
|
||||
onClick={handleImageClick}
|
||||
draggable={false}
|
||||
className={`block w-auto h-auto select-none animate-fade-in transition-opacity duration-700 ease-out ${
|
||||
className={`block ${portrait ? "" : "w-auto h-auto"} select-none animate-fade-in transition-opacity duration-700 ease-out ${
|
||||
interactive ? "cursor-pointer" : "cursor-wait"
|
||||
} ${dimmed ? "opacity-40" : "opacity-100"}`}
|
||||
style={sizeStyle}
|
||||
@@ -361,15 +396,29 @@ export function PlayCanvas({
|
||||
)}
|
||||
|
||||
{beat && (
|
||||
<div className="absolute inset-0 flex flex-col justify-end pointer-events-none select-none">
|
||||
<div
|
||||
className="absolute inset-0 flex flex-col justify-end pointer-events-none select-none"
|
||||
style={
|
||||
portrait
|
||||
? { paddingBottom: "env(safe-area-inset-bottom)" }
|
||||
: undefined
|
||||
}
|
||||
>
|
||||
{choices.length > 0 && (
|
||||
<div className="pointer-events-auto px-[3%] pb-[1.5%] flex gap-[1.5%] items-stretch">
|
||||
<div
|
||||
className={`pointer-events-auto px-[3%] pb-[1.5%] flex items-stretch ${
|
||||
portrait
|
||||
? "flex-col gap-2 max-h-[45dvh] overflow-y-auto"
|
||||
: "gap-[1.5%]"
|
||||
}`}
|
||||
>
|
||||
{choices.map((choice, i) => (
|
||||
<ChoiceButton
|
||||
key={choice.id}
|
||||
index={i}
|
||||
label={choice.label}
|
||||
disabled={phase !== "ready"}
|
||||
vertical={portrait}
|
||||
onClick={() => onSelectChoice(choice)}
|
||||
/>
|
||||
))}
|
||||
@@ -407,7 +456,9 @@ export function PlayCanvas({
|
||||
|
||||
{beat.speaker && (
|
||||
<p
|
||||
className="font-serif text-[11px] md:text-[12px] smallcaps mb-[0.6em]"
|
||||
className={`font-serif smallcaps mb-[0.6em] ${
|
||||
portrait ? "text-[13px]" : "text-[11px] md:text-[12px]"
|
||||
}`}
|
||||
style={{ color: "rgba(205,165,90,0.92)" }}
|
||||
>
|
||||
{beat.speaker}
|
||||
@@ -415,15 +466,17 @@ export function PlayCanvas({
|
||||
)}
|
||||
|
||||
<p
|
||||
className="font-serif leading-[1.85] text-[13px] md:text-[15px]"
|
||||
className={`font-serif leading-[1.85] ${
|
||||
portrait ? "text-[16px]" : "text-[13px] md:text-[15px]"
|
||||
}`}
|
||||
style={{ color: "rgba(245,235,210,0.95)" }}
|
||||
>
|
||||
{typedBody}
|
||||
{beat.speaker && beat.narration && (
|
||||
<span
|
||||
className={`block mt-[0.5em] italic text-[12px] md:text-[13px] transition-opacity duration-300 ${
|
||||
typingDone ? "opacity-100" : "opacity-0"
|
||||
}`}
|
||||
className={`block mt-[0.5em] italic transition-opacity duration-300 ${
|
||||
portrait ? "text-[14px]" : "text-[12px] md:text-[13px]"
|
||||
} ${typingDone ? "opacity-100" : "opacity-0"}`}
|
||||
style={{ color: "rgba(200,185,155,0.78)" }}
|
||||
aria-hidden={!typingDone}
|
||||
>
|
||||
@@ -488,11 +541,10 @@ export function PlayCanvas({
|
||||
</div>
|
||||
) : (
|
||||
<div
|
||||
className="relative aspect-video bg-cream-200 flex flex-col items-center justify-center gap-4"
|
||||
style={{
|
||||
width: placeholderWidth,
|
||||
boxShadow: fullViewport ? "none" : SHADOW,
|
||||
}}
|
||||
className={`relative bg-cream-200 flex flex-col items-center justify-center gap-4 ${
|
||||
portrait ? "" : "aspect-video"
|
||||
}`}
|
||||
style={{ ...placeholderStyle, boxShadow: fullViewport ? "none" : SHADOW }}
|
||||
>
|
||||
<div className="w-1.5 h-1.5 bg-clay-500 rounded-full animate-slow-pulse" />
|
||||
<p className="text-[9px] smallcaps text-clay-500 animate-slow-pulse">
|
||||
|
||||
@@ -0,0 +1,271 @@
|
||||
"use client";
|
||||
|
||||
// Bring-your-own Xiaomi MiMo TTS key modal — shared by the homepage and the
|
||||
// play page. Two-step picker (key family → region for Token Plan only), key
|
||||
// stored CLIENT-SIDE ONLY (see lib/clientTtsConfig). `onSaved(configured)`
|
||||
// fires after a save/disable so each host can react (homepage flips the
|
||||
// 语音配音 toggle; the play page re-synthesizes the current scene in-browser).
|
||||
// `footerNote` lets the host tailor the closing hint to its own context.
|
||||
|
||||
import { type ReactNode, useEffect, useState } from "react";
|
||||
import {
|
||||
clearStoredTtsConfig,
|
||||
readStoredTtsConfig,
|
||||
writeStoredTtsConfig,
|
||||
} from "@/lib/clientTtsConfig";
|
||||
import {
|
||||
findTtsPreset,
|
||||
PAYG_PRESET_ID,
|
||||
TTS_KEY_DOC_URL,
|
||||
TTS_REGION_PRESETS,
|
||||
} from "@/lib/ttsPresets";
|
||||
|
||||
const DEFAULT_FOOTER_NOTE: ReactNode =
|
||||
"提示:需将上方「语音配音」设为「开启」配音才会生效。保存后本设备后续游玩会自动使用此 Key。";
|
||||
|
||||
export function TtsKeyModal({
|
||||
onClose,
|
||||
onSaved,
|
||||
footerNote = DEFAULT_FOOTER_NOTE,
|
||||
}: {
|
||||
onClose: () => void;
|
||||
onSaved: (configured: boolean) => void;
|
||||
footerNote?: ReactNode;
|
||||
}) {
|
||||
// Read storage once; useState initializers ignore later renders, so local
|
||||
// edits aren't clobbered and we don't re-hit localStorage every render.
|
||||
const [initial] = useState(() => readStoredTtsConfig());
|
||||
// Two-step picker: choose key family first, then — only for Token Plan — a
|
||||
// region. Pay-as-you-go (`sk-`) keys hit one fixed endpoint, so no region.
|
||||
const initialKind = findTtsPreset(initial?.presetId)?.kind ?? "token-plan";
|
||||
const [keyType, setKeyType] = useState<"token-plan" | "payg">(initialKind);
|
||||
const [regionId, setRegionId] = useState<string>(
|
||||
initialKind === "token-plan"
|
||||
? (initial?.presetId ?? TTS_REGION_PRESETS[0]!.id)
|
||||
: TTS_REGION_PRESETS[0]!.id,
|
||||
);
|
||||
const [apiKey, setApiKey] = useState<string>(initial?.apiKey ?? "");
|
||||
const [showKey, setShowKey] = useState(false);
|
||||
const [shown, setShown] = useState(false);
|
||||
const alreadyConfigured = initial != null;
|
||||
// Soft guard: tp- keys belong to Token Plan, sk- to pay-as-you-go. A
|
||||
// mismatched pairing hits the wrong endpoint → guaranteed auth failure →
|
||||
// silent playback (the very symptom BYO exists to kill). Warn, but never
|
||||
// block: prefix conventions could change and a hard gate would lock out an
|
||||
// otherwise-valid key.
|
||||
const expectedPrefix = keyType === "payg" ? "sk-" : "tp-";
|
||||
const prefixMismatch =
|
||||
apiKey.trim().length > 0 && !apiKey.trim().startsWith(expectedPrefix);
|
||||
|
||||
useEffect(() => {
|
||||
const id = requestAnimationFrame(() => setShown(true));
|
||||
return () => cancelAnimationFrame(id);
|
||||
}, []);
|
||||
|
||||
const close = () => {
|
||||
setShown(false);
|
||||
setTimeout(onClose, 280);
|
||||
};
|
||||
const save = () => {
|
||||
const key = apiKey.trim();
|
||||
if (!key) return;
|
||||
const presetId = keyType === "payg" ? PAYG_PRESET_ID : regionId;
|
||||
writeStoredTtsConfig({ presetId, apiKey: key });
|
||||
onSaved(true);
|
||||
close();
|
||||
};
|
||||
const disable = () => {
|
||||
clearStoredTtsConfig();
|
||||
onSaved(false);
|
||||
close();
|
||||
};
|
||||
|
||||
return (
|
||||
<div
|
||||
onMouseDown={close}
|
||||
className={
|
||||
"fixed inset-0 z-[60] flex items-center justify-center p-6 md:p-10 transition-all duration-300 " +
|
||||
(shown
|
||||
? "bg-clay-900/30 backdrop-blur-md"
|
||||
: "bg-clay-900/0 backdrop-blur-0")
|
||||
}
|
||||
>
|
||||
<div
|
||||
onMouseDown={(e) => e.stopPropagation()}
|
||||
className={
|
||||
"flex w-[560px] max-w-[94vw] max-h-[88vh] flex-col overflow-hidden rounded-sm border border-clay-900/15 bg-cream-50 shadow-2xl shadow-clay-900/25 transition-all duration-300 " +
|
||||
(shown ? "opacity-100 scale-100" : "opacity-0 scale-95")
|
||||
}
|
||||
>
|
||||
<div className="flex items-center gap-5 px-6 md:px-8 py-5 border-b border-clay-900/10">
|
||||
<div className="flex flex-col">
|
||||
<span className="font-serif text-xl md:text-2xl text-clay-900">
|
||||
自带配音 Key
|
||||
</span>
|
||||
<span className="text-[11px] text-clay-500 mt-1 tracking-wide">
|
||||
可选 · 用你自己的小米 MiMo 免费额度,配音更稳定、延迟更低
|
||||
</span>
|
||||
</div>
|
||||
<button
|
||||
type="button"
|
||||
onClick={close}
|
||||
aria-label="关闭"
|
||||
className="ml-auto text-xl leading-none text-clay-500 hover:text-clay-900 transition-colors"
|
||||
>
|
||||
<i className="fa-solid fa-xmark" />
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className="flex flex-col gap-6 overflow-y-auto px-6 md:px-8 py-6">
|
||||
<p className="text-[13px] leading-relaxed text-clay-600">
|
||||
经常没有声音?公共语音模型有调用频率限额(RPM / TPM),同时游玩的人多时很容易撞到限额而静音。填入你自己的小米 MiMo API Key 后,配音将
|
||||
<span className="text-clay-900">直接在你的浏览器里合成</span>
|
||||
、使用你自己的免费额度 ——{" "}
|
||||
<span className="text-clay-900">Key 只保存在本地浏览器、绝不经过我们的服务器</span>
|
||||
。
|
||||
</p>
|
||||
|
||||
<div className="flex flex-col gap-2">
|
||||
<span className="text-[10px] smallcaps text-clay-500">K e y · 类 型</span>
|
||||
<div className="grid grid-cols-2 gap-2">
|
||||
{(
|
||||
[
|
||||
{ kind: "token-plan", label: "套餐 Token Plan", sub: "tp- 开头" },
|
||||
{ kind: "payg", label: "按量付费 Pay-as-you-go", sub: "sk- 开头" },
|
||||
] as const
|
||||
).map((t) => {
|
||||
const active = keyType === t.kind;
|
||||
return (
|
||||
<button
|
||||
key={t.kind}
|
||||
type="button"
|
||||
onClick={() => setKeyType(t.kind)}
|
||||
className={
|
||||
"flex flex-col gap-0.5 rounded-sm border px-3 py-2.5 text-left transition-all " +
|
||||
(active
|
||||
? "border-ember-500 bg-ember-500/5 text-clay-900"
|
||||
: "border-clay-900/12 text-clay-600 hover:border-clay-900/35 hover:bg-cream-100")
|
||||
}
|
||||
>
|
||||
<span className="text-[13px]">{t.label}</span>
|
||||
<span className="text-[10px] text-clay-400">{t.sub}</span>
|
||||
</button>
|
||||
);
|
||||
})}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{keyType === "token-plan" ? (
|
||||
<div className="flex flex-col gap-2">
|
||||
<span className="text-[10px] smallcaps text-clay-500">区 域 节 点</span>
|
||||
<div className="grid grid-cols-1 gap-2 sm:grid-cols-3">
|
||||
{TTS_REGION_PRESETS.map((p) => {
|
||||
const active = p.id === regionId;
|
||||
return (
|
||||
<button
|
||||
key={p.id}
|
||||
type="button"
|
||||
onClick={() => setRegionId(p.id)}
|
||||
className={
|
||||
"rounded-sm border px-3 py-2.5 text-left text-[13px] transition-all " +
|
||||
(active
|
||||
? "border-ember-500 bg-ember-500/5 text-clay-900"
|
||||
: "border-clay-900/12 text-clay-600 hover:border-clay-900/35 hover:bg-cream-100")
|
||||
}
|
||||
>
|
||||
{p.label}
|
||||
</button>
|
||||
);
|
||||
})}
|
||||
</div>
|
||||
<span className="text-[11px] text-clay-400">
|
||||
选择与你的套餐订阅地区一致的节点(通常也是延迟最低的那个)。
|
||||
</span>
|
||||
</div>
|
||||
) : (
|
||||
<div className="flex items-start gap-2 rounded-sm border border-clay-900/10 bg-cream-100/60 px-3.5 py-2.5">
|
||||
<i className="fa-solid fa-circle-info mt-0.5 text-[11px] text-clay-400" />
|
||||
<span className="text-[11px] leading-relaxed text-clay-500">
|
||||
按量付费使用统一端点{" "}
|
||||
<span className="text-clay-700">api.xiaomimimo.com</span>
|
||||
,无需选择区域。
|
||||
</span>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="flex flex-col gap-2">
|
||||
<span className="text-[10px] smallcaps text-clay-500">
|
||||
A P I · K e y
|
||||
</span>
|
||||
<div className="relative">
|
||||
<input
|
||||
value={apiKey}
|
||||
onChange={(e) => setApiKey(e.target.value)}
|
||||
type={showKey ? "text" : "password"}
|
||||
autoComplete="off"
|
||||
spellCheck={false}
|
||||
placeholder={
|
||||
keyType === "payg"
|
||||
? "粘贴 sk- 开头的按量 Key"
|
||||
: "粘贴 tp- 开头的套餐 Key"
|
||||
}
|
||||
className="h-11 w-full rounded-sm border border-clay-900/15 bg-cream-100 pl-4 pr-11 font-sans text-sm text-clay-900 outline-none transition-colors focus:border-ember-500 placeholder:text-clay-400"
|
||||
/>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setShowKey((v) => !v)}
|
||||
aria-label={showKey ? "隐藏" : "显示"}
|
||||
className="absolute right-3 top-1/2 -translate-y-1/2 text-clay-400 hover:text-clay-700 transition-colors"
|
||||
>
|
||||
<i
|
||||
className={`fa-solid ${showKey ? "fa-eye-slash" : "fa-eye"} text-sm`}
|
||||
/>
|
||||
</button>
|
||||
</div>
|
||||
{prefixMismatch && (
|
||||
<span className="flex items-start gap-1.5 text-[11px] leading-relaxed text-ember-500">
|
||||
<i className="fa-solid fa-triangle-exclamation mt-0.5 text-[10px]" />
|
||||
此 Key 不是 {expectedPrefix} 开头,可能与所选「
|
||||
{keyType === "payg" ? "按量付费 Pay-as-you-go" : "套餐 Token Plan"}
|
||||
」类型不符,请确认是否填错。
|
||||
</span>
|
||||
)}
|
||||
<a
|
||||
href={TTS_KEY_DOC_URL}
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="inline-flex items-center gap-1.5 text-[11px] text-ember-500 hover:text-ember-400 transition-colors"
|
||||
>
|
||||
<i className="fa-brands fa-github text-[11px]" />
|
||||
如何免费申请 Key?查看图文教程
|
||||
</a>
|
||||
</div>
|
||||
|
||||
<p className="text-[11px] leading-relaxed text-clay-400">{footerNote}</p>
|
||||
</div>
|
||||
|
||||
<div className="flex items-center gap-3 border-t border-clay-900/10 px-6 md:px-8 py-4">
|
||||
{alreadyConfigured && (
|
||||
<button
|
||||
type="button"
|
||||
onClick={disable}
|
||||
className="inline-flex items-center gap-2 rounded-sm border border-clay-900/15 px-4 py-2 font-sans text-sm text-clay-600 transition-colors hover:border-clay-900/35 hover:text-clay-900"
|
||||
>
|
||||
<i className="fa-solid fa-rotate-left text-xs" />
|
||||
停用并清除
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
type="button"
|
||||
onClick={save}
|
||||
disabled={!apiKey.trim()}
|
||||
className="ml-auto inline-flex items-center gap-2 rounded-sm bg-clay-900 px-5 py-2.5 font-sans text-sm text-cream-50 transition-colors hover:bg-ember-500 disabled:cursor-not-allowed disabled:opacity-40"
|
||||
>
|
||||
<i className="fa-solid fa-check text-xs" />
|
||||
保存并启用
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,106 @@
|
||||
# 自带配音 Key 教程(小米 MiMo TTS)
|
||||
|
||||
InfiPlot 的角色配音由小米 **MiMo-V2.5-TTS** 模型实时合成。按照本教程,你可以免费申请一个属于自己的 API Key,
|
||||
填入 InfiPlot 后即可获得**更稳定的配音和更低的延迟**——而且这个 Key **只保存在你的浏览器里,绝不会经过我们的服务器**。
|
||||
|
||||
> 本教程随仓库维护,链接长期有效。
|
||||
|
||||
---
|
||||
|
||||
## 为什么需要自带 Key?
|
||||
|
||||
InfiPlot 默认使用一个**公共 Key** 为所有用户提供配音。小米对语音模型设有 **RPM(每分钟请求数)/ TPM(每分钟 Token 数)** 的调用限额,而公共 Key 的额度由全部用户共享。当同时在线的人较多时,公共 Key 很容易达到上限,表现为——
|
||||
|
||||
- 剧情和画面都正常,**唯独没有声音**(静音);
|
||||
- 或者配音断断续续、需要等待较长时间。
|
||||
|
||||
填入你**自己的** Key 后,你将使用独立的额度,不再受其他用户的影响:
|
||||
|
||||
- ✅ **配音稳定**,不再出现随机静音;
|
||||
- ✅ **延迟更低**(套餐 Key 还可以选择就近的区域节点);
|
||||
- ✅ **完全免费**——MiMo-V2.5-TTS 目前限时**免费**,不消耗套餐额度。
|
||||
|
||||
这是一个**可选的增强功能**。不填也能正常游玩,只是高峰期更容易遇到静音。
|
||||
|
||||
---
|
||||
|
||||
## 一、免费申请 API Key
|
||||
|
||||
1. 打开小米 MiMo 开放平台并注册 / 登录:<https://platform.xiaomimimo.com>
|
||||
|
||||
2. **推荐:获取按量付费 Key(`sk-` 开头)**
|
||||
- 进入**控制台 → API Keys**:<https://platform.xiaomimimo.com/console/api-keys>
|
||||
- 在该页面创建或复制你的 API Key(形如 `sk-xxxxxxxx`)。
|
||||
- 按量付费 Key 注册后即可使用,**无需额外购买套餐**,适合大多数用户。
|
||||
|
||||
3. **备选:获取套餐 Key(`tp-` 开头)**
|
||||
- 如果你已经购买了 Token Plan 套餐,可以进入**控制台 → 套餐管理**:<https://platform.xiaomimimo.com/console/plan-manage>
|
||||
- 在该页面复制你的套餐 API Key(形如 `tp-xxxxxxxx`)。
|
||||
|
||||
4. 妥善保管你的 Key,**不要公开分享**。
|
||||
|
||||
> MiMo-V2.5-TTS 系列当前限时**免费**(不消耗套餐 Credits),配音基本不会产生费用。具体以平台公告为准。
|
||||
|
||||
---
|
||||
|
||||
## 二、选择 Key 类型(套餐需额外选区域)
|
||||
|
||||
小米有**两类 Key**,分别对应不同的服务地址。在 InfiPlot 中填写时需要**先选择 Key 类型**——通过 Key 的前缀即可判断:`sk-` 是按量付费、`tp-` 是套餐,两者不能混用。
|
||||
|
||||
**① 按量付费 Pay-as-you-go(`sk-` 开头)** —— 使用统一服务地址 `https://api.xiaomimimo.com/v1`,**无需选择区域**,直接填入 Key 即可。
|
||||
|
||||
**② 套餐 Token Plan(`tp-` 开头)** —— 需要额外选择一个**区域节点**,对应小米在不同地区部署的 Token Plan 服务:
|
||||
|
||||
| 区域 | 说明 | 服务地址 |
|
||||
| --- | --- | --- |
|
||||
| 新加坡 · Singapore | 亚太地区推荐 | `https://token-plan-sgp.xiaomimimo.com/v1` |
|
||||
| 中国大陆 · China | 中国大陆推荐 | `https://token-plan-cn.xiaomimimo.com/v1` |
|
||||
| 欧洲 · Amsterdam | 欧洲推荐 | `https://token-plan-ams.xiaomimimo.com/v1` |
|
||||
|
||||
请选择**与你的套餐订阅地区一致**的节点(通常也是离你最近、延迟最低的那个)。
|
||||
|
||||
---
|
||||
|
||||
## 三、在 InfiPlot 里填写
|
||||
|
||||
1. 回到 InfiPlot **首页**,在选项区下方点击 **「经常没声音?自带配音 Key(可选)」**。
|
||||
2. 在弹窗里:
|
||||
- **选择 Key 类型**(按量付费 / 套餐);选「套餐」时需要额外**选择区域**,选「按量付费」则无需选区域;
|
||||
- **粘贴你的 API Key**;
|
||||
3. 点击 **「保存并启用」**。按钮会变为 **「自带配音 Key · 已启用」**,「语音配音」也会自动切换为「开启」。
|
||||
4. 开始游玩——配音将由你的浏览器**直接连接小米服务**完成。
|
||||
|
||||
想停用时,再次打开弹窗点击 **「停用并清除」** 即可,本地保存的 Key 会一并删除。
|
||||
|
||||
---
|
||||
|
||||
## 四、隐私说明
|
||||
|
||||
- 你的 API Key **只保存在当前浏览器的 `localStorage`**(键名 `infiplot:tts`)中。
|
||||
- 启用后,配音请求由**你的浏览器直接发送至小米**对应的服务地址,携带你的 Key。
|
||||
- 我们的服务器**完全不参与**这条链路,**既看不到也不会记录**你的 Key。
|
||||
- 更换设备、更换浏览器或清除缓存后需要重新填写,这是预期行为。
|
||||
|
||||
---
|
||||
|
||||
## 五、常见问题
|
||||
|
||||
**Q:填了 Key 还是没声音?**
|
||||
- 确认「语音配音」处于「开启」状态;
|
||||
- 确认 **Key 类型选对了**:`sk-` 选「按量付费」、`tp-` 选「套餐」,类型选错会导致鉴权失败;
|
||||
- 确认 Key 没有填错或包含多余空格,且仍有可用额度;
|
||||
- 使用套餐 Key 时,可以尝试切换**区域**(区域与订阅地不匹配也可能导致失败);
|
||||
- 打开浏览器开发者工具的 Network 面板,查看对 `*.xiaomimimo.com` 的请求返回了什么错误。
|
||||
|
||||
**Q:会产生费用吗?**
|
||||
- MiMo-V2.5-TTS 当前限时免费,正常游玩的配音不会消耗套餐额度。最终以小米平台的计费公告为准。
|
||||
|
||||
**Q:`sk-` 和 `tp-` 用哪个?**
|
||||
- 推荐使用 `sk-`(按量付费),注册后即可使用,无需购买套餐。如果你已有 Token Plan 套餐,也可以使用 `tp-`(套餐 Key)。两者不能混用,类型选错会导致鉴权失败。
|
||||
|
||||
**Q:我的 Key 安全吗?**
|
||||
- 安全。Key 只存在你的本地浏览器中,只会发送至小米官方服务地址,不经过 InfiPlot 服务器。但请不要将 Key 公开发布或分享给他人。
|
||||
|
||||
---
|
||||
|
||||
有问题欢迎在 [GitHub Issues](https://github.com/zonghaoyuan/infiplot/issues) 反馈。
|
||||
+88
-2
@@ -1,5 +1,10 @@
|
||||
import type { ProviderConfig } from "@infiplot/types";
|
||||
import { generateText } from "ai";
|
||||
import type { LanguageModelUsage, ModelMessage } from "ai";
|
||||
import { createAnthropic } from "@ai-sdk/anthropic";
|
||||
import { createGoogleGenerativeAI } from "@ai-sdk/google";
|
||||
import type { ProviderConfig, ProviderProtocol } from "@infiplot/types";
|
||||
import { fetchWithRetry } from "./fetchWithRetry";
|
||||
import { normalizeBaseUrl } from "./normalizeUrl";
|
||||
|
||||
export type ChatMessage = {
|
||||
role: "system" | "user" | "assistant";
|
||||
@@ -57,6 +62,31 @@ function summarizeUsage(tag: string, usage: Usage | undefined): string {
|
||||
return `[cache] ${tag} prompt=${prompt} completion=${completion} (provider didn't report cache stats)`;
|
||||
}
|
||||
|
||||
// AI SDK 6 unifies cache stats across providers into usage.inputTokenDetails,
|
||||
// so a single shape covers Anthropic + Gemini (no per-provider probing).
|
||||
function summarizeSdkUsage(
|
||||
tag: string,
|
||||
usage: LanguageModelUsage | undefined,
|
||||
): string {
|
||||
if (!usage) return `[cache] ${tag} no-usage`;
|
||||
const input = usage.inputTokens ?? 0;
|
||||
const output = usage.outputTokens ?? 0;
|
||||
const read = usage.inputTokenDetails?.cacheReadTokens;
|
||||
const write = usage.inputTokenDetails?.cacheWriteTokens;
|
||||
if (typeof read === "number" || typeof write === "number") {
|
||||
const hit = read ?? 0;
|
||||
const create = write ?? 0;
|
||||
const rate = input > 0 ? ((hit / input) * 100).toFixed(1) : "n/a";
|
||||
return `[cache] ${tag} hit=${hit} create=${create} input=${input} rate=${rate}% completion=${output}`;
|
||||
}
|
||||
return `[cache] ${tag} input=${input} completion=${output} (provider didn't report cache stats)`;
|
||||
}
|
||||
|
||||
// text/vision default to the OpenAI-compatible wire protocol when unset.
|
||||
function resolveTextProtocol(config: ProviderConfig): ProviderProtocol {
|
||||
return config.provider ?? "openai_compatible";
|
||||
}
|
||||
|
||||
export async function chat(
|
||||
config: ProviderConfig,
|
||||
messages: ChatMessage[],
|
||||
@@ -66,7 +96,63 @@ export async function chat(
|
||||
tag?: string;
|
||||
},
|
||||
): Promise<string> {
|
||||
const url = `${config.baseUrl.replace(/\/$/, "")}/chat/completions`;
|
||||
const protocol = resolveTextProtocol(config);
|
||||
if (protocol === "anthropic" || protocol === "google") {
|
||||
return chatViaAiSdk(config, messages, opts, protocol);
|
||||
}
|
||||
return chatOpenAiCompatible(config, messages, opts);
|
||||
}
|
||||
|
||||
// Native Anthropic / Gemini via the Vercel AI SDK. response_format is not sent
|
||||
// (Anthropic has no JSON mode); the engine relies on parseJsonLoose downstream,
|
||||
// matching how it already tolerates loose JSON from every provider.
|
||||
async function chatViaAiSdk(
|
||||
config: ProviderConfig,
|
||||
messages: ChatMessage[],
|
||||
opts: { temperature?: number; tag?: string } | undefined,
|
||||
protocol: "anthropic" | "google",
|
||||
): Promise<string> {
|
||||
const baseURL = normalizeBaseUrl(config.baseUrl, protocol);
|
||||
const model =
|
||||
protocol === "anthropic"
|
||||
? createAnthropic({ apiKey: config.apiKey, baseURL })(config.model)
|
||||
: createGoogleGenerativeAI({ apiKey: config.apiKey, baseURL })(
|
||||
config.model,
|
||||
);
|
||||
|
||||
const system = messages.find((m) => m.role === "system")?.content;
|
||||
const convo: ModelMessage[] = messages
|
||||
.filter((m) => m.role !== "system")
|
||||
.map((m) => ({
|
||||
role: m.role as "user" | "assistant",
|
||||
content: m.content,
|
||||
}));
|
||||
|
||||
const { text, usage } = await generateText({
|
||||
model,
|
||||
system,
|
||||
messages: convo,
|
||||
temperature: opts?.temperature ?? 0.9,
|
||||
});
|
||||
|
||||
console.log(summarizeSdkUsage(opts?.tag ?? "chat", usage));
|
||||
|
||||
if (typeof text !== "string" || text.length === 0) {
|
||||
throw new Error(`Chat API (AI SDK ${protocol}) returned no content.`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
async function chatOpenAiCompatible(
|
||||
config: ProviderConfig,
|
||||
messages: ChatMessage[],
|
||||
opts?: {
|
||||
temperature?: number;
|
||||
responseFormat?: "json_object" | "text";
|
||||
tag?: string;
|
||||
},
|
||||
): Promise<string> {
|
||||
const url = `${normalizeBaseUrl(config.baseUrl, "openai_compatible")}/chat/completions`;
|
||||
const body: Record<string, unknown> = {
|
||||
model: config.model,
|
||||
messages,
|
||||
|
||||
@@ -5,6 +5,7 @@ export async function fetchWithRetry(
|
||||
init: RetryInit,
|
||||
): Promise<Response> {
|
||||
const { retries = 2, retryDelayMs = 1500, ...fetchInit } = init;
|
||||
if (!fetchInit.redirect) fetchInit.redirect = "manual";
|
||||
|
||||
let lastError: unknown;
|
||||
for (let attempt = 0; attempt <= retries; attempt++) {
|
||||
|
||||
+171
-49
@@ -1,5 +1,9 @@
|
||||
import type { ProviderConfig } from "@infiplot/types";
|
||||
import { generateImage as generateImageSdk } from "ai";
|
||||
import { createOpenAI } from "@ai-sdk/openai";
|
||||
import { createGoogleGenerativeAI } from "@ai-sdk/google";
|
||||
import type { Orientation, ProviderConfig, ProviderProtocol } from "@infiplot/types";
|
||||
import { fetchWithRetry } from "./fetchWithRetry";
|
||||
import { normalizeBaseUrl } from "./normalizeUrl";
|
||||
|
||||
// Runware uses its own task-array protocol (not OpenAI-compatible).
|
||||
// POST <baseUrl> with [{ taskType: "imageInference", ... }]; errors come
|
||||
@@ -38,30 +42,71 @@ export type GenerateImageOptions = {
|
||||
* Reference image (UUID, public URL, or base64) for img2img. When set,
|
||||
* FLUX preserves the seed image's composition and applies `strength` to
|
||||
* deviate. NOTE: FLUX.2 [klein] 9B KV does NOT support seedImage — use
|
||||
* `referenceImages` for visual continuity instead.
|
||||
* `referenceImages` for visual continuity instead. Runware-only.
|
||||
*/
|
||||
seedImage?: string;
|
||||
/**
|
||||
* Reference images (UUIDs, URLs, or base64) to condition generation on —
|
||||
* typically character portraits + the prior scene image. Runware caps at 4;
|
||||
* we silently truncate beyond that.
|
||||
* we silently truncate beyond that. On the OpenAI/Gemini AI SDK paths these
|
||||
* map to `prompt.images` (the SDK accepts public URLs or data URLs).
|
||||
*/
|
||||
referenceImages?: string[];
|
||||
/** 0–1, FLUX needs ≥ 0.8 to actually have an effect. */
|
||||
/** 0–1, FLUX needs ≥ 0.8 to actually have an effect. Runware-only. */
|
||||
strength?: number;
|
||||
/**
|
||||
* Output aspect, locked per session. "portrait" → 9:16 vertical for mobile;
|
||||
* default/"landscape" → 16:9 widescreen. Mapped to each provider's nearest
|
||||
* supported size: Runware 1024×1792, OpenAI-compatible REST 1024x1792,
|
||||
* native gpt-image 1024x1536, Gemini aspectRatio 9:16.
|
||||
*/
|
||||
orientation?: Orientation;
|
||||
};
|
||||
|
||||
export type GenerateImageResult = {
|
||||
/** Public CDN URL of the generated image (Runware-hosted). */
|
||||
/**
|
||||
* Image the client can render directly. A Runware CDN URL on the Runware
|
||||
* path; a `data:<mime>;base64,...` URI on the AI SDK paths (OpenAI/Gemini
|
||||
* return raw bytes, not a hosted URL).
|
||||
*/
|
||||
imageUrl: string;
|
||||
/** Stable UUID for cheap re-reference in later `referenceImages`. */
|
||||
/**
|
||||
* Stable handle for cheap re-reference in later `referenceImages`. A real
|
||||
* Runware UUID on the Runware path; a synthetic UUID on other paths (those
|
||||
* re-reference via the URL/data-URL form instead).
|
||||
*/
|
||||
imageUuid: string;
|
||||
};
|
||||
|
||||
// Match the Runware host by parsed hostname (exact match or subdomain), not a
|
||||
// bare substring — otherwise `notrunware.ai` or `api.runware.ai.evil.com` would
|
||||
// misroute to the Runware protocol. Falls back to false on an unparseable URL.
|
||||
function isRunwareHost(baseUrl: string): boolean {
|
||||
try {
|
||||
const host = new URL(baseUrl).hostname.toLowerCase();
|
||||
return host === "runware.ai" || host.endsWith(".runware.ai");
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Image roles support more protocols than text/vision. When IMAGE_PROVIDER is
|
||||
// unset we keep the historical URL-based inference so existing deployments
|
||||
// (Runware, or an OpenAI-compatible gateway) behave exactly as before.
|
||||
function inferImageProtocol(config: ProviderConfig): ProviderProtocol {
|
||||
const isOpenAiCompat =
|
||||
!isRunwareHost(config.baseUrl) || config.model === "image-2-vip";
|
||||
return isOpenAiCompat ? "openai_compatible" : "runware";
|
||||
}
|
||||
|
||||
function resolveImageProtocol(config: ProviderConfig): ProviderProtocol {
|
||||
return config.provider ?? inferImageProtocol(config);
|
||||
}
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// generateImage — text-to-image (default) or referenceImages-conditioned.
|
||||
// Returns both the public URL (for client display + future references)
|
||||
// and the UUID (cheapest reference form for subsequent calls).
|
||||
// Returns both a renderable image URL and a re-reference handle (see
|
||||
// GenerateImageResult). Dispatches on the resolved wire protocol.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
export async function generateImage(
|
||||
@@ -69,58 +114,135 @@ export async function generateImage(
|
||||
prompt: string,
|
||||
options?: GenerateImageOptions,
|
||||
): Promise<GenerateImageResult> {
|
||||
const url = config.baseUrl.replace(/\/$/, "");
|
||||
const protocol = resolveImageProtocol(config);
|
||||
switch (protocol) {
|
||||
case "openai":
|
||||
case "google":
|
||||
return generateImageViaAiSdk(config, prompt, options, protocol);
|
||||
case "runware":
|
||||
return generateImageRunware(config, prompt, options);
|
||||
case "anthropic":
|
||||
throw new Error(
|
||||
'IMAGE_PROVIDER "anthropic" does not generate images. Use "openai", "google", "runware", or "openai_compatible".',
|
||||
);
|
||||
case "openai_compatible":
|
||||
default:
|
||||
return generateImageOpenAiCompatible(config, prompt, options);
|
||||
}
|
||||
}
|
||||
|
||||
// 1. OpenAI-compatible route (GPTGod, DALL-E, etc.)
|
||||
const isOpenAi = !url.includes("runware.ai") || config.model === "image-2-vip";
|
||||
if (isOpenAi) {
|
||||
const endpoint = url.endsWith("/images/generations") ? url : `${url}/images/generations`;
|
||||
console.log(`[ai-client] Calling OpenAI-compatible image generations at: ${endpoint} with model: ${config.model}`);
|
||||
|
||||
const res = await fetchWithRetry(endpoint, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
Authorization: `Bearer ${config.apiKey}`,
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: config.model,
|
||||
prompt: prompt,
|
||||
n: 1,
|
||||
size: "1792x1024", // Use horizontal size (16:9)
|
||||
}),
|
||||
});
|
||||
// Native OpenAI (gpt-image) / Gemini (Nano Banana) via the Vercel AI SDK.
|
||||
// Unlike the fetch path, this supports reference-image editing via
|
||||
// `prompt.images`. The SDK returns raw bytes (no hosted URL), so we hand the
|
||||
// client a data URI and synthesize a UUID; continuity references reuse the
|
||||
// data URI rather than a provider UUID.
|
||||
async function generateImageViaAiSdk(
|
||||
config: ProviderConfig,
|
||||
prompt: string,
|
||||
options: GenerateImageOptions | undefined,
|
||||
protocol: "openai" | "google",
|
||||
): Promise<GenerateImageResult> {
|
||||
const baseURL = normalizeBaseUrl(config.baseUrl, protocol);
|
||||
const imageModel =
|
||||
protocol === "openai"
|
||||
? createOpenAI({ apiKey: config.apiKey, baseURL }).image(config.model)
|
||||
: createGoogleGenerativeAI({ apiKey: config.apiKey, baseURL }).image(
|
||||
config.model,
|
||||
);
|
||||
|
||||
const text = await res.text();
|
||||
let json: any;
|
||||
try {
|
||||
json = JSON.parse(text);
|
||||
} catch {
|
||||
throw new Error(`OpenAI Image API error ${res.status}: ${text.slice(0, 500)}`);
|
||||
}
|
||||
const refs = (options?.referenceImages ?? []).slice(0, MAX_REFERENCE_IMAGES);
|
||||
const promptArg =
|
||||
refs.length > 0 ? { text: prompt, images: refs } : prompt;
|
||||
|
||||
if (json.error) {
|
||||
throw new Error(`OpenAI Image API error: ${json.error.message || JSON.stringify(json.error)}`);
|
||||
}
|
||||
// Session-locked aspect. gpt-image takes an explicit `size` (portrait /
|
||||
// landscape options are 1024x1536 / 1536x1024); Gemini takes an `aspectRatio`.
|
||||
const portrait = options?.orientation === "portrait";
|
||||
const { image } = await generateImageSdk({
|
||||
model: imageModel,
|
||||
prompt: promptArg,
|
||||
...(protocol === "openai"
|
||||
? { size: (portrait ? "1024x1536" : "1536x1024") as `${number}x${number}` }
|
||||
: { aspectRatio: (portrait ? "9:16" : "16:9") as `${number}:${number}` }),
|
||||
});
|
||||
|
||||
const data = json.data?.[0];
|
||||
const imageUrl = data?.url;
|
||||
if (!imageUrl) {
|
||||
throw new Error(`No image URL in OpenAI response: ${text.slice(0, 300)}`);
|
||||
}
|
||||
// Generate a mock UUID since OpenAI compatible endpoint doesn't have UUIDs
|
||||
const imageUuid = crypto.randomUUID();
|
||||
return { imageUrl, imageUuid };
|
||||
return {
|
||||
imageUrl: `data:${image.mediaType};base64,${image.base64}`,
|
||||
imageUuid: crypto.randomUUID(),
|
||||
};
|
||||
}
|
||||
|
||||
// OpenAI-compatible REST route (GPTGod, DALL-E proxies, etc.). Basic
|
||||
// text-to-image only — no reference images on this path; for editing/anchoring
|
||||
// set IMAGE_PROVIDER=openai (or google) to take the AI SDK path above.
|
||||
async function generateImageOpenAiCompatible(
|
||||
config: ProviderConfig,
|
||||
prompt: string,
|
||||
options?: GenerateImageOptions,
|
||||
): Promise<GenerateImageResult> {
|
||||
const base = normalizeBaseUrl(config.baseUrl, "openai_compatible");
|
||||
const endpoint = `${base}/images/generations`;
|
||||
console.log(
|
||||
`[ai-client] Calling OpenAI-compatible image generations at: ${endpoint} with model: ${config.model}`,
|
||||
);
|
||||
|
||||
const res = await fetchWithRetry(endpoint, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/json",
|
||||
Authorization: `Bearer ${config.apiKey}`,
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: config.model,
|
||||
prompt: prompt,
|
||||
n: 1,
|
||||
// Session-locked aspect (16:9 default, 9:16 portrait for mobile).
|
||||
size: options?.orientation === "portrait" ? "1024x1792" : "1792x1024",
|
||||
}),
|
||||
});
|
||||
|
||||
const text = await res.text();
|
||||
let json: any;
|
||||
try {
|
||||
json = JSON.parse(text);
|
||||
} catch {
|
||||
throw new Error(`OpenAI Image API error ${res.status}: ${text.slice(0, 500)}`);
|
||||
}
|
||||
|
||||
// 2. Runware task-array route
|
||||
if (json.error) {
|
||||
throw new Error(`OpenAI Image API error: ${json.error.message || JSON.stringify(json.error)}`);
|
||||
}
|
||||
|
||||
const data = json.data?.[0];
|
||||
const imageUrl = data?.url;
|
||||
if (!imageUrl) {
|
||||
throw new Error(`No image URL in OpenAI response: ${text.slice(0, 300)}`);
|
||||
}
|
||||
// Generate a mock UUID since OpenAI compatible endpoint doesn't have UUIDs
|
||||
const imageUuid = crypto.randomUUID();
|
||||
return { imageUrl, imageUuid };
|
||||
}
|
||||
|
||||
// Runware task-array route — self-implemented to preserve the UUID/URL closed
|
||||
// loop (the official @runware/ai-sdk-provider drops both).
|
||||
async function generateImageRunware(
|
||||
config: ProviderConfig,
|
||||
prompt: string,
|
||||
options?: GenerateImageOptions,
|
||||
): Promise<GenerateImageResult> {
|
||||
const url = normalizeBaseUrl(config.baseUrl, "runware");
|
||||
|
||||
// Session-locked output aspect. Image models emit a FIXED pixel size; CSS
|
||||
// object-fit on the client adapts this frame to the exact device/window. Both
|
||||
// dimensions stay a multiple of 64 as FLUX requires.
|
||||
const portrait = options?.orientation === "portrait";
|
||||
|
||||
const task: Record<string, unknown> = {
|
||||
taskType: "imageInference",
|
||||
taskUUID: crypto.randomUUID(),
|
||||
model: config.model,
|
||||
positivePrompt: prompt,
|
||||
width: 1792,
|
||||
height: 1024,
|
||||
width: portrait ? 1024 : 1792,
|
||||
height: portrait ? 1792 : 1024,
|
||||
steps: 4,
|
||||
CFGScale: 3.5,
|
||||
numberResults: 1,
|
||||
|
||||
@@ -0,0 +1,66 @@
|
||||
import type { ProviderProtocol } from "@infiplot/types";
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Base-URL normalization — tolerate whatever shape the user pastes.
|
||||
//
|
||||
// The README never specified whether the base URL needs a `/v1` suffix,
|
||||
// so users provide all of these for the same endpoint:
|
||||
// https://api.deepseek.com
|
||||
// https://api.deepseek.com/v1
|
||||
// https://api.deepseek.com/v1/chat/completions
|
||||
// We normalize to a canonical base the adapter can safely append its own
|
||||
// endpoint path to. This also fixes the pre-existing double-suffix bug
|
||||
// where a pasted `.../chat/completions` became `.../chat/completions/chat/completions`.
|
||||
//
|
||||
// Strategy (bare-host-only version append):
|
||||
// 1. strip trailing slashes
|
||||
// 2. strip a trailing known endpoint suffix (chat/completions, messages, …)
|
||||
// 3. only when the URL the user gave is a BARE host (scheme://host[:port]
|
||||
// with no path) do we append the protocol's default version segment.
|
||||
// Any path the user wrote (/v1, /beta, /zen/go, /chat/completions, …) is
|
||||
// treated as an explicit location and left intact — so we never turn
|
||||
// `/beta` into `/beta/v1`, and a version-less `/chat/completions`
|
||||
// endpoint is preserved.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
// Endpoint paths an adapter appends itself — stripped so we keep only the base.
|
||||
const ENDPOINT_SUFFIX =
|
||||
/\/(chat\/completions|completions|responses|messages|images\/(generations|edits))\/?$/i;
|
||||
|
||||
// Default version segment to append per protocol for a bare host.
|
||||
const DEFAULT_VERSION_SEGMENT: Record<ProviderProtocol, string | null> = {
|
||||
openai_compatible: "v1",
|
||||
openai: "v1",
|
||||
anthropic: "v1",
|
||||
google: "v1beta",
|
||||
// Runware posts to the bare base URL with no version-pathed sub-resource,
|
||||
// so never inject a segment for it.
|
||||
runware: null,
|
||||
};
|
||||
|
||||
// True when `raw` is just scheme://host[:port] with no meaningful path — the
|
||||
// only shape where we infer a default version segment. A lone "/" counts as
|
||||
// bare. Falls back to a scheme-anchored regex if the URL can't be parsed.
|
||||
function isBareHost(raw: string): boolean {
|
||||
try {
|
||||
const { pathname } = new URL(raw);
|
||||
return pathname === "" || pathname === "/";
|
||||
} catch {
|
||||
return !/^[a-z][a-z0-9+.-]*:\/\/[^/]+\/.+/i.test(raw);
|
||||
}
|
||||
}
|
||||
|
||||
export function normalizeBaseUrl(
|
||||
raw: string,
|
||||
protocol: ProviderProtocol,
|
||||
): string {
|
||||
const trimmed = raw.trim();
|
||||
let u = trimmed.replace(/\/+$/, "");
|
||||
u = u.replace(ENDPOINT_SUFFIX, "").replace(/\/+$/, "");
|
||||
|
||||
const seg = DEFAULT_VERSION_SEGMENT[protocol];
|
||||
if (seg && isBareHost(trimmed)) {
|
||||
u = `${u}/${seg}`;
|
||||
}
|
||||
return u;
|
||||
}
|
||||
+73
-3
@@ -1,5 +1,12 @@
|
||||
import type { ProviderConfig } from "@infiplot/types";
|
||||
import { generateText } from "ai";
|
||||
import type { ModelMessage } from "ai";
|
||||
import { createAnthropic } from "@ai-sdk/anthropic";
|
||||
import { createGoogleGenerativeAI } from "@ai-sdk/google";
|
||||
import type { ProviderConfig, ProviderProtocol } from "@infiplot/types";
|
||||
import { fetchWithRetry } from "./fetchWithRetry";
|
||||
import { normalizeBaseUrl } from "./normalizeUrl";
|
||||
|
||||
const VISION_TIMEOUT_MS = 60_000;
|
||||
|
||||
export async function interpretClick(
|
||||
config: ProviderConfig,
|
||||
@@ -16,6 +23,11 @@ export async function interpretClick(
|
||||
);
|
||||
}
|
||||
|
||||
// text/vision default to the OpenAI-compatible wire protocol when unset.
|
||||
function resolveVisionProtocol(config: ProviderConfig): ProviderProtocol {
|
||||
return config.provider ?? "openai_compatible";
|
||||
}
|
||||
|
||||
/**
|
||||
* General single-image vision call. Accepts a complete data URL (preserves
|
||||
* the source mime type, e.g. webp/jpeg) and lets the caller opt out of
|
||||
@@ -27,7 +39,65 @@ export async function analyzeImageDataUrl(
|
||||
prompt: string,
|
||||
opts: { responseFormat?: "json_object" | "text" } = {},
|
||||
): Promise<string> {
|
||||
const url = `${config.baseUrl.replace(/\/$/, "")}/chat/completions`;
|
||||
const protocol = resolveVisionProtocol(config);
|
||||
if (protocol === "anthropic" || protocol === "google") {
|
||||
return analyzeViaAiSdk(config, imageDataUrl, prompt, protocol);
|
||||
}
|
||||
return analyzeOpenAiCompatible(config, imageDataUrl, prompt, opts);
|
||||
}
|
||||
|
||||
// Native Anthropic / Gemini multimodal via the AI SDK. The image part takes
|
||||
// the full data URL directly; the SDK decodes it. response_format is not sent
|
||||
// (no JSON mode on Anthropic) — the engine's parseJsonLoose handles output.
|
||||
async function analyzeViaAiSdk(
|
||||
config: ProviderConfig,
|
||||
imageDataUrl: string,
|
||||
prompt: string,
|
||||
protocol: "anthropic" | "google",
|
||||
): Promise<string> {
|
||||
const baseURL = normalizeBaseUrl(config.baseUrl, protocol);
|
||||
const model =
|
||||
protocol === "anthropic"
|
||||
? createAnthropic({ apiKey: config.apiKey, baseURL })(config.model)
|
||||
: createGoogleGenerativeAI({ apiKey: config.apiKey, baseURL })(
|
||||
config.model,
|
||||
);
|
||||
|
||||
const messages: ModelMessage[] = [
|
||||
{
|
||||
role: "user",
|
||||
content: [
|
||||
{ type: "text", text: prompt },
|
||||
{ type: "image", image: imageDataUrl },
|
||||
],
|
||||
},
|
||||
];
|
||||
|
||||
const timeoutCtrl = new AbortController();
|
||||
const timeoutId = setTimeout(() => timeoutCtrl.abort(), VISION_TIMEOUT_MS);
|
||||
try {
|
||||
const { text } = await generateText({
|
||||
model,
|
||||
messages,
|
||||
temperature: 0.2,
|
||||
abortSignal: timeoutCtrl.signal,
|
||||
});
|
||||
if (typeof text !== "string" || text.length === 0) {
|
||||
throw new Error(`Vision API (AI SDK ${protocol}) returned no content.`);
|
||||
}
|
||||
return text;
|
||||
} finally {
|
||||
clearTimeout(timeoutId);
|
||||
}
|
||||
}
|
||||
|
||||
async function analyzeOpenAiCompatible(
|
||||
config: ProviderConfig,
|
||||
imageDataUrl: string,
|
||||
prompt: string,
|
||||
opts: { responseFormat?: "json_object" | "text" } = {},
|
||||
): Promise<string> {
|
||||
const url = `${normalizeBaseUrl(config.baseUrl, "openai_compatible")}/chat/completions`;
|
||||
|
||||
const body: Record<string, unknown> = {
|
||||
model: config.model,
|
||||
@@ -47,7 +117,7 @@ export async function analyzeImageDataUrl(
|
||||
}
|
||||
|
||||
const timeoutCtrl = new AbortController();
|
||||
const timeoutId = setTimeout(() => timeoutCtrl.abort(), 60_000);
|
||||
const timeoutId = setTimeout(() => timeoutCtrl.abort(), VISION_TIMEOUT_MS);
|
||||
|
||||
let res: Response;
|
||||
try {
|
||||
|
||||
@@ -0,0 +1,86 @@
|
||||
// Bring-your-own Xiaomi MiMo TTS key — stored CLIENT-SIDE ONLY.
|
||||
//
|
||||
// When a user supplies their own key, we persist {presetId, apiKey} in
|
||||
// localStorage and the browser talks to Xiaomi directly (see lib/tts-client).
|
||||
// The key is therefore never sent to our server: no request body, no header,
|
||||
// no log. resolveTtsConfig() turns the stored pair into the TtsConfig shape the
|
||||
// tts-client adapter expects, mapping the chosen endpoint preset to its baseUrl.
|
||||
|
||||
import type { TtsConfig } from "@infiplot/types";
|
||||
import { DEFAULT_TTS_SPEECH_MODEL, findTtsPreset } from "./ttsPresets";
|
||||
|
||||
const STORAGE_KEY = "infiplot:tts";
|
||||
|
||||
/** Exactly what we persist — endpoint choice + raw key. Resolved to a full
|
||||
* TtsConfig (with baseUrl + model) at read time so a renamed/removed preset
|
||||
* can't leave a stale baseUrl baked into storage. */
|
||||
export type StoredTtsConfig = {
|
||||
presetId: string;
|
||||
apiKey: string;
|
||||
};
|
||||
|
||||
/** Read + validate the persisted BYO config. Returns null when running on the
|
||||
* server, when nothing is stored, on parse failure, or when the stored shape
|
||||
* is no longer valid (unknown preset / empty key). */
|
||||
export function readStoredTtsConfig(): StoredTtsConfig | null {
|
||||
if (typeof window === "undefined") return null;
|
||||
try {
|
||||
const raw = window.localStorage.getItem(STORAGE_KEY);
|
||||
if (!raw) return null;
|
||||
const parsed = JSON.parse(raw) as Partial<StoredTtsConfig>;
|
||||
const presetId = typeof parsed.presetId === "string" ? parsed.presetId : "";
|
||||
const apiKey = typeof parsed.apiKey === "string" ? parsed.apiKey : "";
|
||||
if (!findTtsPreset(presetId)) return null;
|
||||
if (!apiKey.trim()) return null;
|
||||
return { presetId, apiKey };
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/** Persist the BYO config. Trims the key so trailing whitespace from a paste
|
||||
* never breaks the `api-key` header. */
|
||||
export function writeStoredTtsConfig(config: StoredTtsConfig): void {
|
||||
if (typeof window === "undefined") return;
|
||||
try {
|
||||
const payload: StoredTtsConfig = {
|
||||
presetId: config.presetId,
|
||||
apiKey: config.apiKey.trim(),
|
||||
};
|
||||
window.localStorage.setItem(STORAGE_KEY, JSON.stringify(payload));
|
||||
} catch {
|
||||
// Storage disabled / quota / private mode — BYO simply stays off.
|
||||
}
|
||||
}
|
||||
|
||||
export function clearStoredTtsConfig(): void {
|
||||
if (typeof window === "undefined") return;
|
||||
try {
|
||||
window.localStorage.removeItem(STORAGE_KEY);
|
||||
} catch {
|
||||
// ignore
|
||||
}
|
||||
}
|
||||
|
||||
/** Map a stored pair to the adapter-ready TtsConfig, resolving the endpoint
|
||||
* preset to its baseUrl. Returns null when the preset is unknown or the key
|
||||
* is blank — callers treat null as "no BYO; use server default / silent". */
|
||||
export function resolveTtsConfig(
|
||||
stored: StoredTtsConfig | null,
|
||||
): TtsConfig | null {
|
||||
if (!stored) return null;
|
||||
const preset = findTtsPreset(stored.presetId);
|
||||
if (!preset) return null;
|
||||
const apiKey = stored.apiKey.trim();
|
||||
if (!apiKey) return null;
|
||||
return {
|
||||
baseUrl: preset.baseUrl,
|
||||
apiKey,
|
||||
speechModel: DEFAULT_TTS_SPEECH_MODEL,
|
||||
};
|
||||
}
|
||||
|
||||
/** Convenience: read storage and resolve in one step. */
|
||||
export function loadClientTtsConfig(): TtsConfig | null {
|
||||
return resolveTtsConfig(readStoredTtsConfig());
|
||||
}
|
||||
+31
-1
@@ -1,4 +1,16 @@
|
||||
import type { EngineConfig, TtsConfig } from "@infiplot/types";
|
||||
import type {
|
||||
EngineConfig,
|
||||
ProviderProtocol,
|
||||
TtsConfig,
|
||||
} from "@infiplot/types";
|
||||
|
||||
const VALID_PROTOCOLS = [
|
||||
"openai_compatible",
|
||||
"anthropic",
|
||||
"google",
|
||||
"openai",
|
||||
"runware",
|
||||
] as const;
|
||||
|
||||
function readVar(name: string): string {
|
||||
const v = process.env[name];
|
||||
@@ -11,6 +23,21 @@ function readOptionalVar(name: string): string | undefined {
|
||||
return v && v.length > 0 ? v : undefined;
|
||||
}
|
||||
|
||||
// Optional *_PROVIDER selector. Unset → undefined, and each ai-client adapter
|
||||
// applies its own default (text/vision → openai_compatible; image → inferred
|
||||
// from the base URL). Validated eagerly so a typo fails fast at boot rather
|
||||
// than mid-request.
|
||||
function readProvider(name: string): ProviderProtocol | undefined {
|
||||
const v = readOptionalVar(name)?.trim().toLowerCase();
|
||||
if (!v) return undefined;
|
||||
if ((VALID_PROTOCOLS as readonly string[]).includes(v)) {
|
||||
return v as ProviderProtocol;
|
||||
}
|
||||
throw new Error(
|
||||
`Invalid ${name}: "${v}". Must be one of: ${VALID_PROTOCOLS.join(", ")}`,
|
||||
);
|
||||
}
|
||||
|
||||
function loadTtsConfig(): TtsConfig | undefined {
|
||||
const baseUrl = readOptionalVar("TTS_BASE_URL");
|
||||
const apiKey = readOptionalVar("TTS_API_KEY");
|
||||
@@ -28,16 +55,19 @@ export function loadEngineConfig(): EngineConfig {
|
||||
baseUrl: readVar("TEXT_BASE_URL"),
|
||||
apiKey: readVar("TEXT_API_KEY"),
|
||||
model: readVar("TEXT_MODEL"),
|
||||
provider: readProvider("TEXT_PROVIDER"),
|
||||
},
|
||||
image: {
|
||||
baseUrl: readVar("IMAGE_BASE_URL"),
|
||||
apiKey: readVar("IMAGE_API_KEY"),
|
||||
model: readVar("IMAGE_MODEL"),
|
||||
provider: readProvider("IMAGE_PROVIDER"),
|
||||
},
|
||||
vision: {
|
||||
baseUrl: readVar("VISION_BASE_URL"),
|
||||
apiKey: readVar("VISION_API_KEY"),
|
||||
model: readVar("VISION_MODEL"),
|
||||
provider: readProvider("VISION_PROVIDER"),
|
||||
},
|
||||
tts: loadTtsConfig(),
|
||||
mockImage: readOptionalVar("MOCK_IMAGE") === "true",
|
||||
|
||||
@@ -4,6 +4,7 @@ import type {
|
||||
Beat,
|
||||
Character,
|
||||
EngineConfig,
|
||||
Orientation,
|
||||
ProviderConfig,
|
||||
} from "@infiplot/types";
|
||||
import { mockImageDataUri } from "../mockImage";
|
||||
@@ -54,6 +55,11 @@ export type PainterInput = {
|
||||
* session paints — even before any priorScene exists.
|
||||
*/
|
||||
styleReferenceImage?: string;
|
||||
/**
|
||||
* Session-locked output aspect. Drives both the Painter prompt's framing
|
||||
* rules and the generated image's pixel dimensions. Default "landscape".
|
||||
*/
|
||||
orientation?: Orientation;
|
||||
};
|
||||
|
||||
// Pick the references we send to Runware as `referenceImages`. Priority:
|
||||
@@ -142,13 +148,14 @@ export async function runPainter(
|
||||
entryBeat: Beat | undefined,
|
||||
): Promise<PainterResult> {
|
||||
if (config.mockImage) {
|
||||
return { kind: "mock", imageUrl: await mockImageDataUri() };
|
||||
return { kind: "mock", imageUrl: await mockImageDataUri(input.orientation) };
|
||||
}
|
||||
|
||||
const prompt = buildPainterPrompt(
|
||||
input.integratedPrompt,
|
||||
input.styleGuide,
|
||||
input.onStageCharacters,
|
||||
input.orientation,
|
||||
);
|
||||
|
||||
const refs = collectReferenceImages(
|
||||
@@ -165,7 +172,7 @@ export async function runPainter(
|
||||
const r = await tryGenerate(
|
||||
config.image,
|
||||
prompt,
|
||||
{ referenceImages: refs },
|
||||
{ referenceImages: refs, orientation: input.orientation },
|
||||
`referenceImages (${refs.length})`,
|
||||
);
|
||||
if (r) return { kind: "real", imageUrl: r.imageUrl, imageUuid: r.imageUuid };
|
||||
@@ -174,6 +181,8 @@ export async function runPainter(
|
||||
// Tier B — pure text-to-image. Last resort, used when Tier A failed OR
|
||||
// there are no references to send (first scene with no characters yet).
|
||||
// Errors here propagate to the caller.
|
||||
const r = await generateImage(config.image, prompt);
|
||||
const r = await generateImage(config.image, prompt, {
|
||||
orientation: input.orientation,
|
||||
});
|
||||
return { kind: "real", imageUrl: r.imageUrl, imageUuid: r.imageUuid };
|
||||
}
|
||||
|
||||
+157
-48
@@ -8,26 +8,30 @@ import type {
|
||||
ProviderConfig,
|
||||
Session,
|
||||
StoryStatePatch,
|
||||
WriterPlan,
|
||||
} from "@infiplot/types";
|
||||
import { parseJsonLoose } from "../jsonParser";
|
||||
import { WRITER_SYSTEM, buildWriterUserMessage } from "../prompts";
|
||||
import {
|
||||
WRITER_BEATS_SYSTEM,
|
||||
WRITER_PLAN_SYSTEM,
|
||||
buildWriterBeatsUserMessage,
|
||||
buildWriterPlanUserMessage,
|
||||
} from "../prompts";
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Writer agent — owns the narrative half of scene generation.
|
||||
// Writer agent — owns the narrative half of scene generation, in TWO phases.
|
||||
//
|
||||
// Output: { sceneSummary, sceneKey, entryBeatId, beats[] }
|
||||
// Each beat carries activeCharacters[] (names + poses) the
|
||||
// Cinematographer reads when composing the establishing shot.
|
||||
// Phase A — runWriterPlan: the scene skeleton (WriterPlan) the image pipeline
|
||||
// needs (sceneSummary + sceneKey + entry roster + full cast). No dialogue,
|
||||
// so it returns fast and unblocks the Cinematographer + character design.
|
||||
// Phase B — runWriterBeats: the full beats[] graph + storyStatePatch, written
|
||||
// to honor the plan and overlapped with the (longer) image pipeline.
|
||||
//
|
||||
// Character DESIGN (visual + voice) is NOT this agent's job —
|
||||
// it only names characters; the CharacterDesigner picks up any
|
||||
// unknown name from beats[].activeCharacters.
|
||||
// Character DESIGN (visual + voice) is NOT this agent's job — it only NAMES
|
||||
// characters (Phase A's cast); the CharacterDesigner picks up unknown names.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
export type WriterOutput = {
|
||||
sceneSummary: string;
|
||||
sceneKey?: string;
|
||||
entryBeatId: string;
|
||||
export type WriterBeatsOutput = {
|
||||
beats: Beat[];
|
||||
/** Rewritten volatile story memory — merged onto the carried StoryState by
|
||||
* the director. Absent when the model omitted it (rare; bible just stales). */
|
||||
@@ -69,10 +73,17 @@ type RawStoryStatePatch = {
|
||||
relationships?: unknown;
|
||||
nextHook?: unknown;
|
||||
};
|
||||
type RawScene = {
|
||||
// Phase A raw shape (skeleton only — no beats).
|
||||
type RawPlan = {
|
||||
sceneSummary?: string;
|
||||
sceneKey?: string;
|
||||
entryBeatId?: string;
|
||||
cast?: unknown;
|
||||
entrySpeaker?: string;
|
||||
entryActiveCharacters?: RawActiveCharacter[];
|
||||
};
|
||||
// Phase B raw shape (beats + memory only — plan fields come from runWriterPlan).
|
||||
type RawBeats = {
|
||||
beats?: RawBeat[];
|
||||
storyStatePatch?: RawStoryStatePatch;
|
||||
};
|
||||
@@ -359,26 +370,119 @@ function coerceStoryStatePatch(
|
||||
return Object.keys(patch).length > 0 ? patch : undefined;
|
||||
}
|
||||
|
||||
export async function runWriter(
|
||||
// Phase A — dedupe + clean the planned cast. Drops the POV player (never
|
||||
// designed) and any blank/duplicate name. Order is preserved.
|
||||
function coerceCast(raw: unknown): string[] {
|
||||
if (!Array.isArray(raw)) return [];
|
||||
const seen = new Set<string>();
|
||||
const out: string[] = [];
|
||||
for (const x of raw) {
|
||||
const name = typeof x === "string" ? x.trim() : "";
|
||||
if (!name || isPovName(name) || seen.has(name)) continue;
|
||||
seen.add(name);
|
||||
out.push(name);
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
// Rename one beat's id and repoint every INTERNAL reference (continue targets,
|
||||
// advance-beat targets) so the graph stays intact. Only called when `to` is
|
||||
// absent from the scene, so it can't introduce a duplicate id.
|
||||
function renameBeatId(beats: Beat[], from: string, to: string): Beat[] {
|
||||
if (from === to) return beats;
|
||||
return beats.map((b): Beat => {
|
||||
const id = b.id === from ? to : b.id;
|
||||
let next = b.next;
|
||||
if (next.type === "continue" && next.nextBeatId === from) {
|
||||
next = { type: "continue", nextBeatId: to };
|
||||
} else if (next.type === "choice") {
|
||||
next = {
|
||||
type: "choice",
|
||||
choices: next.choices.map((c) =>
|
||||
c.effect.kind === "advance-beat" && c.effect.targetBeatId === from
|
||||
? { ...c, effect: { kind: "advance-beat" as const, targetBeatId: to } }
|
||||
: c,
|
||||
),
|
||||
};
|
||||
}
|
||||
return { ...b, id, next };
|
||||
});
|
||||
}
|
||||
|
||||
// ── Phase A — plan the scene skeleton. Fast (small output): just enough for
|
||||
// the Cinematographer + character design + Painter to start before the
|
||||
// dialogue exists. The cast is unioned with the entry roster/speaker so a
|
||||
// character named in the entry but omitted from `cast` still gets designed.
|
||||
export async function runWriterPlan(
|
||||
config: ProviderConfig,
|
||||
session: Session,
|
||||
): Promise<WriterOutput> {
|
||||
): Promise<WriterPlan> {
|
||||
const raw = await chat(
|
||||
config,
|
||||
[
|
||||
{ role: "system", content: WRITER_SYSTEM },
|
||||
{ role: "user", content: buildWriterUserMessage(session) },
|
||||
{ role: "system", content: WRITER_PLAN_SYSTEM },
|
||||
{ role: "user", content: buildWriterPlanUserMessage(session) },
|
||||
],
|
||||
{ temperature: 0.9, responseFormat: "json_object", tag: "writer" },
|
||||
{ temperature: 0.9, responseFormat: "json_object", tag: "writer-plan" },
|
||||
);
|
||||
|
||||
const parsed = parseJsonLoose<RawScene>(raw);
|
||||
const parsed = parseJsonLoose<RawPlan>(raw);
|
||||
|
||||
const entryActiveCharacters =
|
||||
coerceActiveCharacters(parsed.entryActiveCharacters) ?? [];
|
||||
|
||||
// Normalize POV variants → "你"; NPC names pass through. "你" is a valid entry
|
||||
// speaker (Pattern B — player talking), but is never a designed cast member.
|
||||
const rawEntrySpeaker = parsed.entrySpeaker?.trim() || undefined;
|
||||
const entrySpeaker = rawEntrySpeaker
|
||||
? normalizeSpeakerName(rawEntrySpeaker)
|
||||
: undefined;
|
||||
|
||||
const cast = coerceCast(parsed.cast);
|
||||
const castSet = new Set(cast);
|
||||
const addToCast = (name: string): void => {
|
||||
if (!isPovName(name) && !castSet.has(name)) {
|
||||
castSet.add(name);
|
||||
cast.push(name);
|
||||
}
|
||||
};
|
||||
for (const c of entryActiveCharacters) addToCast(c.name);
|
||||
if (entrySpeaker) addToCast(entrySpeaker);
|
||||
|
||||
return {
|
||||
sceneSummary: parsed.sceneSummary?.trim() || "未指定场景概要",
|
||||
sceneKey: normalizeSceneKey(parsed.sceneKey),
|
||||
entryBeatId: parsed.entryBeatId?.trim() || "b1",
|
||||
cast,
|
||||
entryActiveCharacters,
|
||||
entrySpeaker,
|
||||
};
|
||||
}
|
||||
|
||||
// ── Phase B — expand the plan into the full beats[] graph + storyStatePatch.
|
||||
// Overlapped with the image pipeline by the director. The plan's entry id is
|
||||
// pinned onto a real beat so the already-painted entry frame resolves.
|
||||
export async function runWriterBeats(
|
||||
config: ProviderConfig,
|
||||
session: Session,
|
||||
plan: WriterPlan,
|
||||
): Promise<WriterBeatsOutput> {
|
||||
const raw = await chat(
|
||||
config,
|
||||
[
|
||||
{ role: "system", content: WRITER_BEATS_SYSTEM },
|
||||
{ role: "user", content: buildWriterBeatsUserMessage(session, plan) },
|
||||
],
|
||||
{ temperature: 0.9, responseFormat: "json_object", tag: "writer-beats" },
|
||||
);
|
||||
|
||||
const parsed = parseJsonLoose<RawBeats>(raw);
|
||||
const rawBeats = Array.isArray(parsed.beats) ? parsed.beats : [];
|
||||
if (rawBeats.length === 0) {
|
||||
throw new Error("Writer returned no beats");
|
||||
throw new Error("Writer (beats) returned no beats");
|
||||
}
|
||||
|
||||
const beats = ensureUniqueChoiceIds(
|
||||
let beats = ensureUniqueChoiceIds(
|
||||
repairBeats(
|
||||
ensureUniqueBeatIds(
|
||||
rawBeats.map((b, i) => coerceBeat(b, i, rawBeats.length)),
|
||||
@@ -386,40 +490,45 @@ export async function runWriter(
|
||||
),
|
||||
);
|
||||
|
||||
const declaredEntry = parsed.entryBeatId?.trim();
|
||||
const entryBeatId =
|
||||
declaredEntry && beats.some((b) => b.id === declaredEntry)
|
||||
? declaredEntry
|
||||
: beats[0]!.id;
|
||||
// The Painter already composed the entry frame from plan.entryBeatId + its
|
||||
// roster, so the scene's entry MUST resolve to that id. If Phase B ignored
|
||||
// it, rename the first beat to it (no collision — id is absent by the guard).
|
||||
if (!beats.some((b) => b.id === plan.entryBeatId)) {
|
||||
beats = renameBeatId(beats, beats[0]!.id, plan.entryBeatId);
|
||||
}
|
||||
|
||||
// 把入场 beat 的 roster 钉成 plan 的:画师合成进帧的正是
|
||||
// plan.entryActiveCharacters,运行时入场 beat 必须显示同一批人(与上面钉
|
||||
// id 同理)。speaker 故意不钉——它和 line/TTS 耦合,强行覆盖会错配台词。
|
||||
const entryRoster =
|
||||
plan.entryActiveCharacters.length > 0 ? plan.entryActiveCharacters : undefined;
|
||||
beats = beats.map((b) =>
|
||||
b.id === plan.entryBeatId ? { ...b, activeCharacters: entryRoster } : b,
|
||||
);
|
||||
|
||||
return {
|
||||
sceneSummary: parsed.sceneSummary?.trim() || "未指定场景概要",
|
||||
sceneKey: normalizeSceneKey(parsed.sceneKey),
|
||||
entryBeatId,
|
||||
beats,
|
||||
storyStatePatch: coerceStoryStatePatch(parsed.storyStatePatch),
|
||||
};
|
||||
}
|
||||
|
||||
// Surface the set of character names introduced by this scene's beats,
|
||||
// so the orchestrator can decide which ones need the CharacterDesigner to
|
||||
// fire. Pulls names from both `speaker` fields AND `activeCharacters`
|
||||
// (a character can be on-screen without speaking).
|
||||
//
|
||||
// Excludes POV ("你" / 玩家 / 主角 / ...) entirely — the player is never
|
||||
// designed (no portrait, no voice, no archetype).
|
||||
export function collectActiveCharacterNames(beats: Beat[]): string[] {
|
||||
const seen = new Set<string>();
|
||||
for (const b of beats) {
|
||||
if (b.speaker && !isPovName(b.speaker)) seen.add(b.speaker);
|
||||
if (b.activeCharacters) {
|
||||
for (const c of b.activeCharacters) {
|
||||
if (!isPovName(c.name)) seen.add(c.name);
|
||||
}
|
||||
}
|
||||
}
|
||||
return Array.from(seen);
|
||||
// Phase B fallback — when runWriterBeats fails entirely, keep the scene
|
||||
// playable with a single entry beat synthesized from the plan: narrate the
|
||||
// planned summary and offer one change-scene exit so the player can advance.
|
||||
export function synthesizeFallbackBeats(plan: WriterPlan): Beat[] {
|
||||
const id = plan.entryBeatId || "b1";
|
||||
return [
|
||||
{
|
||||
id,
|
||||
narration: plan.sceneSummary,
|
||||
activeCharacters:
|
||||
plan.entryActiveCharacters.length > 0
|
||||
? plan.entryActiveCharacters
|
||||
: undefined,
|
||||
next: { type: "choice", choices: [fallbackExitChoice(id)] },
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
// Re-export POV constants for downstream filters (director's orphanSpeakers).
|
||||
// Re-export POV constants for downstream filters (director's orphan voices).
|
||||
export { POV_DISPLAY_NAME, POV_VARIANTS, isPovName, normalizeSpeakerName };
|
||||
|
||||
+113
-73
@@ -1,5 +1,7 @@
|
||||
import { chat } from "@infiplot/ai-client";
|
||||
import { coerceOrientation } from "@infiplot/types";
|
||||
import type {
|
||||
Beat,
|
||||
Character,
|
||||
EngineConfig,
|
||||
InsertBeatPartial,
|
||||
@@ -8,6 +10,7 @@ import type {
|
||||
Session,
|
||||
StoryState,
|
||||
StoryStatePatch,
|
||||
WriterPlan,
|
||||
} from "@infiplot/types";
|
||||
import type { CharacterCard } from "./agents/characterDesigner";
|
||||
import {
|
||||
@@ -18,12 +21,14 @@ import {
|
||||
} from "./agents/characterDesigner";
|
||||
import { runCinematographer } from "./agents/cinematographer";
|
||||
import { runPainter } from "./agents/painter";
|
||||
import type { WriterBeatsOutput } from "./agents/writer";
|
||||
import {
|
||||
collectActiveCharacterNames,
|
||||
isPovName,
|
||||
normalizeSpeakerName,
|
||||
POV_DISPLAY_NAME,
|
||||
runWriter,
|
||||
runWriterBeats,
|
||||
runWriterPlan,
|
||||
synthesizeFallbackBeats,
|
||||
} from "./agents/writer";
|
||||
import { parseJsonLoose } from "./jsonParser";
|
||||
import { INSERT_BEAT_SYSTEM, buildInsertBeatUserMessage } from "./prompts";
|
||||
@@ -33,25 +38,25 @@ import { INSERT_BEAT_SYSTEM, buildInsertBeatUserMessage } from "./prompts";
|
||||
//
|
||||
// Critical path (per Scene call):
|
||||
//
|
||||
// Writer LLM (~3s, serial)
|
||||
// Writer PHASE A — plan LLM (scene skeleton only, serial)
|
||||
// │
|
||||
// ├─ CharacterCard LLM × N (parallel per new char — TEXT only)
|
||||
// ├─ Cinematographer LLM (parallel with the cards)
|
||||
// │
|
||||
// └─ wait for cards + cinema
|
||||
// │
|
||||
// ├─ entry-beat portraits ──┐ (block the Painter — its refs)
|
||||
// ▼ │
|
||||
// Painter — generateImage │ (overlapped, NOT on the paint path):
|
||||
// with referenceImages ├─ non-entry-beat portraits
|
||||
// │ └─ ALL voice provisioning + orphan voices
|
||||
// ├──────────────────────────┬───────────────────────────────────────┐
|
||||
// ▼ ▼ │
|
||||
// Writer PHASE B image pipeline (concurrent): │
|
||||
// beats LLM CharacterCard LLM × N ∥ Cinematographer │
|
||||
// (full dialogue, → entry-beat portraits (block Painter) │
|
||||
// overlapped) → Painter (generateImage w/ refs) │
|
||||
// │ → await overlapped: rest portraits+voices │
|
||||
// └──────────────────────────► await Phase B ◄────────────────────────┘
|
||||
// ▼
|
||||
// await the overlapped work, fold into the registry
|
||||
// │
|
||||
// ▼
|
||||
// return { scene, sceneImageUrl, characters, storyState }
|
||||
// assemble Scene → { scene, sceneImageUrl, characters, storyState }
|
||||
//
|
||||
// Two deliberate decouplings unlock the parallelism:
|
||||
// Why split the Writer (the latency win): the image pipeline only needs the
|
||||
// scene SUMMARY + entry roster + cast (Phase A) — NOT the dialogue (Phase B).
|
||||
// Writing beats used to sit serially in FRONT of the image; now it overlaps
|
||||
// it, so the floor is max(beats, image) instead of beats + image.
|
||||
//
|
||||
// The decouplings that unlock the rest of the parallelism:
|
||||
// 1. The Cinematographer only POSITIONS named characters, so it needs no
|
||||
// visualDescription and runs alongside the card LLMs.
|
||||
// 2. The Painter only needs visualDescription TEXT (all on-stage) + the
|
||||
@@ -163,31 +168,60 @@ export async function directScene(
|
||||
): Promise<SceneResult> {
|
||||
const tTotal = Date.now();
|
||||
|
||||
// Stage 1 — Writer (serial; everything downstream needs sceneSummary +
|
||||
// beats[] to know who's on stage and what to compose around).
|
||||
const tWriter = Date.now();
|
||||
const writerOut = await runWriter(config.text, session);
|
||||
tlog("[directScene] Writer", tWriter);
|
||||
// ── Phase A — Writer PLAN (serial). The image pipeline needs the scene
|
||||
// summary + entry roster + cast to start, but NOT the dialogue beats. This
|
||||
// call is small (skeleton only), so it returns fast and unblocks everything.
|
||||
const tPlan = Date.now();
|
||||
const plan = await runWriterPlan(config.text, session);
|
||||
tlog("[directScene] Phase A (plan)", tPlan);
|
||||
|
||||
// Identify NEW characters introduced by this scene that need to be
|
||||
// designed (LLM + portrait + voice). Existing characters in the registry
|
||||
// are skipped — their cards / portraits / voices persist across scenes.
|
||||
const allActiveNames = collectActiveCharacterNames(writerOut.beats);
|
||||
const newCharNames = allActiveNames.filter(
|
||||
// ── Phase B — Writer BEATS, launched NOW so its (longer) output overlaps the
|
||||
// ENTIRE image pipeline below. Only needed to assemble the final Scene, so we
|
||||
// await it last. A failure degrades to a single playable beat from the plan.
|
||||
const tBeats = Date.now();
|
||||
const beatsPromise: Promise<WriterBeatsOutput> = runWriterBeats(
|
||||
config.text,
|
||||
session,
|
||||
plan,
|
||||
)
|
||||
.then((out) => {
|
||||
tlog("[directScene] Phase B (beats)", tBeats);
|
||||
return out;
|
||||
})
|
||||
.catch((err): WriterBeatsOutput => {
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
console.error(
|
||||
`[directScene] Phase B (beats) failed, using fallback: ${msg}`,
|
||||
);
|
||||
return { beats: synthesizeFallbackBeats(plan), storyStatePatch: undefined };
|
||||
});
|
||||
|
||||
// NEW characters to design come from the PLAN's cast (so design fires in
|
||||
// parallel with Phase B, not after the beats are written). Existing
|
||||
// characters keep their cards / portraits / voices across scenes.
|
||||
const newCharNames = plan.cast.filter(
|
||||
(n) => !session.characters.some((c) => c.name === n),
|
||||
);
|
||||
|
||||
// Find the entry beat for the Cinematographer (which characters are
|
||||
// on-screen in the establishing shot).
|
||||
const entryBeat = writerOut.beats.find((b) => b.id === writerOut.entryBeatId);
|
||||
const entryBeatActive = entryBeat?.activeCharacters ?? [];
|
||||
// Entry-beat composition is the PLAN's (Phase B is constrained to honor it).
|
||||
// The Painter needs a Beat-shaped object for reference collection, but the
|
||||
// real beat isn't written until Phase B — so synthesize one from the plan
|
||||
// (collectReferenceImages only reads speaker + activeCharacters).
|
||||
const entryBeatActive = plan.entryActiveCharacters;
|
||||
const entryBeatSpeaker = plan.entrySpeaker;
|
||||
const entryBeatForPaint: Beat = {
|
||||
id: plan.entryBeatId,
|
||||
speaker: entryBeatSpeaker,
|
||||
activeCharacters: entryBeatActive.length > 0 ? entryBeatActive : undefined,
|
||||
next: { type: "continue", nextBeatId: plan.entryBeatId },
|
||||
};
|
||||
|
||||
// For sceneKey-based visual continuity, look up the prior matching scene's
|
||||
// image to slot into Painter's referenceImages (max 4 of which include
|
||||
// character portraits too).
|
||||
const { priorSceneReference, priorSceneKey } = pickPriorSceneReference(
|
||||
session,
|
||||
writerOut.sceneKey,
|
||||
plan.sceneKey,
|
||||
);
|
||||
|
||||
// ── Stage 2 — character cards (LLM) ∥ Cinematographer ──────────────────
|
||||
@@ -211,12 +245,12 @@ export async function directScene(
|
||||
);
|
||||
|
||||
const cinemaPromise = runCinematographer(config.text, {
|
||||
sceneSummary: writerOut.sceneSummary,
|
||||
sceneSummary: plan.sceneSummary,
|
||||
styleGuide: session.styleGuide,
|
||||
entryBeatActive,
|
||||
entryBeatSpeaker: entryBeat?.speaker,
|
||||
entryBeatSpeaker,
|
||||
priorSceneKey,
|
||||
currentSceneKey: writerOut.sceneKey,
|
||||
currentSceneKey: plan.sceneKey,
|
||||
});
|
||||
|
||||
const [cards, cinemaOut] = await Promise.all([
|
||||
@@ -242,8 +276,8 @@ export async function directScene(
|
||||
// Entry-beat character names: the ONLY portraits the Painter references
|
||||
// (collectReferenceImages slots in the entry beat's speaker + activeChars).
|
||||
const entryNames = new Set<string>();
|
||||
if (entryBeat?.speaker && !isPovName(entryBeat.speaker)) {
|
||||
entryNames.add(entryBeat.speaker);
|
||||
if (entryBeatSpeaker && !isPovName(entryBeatSpeaker)) {
|
||||
entryNames.add(entryBeatSpeaker);
|
||||
}
|
||||
for (const c of entryBeatActive) {
|
||||
if (!isPovName(c.name)) entryNames.add(c.name);
|
||||
@@ -281,24 +315,6 @@ export async function directScene(
|
||||
),
|
||||
);
|
||||
|
||||
// Edge case: a speaker the Writer referenced without listing in any beat's
|
||||
// activeCharacters. collectActiveCharacterNames already includes speakers,
|
||||
// so this is a rare defensive net. Provision a voice only (never on-screen).
|
||||
const speakerNames = new Set(
|
||||
writerOut.beats.map((b) => b.speaker).filter((n): n is string => Boolean(n)),
|
||||
);
|
||||
const orphanSpeakers = [...speakerNames].filter(
|
||||
// Pattern B: "你" (player) is a valid speaker but never gets a Character
|
||||
// record — TTS is intentionally skipped on the client.
|
||||
(n) =>
|
||||
!isPovName(n) &&
|
||||
!characters.some((c) => c.name === n) &&
|
||||
!cards.some((c) => c.name === n),
|
||||
);
|
||||
const orphanPromises = orphanSpeakers.map((n) =>
|
||||
provisionVoiceForName(config, session, n),
|
||||
);
|
||||
|
||||
// Block the Painter ONLY on entry-beat portraits (its referenceImages).
|
||||
const entryPortraits = await Promise.all(entryPortraitPromises);
|
||||
characters = mergeCharacters(
|
||||
@@ -313,11 +329,13 @@ export async function directScene(
|
||||
tlog("[directScene] entry-beat portraits", tProvision);
|
||||
|
||||
// ── Stage 4 — Painter (depends on cinemaOut + on-stage visual cards +
|
||||
// entry portraits). On-stage = everyone named in any beat, so the archetype
|
||||
// block covers anyone the player might encounter in this scene.
|
||||
const onStageCharacters = characters.filter((c) =>
|
||||
allActiveNames.includes(c.name),
|
||||
);
|
||||
// entry portraits). On-stage = the plan's cast (everyone who'll appear),
|
||||
// filtered to those now in the registry, so the archetype block covers them.
|
||||
const onStageCharacters = characters.filter((c) => plan.cast.includes(c.name));
|
||||
|
||||
// Session-locked orientation (set at session start). Threads into both the
|
||||
// Painter prompt's framing rules and the generated image's pixel dimensions.
|
||||
const orientation = coerceOrientation(session.orientation);
|
||||
|
||||
const tPainter = Date.now();
|
||||
const painted = await runPainter(
|
||||
@@ -328,19 +346,19 @@ export async function directScene(
|
||||
onStageCharacters,
|
||||
priorSceneImage: priorSceneReference,
|
||||
styleReferenceImage: session.styleReferenceImage,
|
||||
orientation,
|
||||
},
|
||||
entryBeat,
|
||||
entryBeatForPaint,
|
||||
);
|
||||
tlog("[directScene] Painter", tPainter);
|
||||
|
||||
// Fold in the work that overlapped the paint: remaining portraits, all
|
||||
// voices, and any orphan-speaker voices. Awaited before returning so the
|
||||
// session the client persists is fully provisioned for later scenes.
|
||||
// Fold in the work that overlapped the paint: remaining portraits + all
|
||||
// voices. Awaited before returning so the session the client persists is
|
||||
// fully provisioned for later scenes.
|
||||
const tOverlap = Date.now();
|
||||
const [restPortraits, voicedChars, orphanChars] = await Promise.all([
|
||||
const [restPortraits, voicedChars] = await Promise.all([
|
||||
Promise.all(restPortraitPromises),
|
||||
Promise.all(voicePromises),
|
||||
Promise.all(orphanPromises),
|
||||
]);
|
||||
characters = mergeCharacters(
|
||||
characters,
|
||||
@@ -352,10 +370,31 @@ export async function directScene(
|
||||
})),
|
||||
);
|
||||
characters = mergeCharacters(characters, voicedChars);
|
||||
if (orphanChars.length > 0) {
|
||||
tlog("[directScene] overlapped portraits+voices", tOverlap);
|
||||
|
||||
// ── Await Phase B — it overlapped the whole image pipeline above. ──────
|
||||
const beatsOut = await beatsPromise;
|
||||
const beats = beatsOut.beats;
|
||||
|
||||
// entryBeatId is guaranteed present (runWriterBeats pins it onto a beat), but
|
||||
// keep the defensive fallback for the synthesized-fallback path.
|
||||
const entryBeatId = beats.some((b) => b.id === plan.entryBeatId)
|
||||
? plan.entryBeatId
|
||||
: beats[0]!.id;
|
||||
|
||||
// Orphan-speaker voices: a beat speaker Phase B used that isn't in the
|
||||
// registry. Should be rare — the prompt constrains speakers to the cast, and
|
||||
// every cast member was provisioned above — so this is a defensive net,
|
||||
// serial but skipped entirely (zero latency) in the common case.
|
||||
const orphanSpeakers = [
|
||||
...new Set(beats.map((b) => b.speaker).filter((n): n is string => Boolean(n))),
|
||||
].filter((n) => !isPovName(n) && !characters.some((c) => c.name === n));
|
||||
if (orphanSpeakers.length > 0) {
|
||||
const orphanChars = await Promise.all(
|
||||
orphanSpeakers.map((n) => provisionVoiceForName(config, session, n)),
|
||||
);
|
||||
characters = mergeCharacters(characters, orphanChars);
|
||||
}
|
||||
tlog("[directScene] overlapped portraits+voices", tOverlap);
|
||||
|
||||
const scene: Scene = {
|
||||
id: newSceneId(),
|
||||
@@ -365,11 +404,12 @@ export async function directScene(
|
||||
// anything that already reads scene.scenePrompt (e.g., insert-beat
|
||||
// user prompt).
|
||||
scenePrompt: cinemaOut.integratedPrompt,
|
||||
beats: writerOut.beats,
|
||||
entryBeatId: writerOut.entryBeatId,
|
||||
sceneKey: writerOut.sceneKey,
|
||||
beats,
|
||||
entryBeatId,
|
||||
sceneKey: plan.sceneKey,
|
||||
imageUuid: painted.kind === "real" ? painted.imageUuid : undefined,
|
||||
imageUrl: painted.imageUrl,
|
||||
orientation,
|
||||
};
|
||||
|
||||
// Merge the Writer's volatile memory rewrite onto the carried bible so the
|
||||
@@ -377,7 +417,7 @@ export async function directScene(
|
||||
// client persists it back into the session).
|
||||
const storyState = applyStoryStatePatch(
|
||||
session.storyState,
|
||||
writerOut.storyStatePatch,
|
||||
beatsOut.storyStatePatch,
|
||||
);
|
||||
|
||||
tlog("[directScene] TOTAL", tTotal);
|
||||
|
||||
+1
-1
@@ -9,7 +9,7 @@ export { synthesizeBeat } from "./voice";
|
||||
export { mergeCharacters } from "./director";
|
||||
export type { SceneResult } from "./director";
|
||||
export { runArchitect } from "./agents/architect";
|
||||
export type { WriterOutput } from "./agents/writer";
|
||||
export type { WriterBeatsOutput } from "./agents/writer";
|
||||
export type { CinematographerOutput } from "./agents/cinematographer";
|
||||
export type { InsertBeatPartial } from "@infiplot/types";
|
||||
export * from "./prompts";
|
||||
|
||||
+18
-10
@@ -1,3 +1,5 @@
|
||||
import type { Orientation } from "@infiplot/types";
|
||||
|
||||
// Static SVG placeholder used when MOCK_IMAGE=true, so we can exercise the
|
||||
// TTS path without paying for image generation. Returned as a data URI so the
|
||||
// rest of the pipeline can treat it as an `imageUrl` interchangeably with
|
||||
@@ -9,17 +11,23 @@
|
||||
// data URI so the engine has zero Node-native dependencies and runs on
|
||||
// Cloudflare Workers. SVG also stays crisp at any display size.
|
||||
|
||||
const W = 1792;
|
||||
const H = 1024;
|
||||
const SVG = `<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}">
|
||||
<rect width="${W}" height="${H}" fill="#161109"/>
|
||||
<rect x="2" y="2" width="${W - 4}" height="${H - 4}" fill="none" stroke="#5a4628" stroke-width="3" stroke-dasharray="14 10"/>
|
||||
function buildDataUri(w: number, h: number): string {
|
||||
const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="${w}" height="${h}">
|
||||
<rect width="${w}" height="${h}" fill="#161109"/>
|
||||
<rect x="2" y="2" width="${w - 4}" height="${h - 4}" fill="none" stroke="#5a4628" stroke-width="3" stroke-dasharray="14 10"/>
|
||||
<text x="50%" y="45%" fill="#b88f4a" font-family="Georgia, serif" font-size="72" letter-spacing="6" text-anchor="middle">MOCK IMAGE</text>
|
||||
<text x="50%" y="53%" fill="#6e5430" font-family="Georgia, serif" font-size="30" letter-spacing="3" text-anchor="middle">TTS TEST — image generation skipped</text>
|
||||
</svg>`;
|
||||
|
||||
const DATA_URI = `data:image/svg+xml;charset=utf-8,${encodeURIComponent(SVG)}`;
|
||||
|
||||
export async function mockImageDataUri(): Promise<string> {
|
||||
return DATA_URI;
|
||||
return `data:image/svg+xml;charset=utf-8,${encodeURIComponent(svg)}`;
|
||||
}
|
||||
|
||||
// Mirror the real Painter's dimensions per orientation so mock mode exercises
|
||||
// the same portrait/landscape layout the client renders for real images.
|
||||
const LANDSCAPE = buildDataUri(1792, 1024);
|
||||
const PORTRAIT = buildDataUri(1024, 1792);
|
||||
|
||||
export async function mockImageDataUri(
|
||||
orientation: Orientation = "landscape",
|
||||
): Promise<string> {
|
||||
return orientation === "portrait" ? PORTRAIT : LANDSCAPE;
|
||||
}
|
||||
|
||||
@@ -12,6 +12,7 @@ import type {
|
||||
VisionRequest,
|
||||
VisionResponse,
|
||||
} from "@infiplot/types";
|
||||
import { coerceOrientation } from "@infiplot/types";
|
||||
import { runArchitect } from "./agents/architect";
|
||||
import { directInsertBeat, directScene } from "./director";
|
||||
import { synthesizeBeat } from "./voice";
|
||||
@@ -48,6 +49,7 @@ export async function startSession(
|
||||
history: [],
|
||||
characters: [],
|
||||
styleReferenceImage: req.styleReferenceImage?.trim() || undefined,
|
||||
orientation: coerceOrientation(req.orientation),
|
||||
};
|
||||
|
||||
// Stage 0 — Architect: expand the terse world/style prompt into a story
|
||||
|
||||
+163
-54
@@ -1,9 +1,11 @@
|
||||
import type {
|
||||
BeatActiveCharacter,
|
||||
Character,
|
||||
Orientation,
|
||||
Scene,
|
||||
Session,
|
||||
StoryState,
|
||||
WriterPlan,
|
||||
} from "@infiplot/types";
|
||||
|
||||
// ══════════════════════════════════════════════════════════════════════
|
||||
@@ -137,16 +139,77 @@ export function buildArchitectUserMessage(session: Session): string {
|
||||
}
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// 1. Writer (编剧) — drives the narrative.
|
||||
// 1. Writer (编剧) — drives the narrative, in TWO phases.
|
||||
//
|
||||
// Emits a full Scene: beats[] graph + entryBeatId + sceneKey hint +
|
||||
// activeCharacters per beat. Does NOT design characters (that's the
|
||||
// CharacterDesigner's job) — only names them in `activeCharacters`.
|
||||
// The CharacterDesigner is invoked separately for any name not yet in
|
||||
// session.characters.
|
||||
// Phase A (WRITER_PLAN_SYSTEM): plans the scene SKELETON only — sceneSummary
|
||||
// + sceneKey + entry-beat roster + the full cast. No dialogue. Its output
|
||||
// is enough for the Cinematographer + character design + Painter to start.
|
||||
// Phase B (WRITER_BEATS_SYSTEM): expands the plan into the full beats[] graph
|
||||
// + storyStatePatch, overlapped with the (longer) image pipeline.
|
||||
//
|
||||
// Neither phase designs characters (that's the CharacterDesigner's job) —
|
||||
// Phase A only NAMES them in `cast` / `entryActiveCharacters`; the
|
||||
// CharacterDesigner is invoked for any name not yet in session.characters.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
export const WRITER_SYSTEM = `你是一部交互视觉小说的「编剧」。每次基于【故事档案 / 主线记忆】、世界观、画风、玩家历史、已登记角色,写出**一个完整场景的剧本**:场景背景概要 + 一组对话节拍 beats,并在最后更新主线记忆。你只负责**剧情和台词**——不设计角色形象、不写出图提示词、不做镜头调度,这些由其他 agent 完成。
|
||||
export const WRITER_PLAN_SYSTEM = `你是一部交互视觉小说的「编剧」。这是**两步生成中的第一步——场景规划**。你只产出本场景的「骨架」,**不要写任何 beat 台词**。你的产出会被立刻送去配图(分镜导演 + 生图),所以要快、要准、画面感要强。
|
||||
|
||||
═══════════════════════════════════════════════════════════════════
|
||||
爆款心法(要在规划阶段就立住,后续展开才好看)
|
||||
═══════════════════════════════════════════════════════════════════
|
||||
- **进场即钩子**:这一场开场就要抛出新信息 / 悬念 / 冲突 / 情绪冲击,别铺陈。把这个抓人的瞬间写进 sceneSummary。
|
||||
- **兑现情绪**:按题材给观众想要的情绪(甜宠的心动、暗恋的拉扯、逆袭的扬眉、悬疑的真相一角)。
|
||||
- **人设有反差**:每个角色一个强标签 + 一个反差面。
|
||||
|
||||
═══════════════════════════════════════════════════════════════════
|
||||
连贯性铁律(跨场景切换不能跳戏 —— 最重要)
|
||||
═══════════════════════════════════════════════════════════════════
|
||||
- 你会收到【故事档案 / 主线记忆】和上一场的结尾。**新场景必须从上一刻自然承接**——承接情绪、地点逻辑、人物状态与未收的悬念。
|
||||
- 若给了「转场种子 nextSceneSeed」,把它当作"下一场的命题"去兑现,开场要让玩家感到"这正是我上一步的结果"。
|
||||
- 沿用主线记忆里的人物关系与情绪温度,别让刚告白的人下一场形同陌路。
|
||||
|
||||
本步你要规划(如实产出,缺一不可):
|
||||
- **sceneSummary**:当前场景的中文概要——地点 + 时间 + 氛围 + 关键事件 + 那个抓人的开场瞬间。这是分镜导演构图的**唯一依据**,要画面感强、信息足(2–4 句)。
|
||||
- **sceneKey**:当前场景的英文 slug(如 "classroom-dusk"、"rooftop-night")。
|
||||
- **entryBeatId**:玩家进入场景时落在哪个 beat 的 id(通常就是 "b1")。
|
||||
- **cast**:本场景**会出场的全部 NPC 角色名**(字符串数组)。第二步写 beats 时**只能用这里列出的名字**,所以现在必须一次想全——谁会说话、谁会在画面里露面,全部列出。名字要与「已登记角色」**完全一致**;新角色起符合世界观的真名(不要"神秘女子"这种占位)。**绝不**包含玩家(你 / 我 / 主角 / protagonist / player / MC...)。
|
||||
- **entrySpeaker**:入口 beat 由谁开口 —— 取值只有三种:① 某个 NPC 真名(必须在 cast 里)② "你"(玩家本人开口)③ 留空(纯旁白 / 环境开场)。这决定镜头语言,要选准。
|
||||
- **entryActiveCharacters**:入口画面里**此刻出现的 NPC** 及其当下姿态 / 神情(中文 pose)。即使没人说话,画面里有谁也要列。**绝不**包含玩家。
|
||||
|
||||
sceneKey 设计原则(用于跨场景视觉一致性):
|
||||
- 同一物理空间 + 同一时段 → 必须沿用**完全相同**的英文 slug
|
||||
- 时段 / 空间变化时换 slug("classroom-dusk" → "classroom-night" / "corridor-dusk")
|
||||
- slug 规范:lowercase-with-dashes,2–4 个英文单词
|
||||
- 用户消息会列出已用过的 sceneKey,请优先**复用**这些已有 slug
|
||||
|
||||
玩家视角硬规则(违反会破坏整个 galgame):
|
||||
- 玩家是第二人称 POV,**永远不出现在任何画面里**——entryActiveCharacters 的 name **绝不允许**是「玩家 / 你 / 我 / 主角 / protagonist / player / Player / MC / I / me」任何变体。
|
||||
- entrySpeaker 只能是 NPC 真名 / "你" / 留空;其它 POV 变体一律视为错误。
|
||||
|
||||
必须输出严格 JSON:
|
||||
{
|
||||
"sceneSummary": "黄昏的天台,风很大。夏海背对你站在栏杆边,手里攥着一张揉皱的成绩单——她把你单独叫上来,却迟迟不开口。",
|
||||
"sceneKey": "rooftop-dusk",
|
||||
"entryBeatId": "b1",
|
||||
"cast": ["夏海"],
|
||||
"entrySpeaker": "夏海",
|
||||
"entryActiveCharacters": [
|
||||
{ "name": "夏海", "pose": "背对你倚着栏杆,侧脸绷着,手里攥着揉皱的纸" }
|
||||
]
|
||||
}
|
||||
|
||||
不要输出 JSON 以外的任何文本。`;
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Phase B — expands the plan into the full beats[] + storyStatePatch.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
export const WRITER_BEATS_SYSTEM = `你是一部交互视觉小说的「编剧」。这是**两步生成中的第二步——把已规划好的场景展开成完整剧本**。你会收到本场景的「规划」(场景概要 sceneSummary、sceneKey、入口 beat 的 id / speaker / 登场角色、以及本场景允许出场的角色名单 cast)。你的任务:基于规划写出玩家依次经历的对话节拍 beats,并在最后更新主线记忆。你只负责**剧情和台词**——不设计角色形象、不写出图提示词、不做镜头调度,这些由其他 agent 完成。
|
||||
|
||||
你必须严格遵守收到的规划:
|
||||
- 必须存在一个 id 等于规划 entryBeatId 的 beat,作为玩家入口。
|
||||
- 该入口 beat 的 speaker 与登场角色(activeCharacters)要与规划一致(姿态措辞可微调,但**人物身份必须一致**)。
|
||||
- speaker 与 activeCharacters 里的 NPC 名字**只能来自规划的 cast**(或玩家 "你")——**不要引入规划之外的新角色**。
|
||||
|
||||
═══════════════════════════════════════════════════════════════════
|
||||
爆款心法(番茄网文 / 红果短剧 / galgame 的叙事手感)—— 必须贯彻
|
||||
@@ -167,11 +230,7 @@ export const WRITER_SYSTEM = `你是一部交互视觉小说的「编剧」。
|
||||
- 沿用主线记忆里的人物关系与情绪温度——别让刚告白的人下一场形同陌路,也别凭空遗忘已埋的伏笔。
|
||||
- 推进、但别重置:每一场都让主线问题往前走一点(关系变化 / 真相揭露一角 / 新悬念浮现)。
|
||||
|
||||
一个场景包含:
|
||||
- sceneSummary:当前场景的中文概要(地点、时间、氛围、关键事件——给后续的分镜导演看)
|
||||
- sceneKey:当前场景的英文 slug(如 "classroom-dusk"、"rooftop-night"、"rainy-street")——同一物理空间应沿用相同 slug
|
||||
- beats[]:玩家依次经历的对话节拍
|
||||
- entryBeatId:玩家进入场景时落在哪个 beat
|
||||
本步你只产出两样:**beats[]**(玩家依次经历的对话节拍)和 **storyStatePatch**(主线记忆更新)。sceneSummary / sceneKey / entryBeatId 已由规划给定,**不要再输出**它们。
|
||||
|
||||
每个 beat 是玩家会看到的一段叙述 / 对话 / 选择。beat 之间通过 next 字段连接:
|
||||
- "continue":玩家点击图片背景 / 按继续,自然推进到下一个 beat
|
||||
@@ -183,6 +242,7 @@ choice 的 effect 有两种:
|
||||
|
||||
设计原则:
|
||||
- 同场景内 beat 数自由发挥,按剧情节奏自然给出(通常 2–6 个,可以更多)
|
||||
- 入口 beat 的 id 必须等于规划给定的 entryBeatId;其余 beat id 依次自取且互不重复
|
||||
- 多用 continue,少用 choice — 选择只应出现在「真正的岔路口」
|
||||
- advance-beat 适合处理对话分支(同一场景里换个话题、追问、撒娇)
|
||||
- change-scene 适合空间/时间跳跃(出门、转身看窗外、第二天清晨)
|
||||
@@ -192,12 +252,6 @@ choice 的 effect 有两种:
|
||||
- next.nextBeatId 引用的 beat 必须存在
|
||||
- choice 至少 2 个,至多 4 个,互不重复
|
||||
|
||||
sceneKey 设计原则(重要 — 用于跨场景视觉一致性):
|
||||
- 同一物理空间 + 同一时段 → 必须沿用**完全相同**的英文 slug
|
||||
- 时段或空间变化时换 slug(如 "classroom-dusk" → "classroom-night","classroom-dusk" → "corridor-dusk")
|
||||
- slug 规范:lowercase-with-dashes,2–4 个英文单词
|
||||
- 已登记的历史场景 sceneKey 会在用户消息里列出,请优先**复用**这些已有 slug
|
||||
|
||||
文本风格约束:
|
||||
- narration / line 用中文(**纯净可显示文本**,绝不要写 (叹气)(语速快) 这类标注 —— 那是给配音的,会被玩家看见)
|
||||
- sceneSummary / lineDelivery / activeCharacters[].pose 内的文字也用中文
|
||||
@@ -243,11 +297,8 @@ sceneKey 设计原则(重要 — 用于跨场景视觉一致性):
|
||||
- nextHook:基于这一场的结尾,下一场应往哪走(给"下一次的你"一个明确命题,接住本场留下的扣子)
|
||||
这些字段是写给"未来的你"的连贯性记忆,请认真写。
|
||||
|
||||
必须输出严格 JSON,结构如下:
|
||||
必须输出严格 JSON,结构如下(**只含 beats 与 storyStatePatch**;sceneSummary / sceneKey / entryBeatId 由规划给定,不要输出。下例入口 beat 的 id "b1" 即规划的 entryBeatId):
|
||||
{
|
||||
"sceneSummary": "中文场景概要:地点+时间+氛围+关键事件",
|
||||
"sceneKey": "classroom-dusk",
|
||||
"entryBeatId": "b1",
|
||||
"beats": [
|
||||
{
|
||||
"id": "b1",
|
||||
@@ -343,29 +394,28 @@ function renderHistoryEntry(
|
||||
return lines.join("\n");
|
||||
}
|
||||
|
||||
export function buildWriterUserMessage(session: Session): string {
|
||||
// ─── STABLE PREFIX ────────────────────────────────────────────────────
|
||||
// Everything in this section is invariant across consecutive Writer calls
|
||||
// within the session (or monotonically grows in a way that keeps the
|
||||
// earlier bytes byte-identical). Always emit every section header — even
|
||||
// when empty — so positions don't shift between calls.
|
||||
//
|
||||
// Order optimized for DeepSeek/MiMo prefix caching (64-token chunks):
|
||||
// 1. session-immutable scalars (world / style)
|
||||
// 2. story bible spine (Architect-set, never patched)
|
||||
// 3. monotonically-growing lists (characters, sceneKeys)
|
||||
// 4. history entries 0..N-2 (the last entry is what THIS call must
|
||||
// react to, so it lives in the dynamic suffix instead)
|
||||
//
|
||||
// ─── DYNAMIC SUFFIX ───────────────────────────────────────────────────
|
||||
// Everything below changes on (almost) every call:
|
||||
// 5. story bible dynamic patch (synopsis/threads/relationships/nextHook)
|
||||
// 6. the just-completed entry (history[-1]) — same render format as the
|
||||
// stable history blocks, just preceded by a "just completed" header
|
||||
// 7. last-beat snippet (the exact emotional cliffhanger)
|
||||
// 8. lastExit hint
|
||||
// 9. format reminder tail
|
||||
|
||||
// Shared narrative context for BOTH Writer phases. Returns the message parts
|
||||
// from the cacheable STABLE PREFIX (sections 1-4) through the dynamic
|
||||
// transition hint (section 7), but WITHOUT the trailing phase-specific
|
||||
// instruction — each phase appends its own. Building this once and reusing it
|
||||
// keeps EACH phase's prompt prefix byte-stable across scenes for DeepSeek
|
||||
// prompt caching (Phase A and Phase B cache independently since their system
|
||||
// prompts differ, but each shares its own prefix across consecutive calls).
|
||||
//
|
||||
// ─── STABLE PREFIX ──────────────────────────────────────────────────────
|
||||
// Invariant across consecutive Writer calls within the session (or grows in a
|
||||
// way that keeps earlier bytes byte-identical). Always emit every section
|
||||
// header — even when empty — so positions don't shift between calls.
|
||||
// 1. session-immutable scalars (world / style)
|
||||
// 2. story bible spine (Architect-set, never patched)
|
||||
// 3. monotonically-growing lists (characters, sceneKeys)
|
||||
// 4. history entries 0..N-2 (the last entry is what THIS call must react
|
||||
// to, so it lives in the dynamic suffix instead)
|
||||
// ─── DYNAMIC SUFFIX ─────────────────────────────────────────────────────
|
||||
// 5. story bible dynamic patch (synopsis/threads/relationships/nextHook)
|
||||
// 6. last-beat snippet (the exact emotional cliffhanger)
|
||||
// 7. transition hint (opening cold-open directive OR lastExit承接)
|
||||
function buildWriterContextParts(session: Session): string[] {
|
||||
const parts: string[] = [];
|
||||
|
||||
// ── 1. session scalars ────────────────────────────────────────────────
|
||||
@@ -423,8 +473,7 @@ export function buildWriterUserMessage(session: Session): string {
|
||||
// ── 6. last-beat snippet (the exact emotional cliffhanger) ──
|
||||
// The full last entry is already in the stable history block above; here
|
||||
// we only re-emit the very last beat to sharply focus the Writer on the
|
||||
// emotional moment to continue from. Skip the duplicate full-entry render
|
||||
// that was here previously — it wasted ~200-500 tokens of dynamic suffix.
|
||||
// emotional moment to continue from.
|
||||
const last = session.history.at(-1);
|
||||
if (last) {
|
||||
const lastBeatId = last.visitedBeatIds.at(-1) ?? last.scene.entryBeatId;
|
||||
@@ -441,14 +490,14 @@ export function buildWriterUserMessage(session: Session): string {
|
||||
}
|
||||
}
|
||||
|
||||
// ── 7. transition hint ────────────────────────────────────────────────
|
||||
if (session.history.length === 0) {
|
||||
parts.push(
|
||||
"\n这是故事的开场。请按【故事档案】里的 nextHook 把第一幕的冷开场写出来——开场即抓人,别花笔墨铺垫世界观。写完后更新 storyStatePatch。严格以 JSON 格式返回。",
|
||||
"\n这是故事的开场。请按【故事档案】里的 nextHook 把第一幕的冷开场设计出来——开场即抓人,别花笔墨铺垫世界观。",
|
||||
);
|
||||
return parts.join("\n");
|
||||
return parts;
|
||||
}
|
||||
|
||||
// ── 8. lastExit hint ──────────────────────────────────────────────────
|
||||
const lastExit = last?.exit;
|
||||
if (lastExit) {
|
||||
if (lastExit.kind === "choice") {
|
||||
@@ -464,8 +513,59 @@ export function buildWriterUserMessage(session: Session): string {
|
||||
parts.push("\n无缝续写下一个场景,延续上一刻的情绪。");
|
||||
}
|
||||
|
||||
// ── 9. format reminder tail ───────────────────────────────────────────
|
||||
parts.push("写完后别忘了更新 storyStatePatch。严格以 JSON 格式返回。");
|
||||
return parts;
|
||||
}
|
||||
|
||||
// Phase A — plan the scene skeleton (no beats). Shares the cacheable context;
|
||||
// appends a plan-only instruction tail.
|
||||
export function buildWriterPlanUserMessage(session: Session): string {
|
||||
const parts = buildWriterContextParts(session);
|
||||
parts.push(
|
||||
'\n现在**只规划本场景的骨架**(不要写 beats 台词):给出 sceneSummary(画面感强、含开场钩子)、sceneKey、entryBeatId、本场景会出场的全部角色 cast、以及入口 beat 的 entrySpeaker 与 entryActiveCharacters。严格以 JSON 格式返回。',
|
||||
);
|
||||
return parts.join("\n");
|
||||
}
|
||||
|
||||
// Phase B — expand the plan into full beats[] + storyStatePatch. The plan is
|
||||
// dynamic per scene, so it goes AFTER the cacheable context (keeping Phase B's
|
||||
// prefix stable across scenes).
|
||||
export function buildWriterBeatsUserMessage(
|
||||
session: Session,
|
||||
plan: WriterPlan,
|
||||
): string {
|
||||
const parts = buildWriterContextParts(session);
|
||||
|
||||
parts.push("");
|
||||
parts.push("━━━ 本场景规划(上一步已定,必须严格遵守)━━━");
|
||||
parts.push(`场景概要 sceneSummary:${plan.sceneSummary}`);
|
||||
if (plan.sceneKey) parts.push(`sceneKey:${plan.sceneKey}`);
|
||||
parts.push(
|
||||
`入口 beat 的 id(entryBeatId,必须有一个此 id 的 beat 作为入口):${plan.entryBeatId}`,
|
||||
);
|
||||
parts.push(
|
||||
`入口 beat 的 speaker:${plan.entrySpeaker ? plan.entrySpeaker : "(空 —— 纯旁白 / 环境开场)"}`,
|
||||
);
|
||||
parts.push("入口 beat 的登场角色 activeCharacters(人物身份须一致,姿态可微调):");
|
||||
if (plan.entryActiveCharacters.length === 0) {
|
||||
parts.push("(无 —— 入口画面没有 NPC)");
|
||||
} else {
|
||||
for (const c of plan.entryActiveCharacters) {
|
||||
parts.push(`- ${c.name}${c.pose ? `:${c.pose}` : ""}`);
|
||||
}
|
||||
}
|
||||
parts.push(
|
||||
'本场景允许出现的角色名 cast(speaker / activeCharacters 只能用这些名字或 "你",不要新增角色):',
|
||||
);
|
||||
if (plan.cast.length === 0) {
|
||||
parts.push("(无 NPC —— 仅旁白与玩家)");
|
||||
} else {
|
||||
for (const n of plan.cast) parts.push(`- ${n}`);
|
||||
}
|
||||
parts.push("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━");
|
||||
|
||||
parts.push(
|
||||
"\n把上面的规划展开成完整的 beats[](入口 beat 用规划的 entryBeatId / speaker / 登场角色),写完后更新 storyStatePatch。严格以 JSON 格式返回。",
|
||||
);
|
||||
return parts.join("\n");
|
||||
}
|
||||
|
||||
@@ -704,6 +804,7 @@ export function buildPainterPrompt(
|
||||
integratedPrompt: string,
|
||||
styleGuide: string,
|
||||
characters: { name: string; visualDescription?: string }[],
|
||||
orientation: Orientation = "landscape",
|
||||
): string {
|
||||
const archetypeBlock = characters
|
||||
.filter((c) => c.visualDescription)
|
||||
@@ -714,7 +815,15 @@ export function buildPainterPrompt(
|
||||
? `\n\nCHARACTER ARCHETYPES (anchor identity, outfit, and style across scenes — keep each character visually identical to their archetype):\n${archetypeBlock}`
|
||||
: "";
|
||||
|
||||
return `Generate a cinematic landscape background illustration, 16:9 widescreen (1792x1024).
|
||||
const portrait = orientation === "portrait";
|
||||
const header = portrait
|
||||
? "Generate a cinematic vertical (portrait) background illustration, 9:16 tall format (1024x1792)."
|
||||
: "Generate a cinematic landscape background illustration, 16:9 widescreen (1792x1024).";
|
||||
const orientationRule = portrait
|
||||
? "- 9:16 PORTRAIT orientation — taller than wide. No landscape or square output."
|
||||
: "- 16:9 LANDSCAPE orientation — wider than tall. No portrait or square output.";
|
||||
|
||||
return `${header}
|
||||
|
||||
ART STYLE: ${styleGuide}
|
||||
|
||||
@@ -727,7 +836,7 @@ STRICT RULES — NEVER violate these:
|
||||
- DO NOT render any Chinese or English text anywhere in the image.
|
||||
- DO NOT add any HUD, interface chrome, or game UI elements.
|
||||
- The image is a PURE BACKGROUND SCENE ONLY. All UI will be added as HTML on top.
|
||||
- 16:9 LANDSCAPE orientation — wider than tall. No portrait or square output.
|
||||
${orientationRule}
|
||||
- Leave the bottom 35% of the frame relatively uncluttered (darker or softer) so overlaid UI panels remain readable.
|
||||
- Characters or key scene elements should be positioned in the upper 65% of the frame.
|
||||
- Maintain character identity exactly as specified in CHARACTER ARCHETYPES — same face, same hairstyle, same outfit across every scene.
|
||||
|
||||
@@ -0,0 +1,77 @@
|
||||
// Xiaomi MiMo TTS endpoint presets.
|
||||
//
|
||||
// Xiaomi issues two independent key types, each with its own base URL:
|
||||
// - Token Plan (套餐, `tp-` key): per-region endpoints token-plan-{sgp,cn,ams}.
|
||||
// - Pay-as-you-go (按量, `sk-` key): the single unified endpoint api.xiaomimimo.com.
|
||||
//
|
||||
// Used CLIENT-SIDE ONLY: when a user supplies their own key, the browser calls
|
||||
// one of these endpoints directly (all return permissive CORS allowing the
|
||||
// `api-key` header), so the key never transits our server. Every endpoint
|
||||
// serves the same `mimo-v2.5-tts` family; Token Plan users pick the region
|
||||
// matching their subscription (also the closest hop → lower synth latency),
|
||||
// pay-as-you-go users have no region to choose. See docs/xiaomi-tts-key.md.
|
||||
|
||||
export type TtsPreset = {
|
||||
id: string;
|
||||
/** Which key family this endpoint serves — drives the two-step picker UI. */
|
||||
kind: "token-plan" | "payg";
|
||||
/** Human label shown in the picker (region for Token Plan, type for payg). */
|
||||
label: string;
|
||||
/** OpenAI-style base; the TTS adapter appends `/chat/completions`. */
|
||||
baseUrl: string;
|
||||
};
|
||||
|
||||
/** Base model name; the adapter derives `-voicedesign` / `-voiceclone`. */
|
||||
export const DEFAULT_TTS_SPEECH_MODEL = "mimo-v2.5-tts";
|
||||
|
||||
/**
|
||||
* In-repo tutorial for getting a free Xiaomi MiMo key + picking a region.
|
||||
* Points at the default branch so it resolves once this lands on main (which
|
||||
* is what production serves). Linked from the homepage BYO modal, the play
|
||||
* page's silence nudge, and the README.
|
||||
*/
|
||||
export const TTS_KEY_DOC_URL =
|
||||
"https://github.com/zonghaoyuan/infiplot/blob/main/docs/xiaomi-tts-key.md";
|
||||
|
||||
export const TTS_PRESETS: TtsPreset[] = [
|
||||
{
|
||||
id: "sgp",
|
||||
kind: "token-plan",
|
||||
label: "新加坡 · Singapore",
|
||||
baseUrl: "https://token-plan-sgp.xiaomimimo.com/v1",
|
||||
},
|
||||
{
|
||||
id: "cn",
|
||||
kind: "token-plan",
|
||||
label: "中国大陆 · China",
|
||||
baseUrl: "https://token-plan-cn.xiaomimimo.com/v1",
|
||||
},
|
||||
{
|
||||
id: "ams",
|
||||
kind: "token-plan",
|
||||
label: "欧洲 · Amsterdam",
|
||||
baseUrl: "https://token-plan-ams.xiaomimimo.com/v1",
|
||||
},
|
||||
{
|
||||
id: "payg",
|
||||
kind: "payg",
|
||||
label: "按量付费 · Pay-as-you-go",
|
||||
baseUrl: "https://api.xiaomimimo.com/v1",
|
||||
},
|
||||
];
|
||||
|
||||
/** Token Plan endpoints only — the region sub-options shown once the user
|
||||
* picks the "套餐" key type. */
|
||||
export const TTS_REGION_PRESETS = TTS_PRESETS.filter(
|
||||
(p) => p.kind === "token-plan",
|
||||
);
|
||||
|
||||
/** The single pay-as-you-go preset id (`sk-` keys have no region). */
|
||||
export const PAYG_PRESET_ID = "payg";
|
||||
|
||||
export function findTtsPreset(
|
||||
id: string | null | undefined,
|
||||
): TtsPreset | undefined {
|
||||
if (!id) return undefined;
|
||||
return TTS_PRESETS.find((p) => p.id === id);
|
||||
}
|
||||
@@ -40,6 +40,23 @@ export type BeatChoiceEffect =
|
||||
| { kind: "advance-beat"; targetBeatId: string }
|
||||
| { kind: "change-scene"; nextSceneSeed: string };
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Orientation — session-wide image aspect, locked at session start.
|
||||
// "landscape" → 16:9 (1792×1024), the default for desktop / mobile-landscape.
|
||||
// "portrait" → 9:16 (1024×1792), painted for mobile users holding the phone
|
||||
// upright so the scene fills the screen instead of letterboxing a widescreen
|
||||
// image. CSS object-fit then adapts the 9:16 frame to the exact device size.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
export type Orientation = "portrait" | "landscape";
|
||||
|
||||
/** Normalize an untrusted orientation value (from a request body, or a
|
||||
* persisted session that predates the field) to a valid Orientation.
|
||||
* Anything other than "portrait" falls back to "landscape" (back-compat). */
|
||||
export function coerceOrientation(value: unknown): Orientation {
|
||||
return value === "portrait" ? "portrait" : "landscape";
|
||||
}
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Scene — one background image + a graph of beats.
|
||||
// The Director emits an entire Scene per call; the player navigates
|
||||
@@ -75,6 +92,12 @@ export type Scene = {
|
||||
* Runware URL — the client renders both forms transparently.
|
||||
*/
|
||||
imageUrl?: string;
|
||||
/**
|
||||
* Orientation this scene's image was painted in. Mirrors the session's
|
||||
* locked orientation; recorded per-scene so the client can pick the right
|
||||
* intrinsic dimensions / object-fit even across legacy or mixed history.
|
||||
*/
|
||||
orientation?: Orientation;
|
||||
};
|
||||
|
||||
export type SceneExit =
|
||||
@@ -92,6 +115,43 @@ export type SceneHistoryEntry = {
|
||||
exit?: SceneExit;
|
||||
};
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Writer two-phase split
|
||||
//
|
||||
// The Writer runs as TWO LLM calls so scene-image generation can begin
|
||||
// before the dialogue is fully written:
|
||||
// Phase A (WriterPlan) — the minimal skeleton the image pipeline needs:
|
||||
// sceneSummary + sceneKey + the entry beat's
|
||||
// on-stage roster + the full cast to design.
|
||||
// Phase B (beats) — the full beats[] graph + storyStatePatch, written
|
||||
// to honor the plan, overlapped with image gen.
|
||||
// The Cinematographer + character design + Painter all run off the Plan, so
|
||||
// Phase B's (longer) output is hidden behind the image pipeline.
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
export type WriterPlan = {
|
||||
/** 中文 scene synopsis (location + time + mood + key event + opening hook).
|
||||
* The sole input the Cinematographer composes the establishing shot from. */
|
||||
sceneSummary: string;
|
||||
/** English location+time slug for cross-scene visual continuity. */
|
||||
sceneKey?: string;
|
||||
/** Beat id the player lands on when entering the scene. Phase B must emit a
|
||||
* beat with this id (reconciled if it doesn't). */
|
||||
entryBeatId: string;
|
||||
/** Every NPC name that appears anywhere in this scene. Drives character
|
||||
* design (card + portrait + voice) IN PARALLEL with Phase B beat writing, so
|
||||
* the whole cast is provisioned by the time the scene returns. Phase B may
|
||||
* only use names from this list (plus the POV "你"). Never includes the player. */
|
||||
cast: string[];
|
||||
/** The entry beat's on-stage roster (who's visible + pose when the player
|
||||
* lands). Drives the Cinematographer's framing and the entry-beat portraits
|
||||
* the Painter anchors to. Never includes the POV player. */
|
||||
entryActiveCharacters: BeatActiveCharacter[];
|
||||
/** The entry beat's speaker — an NPC name, "你" (player speaking), or
|
||||
* undefined for a pure narration/environment entry. Drives shot selection. */
|
||||
entrySpeaker?: string;
|
||||
};
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
// Characters & voices (TTS)
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
@@ -214,6 +274,12 @@ export type Session = {
|
||||
* payload small for /api/scene round-trips.
|
||||
*/
|
||||
styleReferenceImage?: string;
|
||||
/**
|
||||
* Session-wide image orientation, locked at session start from the client's
|
||||
* device + orientation and carried on every /api/scene call so all scenes
|
||||
* share one aspect ratio. Absent → "landscape" (back-compat).
|
||||
*/
|
||||
orientation?: Orientation;
|
||||
};
|
||||
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
@@ -231,10 +297,41 @@ export type VisionClassify = "insert-beat" | "change-scene";
|
||||
// Provider config
|
||||
// ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
* Wire protocol used to talk to a model provider. Which values are valid
|
||||
* depends on the model role — each ai-client adapter accepts its own subset
|
||||
* and falls back to a sensible default for anything else:
|
||||
*
|
||||
* openai_compatible text / vision / image — OpenAI Chat Completions +
|
||||
* `/images/generations` (self-implemented fetch; the
|
||||
* default for text/vision when unset)
|
||||
* anthropic text / vision — native Anthropic Messages (AI SDK)
|
||||
* google text / vision / image — native Gemini (AI SDK); image
|
||||
* uses the Nano Banana family
|
||||
* openai image only — OpenAI gpt-image via AI SDK,
|
||||
* unlocks reference-image editing (for text/vision use
|
||||
* openai_compatible, which already speaks OpenAI's format)
|
||||
* runware image only — Runware task-array protocol
|
||||
* (self-implemented; the default for runware.ai URLs)
|
||||
*/
|
||||
export type ProviderProtocol =
|
||||
| "openai_compatible"
|
||||
| "anthropic"
|
||||
| "google"
|
||||
| "openai"
|
||||
| "runware";
|
||||
|
||||
export type ProviderConfig = {
|
||||
baseUrl: string;
|
||||
apiKey: string;
|
||||
model: string;
|
||||
/**
|
||||
* Wire protocol. When unset, callers apply a role-specific default:
|
||||
* text/vision → "openai_compatible"; image → inferred from baseUrl
|
||||
* (runware.ai → "runware", otherwise "openai_compatible") so existing
|
||||
* deployments keep working without setting *_PROVIDER.
|
||||
*/
|
||||
provider?: ProviderProtocol;
|
||||
};
|
||||
|
||||
export type TtsConfig = {
|
||||
@@ -263,6 +360,18 @@ export type StartRequest = {
|
||||
styleGuide: string;
|
||||
/** Optional user-uploaded style reference image — see Session.styleReferenceImage. */
|
||||
styleReferenceImage?: string;
|
||||
/**
|
||||
* When true the client supplied its own Xiaomi TTS key and will provision +
|
||||
* synth voices in the browser (key never touches our server). The route then
|
||||
* drops `config.tts` so the engine skips all server-side TTS work.
|
||||
*/
|
||||
clientTts?: boolean;
|
||||
/**
|
||||
* Device orientation chosen at session start. "portrait" makes the engine
|
||||
* paint 9:16 vertical scene images (mobile, held upright); "landscape"
|
||||
* (default) keeps 16:9 widescreen. Locked for the whole session.
|
||||
*/
|
||||
orientation?: Orientation;
|
||||
};
|
||||
|
||||
// /api/parse-style-image — vision LLM extracts a textual painting-style
|
||||
@@ -295,6 +404,8 @@ export type StartResponse = {
|
||||
// (frontend synthesizes a speculative exit).
|
||||
export type SceneRequest = {
|
||||
session: Session;
|
||||
/** See StartRequest.clientTts — drops server-side TTS for BYO-key clients. */
|
||||
clientTts?: boolean;
|
||||
};
|
||||
|
||||
export type SceneResponse = {
|
||||
@@ -352,6 +463,8 @@ export type VisionResponse = {
|
||||
export type InsertBeatRequest = {
|
||||
session: Session;
|
||||
freeformAction: string;
|
||||
/** See StartRequest.clientTts — drops server-side TTS for BYO-key clients. */
|
||||
clientTts?: boolean;
|
||||
};
|
||||
|
||||
/** Partial beat fields produced by the insert-beat director. */
|
||||
|
||||
Vendored
+1
-1
@@ -1,6 +1,6 @@
|
||||
/// <reference types="next" />
|
||||
/// <reference types="next/image-types/global" />
|
||||
import "./.next/types/routes.d.ts";
|
||||
import "./.next/dev/types/routes.d.ts";
|
||||
|
||||
// NOTE: This file should not be edited
|
||||
// see https://nextjs.org/docs/app/api-reference/config/typescript for more information.
|
||||
|
||||
@@ -20,6 +20,10 @@
|
||||
"deploy:cf": "opennextjs-cloudflare deploy"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ai-sdk/anthropic": "^3.0.81",
|
||||
"@ai-sdk/google": "^3.0.80",
|
||||
"@ai-sdk/openai": "^3.0.67",
|
||||
"ai": "^6.0.196",
|
||||
"jsonrepair": "^3.14.0",
|
||||
"next": "^16.0.0",
|
||||
"react": "^19.0.0",
|
||||
|
||||
Generated
+138
-8
@@ -8,12 +8,24 @@ importers:
|
||||
|
||||
.:
|
||||
dependencies:
|
||||
'@ai-sdk/anthropic':
|
||||
specifier: ^3.0.81
|
||||
version: 3.0.81(zod@4.4.3)
|
||||
'@ai-sdk/google':
|
||||
specifier: ^3.0.80
|
||||
version: 3.0.80(zod@4.4.3)
|
||||
'@ai-sdk/openai':
|
||||
specifier: ^3.0.67
|
||||
version: 3.0.67(zod@4.4.3)
|
||||
ai:
|
||||
specifier: ^6.0.196
|
||||
version: 6.0.196(zod@4.4.3)
|
||||
jsonrepair:
|
||||
specifier: ^3.14.0
|
||||
version: 3.14.0
|
||||
next:
|
||||
specifier: ^16.0.0
|
||||
version: 16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
|
||||
version: 16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
|
||||
react:
|
||||
specifier: ^19.0.0
|
||||
version: 19.2.7
|
||||
@@ -23,7 +35,7 @@ importers:
|
||||
devDependencies:
|
||||
'@opennextjs/cloudflare':
|
||||
specifier: ^1.19.11
|
||||
version: 1.19.11(next@16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7))(wrangler@4.97.0)
|
||||
version: 1.19.11(next@16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7))(wrangler@4.97.0)
|
||||
'@types/node':
|
||||
specifier: ^22.9.0
|
||||
version: 22.19.19
|
||||
@@ -54,6 +66,40 @@ importers:
|
||||
|
||||
packages:
|
||||
|
||||
'@ai-sdk/anthropic@3.0.81':
|
||||
resolution: {integrity: sha512-B1JDd9Ugq9R5AgIaW3674lhGCMMYJcPUxnrZh8fzbGojgg4QvHFRv6eZahGQAUsmGHbcf74G9bdSBDLWQGY2GA==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
zod: ^3.25.76 || ^4.1.8
|
||||
|
||||
'@ai-sdk/gateway@3.0.124':
|
||||
resolution: {integrity: sha512-h8CrmbSG+8X0C+M/E1M4oiDHYevqwbzAPN+uLRHS0eJaatF2MZ+juNtOHXNOjk7Bsk9mD2RjYMjJO9dFkb9I7Q==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
zod: ^3.25.76 || ^4.1.8
|
||||
|
||||
'@ai-sdk/google@3.0.80':
|
||||
resolution: {integrity: sha512-5ORbm/yFUPO0MEvZsxBMN0cdKw2+lwU/wVn5KN3KF8Dmk1LughuDuUohMh/7iU/XFTiyB0OvmTW/tdV/J7O9zg==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
zod: ^3.25.76 || ^4.1.8
|
||||
|
||||
'@ai-sdk/openai@3.0.67':
|
||||
resolution: {integrity: sha512-oAiGC9eWG7IgtdsdS74bOCnAAHarAfTJhWN9x5INwnWPekL802AvF+0I5DvLzIF1MIRmNw4N8mPSL/GUVbX9Mw==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
zod: ^3.25.76 || ^4.1.8
|
||||
|
||||
'@ai-sdk/provider-utils@4.0.27':
|
||||
resolution: {integrity: sha512-ubkAJ+xODouwtmN1tYlvTPphH1hPOBfZaEQe8U7skGvFAnIRs9PPpsq57bC2+Ky/MB4yzhd6YOsxTAx9sGpazw==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
zod: ^3.25.76 || ^4.1.8
|
||||
|
||||
'@ai-sdk/provider@3.0.10':
|
||||
resolution: {integrity: sha512-Q3BZ27qfpYqnCYGvE3vt+Qi6LGOF9R5Nmzn+9JoM1lCRsD9mYaIhfJLkSunN48nfGXJ6n+XNV0J/XVpqGQl7Dw==}
|
||||
engines: {node: '>=18'}
|
||||
|
||||
'@alloc/quick-lru@5.2.0':
|
||||
resolution: {integrity: sha512-UrcABB+4bUrFABwbluTIBErXwvbsU/V7TZWfmbgJfbkwiBuziS9gxdODUyuiecfdGQ85jglMW6juS3+z5TsKLw==}
|
||||
engines: {node: '>=10'}
|
||||
@@ -1036,6 +1082,10 @@ packages:
|
||||
next: '>=15.5.18 <16 || >=16.2.6'
|
||||
wrangler: ^4.86.0
|
||||
|
||||
'@opentelemetry/api@1.9.1':
|
||||
resolution: {integrity: sha512-gLyJlPHPZYdAk1JENA9LeHejZe1Ti77/pTeFm/nMXmQH/HFZlcS/O2XJB+L8fkbrNSqhdtlvjBVjxwUYanNH5Q==}
|
||||
engines: {node: '>=8.0.0'}
|
||||
|
||||
'@poppinss/colors@4.1.6':
|
||||
resolution: {integrity: sha512-H9xkIdFswbS8n1d6vmRd8+c10t2Qe+rZITbbDHHkQixH5+2x1FDGmi/0K+WgWiqQFKPSlIYB7jlH6Kpfn6Fleg==}
|
||||
|
||||
@@ -1204,6 +1254,9 @@ packages:
|
||||
'@speed-highlight/core@1.2.15':
|
||||
resolution: {integrity: sha512-BMq1K3DsElxDWawkX6eLg9+CKJrTVGCBAWVuHXVUV2u0s2711qiChLSId6ikYPfxhdYocLNt3wWwSvDiTvFabw==}
|
||||
|
||||
'@standard-schema/spec@1.1.0':
|
||||
resolution: {integrity: sha512-l2aFy5jALhniG5HgqrD6jXLi/rUWrKvqN/qJx6yoJsgKhblVd+iqqU4RCXavm/jPityDo5TCvKMnpjKnOriy0w==}
|
||||
|
||||
'@swc/helpers@0.5.15':
|
||||
resolution: {integrity: sha512-JQ5TuMi45Owi4/BIMAJBoSQoOJu12oOk/gADqlcUL9JEdHB8vyjUSsxqeNXnmXHjYKMi2WcYtezGEEhqUI/E2g==}
|
||||
|
||||
@@ -1227,6 +1280,10 @@ packages:
|
||||
'@types/react@19.2.16':
|
||||
resolution: {integrity: sha512-esJiCAnl0kfpNdE69f3So4WJUXy95dLZydX0KwK46riIHDzHM7O9Vtf9xCHW0PXIqvgqNrswl522kA/5yx+F4w==}
|
||||
|
||||
'@vercel/oidc@3.2.0':
|
||||
resolution: {integrity: sha512-UycprH3T6n3jH0k44NHMa7pnFHGu/N05MjojYr+Mc6I7obkoLIJujSWwin1pCvdy/eOxrI/l3uDLQsmcrOb4ug==}
|
||||
engines: {node: '>= 20'}
|
||||
|
||||
abort-controller@3.0.0:
|
||||
resolution: {integrity: sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==}
|
||||
engines: {node: '>=6.5'}
|
||||
@@ -1244,6 +1301,12 @@ packages:
|
||||
resolution: {integrity: sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ==}
|
||||
engines: {node: '>= 8.0.0'}
|
||||
|
||||
ai@6.0.196:
|
||||
resolution: {integrity: sha512-2T45UeqKL4a11KQ14I5i1YYHOvCFrMF478E1k6PVjlQSGUvXSv4xrxIaQbUL4qgv91DADSbddwv3oR49pPAK3g==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
zod: ^3.25.76 || ^4.1.8
|
||||
|
||||
ansi-colors@4.1.3:
|
||||
resolution: {integrity: sha512-/6w/C21Pm1A7aZitlI5Ni/2J6FFQN8i1Cvz3kHABAAbw93v/NlvKdVOqz7CCWz/3iv/JplRSEEZ83XION15ovw==}
|
||||
engines: {node: '>=6'}
|
||||
@@ -1549,6 +1612,10 @@ packages:
|
||||
resolution: {integrity: sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==}
|
||||
engines: {node: '>=6'}
|
||||
|
||||
eventsource-parser@3.1.0:
|
||||
resolution: {integrity: sha512-kJezFj9YFAMLeORyi7aCLxLbD5/qWMQnoMVlVPyHIll7lgRJCc3JVln9Vgl9nwQi0YkMnhdGTMNn7CkRRAptMg==}
|
||||
engines: {node: '>=18.0.0'}
|
||||
|
||||
execa@5.1.1:
|
||||
resolution: {integrity: sha512-8uSpZZocAZRBAPIEINJj3Lo9HyGitllczc27Eh5YYojjMFMn8yHMDMaUHE2Jqfq05D/wucwI4JGURyXt1vchyg==}
|
||||
engines: {node: '>=10'}
|
||||
@@ -1754,6 +1821,9 @@ packages:
|
||||
resolution: {integrity: sha512-/imKNG4EbWNrVjoNC/1H5/9GFy+tqjGBHCaSsN+P2RnPqjsLmv6UD3Ej+Kj8nBWaRAwyk7kK5ZUc+OEatnTR3A==}
|
||||
hasBin: true
|
||||
|
||||
json-schema@0.4.0:
|
||||
resolution: {integrity: sha512-es94M3nTIfsEPisRafak+HDLfHXnKBhV3vU5eqPcS3flIWqcxJWgXHXiey3YrpaNsanY5ei1VoYEbOzijuq9BA==}
|
||||
|
||||
jsonrepair@3.14.0:
|
||||
resolution: {integrity: sha512-tWPGKMZf/8UPim+fcW2EfcQ/d/7aKUrP6IECz9G3Tu6Q5dX0orSleqJ9z6sSw7qrQkjF8/Edo4DvsWBZ8H+HNg==}
|
||||
hasBin: true
|
||||
@@ -2384,8 +2454,47 @@ packages:
|
||||
youch@4.1.0-beta.10:
|
||||
resolution: {integrity: sha512-rLfVLB4FgQneDr0dv1oddCVZmKjcJ6yX6mS4pU82Mq/Dt9a3cLZQ62pDBL4AUO+uVrCvtWz3ZFUL2HFAFJ/BXQ==}
|
||||
|
||||
zod@4.4.3:
|
||||
resolution: {integrity: sha512-ytENFjIJFl2UwYglde2jchW2Hwm4GJFLDiSXWdTrJQBIN9Fcyp7n4DhxJEiWNAJMV1/BqWfW/kkg71UDcHJyTQ==}
|
||||
|
||||
snapshots:
|
||||
|
||||
'@ai-sdk/anthropic@3.0.81(zod@4.4.3)':
|
||||
dependencies:
|
||||
'@ai-sdk/provider': 3.0.10
|
||||
'@ai-sdk/provider-utils': 4.0.27(zod@4.4.3)
|
||||
zod: 4.4.3
|
||||
|
||||
'@ai-sdk/gateway@3.0.124(zod@4.4.3)':
|
||||
dependencies:
|
||||
'@ai-sdk/provider': 3.0.10
|
||||
'@ai-sdk/provider-utils': 4.0.27(zod@4.4.3)
|
||||
'@vercel/oidc': 3.2.0
|
||||
zod: 4.4.3
|
||||
|
||||
'@ai-sdk/google@3.0.80(zod@4.4.3)':
|
||||
dependencies:
|
||||
'@ai-sdk/provider': 3.0.10
|
||||
'@ai-sdk/provider-utils': 4.0.27(zod@4.4.3)
|
||||
zod: 4.4.3
|
||||
|
||||
'@ai-sdk/openai@3.0.67(zod@4.4.3)':
|
||||
dependencies:
|
||||
'@ai-sdk/provider': 3.0.10
|
||||
'@ai-sdk/provider-utils': 4.0.27(zod@4.4.3)
|
||||
zod: 4.4.3
|
||||
|
||||
'@ai-sdk/provider-utils@4.0.27(zod@4.4.3)':
|
||||
dependencies:
|
||||
'@ai-sdk/provider': 3.0.10
|
||||
'@standard-schema/spec': 1.1.0
|
||||
eventsource-parser: 3.1.0
|
||||
zod: 4.4.3
|
||||
|
||||
'@ai-sdk/provider@3.0.10':
|
||||
dependencies:
|
||||
json-schema: 0.4.0
|
||||
|
||||
'@alloc/quick-lru@5.2.0': {}
|
||||
|
||||
'@ast-grep/napi-darwin-arm64@0.40.5':
|
||||
@@ -3446,7 +3555,7 @@ snapshots:
|
||||
'@nodelib/fs.scandir': 2.1.5
|
||||
fastq: 1.20.1
|
||||
|
||||
'@opennextjs/aws@4.0.2(next@16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7))':
|
||||
'@opennextjs/aws@4.0.2(next@16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7))':
|
||||
dependencies:
|
||||
'@ast-grep/napi': 0.40.5
|
||||
'@aws-sdk/client-cloudfront': 3.984.0
|
||||
@@ -3462,24 +3571,24 @@ snapshots:
|
||||
cookie: 1.1.1
|
||||
esbuild: 0.25.4
|
||||
express: 5.2.1
|
||||
next: 16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
|
||||
next: 16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
|
||||
path-to-regexp: 6.3.0
|
||||
urlpattern-polyfill: 10.1.0
|
||||
yaml: 2.9.0
|
||||
transitivePeerDependencies:
|
||||
- supports-color
|
||||
|
||||
'@opennextjs/cloudflare@1.19.11(next@16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7))(wrangler@4.97.0)':
|
||||
'@opennextjs/cloudflare@1.19.11(next@16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7))(wrangler@4.97.0)':
|
||||
dependencies:
|
||||
'@ast-grep/napi': 0.40.5
|
||||
'@dotenvx/dotenvx': 1.31.0
|
||||
'@opennextjs/aws': 4.0.2(next@16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7))
|
||||
'@opennextjs/aws': 4.0.2(next@16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7))
|
||||
ci-info: 4.4.0
|
||||
cloudflare: 4.5.0
|
||||
comment-json: 4.6.2
|
||||
enquirer: 2.4.1
|
||||
glob: 12.0.0
|
||||
next: 16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
|
||||
next: 16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
|
||||
ts-tqdm: 0.8.6
|
||||
wrangler: 4.97.0
|
||||
yargs: 18.0.0
|
||||
@@ -3487,6 +3596,8 @@ snapshots:
|
||||
- encoding
|
||||
- supports-color
|
||||
|
||||
'@opentelemetry/api@1.9.1': {}
|
||||
|
||||
'@poppinss/colors@4.1.6':
|
||||
dependencies:
|
||||
kleur: 4.1.5
|
||||
@@ -3697,6 +3808,8 @@ snapshots:
|
||||
|
||||
'@speed-highlight/core@1.2.15': {}
|
||||
|
||||
'@standard-schema/spec@1.1.0': {}
|
||||
|
||||
'@swc/helpers@0.5.15':
|
||||
dependencies:
|
||||
tslib: 2.8.1
|
||||
@@ -3724,6 +3837,8 @@ snapshots:
|
||||
dependencies:
|
||||
csstype: 3.2.3
|
||||
|
||||
'@vercel/oidc@3.2.0': {}
|
||||
|
||||
abort-controller@3.0.0:
|
||||
dependencies:
|
||||
event-target-shim: 5.0.1
|
||||
@@ -3739,6 +3854,14 @@ snapshots:
|
||||
dependencies:
|
||||
humanize-ms: 1.2.1
|
||||
|
||||
ai@6.0.196(zod@4.4.3):
|
||||
dependencies:
|
||||
'@ai-sdk/gateway': 3.0.124(zod@4.4.3)
|
||||
'@ai-sdk/provider': 3.0.10
|
||||
'@ai-sdk/provider-utils': 4.0.27(zod@4.4.3)
|
||||
'@opentelemetry/api': 1.9.1
|
||||
zod: 4.4.3
|
||||
|
||||
ansi-colors@4.1.3: {}
|
||||
|
||||
ansi-regex@5.0.1: {}
|
||||
@@ -4052,6 +4175,8 @@ snapshots:
|
||||
|
||||
event-target-shim@5.0.1: {}
|
||||
|
||||
eventsource-parser@3.1.0: {}
|
||||
|
||||
execa@5.1.1:
|
||||
dependencies:
|
||||
cross-spawn: 7.0.6
|
||||
@@ -4293,6 +4418,8 @@ snapshots:
|
||||
|
||||
jiti@1.21.7: {}
|
||||
|
||||
json-schema@0.4.0: {}
|
||||
|
||||
jsonrepair@3.14.0: {}
|
||||
|
||||
kleur@4.1.5: {}
|
||||
@@ -4376,7 +4503,7 @@ snapshots:
|
||||
|
||||
negotiator@1.0.0: {}
|
||||
|
||||
next@16.2.7(react-dom@19.2.7(react@19.2.7))(react@19.2.7):
|
||||
next@16.2.7(@opentelemetry/api@1.9.1)(react-dom@19.2.7(react@19.2.7))(react@19.2.7):
|
||||
dependencies:
|
||||
'@next/env': 16.2.7
|
||||
'@swc/helpers': 0.5.15
|
||||
@@ -4395,6 +4522,7 @@ snapshots:
|
||||
'@next/swc-linux-x64-musl': 16.2.7
|
||||
'@next/swc-win32-arm64-msvc': 16.2.7
|
||||
'@next/swc-win32-x64-msvc': 16.2.7
|
||||
'@opentelemetry/api': 1.9.1
|
||||
sharp: 0.34.5
|
||||
transitivePeerDependencies:
|
||||
- '@babel/core'
|
||||
@@ -4928,3 +5056,5 @@ snapshots:
|
||||
'@speed-highlight/core': 1.2.15
|
||||
cookie: 1.1.1
|
||||
youch-core: 0.3.3
|
||||
|
||||
zod@4.4.3: {}
|
||||
|
||||
+1
-8
@@ -1,11 +1,4 @@
|
||||
{
|
||||
"$schema": "https://openapi.vercel.sh/vercel.json",
|
||||
"framework": "nextjs",
|
||||
"functions": {
|
||||
"app/api/start/route.ts": { "maxDuration": 60 },
|
||||
"app/api/scene/route.ts": { "maxDuration": 60 },
|
||||
"app/api/vision/route.ts": { "maxDuration": 60 },
|
||||
"app/api/insert-beat/route.ts": { "maxDuration": 60 },
|
||||
"app/api/beat-audio/route.ts": { "maxDuration": 30 }
|
||||
}
|
||||
"framework": "nextjs"
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user