chore(engine): log prompt-cache hit/miss per chat call

Add a `tag` option to chat() and have it print one `[cache] <tag>
hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are
probed in order so the same logger works across providers:

  - DeepSeek (v3+):  usage.prompt_cache_hit_tokens / *_miss_tokens
  - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens
  - Anthropic:        usage.cache_read_input_tokens / *_creation_*

When none of them are present (MiMo / local Ollama / others) we still
print prompt + completion totals so the cost baseline is visible.

Tag every callsite so the log is greppable:
  architect / writer / character-designer / cinematographer / insert-beat

This is the prerequisite for the prefix-cache reordering work that
follows — without per-agent visibility there's no way to tell if a
prompt rearrangement actually moved the needle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
DESKTOP-I1T6TF3\Q
2026-06-03 10:40:20 +08:00
parent 334c9808c6
commit 37c911f510
6 changed files with 67 additions and 7 deletions
+1 -1
View File
@@ -405,7 +405,7 @@ export async function directInsertBeat(
content: buildInsertBeatUserMessage(session, freeformAction),
},
],
{ temperature: 0.9, responseFormat: "json_object" },
{ temperature: 0.9, responseFormat: "json_object", tag: "insert-beat" },
);
const parsed = parseJsonLoose<InsertBeatPartial>(raw);