chore(engine): log prompt-cache hit/miss per chat call
Add a `tag` option to chat() and have it print one `[cache] <tag> hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are probed in order so the same logger works across providers: - DeepSeek (v3+): usage.prompt_cache_hit_tokens / *_miss_tokens - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens - Anthropic: usage.cache_read_input_tokens / *_creation_* When none of them are present (MiMo / local Ollama / others) we still print prompt + completion totals so the cost baseline is visible. Tag every callsite so the log is greppable: architect / writer / character-designer / cinematographer / insert-beat This is the prerequisite for the prefix-cache reordering work that follows — without per-agent visibility there's no way to tell if a prompt rearrangement actually moved the needle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -56,7 +56,7 @@ async function runDesignLLM(
|
||||
content: buildCharacterDesignerUserMessage(charName, session),
|
||||
},
|
||||
],
|
||||
{ temperature: 0.7, responseFormat: "json_object" },
|
||||
{ temperature: 0.7, responseFormat: "json_object", tag: "character-designer" },
|
||||
);
|
||||
return parseJsonLoose<CharacterDesignOutput>(raw);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user