chore(engine): log prompt-cache hit/miss per chat call

Add a `tag` option to chat() and have it print one `[cache] <tag> hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are probed in order so the same logger works across providers: - DeepSeek (v3+): usage.prompt_cache_hit_tokens / *_miss_tokens - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens - Anthropic: usage.cache_read_input_tokens / *_creation_* When none of them are present (MiMo / local Ollama / others) we still print prompt + completion totals so the cost baseline is visible. Tag every callsite so the log is greppable: architect / writer / character-designer / cinematographer / insert-beat This is the prerequisite for the prefix-cache reordering work that follows — without per-agent visibility there's no way to tell if a prompt rearrangement actually moved the needle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:40:20 +08:00
parent 334c9808c6
commit 37c911f510
6 changed files with 67 additions and 7 deletions
@@ -56,7 +56,7 @@ async function runDesignLLM(
        content: buildCharacterDesignerUserMessage(charName, session),
      },
    ],
-    { temperature: 0.7, responseFormat: "json_object" },
+    { temperature: 0.7, responseFormat: "json_object", tag: "character-designer" },
  );
  return parseJsonLoose<CharacterDesignOutput>(raw);
 }