chore(engine): log prompt-cache hit/miss per chat call

Add a `tag` option to chat() and have it print one `[cache] <tag> hit=X miss=Y rate=Z%` line per call. Three Usage-shape variants are probed in order so the same logger works across providers: - DeepSeek (v3+): usage.prompt_cache_hit_tokens / *_miss_tokens - OpenAI / o-series: usage.prompt_tokens_details.cached_tokens - Anthropic: usage.cache_read_input_tokens / *_creation_* When none of them are present (MiMo / local Ollama / others) we still print prompt + completion totals so the cost baseline is visible. Tag every callsite so the log is greppable: architect / writer / character-designer / cinematographer / insert-beat This is the prerequisite for the prefix-cache reordering work that follows — without per-agent visibility there's no way to tell if a prompt rearrangement actually moved the needle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-03 10:40:20 +08:00
parent 334c9808c6
commit 37c911f510
6 changed files with 67 additions and 7 deletions
@@ -405,7 +405,7 @@ export async function directInsertBeat(
        content: buildInsertBeatUserMessage(session, freeformAction),
      },
    ],
-    { temperature: 0.9, responseFormat: "json_object" },
+    { temperature: 0.9, responseFormat: "json_object", tag: "insert-beat" },
  );

  const parsed = parseJsonLoose<InsertBeatPartial>(raw);