Skip to main content

Prompt Caching

Prompt caching reduces token costs by reusing system prompts across requests:

  • Anthropic: Explicit cache_control blocks with ephemeral type
  • Google Gemini: CachedContent API with configurable TTL (5 minutes default)
  • OpenAI: Automatic prefix caching (no client changes needed, active for prompts >1024 tokens)
  • xAI: Not currently available; monitored for future support

Caching is always-on and invisible to the user. No settings or configuration required.