Prompt Caching
Prompt caching reduces token costs by reusing system prompts across requests:
- Anthropic: Explicit
cache_controlblocks withephemeraltype - Google Gemini:
CachedContentAPI with configurable TTL (5 minutes default) - OpenAI: Automatic prefix caching (no client changes needed, active for prompts >1024 tokens)
- xAI: Not currently available; monitored for future support
Caching is always-on and invisible to the user. No settings or configuration required.