Prompt caching on Anthropic lets you cache individual messages in your request for repeat use. With caching, you can free up your tokens to include more context in your prompt, and also deliver responses significantly faster and cheaper.
You can use this feature on our OpenAI-compliant universal API as well as with our prompt templates.
Just set the cache_control
param in your respective message body:
Set any message in your prompt template to be cached by just toggling the Cache Control
setting in the UI:
Anthropic currently has certain restrictions on prompt caching, like:
For more, refer to Anthropic’s prompt caching documentation here.
Portkey automatically calculate the correct pricing for your prompt caching requests & responses based on Anthropic’s calculations here:
In the individual log for any request, you can also see the exact status of your request and verify if it was cached, or delivered from cache with two usage
parameters:
cache_creation_input_tokens
: Number of tokens written to the cache when creating a new entry.cache_read_input_tokens
: Number of tokens retrieved from the cache for this request.