OpenAI now offers prompt caching, a feature that can significantly reduce both latency and costs for your API requests. This feature is particularly beneficial for prompts exceeding 1024 tokens, offering up to an 80% reduction in latency for longer prompts over 10,000 tokens.
Prompt Caching is enabled for following models
gpt-4o (excludes gpt-4o-2024-05-13)
gpt-4o-mini
o1-preview
o1-mini
Portkey supports OpenAI’s prompt caching feature out of the box. Here is an examples on of how to use it:
tools
can be cached, contributing to the minimum 1024 token requirement.Prompt caching requests & responses based on OpenAI’s calculations here:
All requests, including those with fewer than 1024 tokens, will display a cached_tokens
field of the usage.prompt_tokens_details
chat completions object indicating how many of the prompt tokens were a cache hit.
For requests under 1024 tokens, cached_tokens
will be zero.
Key Features: