Prompt caching on Amazon Bedrock lets you cache specific portions of your requests for repeated use. This feature significantly reduces inference response latency and input token costs by allowing the model to skip recomputation of previously processed content.
With Portkey, you can easily implement Amazon Bedrock’s prompt caching through our OpenAI-compliant unified API and prompt templates.
Amazon Bedrock prompt caching is generally available with the following models:
Currently Supported Models:
Customers who were given access to Claude 3.5 Sonnet v2 during the prompt caching preview will retain their access, but no additional customers will be granted access to prompt caching on the Claude 3.5 Sonnet v2 model.
When using prompt caching, you define cache checkpoints - markers that indicate parts of your prompt to cache. These cached sections must be static between requests; any alterations will result in a cache miss.
You can also use Bedrock Prompt Caching Feature with Portkey’s Prompt Templates.
Here’s how to implement prompt caching with Portkey:
Supported Features
Below is a detailed table of supported models, their minimum token requirements, maximum cache checkpoints, and fields that support caching:
Model | Model ID | Min tokens per checkpoint | Max checkpoints per request | Cacheable fields |
---|---|---|---|---|
Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | 1,024 | 4 | system, messages, tools |
Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 | 2,048 | 4 | system, messages, tools |
Amazon Nova Micro | amazon.nova-micro-v1:0 | 1,000 | 4 | system, messages |
Amazon Nova Lite | amazon.nova-lite-v1:0 | 1,000 | 4 | system, messages |
Amazon Nova Pro | amazon.nova-pro-v1:0 | 1,000 | 4 | system, messages |
For more detailed information on Bedrock prompt caching, refer to: