Model Support
Amazon Bedrock prompt caching is generally available with the following models:Currently Supported Models:
- Claude 3.7 Sonnet
- Claude 3.5 Haiku
- Amazon Nova Micro
- Amazon Nova Lite
- Amazon Nova Pro
How Bedrock Prompt Caching Works
When using prompt caching, you define cache checkpoints - markers that indicate parts of your prompt to cache. These cached sections must be static between requests; any alterations will result in a cache miss.You can also use Bedrock Prompt Caching Feature with Portkey’s Prompt Templates.
Implementation Examples
Here’s how to implement prompt caching with Portkey:Supported Features and Limitations
Supported Features
- Text prompts and images embedded within text prompts
- Multiple cache checkpoints per request
- Caching in system prompts, messages, and tools fields (model-dependent)
Supported Models and Limits
Below is a detailed table of supported models, their minimum token requirements, maximum cache checkpoints, and fields that support caching:| Model | Model ID | Min tokens per checkpoint | Max checkpoints per request | Cacheable fields |
|---|---|---|---|---|
| Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | 1,024 | 4 | system, messages, tools |
| Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 | 2,048 | 4 | system, messages, tools |
| Amazon Nova Micro | amazon.nova-micro-v1:0 | 1,000 | 4 | system, messages |
| Amazon Nova Lite | amazon.nova-lite-v1:0 | 1,000 | 4 | system, messages |
| Amazon Nova Pro | amazon.nova-pro-v1:0 | 1,000 | 4 | system, messages |
- The Amazon Nova models support a maximum of 32k tokens for prompt caching.
- For Claude models, tools caching is fully supported.
- Tools caching is not supported for Amazon Nova models.
Understanding Token Counts and Pricing
Portkey automatically calculates correct pricing for prompt caching requests. In the logs, you’ll see cache-related token counts in theusage object:
cache_creation_input_tokens: Number of tokens written to the cache when creating a new entry.cache_read_input_tokens: Number of tokens retrieved from the cache for this request.
Token Format NormalizationPortkey normalizes responses to the OpenAI format. In this format, This differs from native provider formats where input tokens may exclude cached tokens. Portkey’s pricing calculation accounts for this by:
prompt_tokens includes the cached tokens:- Subtracting cached tokens from
prompt_tokensto get the base input token count - Applying the standard input token rate to base tokens
- Applying the discounted cache read rate to
cache_read_input_tokens - Applying the cache write rate to
cache_creation_input_tokens
Related Resources
AWS Bedrock Prompt Caching Docs
For more detailed information on Bedrock prompt caching, refer to:

