This feature is available on all Portkey plans.
Enable Load Balancing
To enable Load Balancing, you can modify theconfig object to include a strategy with loadbalance mode.
Here’s a quick example to load balance 75-25 between an OpenAI and an Azure OpenAI account
You can create and then use the config in your requests.
How Load Balancing Works
- Defining the Loadbalance Targets & their Weights: You provide a list of
providers, and assign aweightvalue to each target. The weights represent the relative share of requests that should be routed to each target. - Weight Normalization: Portkey first sums up all the weights you provided for the targets. It then divides each target’s weight by the total sum to calculate the normalized weight for that target. This ensures the weights add up to 1 (or 100%), allowing Portkey to distribute the load proportionally.
For example, let’s say you have three targets with weights 5, 3, and 1. The total sum of weights is 9 (5 + 3 + 1). Portkey will then normalize the weights as follows:
- Target 1: 5 / 9 = 0.55 (55% of the traffic)
- Target 2: 3 / 9 = 0.33 (33% of the traffic)
- Target 3: 1 / 9 = 0.11 (11% of the traffic)
- Request Distribution: When a request comes in, Portkey routes it to a target LLM based on the normalized weight probabilities. This ensures the traffic is distributed across the LLMs according to the specified weights.
- Default
weightvalue is1 - Minimum
weightvalue is0 - If
weightis not set for a target, the defaultweightvalue (i.e.1) is applied. - You can set
"weight":0for a specific target to stop routing traffic to it without removing it from your Config
Sticky Load Balancing
Sticky load balancing ensures that requests with the same identifier are consistently routed to the same target. This is useful for:- Maintaining conversation context across multiple requests
- Ensuring consistent model behavior for A/B testing
- Session-based routing for user-specific experiences
Configuration
Addsticky_session to your load balancing strategy:
Parameters
| Parameter | Type | Description |
|---|---|---|
hash_fields | array | Fields to use for generating the sticky session identifier. Supports dot notation for nested fields (e.g., metadata.user_id, metadata.session_id) |
ttl | number | Time-to-live in seconds for the sticky session. After this period, a new target may be selected. Default: 3600 (1 hour) |
How It Works
- Identifier Generation: When a request arrives, Portkey generates a hash from the specified
hash_fieldsvalues - Target Lookup: The hash is used to look up the previously assigned target from cache
- Consistent Routing: If a cached assignment exists and hasn’t expired, the request goes to the same target
- New Assignment: If no cached assignment exists, a new target is selected based on weights and cached for future requests
Sticky sessions use a two-tier cache system (in-memory + Redis) for fast lookups and persistence across gateway instances in distributed deployments.
Caveats and Considerations
While the Load Balancing feature offers numerous benefits, there are a few things to consider:- Ensure the LLMs in your list are compatible with your use case. Not all LLMs offer the same capabilities or respond in the same format.
- Be aware of your usage with each LLM. Depending on your weight distribution, your usage with each LLM could vary significantly.
- Keep in mind that each LLM has its own latency and pricing. Diversifying your traffic could have implications on the cost and response time.
- Sticky sessions require Redis for persistence across gateway instances. Without Redis, sticky sessions will only work within a single gateway instance’s memory.

