1. Import the SDK and authenticate Portkey
Start by installing theportkey-ai
to your NodeJS project.
2. Create Configs: Loadbalance with Nested Fallbacks
Portkey acts as AI gateway to all of your requests to LLMs. It follows the OpenAI SDK signature in all of it’s methods and interfaces making it easy to use and switch. Here is an example of an chat completions requests through Portkey.loadbalance
strategy across Anthropic and OpenAI. weight
describes the traffic should be split into 50/50 among both the LLM providers while override_params
will help us override the defaults.
Let’s take this a step further to apply a fallback mechanism for the requests from* OpenAI* to fallback to Azure OpenAI. This nested mechanism among the targets
will ensure our app is reliable in the production in great confidence.
See the documentation for Portkey Fallbacks and Loadbalancing.
3. Make a Request
Now that theconfig
‘s are concrete and are passed as arguments when instantiating the Portkey client instance, all subsequent will acquire desired behavior auto-magically — No additional changes to the codebase.
4. Trace the request from the logs
It can be challenging to identify particular requests from the thousands that are received every day, similar to trying to find a needle in a haystack. However, Portkey offers a solution by enabling us to attach a desired trace ID. Hererequest-loadbalance-fallback
.
5. Advanced: Canary Testing
Given there are new models coming every day and your app is in production — What is the best way to try the quality of those models? Canary Testing allows you to gradually roll out a change to a small subset of users before making it available to everyone. Consider this scenario: You have been using OpenAI as your LLM provider for a while now, but are considering trying an open-source Llama model for your app through Anyscale.weight
, indication of traffic is split to have 10% of your user-base are served from Anyscale’s Llama models. Now, you are all set up to get feedback and observe the performance of your app and release increasingly to larger userbase.
Considerations
You can implement production-grade Loadbalancing and nested fallback mechanisms with just a few lines of code. While you are equipped with all the tools for your next GenAI app, here are a few considerations:- Every request has to adhere to the LLM provider’s requirements for it to be successful. For instance,
max_tokens
is required for Anthropic and not for OpenAI. - While loadbalance helps reduce the load on one LLM - it is recommended to pair it with a Fallback strategy to ensure that your app stays reliable
- On Portkey, you can also pass the loadbalance weight as 0 - this will essentially stop routing requests to that target and you can amp it up when required
- Loadbalance has no target limits as such, so you can potentially add multiple account details from one provider and effectively multiply your available rate limits
- Loadbalance does not alter the outputs or the latency of the requests in any way
See the entire code
See the entire code