tts-1-hd from OpenAI, you can not send more than 7 requests in minute. Any extra request automatically fails.
There are many real-world use cases where it’s possible to run into rate limits:
- When your requests have very high input-tokens count or a very long context, you can hit token thresholds
- When you are running a complex and long prompts pipeline that fires hundreds of requests at once, you can hit both token & request limits
Here’s an overview of rate limits imposed by various providers:
| LLM Provider | Example Model | Rate Limits | 
|---|---|---|
| OpenAI | gpt-4 | Tier 1:500 Requests per Minute10,000 Tokens per Minute10,000 Requests per Day | 
| Anthropic | All models | Tier 1:50 RPM50,000 TPM1 Million Tokens per Day | 
| Cohere | Co.Generate models | Production Key:10,000 RPM | 
| Anyscale | All models | Endpoints:30 concurrent requests | 
| Perplexity AI | mixtral-8x7b-instruct | 24 RPM16,000 TPM | 
| Together AI | All models | Paid:100 RPM | 
1. Install Portkey SDK
chat.completions call using the Portkey SDK:
2. Fallback to Alternative LLMs
With Portkey, you can write a call routing strategy that helps you fallback from one provider to another provider in case of rate limit errors. This is done by passing a Config object while instantiating your Portkey client:In this Config object,
- The routing strategyis set asfallback
- on_status_codesparam ensures that the fallback is only triggered on the- 429error code, which is generated for rate limit errors
- targetsarray contains the details of the LLMs and the order of the fallback
- The override_paramsin the second target lets you add more params for the specific provider. (max_tokensfor Anthropic in this case)
3. Load Balance Among Multiple LLMs
Instead of sending all your requests to a single provider on a single account, you can split your traffic across multiple provider accounts using Portkey - this ensures that a single account does not get overburdened with requests and thus avoids rate limits. It is very easy to setup this “loadbalancing” using Portkey - just write the relevant loadbalance Config and pass it while instantiating your Portkey client once:In this Config object:
- The routing strategyis set asloadbalance
- targetscontain 3 different OpenAI API keys from 3 different accounts, all with equal weight - which means Portkey will split the traffic 1/3rd equally among the 3 keys

