January

Kicking off 2025 with major releases! 🎉 January marks a milestone for Portkey with our first industry report — we analyzed over 2 trillion tokens flowing through Portkey to find out production patterns for LLMs. We’re also expanding our platform capabilities with advanced PII redaction, JWT authentication, comprehensive audit logs, unified files & batches API, and support for private LLMs. Latest LLMs like Deepseek R1, OpenAI o3, and Gemini thinking model are also integrated with Portkey. Plus, we are attending the AI Engineer Summit in New York in February, and hosting in-person meetups in Mumbai & NYC. Let’s dive in!

Summary

Area	Key Updates
Benchmark	• Released LLMs in Prod Report 2025 analyzing 2T+ tokens • Key finding: Multi-LLM deployment is now standard • Average prompt size up 4x, with 40% cost savings from caching
Security	• Advanced PII redaction with automatic standardized identifiers • JWT authentication support for enterprise deployments • Comprehensive audit logs for all critical actions • Enforced metadata schemas for better governance • Attach default configs & metadata to API keys • Granular workspace management controls
Platform	• Unified API for files & batches across major providers • Support for private LLM deployments • Enhanced virtual keys with granular controls
New Models	• Deepseek R1 available across 7+ providers • Added Gemini thinking model • Support for Perplexity Sonar models • o3-mini integration
Integrations	• AWS Bedrock Guardrails support • Milvus DB & Replicate integrations • Expanded Open WebUI support • Guardrails for embedding requests
Community	• We did a deep dive into MCP and event-driven architecture for agentic systems

Our comprehensive analysis of 2T+ tokens processed through Portkey’s Gateway reveals fascinating insights about how teams are deploying LLMs in production. Here are the key findings:

Multi-LLM is the New Normal

Despite OpenAI’s dominance (>50% of prod traffic), teams are actively implementing multi-LLM strategies for reliability and specialized use cases

Prompts are Getting Complex

Average prompt size has increased 4x in the last year, indicating more sophisticated engineering techniques and complex workloads

Caching is Critical

Implementation of proper caching strategies leads to up to 40% cost savings - a must-have for production deployments

Read the full LLMs in Prod 2025 Report →

Platform

Advanced PII Redaction We’ve significantly enhanced Portkey’s Guardrails with request mutation capabilities. When any sensitive data (like email, phone number, SSN) is detected in user requests, our PII redaction automatically replaces it with standardized identifiers before it reaches the LLM. This works seamlessly across our entire guardrails ecosystem, including AWS Bedrock Guardrails, Patronus AI, Promptfoo, Pangea, and more. Unified Files & Batches API Managing file uploads and batch processing across multiple LLM providers is now dramatically simpler. Instead of building provider-specific integrations, you can:

Upload once, use everywhere - test your data across different foundation models
Run A/B tests seamlessly across providers - Choose between native provider batching or Portkey’s custom batch API

Integrate Private LLMs You can now add your privately hosted LLMs to Portkey’s virtual keys. Simply:

Configure your model’s base URL
Set required authentication headers
Start routing requests through our unified API

This means you can use your private deployments alongside commercial providers, with the same monitoring, reliability, and management features. API Keys with Default Configs & Metadata You can now attach default Portkey config & Metadata with any API key you create.

Automatically monitor how a service/user is consuming Portkey API by enforcing metadata
Apply Guardrails on requests automatically by adding them to Configs and attaching that to the key
Set default fallbacks for outgoing request

Enterprise

Running AI at scale requires robust security, visibility, and control. This month, we’ve launched a comprehensive set of enterprise features to enable that:

Authentication & Access Control

JWT Authentication: Secure API access with JWT tokens, with support for JWKS URL and custom claims validation.
Workspace Management: Manage workspace access and control who can view logs or create API keys from the Admin dashboard

Governance & Compliance

Metadata Schemas: Enforce standardized request metadata across teams - crucial for governance and cost allocation
Audit Logging: Track every critical action across both the Portkey app and Admin API, with detailed user attribution
Security Settings: Expanded settings for managing logs visibility and API key creation

Customer Love

After evaluating 17 different platforms, this AI team replaced 2+ years of homegrown tooling with Portkey Prompts.

They were able to do this because of three things:

They could build reusable prompts with our partial templates
Our versioning let them confidently roll out changes
And they didn’t have to refactor anything thanks to our OpenAI-compatible APIs

Integrations

Models & Providers

Deepseek R1

Access Deepseek’s latest reasoning model through multiple providers: direct API, Fireworks AI, Together AI, Openrouter, Groq, AWS Bedrock, Azure AI Inference, and more.

Gemini Thinking Model

To keep things OpenAI compatible, you can decide if you’d like Portkey to return the reasoning tokens or not

o3-mini

Available across both OpenAI & Azure OpenAI

Perplexity Sonar

Along with support for their citations and other features

Replicate

Full support for Replicate’s model marketplace

Libraries & Tools

Milvus DB

Direct routing support for Milvus vector database

Qdrant DB

Direct routing support for Qdrant vector database

Open WebUI

Expanded integration capabilities

Langchain

Enhanced documentation and integration guides

Guardrails

Inverse Guardrail All eligible checks now have an Inverse option in the UI - which triggers a TRUE verdict when the Guardrail verdict fails.

AWS Bedrock Guardrails

Native support for AWS Bedrock’s guardrail capabilities

Guardrails on Embedding Requests Portkey Guardrails now work on your embedding input requests!

Community

We are attending the AI Engineer Summit in NYC this February and have some extra event passes to share! Reach out to us on Discord to ask for a pass. We are also hosting small meetups in NYC and Mumbai this month to meet with local engineering leaders and ML/AI platform leads. Register for them below:

LLMs in Prod Mumbai

LLMs in Prod NYC

Resources

EDA for Agents

Last month we hosted an inspiring AI practitioners meetup with Ojasvi Yadav and Anudeep Yegireddi to discuss the role of Event-Driven Architecture in building Multi-Agent Systems using and MCP. Read event report here → Essential reading for your AI infrastructure:

LLMs in Prod Report 2025: Comprehensive analysis of production LLM usage patterns
The Real Cost of Building an LLM Gateway: Understanding infrastructure investments
Critical Role of Audit Logs: Enterprise AI governance
Error Library: New documentation covering common errors across 30+ providers
Deepseek on Fireworks: How to use Portkey with Fireworks to call Deepseek’s R1 model for reasoning tasks

Improvements

Token counting is now more accurate for Anthropic streams
Added logprobs for Vertex AI
Improved usage object mapping for Perplexity
Error handling is more robust across all SDKs

Support

Need Help?

Open an issue on GitHub

Join Us

Get support in our Discord

Monthly Summary

Enterprise Releases