Kicking off 2025 with major releases! 🎉
January marks a milestone for Portkey with our first industry report — we analyzed over 2 trillion tokens flowing through Portkey to find out production patterns for LLMs.
We’re also expanding our platform capabilities with advanced PII redaction, JWT authentication, comprehensive audit logs, unified files & batches API, and support for private LLMs. Latest LLMs like Deepseek R1, OpenAI o3, and Gemini thinking model are also integrated with Portkey.
Plus, we are attending the AI Engineer Summit in New York in February, and hosting in-person meetups in Mumbai & NYC.
Let’s dive in!
Area | Key Updates |
---|---|
Benchmark | • Released LLMs in Prod Report 2025 analyzing 2T+ tokens • Key finding: Multi-LLM deployment is now standard • Average prompt size up 4x, with 40% cost savings from caching |
Security | • Advanced PII redaction with automatic standardized identifiers • JWT authentication support for enterprise deployments • Comprehensive audit logs for all critical actions • Enforced metadata schemas for better governance • Attach default configs & metadata to API keys • Granular workspace management controls |
Platform | • Unified API for files & batches across major providers • Support for private LLM deployments • Enhanced virtual keys with granular controls |
New Models | • Deepseek R1 available across 7+ providers • Added Gemini thinking model • Support for Perplexity Sonar models • o3-mini integration |
Integrations | • AWS Bedrock Guardrails support • Milvus DB & Replicate integrations • Expanded Open WebUI support • Guardrails for embedding requests |
Community | • We did a deep dive into MCP and event-driven architecture for agentic systems |
Our comprehensive analysis of 2T+ tokens processed through Portkey’s Gateway reveals fascinating insights about how teams are deploying LLMs in production. Here are the key findings:
Despite OpenAI’s dominance (>50% of prod traffic), teams are actively implementing multi-LLM strategies for reliability and specialized use cases
Average prompt size has increased 4x in the last year, indicating more sophisticated engineering techniques and complex workloads
Implementation of proper caching strategies leads to up to 40% cost savings - a must-have for production deployments
Advanced PII Redaction
We’ve significantly enhanced Portkey’s Guardrails with request mutation capabilities.
When any sensitive data (like email, phone number, SSN) is detected in user requests, our PII redaction automatically replaces it with standardized identifiers before it reaches the LLM. This works seamlessly across our entire guardrails ecosystem, including AWS Bedrock Guardrails, Patronus AI, Promptfoo, Pangea, and more.
Unified Files & Batches API
Managing file uploads and batch processing across multiple LLM providers is now dramatically simpler. Instead of building provider-specific integrations, you can:
Integrate Private LLMs
You can now add your privately hosted LLMs to Portkey’s virtual keys. Simply:
This means you can use your private deployments alongside commercial providers, with the same monitoring, reliability, and management features.
API Keys with Default Configs & Metadata
You can now attach default Portkey config & Metadata with any API key you create.
Running AI at scale requires robust security, visibility, and control. This month, we’ve launched a comprehensive set of enterprise features to enable that:
After evaluating 17 different platforms, this AI team replaced 2+ years of homegrown tooling with Portkey Prompts.
They were able to do this because of three things:
Access Deepseek’s latest reasoning model through multiple providers: direct API, Fireworks AI, Together AI, Openrouter, Groq, AWS Bedrock, Azure AI Inference, and more.
To keep things OpenAI compatible, you can decide if you’d like Portkey to return the reasoning tokens or not
Available across both OpenAI & Azure OpenAI
Along with support for their citations and other features
Full support for Replicate’s model marketplace
Direct routing support for Milvus vector database
Direct routing support for Qdrant vector database
Expanded integration capabilities
Enhanced documentation and integration guides
Inverse Guardrail
All eligible checks now have an Inverse
option in the UI - which triggers a TRUE
verdict when the Guardrail verdict fails.
Native support for AWS Bedrock’s guardrail capabilities
Guardrails on Embedding Requests Portkey Guardrails now work on your embedding input requests!
We are attending the AI Engineer Summit in NYC this February and have some extra event passes to share! Reach out to us on Discord to ask for a pass.
We are also hosting small meetups in NYC and Mumbai this month to meet with local engineering leaders and ML/AI platform leads. Register for them below:
EDA for Agents
Last month we hosted an inspiring AI practitioners meetup with Ojasvi Yadav and Anudeep Yegireddi to discuss the role of Event-Driven Architecture in building Multi-Agent Systems using and MCP.
Essential reading for your AI infrastructure: