Skip to main content
Portkey provides a robust platform to observe, govern, and manage your locally or privately hosted custom models using Triton Inference Server.
Here’s the official Triton Inference Server documentation for more details.

Integration Steps

1

Expose your Triton Server

Expose your Triton server using a tunneling service like ngrok or make it publicly accessible. Skip this if you’re self-hosting the Gateway.
ngrok http 8000 --host-header="localhost:8080"
2

Add to Model Catalog

  1. Go to Model Catalog → Add Provider
  2. Enable “Local/Privately hosted provider” toggle
  3. Select Triton as the provider type
  4. Enter your Triton server URL in Custom Host: http://localhost:8000/v2/models/mymodel
  5. Add authentication headers if needed
  6. Name your provider (e.g., my-triton)

Complete Setup Guide

See all setup options
3

Use in Your Application

from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="@my-triton"
)

response = portkey.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
Or use custom host directly:
from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="triton",
    custom_host="http://localhost:8000/v2/models/mymodel",
    Authorization="AUTH_KEY"  # If needed
)

Next Steps

Gateway Configs

Add retries, timeouts, and fallbacks

Observability

Monitor your Triton deployments

Custom Host Guide

Learn more about custom host setup

BYOLLM Guide

Complete guide for private LLMs
For complete SDK documentation:

SDK Reference

Complete Portkey SDK documentation
Last modified on February 9, 2026