Skip to main content
Portkey provides a robust platform to observe, govern, and manage your locally or privately hosted custom models using vLLM.
Here’s a list of all model architectures supported on vLLM.

Integration Steps

1

Expose your vLLM Server

Expose your vLLM server using a tunneling service like ngrok or make it publicly accessible. Skip this if you’re self-hosting the Gateway.
ngrok http 8000 --host-header="localhost:8080"
2

Add to Model Catalog

  1. Go to Model Catalog → Add Provider
  2. Enable “Local/Privately hosted provider” toggle
  3. Select OpenAI as the provider type (vLLM follows OpenAI API schema)
  4. Enter your vLLM server URL in Custom Host: https://your-vllm-server.ngrok-free.app
  5. Add authentication headers if needed
  6. Name your provider (e.g., my-vllm)

Complete Setup Guide

See all setup options
3

Use in Your Application

from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="@my-vllm"
)

response = portkey.chat.completions.create(
    model="your-model-name",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
Or use custom host directly:
from portkey_ai import Portkey

portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    provider="openai",
    custom_host="https://your-vllm-server.ngrok-free.app",
    Authorization="AUTH_KEY"  # If needed
)
Important: vLLM follows the OpenAI API specification, so set the provider as openai when using custom host directly. By default, vLLM runs on http://localhost:8000/v1.

Next Steps

Gateway Configs

Add retries, timeouts, and fallbacks

Observability

Monitor your vLLM deployments

Custom Host Guide

Learn more about custom host setup

BYOLLM Guide

Complete guide for private LLMs
For complete SDK documentation:

SDK Reference

Complete Portkey SDK documentation
Last modified on February 9, 2026