Portkey provides a robust and secure gateway to facilitate the integration of various Large Language Models (LLMs), and embedding models into your apps, including Google Vertex AI.
With Portkey, you can take advantage of features like fast AI gateway access, observability, prompt management, and more, all while ensuring the secure management of your Vertex auth through a virtual key system
Provider Slug. vertex-ai
Portkey provides a consistent API to interact with models from various providers. To integrate Google Vertex AI with Portkey:
Add the Portkey SDK to your application to interact with Google Vertex AI API through Portkey’s gateway.
To integrate Vertex AI with Portkey, you’ll need your Vertex Project Id
Or Service Account JSON
& Vertex Region
, with which you can set up the Virtual key.
Here’s a guide on how to find your Vertex Project details
If you are integrating through Service Account File, refer to this guide.
If you do not want to add your Vertex AI details to Portkey vault, you can directly pass them while instantiating the Portkey client. More on that here.
Use the Portkey instance to send requests to Gemini models hosted on Vertex AI. You can also override the virtual key directly in the API call if needed.
Vertex AI uses OAuth2 to authenticate its requests, so you need to send the access token additionally along with the request.
To use Anthopic models on Vertex AI, prepend anthropic.
to the model name.
Example: anthropic.claude-3-5-sonnet@20240620
Similarly, for Meta models, prepend meta.
to the model name.
Example: meta.llama-3-8b-8192
Portkey supports connecting to self-deployed models on Vertex AI, including models from Hugging Face or any custom models you’ve deployed to a Vertex AI endpoint.
Requirements for Self-Deployed Models
To use self-deployed models on Vertex AI through Portkey:
Model Naming Convention: When making requests to your self-deployed model, you must prefix the model name with endpoints.
Required Permissions: The Google Cloud service account used in your Portkey virtual key must have the aiplatform.endpoints.predict
permission.
Why the prefix? Vertex AI’s product offering for self-deployed models is called “Endpoints.” This naming convention indicates to Portkey that it should route requests to your custom endpoint rather than a standard Vertex AI model.
This approach works for all models you can self-deploy on Vertex AI Model Garden, including Hugging Face models and your own custom models.
Vertex AI supports attaching webm
, mp4
, pdf
, jpg
, mp3
, wav
, etc. file types to your Gemini messages.
Gemini Docs:
Using Portkey, here’s how you can send these media files:
Gemini’s vision capabilities excel at understanding the content of PDF documents, including text, tables, and images.
Method 1: Sending a Document via Google Files URL
Upload your PDF using the Files API to get a Google Files URL.
Method 2: Sending a Local Document as Base64 Data
This is suitable for smaller, local PDF files.
.txt
or .html
, they will be treated as plain text. Gemini’s native document vision capabilities are optimized for the application/pdf
MIME type.The assistants thinking response is returned in the response_chunk.choices[0].delta.content_blocks
array, not the response.choices[0].message.content
string.
Gemini models do not support plugging back the reasoning into multi turn conversations, so you don’t need to send the thinking message back to the model.
Models like google.gemini-2.5-flash-preview-04-17
anthropic.claude-3-7-sonnet@20250219
support extended thinking.
This is similar to openai thinking, but you get the model’s reasoning as it processes the request as well.
Note that you will have to set strict_open_ai_compliance=False
in the headers to use this feature.
To disable thinking for gemini models like google.gemini-2.5-flash-preview-04-17
, you are required to explicitly set budget_tokens
to 0
.
base64
ImageHere, you can send the base64
image data along with the url
field too:
This same message format also works for all other media types — just send your media file in the url
field, like "url": "gs://cloud-samples-data/video/animals.mp4"
for google cloud urls and "url":"https://download.samplelib.com/mp3/sample-3s.mp3"
for public urls
Your URL should have the file extension, this is used for inferring MIME_TYPE
which is a required parameter for prompting Gemini models with files
You can use any of Vertex AI’s English
and Multilingual
models through Portkey, in the familar OpenAI-schema.
The Gemini-specific parameter task_type
is also supported on Portkey.
Portkey supports function calling mode on Google’s Gemini Models. Explore this Cookbook for a deep dive and examples:
You can manage all prompts to Google Gemini in the Prompt Library. All the models in the model garden are supported and you can easily start testing different prompts.
Once you’re ready with your prompt, you can use the portkey.prompts.completions.create
interface to use the prompt in your application.
Portkey supports the Imagen API
on Vertex AI for image generations, letting you easily make requests in the familar OpenAI-compliant schema.
Image Generation API Reference
imagen-3.0-generate-001
imagen-3.0-fast-generate-001
imagegeneration@006
imagegeneration@005
imagegeneration@002
Vertex AI supports grounding with Google Search. This is a feature that allows you to ground your LLM responses with real-time search results.
Grounding is invoked by passing the google_search
tool (for newer models like gemini-2.0-flash-001), and google_search_retrieval
(for older models like gemini-1.5-flash) in the tools
array.
If you mix regular tools with grounding tools, vertex might throw an error saying only one tool can be used at a time.
gemini-2.0-flash-thinking-exp
models return a Chain of Thought response along with the actual inference text,
this is not openai compatible, however, Portkey supports this by adding a \r\n\r\n
and appending the two responses together.
You can split the response along this pattern to get the Chain of Thought response and the actual inference text.
If you require the Chain of Thought response along with the actual inference text, pass the strict open ai compliance flag as false
in the request.
If you want to get the inference text only, pass the strict open ai compliance flag as true
in the request.
You can also pass your Vertex AI details & secrets directly without using the Virtual Keys in Portkey.
Vertex AI expects a region
, a project ID
and the access token
in the request for a successful completion request. This is how you can specify these fields directly in your requests:
For further questions on custom Vertex AI deployments or fine-grained access tokens, reach out to us on support@portkey.ai
To obtain your Vertex Project ID and Region, navigate to Google Vertex Dashboard.
When selecting Service Account File as your authentication method, you’ll need to:
This method is particularly important for using self-deployed models, as your service account must have the aiplatform.endpoints.predict
permission to access custom endpoints.
Learn more about permission on your Vertex IAM key here.
For Self-Deployed Models: Your service account must have the aiplatform.endpoints.predict
permission in Google Cloud IAM. Without this specific permission, requests to custom endpoints will fail.
For standard Vertex AI models, you can simply provide:
This method is simpler but may not have all the permissions needed for custom endpoints.
The complete list of features supported in the SDK are available on the link below.
You’ll find more information in the relevant sections: