Speech-to-Text - Portkey Docs

Transcription & Translation Usage

Portkey supports both Transcription and Translation methods for STT models and follows the OpenAI signature where you can send the file (in flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm formats) as part of the API request. Here’s an example:

OpenAI NodeJS
OpenAI Python
Python SDK
cURL

import fs from "fs";
import OpenAI from "openai";
import { PORTKEY_GATEWAY_URL, createHeaders } from 'portkey-ai'

const openai = new OpenAI({
  apiKey: "API_KEY", // Replace with your standard API Key or a dummy string
  baseURL: PORTKEY_GATEWAY_URL,
  defaultHeaders: createHeaders({
    apiKey: "PORTKEY_API_KEY",
    provider: "openai"
  })
});

// Transcription

async function transcribe() {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("/path/to/file.mp3"),
    model: "whisper-1",
  });

  console.log(transcription.text);
}
transcribe();

// Translation

async function translate() {
    const translation = await openai.audio.translations.create({
        file: fs.createReadStream("/path/to/file.mp3"),
        model: "whisper-1",
    });
    console.log(translation.text);
}
translate();

from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

client = OpenAI(
    api_key="API_KEY", # Replace with your standard API Key or a dummy string
    base_url=PORTKEY_GATEWAY_URL,
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        provider="openai"
    )
)

audio_file= open("/path/to/file.mp3", "rb")

# Transcription

transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file
)
print(transcription.text)

# Translation

translation = client.audio.translations.create(
  model="whisper-1",
  file=audio_file
)
print(translation.text)

from pathlib import Path
from portkey_ai import Portkey

# Initialize the Portkey client
portkey = Portkey(
    api_key="PORTKEY_API_KEY",  # Replace with your Portkey API key
    provider="@PROVIDER"   
)
audio_file= open("/path/to/file.mp3", "rb")

# Transcription
transcription = portkey.audio.transcriptions.create(
  model="@openai/whisper-1",
  file=audio_file
)

print(transcription.text)
# Translation
translation = portkey.audio.translations.create(
  model="@openai/whisper-1",
  file=audio_file
)
print(translation.text)

For Transcriptions:

curl "https://api.portkey.ai/v1/audio/transcriptions" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -H 'Content-Type: multipart/form-data' \
  --form file=@/path/to/file/audio.mp3 \
  --form model=@openai/whisper-1

For Translations:

curl "https://api.portkey.ai/v1/audio/translations" \
  -H "x-portkey-api-key: $PORTKEY_API_KEY" \
  -H 'Content-Type: multipart/form-data' \
  --form file=@/path/to/file/audio.mp3 \
  --form model=@openai/whisper-1

On completion, the request will get logged in the logs UI where you can see transcribed or translated text, along with the cost and latency incurred.

Documentation Index

​Transcription & Translation Usage

Transcription & Translation Usage