FutureAGI

FutureAGI is an AI lifecycle platform that provides automated evaluation, tracing, and quality assessment for LLM applications. When combined with Portkey, you get a complete end-to-end observability solution covering both operational performance and response quality.

Portkey handles the “what happened, how fast, and how much did it cost?” while FutureAGI answers “how good was the response?”

Why FutureAGI + Portkey?

The integration creates a powerful synergy:

Portkey acts as the operational layer - unifying API calls, managing keys, and monitoring metrics like latency, cost, and request volume
FutureAGI acts as the quality layer - capturing full request context and running automated evaluations to score model outputs

Getting Started

Prerequisites

Before integrating FutureAGI with Portkey, ensure you have:

Python 3.8+ installed
API Keys:
- Portkey API Key
- FutureAGI API Key
- Virtual Keys for each provider in your Portkey dashboard

Installation

pip install portkey-ai fi-instrumentation traceai-portkey

Setting up Environment Variables

Create a .env file in your project root:

# .env
PORTKEY_API_KEY="your-portkey-api-key"
FI_API_KEY="your-futureagi-api-key"
FI_SECRET_KEY="your-futureagi-secret-key"

Integration Guide

Step 1: Basic Setup

Import the necessary libraries and configure your environment:

import asyncio
import json
import time
from portkey_ai import Portkey
from traceai_portkey import PortkeyInstrumentor
from fi_instrumentation import register
from fi_instrumentation.fi_types import (
    ProjectType, EvalTag, EvalTagType,
    EvalSpanKind, EvalName, ModelChoices
)
from dotenv import load_dotenv

load_dotenv()

Step 2: Configure FutureAGI Tracing

Set up comprehensive evaluation tags to automatically assess model responses:

def setup_tracing(project_version_name: str):
    """Setup tracing with comprehensive evaluation tags"""
    tracer_provider = register(
        project_name="Model-Benchmarking",
        project_type=ProjectType.EXPERIMENT,
        project_version_name=project_version_name,
        eval_tags=[
            # Evaluates if the response is concise
            EvalTag(
                type=EvalTagType.OBSERVATION_SPAN,
                value=EvalSpanKind.LLM,
                eval_name=EvalName.IS_CONCISE,
                custom_eval_name="Is_Concise",
                mapping={"input": "llm.output_messages.0.message.content"},
                model=ModelChoices.TURING_LARGE
            ),
            # Evaluates context adherence
            EvalTag(
                type=EvalTagType.OBSERVATION_SPAN,
                value=EvalSpanKind.LLM,
                eval_name=EvalName.CONTEXT_ADHERENCE,
                custom_eval_name="Response_Quality",
                mapping={
                    "context": "llm.input_messages.0.message.content",
                    "output": "llm.output_messages.0.message.content",
                },
                model=ModelChoices.TURING_LARGE
            ),
            # Evaluates task completion
            EvalTag(
                type=EvalTagType.OBSERVATION_SPAN,
                value=EvalSpanKind.LLM,
                eval_name=EvalName.TASK_COMPLETION,
                custom_eval_name="Task_Completion",
                mapping={
                    "input": "llm.input_messages.0.message.content",
                    "output": "llm.output_messages.0.message.content",
                },
                model=ModelChoices.TURING_LARGE
            ),
        ]
    )
    # Instrument the Portkey library
    PortkeyInstrumentor().instrument(tracer_provider=tracer_provider)
    return tracer_provider

The mapping parameter in EvalTag tells the evaluator where to find the necessary data within the trace. This is crucial for accurate evaluation.

Step 3: Define Models and Test Scenarios

Configure the models you want to test and create test scenarios:

def get_models():
    """Setup model configurations with their Portkey Virtual Keys"""
    return [
        {
            "name": "GPT-4o",
            "provider": "OpenAI",
            "virtual_key": "openai-virtual-key",
            "model_id": "gpt-4o"
        },
        {
            "name": "Claude-3.7-Sonnet",
            "provider": "Anthropic",
            "virtual_key": "anthropic-virtual-key",
            "model_id": "claude-3-7-sonnet-latest"
        },
        {
            "name": "Llama-3-70b",
            "provider": "Groq",
            "virtual_key": "groq-virtual-key",
            "model_id": "llama3-70b-8192"
        },
    ]

def get_test_scenarios():
    """Returns a dictionary of test scenarios"""
    return {
        "reasoning_logic": "A farmer has 17 sheep. All but 9 die. How many are left?",
        "creative_writing": "Write a 6-word story about a robot who discovers music.",
        "code_generation": "Write a Python function to find the nth Fibonacci number.",
    }

Step 4: Execute Tests with Automatic Evaluation

Run tests on each model while capturing both operational metrics and quality evaluations:

async def test_model(model_config, prompt):
    """Tests a single model with a single prompt and returns the response"""

    tracer_provider = setup_tracing(model_config["name"])

    print(f"Testing {model_config['name']}...")

    client = Portkey(virtual_key=model_config['virtual_key'])
    start_time = time.time()

    completion = await client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model=model_config['model_id'],
        max_tokens=1024,
        temperature=0.5
    )
    response_time = time.time() - start_time
    response_text = completion.choices[0].message.content or ""

    return response_text

async def main():
    """Main execution function to run all tests"""
    models_to_test = get_models()
    scenarios = get_test_scenarios()

    for test_name, prompt in scenarios.items():
        print(f"\n{'='*20} SCENARIO: {test_name.upper()} {'='*20}")
        print(f"PROMPT: {prompt}")
        print("-" * 60)

        for model in models_to_test:
            await test_model(model, prompt)

        await asyncio.sleep(1)  # Brief pause between scenarios
        PortkeyInstrumentor().uninstrument()

if __name__ == "__main__":
    asyncio.run(main())

Viewing Results

After running your tests, you’ll have two powerful dashboards to analyze performance:

FutureAGI Dashboard - Quality View

Navigate to the Prototype Tab in your FutureAGI Dashboard to find your “Model-Benchmarking” project.

Key features:

Automated evaluation scores for each model response
Detailed trace analysis with quality metrics
Comparison views across different models

Portkey Dashboard - Operational View

Access your Portkey dashboard to see operational metrics for all API calls:

Portkey Dashboard showing operational metrics like latency, costs, and token usage

Key metrics:

Unified Logs: Single view of all requests across providers
Cost Tracking: Automatic cost calculation for every call
Latency Monitoring: Response time comparisons across models
Token Usage: Detailed token consumption analytics

Advanced Use Cases

Complex Agentic Workflows

The integration supports tracing complex workflows where you chain multiple LLM calls:

# Example: E-commerce assistant with multiple LLM calls
async def ecommerce_assistant_workflow(user_query):
    # Step 1: Intent classification
    intent = await classify_intent(user_query)

    # Step 2: Product search
    products = await search_products(intent)

    # Step 3: Generate response
    response = await generate_response(products, user_query)

    # All steps are automatically traced and evaluated
    return response

CI/CD Integration

Leverage this integration in your CI/CD pipelines for:

Automated Model Testing: Run evaluation suites on new model versions
Quality Gates: Set thresholds for evaluation scores before deployment
Performance Monitoring: Track degradation in model quality over time
Cost Optimization: Monitor and alert on cost spikes

Benefits

Comprehensive Observability

Track both operational metrics (cost, latency) and quality metrics (accuracy, relevance) in one place

Automated Evaluation

No manual evaluation needed - FutureAGI automatically scores responses on multiple dimensions

Multi-Model Comparison

Easily compare different models side-by-side on the same tasks

Production Ready

Built-in alerting and monitoring for your production LLM applications

Example Notebooks

Interactive Colab Notebook

Try out the FutureAGI + Portkey integration with our interactive notebook

Next Steps

Create your FutureAGI account
Set up Virtual Keys in Portkey
Run the example code to see automated evaluation in action
Customize evaluation tags for your specific use cases
Integrate into your CI/CD pipeline for continuous model quality monitoring

For advanced configurations and custom evaluators, check out the FutureAGI documentation and join our Discord community for support.

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

Why FutureAGI + Portkey?

Getting Started

Prerequisites

Installation

Setting up Environment Variables

Integration Guide

Step 1: Basic Setup

Step 2: Configure FutureAGI Tracing

Step 3: Define Models and Test Scenarios

Step 4: Execute Tests with Automatic Evaluation

Viewing Results

FutureAGI Dashboard - Quality View

Portkey Dashboard - Operational View

Advanced Use Cases

Complex Agentic Workflows

CI/CD Integration

Benefits

Comprehensive Observability

Automated Evaluation

Multi-Model Comparison

Production Ready

Example Notebooks

Interactive Colab Notebook

Next Steps

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

​Why FutureAGI + Portkey?

​Getting Started

​Prerequisites

​Installation

​Setting up Environment Variables

​Integration Guide

​Step 1: Basic Setup

​Step 2: Configure FutureAGI Tracing

​Step 3: Define Models and Test Scenarios

​Step 4: Execute Tests with Automatic Evaluation

​Viewing Results

​FutureAGI Dashboard - Quality View

​Portkey Dashboard - Operational View

​Advanced Use Cases

​Complex Agentic Workflows

​CI/CD Integration

​Benefits

Comprehensive Observability

Automated Evaluation

Multi-Model Comparison

Production Ready

​Example Notebooks

Interactive Colab Notebook

​Next Steps

Why FutureAGI + Portkey?

Getting Started

Prerequisites

Installation

Setting up Environment Variables

Integration Guide

Step 1: Basic Setup

Step 2: Configure FutureAGI Tracing

Step 3: Define Models and Test Scenarios

Step 4: Execute Tests with Automatic Evaluation

Viewing Results

FutureAGI Dashboard - Quality View

Portkey Dashboard - Operational View

Advanced Use Cases

Complex Agentic Workflows

CI/CD Integration

Benefits

Example Notebooks

Next Steps