All Posts
May 3, 20269 min read

AI Gateway: Unified API for LLMs & AI Providers

How Infrarix AI Gateway simplifies multi-provider LLM routing, adds observability, enables failover, and optimizes costs — all through a single API.

AI GatewayLLMRoutingObservability

Introduction

Modern AI applications rarely rely on a single LLM provider. Teams switch between OpenAI, Anthropic, Google, AWS Bedrock, and open-source models depending on cost, latency, and capability requirements.

Infrarix AI Gateway is a unified API layer that sits between your application and any number of LLM providers. It handles routing, failover, caching, observability, and cost tracking — so you can focus on building product features.

The Multi-Provider Problem

Without a gateway layer, teams face:

  • Provider lock-in: Tight coupling to a single provider's SDK and API format
  • No failover: If your primary provider goes down, your entire app goes down
  • Blind spots: No unified view of latency, cost, or quality across providers
  • Cost surprises: No real-time tracking of token usage across models
  • Inconsistent APIs: Each provider has different auth, formats, and error handling

How AI Gateway Works

1. Unified API Format

Send requests using a single, consistent API format. AI Gateway translates them to the correct provider format automatically. Switch providers by changing a single parameter — no code changes needed.

2. Intelligent Routing

Define routing rules based on model capability, cost, latency, or custom logic. AI Gateway can route to the cheapest model for simple tasks and the most capable model for complex ones.

3. Automatic Failover

When a provider returns errors or exceeds latency thresholds, AI Gateway automatically retries with a fallback provider. Configurable retry policies ensure your users never see a blank screen.

4. Observability

Every request generates structured telemetry: latency, tokens, cost, model, provider, and status. Dashboard views and API exports give you complete visibility.

Quick Start

curl -X POST https://api.infrarix.com/v1/gateway/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "fallback": ["claude-sonnet-4-20250514", "gemini-pro"],
    "messages": [
      { "role": "user", "content": "Explain quantum computing in simple terms" }
    ],
    "max_tokens": 500
  }'

Response:

{
  "id": "gw-req-abc123",
  "model": "gpt-4o",
  "provider": "openai",
  "content": "Quantum computing uses quantum bits (qubits)...",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 187,
    "total_tokens": 199,
    "cost_usd": 0.0024
  },
  "latency_ms": 842
}

SDK Integration

import { AIGateway } from '@infrarix/gateway'

const gw = new AIGateway({ apiKey: process.env.INFRARIX_KEY })

const response = await gw.chat({
  model: 'gpt-4o',
  fallback: ['claude-sonnet-4-20250514'],
  messages: [{ role: 'user', content: 'Hello!' }],
})

console.log(response.content)
console.log(`Cost: $${response.usage.cost_usd}`)

Supported Providers

ProviderModelsFeatures
OpenAIGPT-4o, GPT-4o-mini, o1, o3Chat, Embeddings, Images
AnthropicClaude Opus, Sonnet, HaikuChat, Streaming
GoogleGemini Pro, Gemini UltraChat, Multimodal
AWS BedrockClaude, Titan, LlamaChat, Embeddings
Azure OpenAIGPT-4o, GPT-4o-miniChat, Embeddings
Custom / Self-hostedAny OpenAI-compatibleChat, Embeddings

Key Features

  • Semantic caching: Cache similar queries to reduce cost and latency by up to 80%
  • Rate limiting: Per-user and per-model rate limits with configurable policies
  • Cost budgets: Set spend limits per project, team, or model with real-time alerts
  • Streaming: Full support for streaming responses from all providers
  • Function calling: Unified function/tool calling across supported providers
  • Prompt templates: Store and version prompt templates with variable injection

Frequently Asked Questions

Does AI Gateway add latency?

AI Gateway adds less than 10ms of overhead. With semantic caching enabled, it can actually reduce total latency significantly.

Can I bring my own provider API keys?

Yes. AI Gateway works with your existing provider API keys. We never access your models directly.

Is streaming supported?

Yes. AI Gateway supports streaming for all providers that offer it, with unified event formats.

Get Started

Simplify your LLM infrastructure with AI Gateway. Learn more or read the comparison with Langchain Endpoints.