Introduction

Modern AI applications rarely rely on a single LLM provider. Teams switch between OpenAI, Anthropic, Google, AWS Bedrock, and open-source models depending on cost, latency, and capability requirements.

Infrarix AI Gateway is a unified API layer that sits between your application and any number of LLM providers. It handles routing, failover, caching, observability, and cost tracking — so you can focus on building product features.

The Multi-Provider Problem

Without a gateway layer, teams face:

Provider lock-in: Tight coupling to a single provider's SDK and API format
No failover: If your primary provider goes down, your entire app goes down
Blind spots: No unified view of latency, cost, or quality across providers
Cost surprises: No real-time tracking of token usage across models
Inconsistent APIs: Each provider has different auth, formats, and error handling

How AI Gateway Works

1. Unified API Format

Send requests using a single, consistent API format. AI Gateway translates them to the correct provider format automatically. Switch providers by changing a single parameter — no code changes needed.

2. Intelligent Routing

Define routing rules based on model capability, cost, latency, or custom logic. AI Gateway can route to the cheapest model for simple tasks and the most capable model for complex ones.

3. Automatic Failover

When a provider returns errors or exceeds latency thresholds, AI Gateway automatically retries with a fallback provider. Configurable retry policies ensure your users never see a blank screen.

4. Observability

Every request generates structured telemetry: latency, tokens, cost, model, provider, and status. Dashboard views and API exports give you complete visibility.

Quick Start

curl -X POST https://api.infrarix.com/v1/gateway/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "fallback": ["claude-sonnet-4-20250514", "gemini-pro"],
    "messages": [
      { "role": "user", "content": "Explain quantum computing in simple terms" }
    ],
    "max_tokens": 500
  }'

Response:

{
  "id": "gw-req-abc123",
  "model": "gpt-4o",
  "provider": "openai",
  "content": "Quantum computing uses quantum bits (qubits)...",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 187,
    "total_tokens": 199,
    "cost_usd": 0.0024
  },
  "latency_ms": 842
}

SDK Integration

import { AIGateway } from '@infrarix/gateway'

const gw = new AIGateway({ apiKey: process.env.INFRARIX_KEY })

const response = await gw.chat({
  model: 'gpt-4o',
  fallback: ['claude-sonnet-4-20250514'],
  messages: [{ role: 'user', content: 'Hello!' }],
})

console.log(response.content)
console.log(`Cost: $${response.usage.cost_usd}`)

Supported Providers

Provider	Models	Features
OpenAI	GPT-4o, GPT-4o-mini, o1, o3	Chat, Embeddings, Images
Anthropic	Claude Opus, Sonnet, Haiku	Chat, Streaming
Google	Gemini Pro, Gemini Ultra	Chat, Multimodal
AWS Bedrock	Claude, Titan, Llama	Chat, Embeddings
Azure OpenAI	GPT-4o, GPT-4o-mini	Chat, Embeddings
Custom / Self-hosted	Any OpenAI-compatible	Chat, Embeddings

Key Features

Semantic caching: Cache similar queries to reduce cost and latency by up to 80%
Rate limiting: Per-user and per-model rate limits with configurable policies
Cost budgets: Set spend limits per project, team, or model with real-time alerts
Streaming: Full support for streaming responses from all providers
Function calling: Unified function/tool calling across supported providers
Prompt templates: Store and version prompt templates with variable injection

Frequently Asked Questions

Does AI Gateway add latency?

AI Gateway adds less than 10ms of overhead. With semantic caching enabled, it can actually reduce total latency significantly.

Can I bring my own provider API keys?

Yes. AI Gateway works with your existing provider API keys. We never access your models directly.

Is streaming supported?

Yes. AI Gateway supports streaming for all providers that offer it, with unified event formats.

Get Started

Simplify your LLM infrastructure with AI Gateway. Learn more or read the comparison with Langchain Endpoints.