AI Gateway: Unified API for LLMs & AI Providers
How Infrarix AI Gateway simplifies multi-provider LLM routing, adds observability, enables failover, and optimizes costs — all through a single API.
Introduction
Modern AI applications rarely rely on a single LLM provider. Teams switch between OpenAI, Anthropic, Google, AWS Bedrock, and open-source models depending on cost, latency, and capability requirements.
Infrarix AI Gateway is a unified API layer that sits between your application and any number of LLM providers. It handles routing, failover, caching, observability, and cost tracking — so you can focus on building product features.
The Multi-Provider Problem
Without a gateway layer, teams face:
- Provider lock-in: Tight coupling to a single provider's SDK and API format
- No failover: If your primary provider goes down, your entire app goes down
- Blind spots: No unified view of latency, cost, or quality across providers
- Cost surprises: No real-time tracking of token usage across models
- Inconsistent APIs: Each provider has different auth, formats, and error handling
How AI Gateway Works
1. Unified API Format
Send requests using a single, consistent API format. AI Gateway translates them to the correct provider format automatically. Switch providers by changing a single parameter — no code changes needed.
2. Intelligent Routing
Define routing rules based on model capability, cost, latency, or custom logic. AI Gateway can route to the cheapest model for simple tasks and the most capable model for complex ones.
3. Automatic Failover
When a provider returns errors or exceeds latency thresholds, AI Gateway automatically retries with a fallback provider. Configurable retry policies ensure your users never see a blank screen.
4. Observability
Every request generates structured telemetry: latency, tokens, cost, model, provider, and status. Dashboard views and API exports give you complete visibility.
Quick Start
curl -X POST https://api.infrarix.com/v1/gateway/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"fallback": ["claude-sonnet-4-20250514", "gemini-pro"],
"messages": [
{ "role": "user", "content": "Explain quantum computing in simple terms" }
],
"max_tokens": 500
}'Response:
{
"id": "gw-req-abc123",
"model": "gpt-4o",
"provider": "openai",
"content": "Quantum computing uses quantum bits (qubits)...",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 187,
"total_tokens": 199,
"cost_usd": 0.0024
},
"latency_ms": 842
}SDK Integration
import { AIGateway } from '@infrarix/gateway'
const gw = new AIGateway({ apiKey: process.env.INFRARIX_KEY })
const response = await gw.chat({
model: 'gpt-4o',
fallback: ['claude-sonnet-4-20250514'],
messages: [{ role: 'user', content: 'Hello!' }],
})
console.log(response.content)
console.log(`Cost: $${response.usage.cost_usd}`)Supported Providers
| Provider | Models | Features |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, o1, o3 | Chat, Embeddings, Images |
| Anthropic | Claude Opus, Sonnet, Haiku | Chat, Streaming |
| Gemini Pro, Gemini Ultra | Chat, Multimodal | |
| AWS Bedrock | Claude, Titan, Llama | Chat, Embeddings |
| Azure OpenAI | GPT-4o, GPT-4o-mini | Chat, Embeddings |
| Custom / Self-hosted | Any OpenAI-compatible | Chat, Embeddings |
Key Features
- Semantic caching: Cache similar queries to reduce cost and latency by up to 80%
- Rate limiting: Per-user and per-model rate limits with configurable policies
- Cost budgets: Set spend limits per project, team, or model with real-time alerts
- Streaming: Full support for streaming responses from all providers
- Function calling: Unified function/tool calling across supported providers
- Prompt templates: Store and version prompt templates with variable injection
Frequently Asked Questions
Does AI Gateway add latency?
AI Gateway adds less than 10ms of overhead. With semantic caching enabled, it can actually reduce total latency significantly.
Can I bring my own provider API keys?
Yes. AI Gateway works with your existing provider API keys. We never access your models directly.
Is streaming supported?
Yes. AI Gateway supports streaming for all providers that offer it, with unified event formats.
Get Started
Simplify your LLM infrastructure with AI Gateway. Learn more or read the comparison with Langchain Endpoints.