All Posts
May 3, 202610 min read

LocoPilot: Local-First AI Inference & Fine-Tuning Runtime

A complete guide to LocoPilot — the open-core, OpenAI-compatible CLI for running LLM inference locally and fine-tuning models with LoRA adapters.

LocoPilotLocal AIFine-TuningOllamaOpen Core

What is LocoPilot?

LocoPilot is a local-first, OpenAI-compatible AI runtime built by Infrarix that lets developers run inference locally via Ollama, fall back to remote GPU (RunPod), and fine-tune models with Unsloth/Axolotl/MLX adapters — all through a single API and CLI.

It ships with an open-core model: the CLI, API layer, and basic training adapters are MIT-licensed. The intelligent router, optimized training pipeline, and cloud features are proprietary Pro-tier components.

Why Local-First AI?

Running AI locally offers several critical advantages for developers:

  • Privacy: Data never leaves your machine — essential for sensitive applications
  • Cost: No per-token API charges; pay only for hardware you already own
  • Speed: Zero network latency for inference
  • Offline: Works without an internet connection
  • Control: Full control over model versions, configurations, and behavior

Architecture Overview

LocoPilot supports two runtime modes with a seamless upgrade path:

Free Tier (Local Mode)

CLI / SDK / External Client
    │
  Fastify API Gateway (MIT)
   ├── Router
   ├── Ollama (local inference)
   └── SQLite (local storage)

Training:
  locopilot train --config train.json
    → In-process Worker (synchronous)
      → Unsloth / Axolotl / MLX adapter
        → Local Python subprocess

No Docker, no Redis, no PostgreSQL, no external dependencies. Just install and run.

Pro Tier (Hybrid Mode)

CLI / SDK / External Client
    │
  Fastify API Gateway
   ├── Auth + Rate Limiter
   └── Intelligent Router
     ├── Ollama (local)
     └── LocoPilot Cloud
        ├── RunPod GPU (inference + training)
        ├── BullMQ job queue with retries
        ├── Cloudflare Tunnel
        └── PostgreSQL / Redis

Activated via locopilot login. Adds cloud GPU fallback, distributed training with checkpointing, and usage tracking.

OpenAI-Compatible API

LocoPilot implements the OpenAI chat/completions API surface. Any tool or SDK that works with OpenAI works with LocoPilot — just change the base URL:

# Stream chat completions from a local llama3 model
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Explain transformers"}],
    "stream": true
  }'

The router checks Ollama for local models first (exact match, then prefix match). If no local model is found and the user is Pro, it routes to cloud GPU. Free users get a 404 with an upgrade hint.

Model Fine-Tuning

LocoPilot provides real, working fine-tuning on both tiers. Free users get local training that produces actual LoRA adapter weights. This is not a demo — it's a complete workflow.

Supported Frameworks

FrameworkPlatformNotes
UnslothLinux / WindowsDefault on non-macOS systems
AxolotlLinux / WindowsDeepSpeed multi-GPU (Pro)
MLXmacOS arm64Metal-optimized, auto-detected on Apple Silicon

Training Configuration

{
  "framework": "unsloth",
  "baseModel": "llama3",
  "datasetPath": "./data/train.jsonl",
  "outputDir": "./output/my-adapter",
  "epochs": 3,
  "batchSize": 2,       // Free default
  "loraR": 8,           // Free: 8, Pro: 16
  "loraAlpha": 16,      // Free: 16, Pro: 32
  "maxSeqLength": 2048
}

Supports Alpaca and ShareGPT dataset formats. Minimum 10 examples required. First 5 lines are validated before the job starts.

Free vs Pro Training

CapabilityFreePro
Training time (7B, 3 epochs)45–90 min3–10 min
GPU supportLocal onlyRunPod A100/H100
Checkpoint resumeNoYes
Flash Attention 2NoYes
4-bit quantizationNoYes
Dataset preprocessingNoAuto-packing
Job queueSequentialBullMQ parallel
Model size limit≤7B recommendedUp to 70B

CLI Commands

locopilot init          # Detect OS, install Ollama, create config
locopilot start         # Start local API on port 8080
locopilot doctor        # Pre-flight checks
locopilot models list   # List local + remote models
locopilot models pull   # Pull model into Ollama
locopilot train         # Fine-tune locally
locopilot train --cloud # Fine-tune on cloud GPU (Pro)
locopilot login         # Authenticate with LocoPilot Cloud
locopilot expose        # Create public endpoint (Pro)

Open-Core Model

LocoPilot's business model is open-core. The MIT-licensed public repo includes:

  • CLI + Fastify API gateway
  • Basic in-process training worker
  • Minimal Unsloth, Axolotl, and MLX adapters
  • TrainingAdapter interface (public, stable contract)
  • Database schema (SQLite + PostgreSQL)

The proprietary components include the intelligent router, optimized training adapters, BullMQ worker, RunPod GPU adapter, usage engine, and the Tauri desktop GUI.

Get Started

npm install -g @infrarix/locopilot
locopilot init
locopilot start

That's it. No API key, no Docker, no cloud account. Learn more on the product page or check the GitHub repo.