What is QuickSlug?

QuickSlug is a local-first, OpenAI-compatible AI platform built by Infrarix that lets developers run inference locally via Ollama, fall back to remote GPU (RunPod), and fine-tune models with Unsloth/Axolotl/MLX adapters — all through a single API and CLI.

It ships with an open-core model: the CLI, API layer, and basic training adapters are MIT-licensed. The intelligent router, optimized training pipeline, and cloud features are proprietary Pro-tier components.

Why Local-First AI?

Running AI locally offers several critical advantages for developers:

Privacy: Data never leaves your machine — essential for sensitive applications
Cost: No per-token API charges; pay only for hardware you already own
Speed: Zero network latency for inference
Offline: Works without an internet connection
Control: Full control over model versions, configurations, and behavior

Architecture Overview

QuickSlug supports two runtime modes with a seamless upgrade path:

Free Tier (Local Mode)

CLI / SDK / External Client
    │
  Fastify API Gateway (MIT)
   ├── Router
   ├── Ollama (local inference)
   └── SQLite (local storage)

Training:
  quickslug train --config train.json
    → In-process Worker (synchronous)
      → Unsloth / Axolotl / MLX adapter
        → Local Python subprocess

No Docker, no Redis, no PostgreSQL, no external dependencies. Just install and run.

Pro Tier (Hybrid Mode)

CLI / SDK / External Client
    │
  Fastify API Gateway
   ├── Auth + Rate Limiter
   └── Intelligent Router
     ├── Ollama (local)
     └── QuickSlug Cloud
        ├── RunPod GPU (inference + training)
        ├── BullMQ job queue with retries
        ├── Cloudflare Tunnel
        └── PostgreSQL / Redis

Activated via quickslug login. Adds cloud GPU fallback, distributed training with checkpointing, and usage tracking.

OpenAI-Compatible API

QuickSlug implements the OpenAI chat/completions API surface. Any tool or SDK that works with OpenAI works with QuickSlug — just change the base URL:

# Stream chat completions from a local llama3 model
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Explain transformers"}],
    "stream": true
  }'

The router checks Ollama for local models first (exact match, then prefix match). If no local model is found and the user is Pro, it routes to cloud GPU. Free users get a 404 with an upgrade hint.

Model Fine-Tuning

QuickSlug provides real, working fine-tuning on both tiers. Free users get local training that produces actual LoRA adapter weights. This is not a demo — it's a complete workflow.

Supported Frameworks

Framework	Platform	Notes
Unsloth	Linux / Windows	Default on non-macOS systems
Axolotl	Linux / Windows	DeepSpeed multi-GPU (Pro)
MLX	macOS arm64	Metal-optimized, auto-detected on Apple Silicon

Training Configuration

{
  "framework": "unsloth",
  "baseModel": "llama3",
  "datasetPath": "./data/train.jsonl",
  "outputDir": "./output/my-adapter",
  "epochs": 3,
  "batchSize": 2,       // Free default
  "loraR": 8,           // Free: 8, Pro: 16
  "loraAlpha": 16,      // Free: 16, Pro: 32
  "maxSeqLength": 2048
}

Supports Alpaca and ShareGPT dataset formats. Minimum 10 examples required. First 5 lines are validated before the job starts.

Free vs Pro Training

Capability	Free	Pro
Training time (7B, 3 epochs)	45–90 min	3–10 min
GPU support	Local only	RunPod A100/H100
Checkpoint resume	No	Yes
Flash Attention 2	No	Yes
4-bit quantization	No	Yes
Dataset preprocessing	No	Auto-packing
Job queue	Sequential	BullMQ parallel
Model size limit	≤7B recommended	Up to 70B

CLI Commands

quickslug init          # Detect OS, install Ollama, create config
quickslug start         # Start local API on port 8080
quickslug doctor        # Pre-flight checks
quickslug models list   # List local + remote models
quickslug models pull   # Pull model into Ollama
quickslug train         # Fine-tune locally
quickslug train --cloud # Fine-tune on cloud GPU (Pro)
quickslug login         # Authenticate with QuickSlug Cloud
quickslug expose        # Create public endpoint (Pro)

Open-Core Model

QuickSlug's business model is open-core. The MIT-licensed public repo includes:

CLI + Fastify API gateway
Basic in-process training worker
Minimal Unsloth, Axolotl, and MLX adapters
TrainingAdapter interface (public, stable contract)
Database schema (SQLite + PostgreSQL)

The proprietary components include the intelligent router, optimized training adapters, BullMQ worker, RunPod GPU adapter, usage engine, and the Tauri desktop GUI.

Get Started

npm install -g quickslug
quickslug init
quickslug start

That's it. No API key, no Docker, no cloud account. Learn more on the product page or check the GitHub repo.