All Posts
May 3, 202610 min read

QuickSlug: Local-First AI Inference & Fine-Tuning Platform

A complete guide to QuickSlug — the open-core, OpenAI-compatible platform for running LLM inference locally and fine-tuning models with LoRA adapters.

QuickSlugLocal AIFine-TuningOllamaOpen Core

What is QuickSlug?

QuickSlug is a local-first, OpenAI-compatible AI platform built by Infrarix that lets developers run inference locally via Ollama, fall back to remote GPU (RunPod), and fine-tune models with Unsloth/Axolotl/MLX adapters — all through a single API and CLI.

It ships with an open-core model: the CLI, API layer, and basic training adapters are MIT-licensed. The intelligent router, optimized training pipeline, and cloud features are proprietary Pro-tier components.

Why Local-First AI?

Running AI locally offers several critical advantages for developers:

  • Privacy: Data never leaves your machine — essential for sensitive applications
  • Cost: No per-token API charges; pay only for hardware you already own
  • Speed: Zero network latency for inference
  • Offline: Works without an internet connection
  • Control: Full control over model versions, configurations, and behavior

Architecture Overview

QuickSlug supports two runtime modes with a seamless upgrade path:

Free Tier (Local Mode)

CLI / SDK / External Client
    │
  Fastify API Gateway (MIT)
   ├── Router
   ├── Ollama (local inference)
   └── SQLite (local storage)

Training:
  quickslug train --config train.json
    → In-process Worker (synchronous)
      → Unsloth / Axolotl / MLX adapter
        → Local Python subprocess

No Docker, no Redis, no PostgreSQL, no external dependencies. Just install and run.

Pro Tier (Hybrid Mode)

CLI / SDK / External Client
    │
  Fastify API Gateway
   ├── Auth + Rate Limiter
   └── Intelligent Router
     ├── Ollama (local)
     └── QuickSlug Cloud
        ├── RunPod GPU (inference + training)
        ├── BullMQ job queue with retries
        ├── Cloudflare Tunnel
        └── PostgreSQL / Redis

Activated via quickslug login. Adds cloud GPU fallback, distributed training with checkpointing, and usage tracking.

OpenAI-Compatible API

QuickSlug implements the OpenAI chat/completions API surface. Any tool or SDK that works with OpenAI works with QuickSlug — just change the base URL:

# Stream chat completions from a local llama3 model
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3",
    "messages": [{"role": "user", "content": "Explain transformers"}],
    "stream": true
  }'

The router checks Ollama for local models first (exact match, then prefix match). If no local model is found and the user is Pro, it routes to cloud GPU. Free users get a 404 with an upgrade hint.

Model Fine-Tuning

QuickSlug provides real, working fine-tuning on both tiers. Free users get local training that produces actual LoRA adapter weights. This is not a demo — it's a complete workflow.

Supported Frameworks

FrameworkPlatformNotes
UnslothLinux / WindowsDefault on non-macOS systems
AxolotlLinux / WindowsDeepSpeed multi-GPU (Pro)
MLXmacOS arm64Metal-optimized, auto-detected on Apple Silicon

Training Configuration

{
  "framework": "unsloth",
  "baseModel": "llama3",
  "datasetPath": "./data/train.jsonl",
  "outputDir": "./output/my-adapter",
  "epochs": 3,
  "batchSize": 2,       // Free default
  "loraR": 8,           // Free: 8, Pro: 16
  "loraAlpha": 16,      // Free: 16, Pro: 32
  "maxSeqLength": 2048
}

Supports Alpaca and ShareGPT dataset formats. Minimum 10 examples required. First 5 lines are validated before the job starts.

Free vs Pro Training

CapabilityFreePro
Training time (7B, 3 epochs)45–90 min3–10 min
GPU supportLocal onlyRunPod A100/H100
Checkpoint resumeNoYes
Flash Attention 2NoYes
4-bit quantizationNoYes
Dataset preprocessingNoAuto-packing
Job queueSequentialBullMQ parallel
Model size limit≤7B recommendedUp to 70B

CLI Commands

quickslug init          # Detect OS, install Ollama, create config
quickslug start         # Start local API on port 8080
quickslug doctor        # Pre-flight checks
quickslug models list   # List local + remote models
quickslug models pull   # Pull model into Ollama
quickslug train         # Fine-tune locally
quickslug train --cloud # Fine-tune on cloud GPU (Pro)
quickslug login         # Authenticate with QuickSlug Cloud
quickslug expose        # Create public endpoint (Pro)

Open-Core Model

QuickSlug's business model is open-core. The MIT-licensed public repo includes:

  • CLI + Fastify API gateway
  • Basic in-process training worker
  • Minimal Unsloth, Axolotl, and MLX adapters
  • TrainingAdapter interface (public, stable contract)
  • Database schema (SQLite + PostgreSQL)

The proprietary components include the intelligent router, optimized training adapters, BullMQ worker, RunPod GPU adapter, usage engine, and the Tauri desktop GUI.

Get Started

npm install -g quickslug
quickslug init
quickslug start

That's it. No API key, no Docker, no cloud account. Learn more on the product page or check the GitHub repo.