QuickSlug: Local-First AI Inference & Fine-Tuning Platform
A complete guide to QuickSlug — the open-core, OpenAI-compatible platform for running LLM inference locally and fine-tuning models with LoRA adapters.
What is QuickSlug?
QuickSlug is a local-first, OpenAI-compatible AI platform built by Infrarix that lets developers run inference locally via Ollama, fall back to remote GPU (RunPod), and fine-tune models with Unsloth/Axolotl/MLX adapters — all through a single API and CLI.
It ships with an open-core model: the CLI, API layer, and basic training adapters are MIT-licensed. The intelligent router, optimized training pipeline, and cloud features are proprietary Pro-tier components.
Why Local-First AI?
Running AI locally offers several critical advantages for developers:
- Privacy: Data never leaves your machine — essential for sensitive applications
- Cost: No per-token API charges; pay only for hardware you already own
- Speed: Zero network latency for inference
- Offline: Works without an internet connection
- Control: Full control over model versions, configurations, and behavior
Architecture Overview
QuickSlug supports two runtime modes with a seamless upgrade path:
Free Tier (Local Mode)
CLI / SDK / External Client
│
Fastify API Gateway (MIT)
├── Router
├── Ollama (local inference)
└── SQLite (local storage)
Training:
quickslug train --config train.json
→ In-process Worker (synchronous)
→ Unsloth / Axolotl / MLX adapter
→ Local Python subprocessNo Docker, no Redis, no PostgreSQL, no external dependencies. Just install and run.
Pro Tier (Hybrid Mode)
CLI / SDK / External Client
│
Fastify API Gateway
├── Auth + Rate Limiter
└── Intelligent Router
├── Ollama (local)
└── QuickSlug Cloud
├── RunPod GPU (inference + training)
├── BullMQ job queue with retries
├── Cloudflare Tunnel
└── PostgreSQL / RedisActivated via quickslug login. Adds cloud GPU fallback, distributed training with checkpointing, and usage tracking.
OpenAI-Compatible API
QuickSlug implements the OpenAI chat/completions API surface. Any tool or SDK that works with OpenAI works with QuickSlug — just change the base URL:
# Stream chat completions from a local llama3 model
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"messages": [{"role": "user", "content": "Explain transformers"}],
"stream": true
}'The router checks Ollama for local models first (exact match, then prefix match). If no local model is found and the user is Pro, it routes to cloud GPU. Free users get a 404 with an upgrade hint.
Model Fine-Tuning
QuickSlug provides real, working fine-tuning on both tiers. Free users get local training that produces actual LoRA adapter weights. This is not a demo — it's a complete workflow.
Supported Frameworks
| Framework | Platform | Notes |
|---|---|---|
| Unsloth | Linux / Windows | Default on non-macOS systems |
| Axolotl | Linux / Windows | DeepSpeed multi-GPU (Pro) |
| MLX | macOS arm64 | Metal-optimized, auto-detected on Apple Silicon |
Training Configuration
{
"framework": "unsloth",
"baseModel": "llama3",
"datasetPath": "./data/train.jsonl",
"outputDir": "./output/my-adapter",
"epochs": 3,
"batchSize": 2, // Free default
"loraR": 8, // Free: 8, Pro: 16
"loraAlpha": 16, // Free: 16, Pro: 32
"maxSeqLength": 2048
}Supports Alpaca and ShareGPT dataset formats. Minimum 10 examples required. First 5 lines are validated before the job starts.
Free vs Pro Training
| Capability | Free | Pro |
|---|---|---|
| Training time (7B, 3 epochs) | 45–90 min | 3–10 min |
| GPU support | Local only | RunPod A100/H100 |
| Checkpoint resume | No | Yes |
| Flash Attention 2 | No | Yes |
| 4-bit quantization | No | Yes |
| Dataset preprocessing | No | Auto-packing |
| Job queue | Sequential | BullMQ parallel |
| Model size limit | ≤7B recommended | Up to 70B |
CLI Commands
quickslug init # Detect OS, install Ollama, create config
quickslug start # Start local API on port 8080
quickslug doctor # Pre-flight checks
quickslug models list # List local + remote models
quickslug models pull # Pull model into Ollama
quickslug train # Fine-tune locally
quickslug train --cloud # Fine-tune on cloud GPU (Pro)
quickslug login # Authenticate with QuickSlug Cloud
quickslug expose # Create public endpoint (Pro)Open-Core Model
QuickSlug's business model is open-core. The MIT-licensed public repo includes:
- CLI + Fastify API gateway
- Basic in-process training worker
- Minimal Unsloth, Axolotl, and MLX adapters
- TrainingAdapter interface (public, stable contract)
- Database schema (SQLite + PostgreSQL)
The proprietary components include the intelligent router, optimized training adapters, BullMQ worker, RunPod GPU adapter, usage engine, and the Tauri desktop GUI.
Get Started
npm install -g quickslug
quickslug init
quickslug startThat's it. No API key, no Docker, no cloud account. Learn more on the product page or check the GitHub repo.