QuickSlug
Local-first, OpenAI-compatible AI platform. Run inference via Ollama, fall back to cloud GPU, and fine-tune models — all through a single CLI and API.
Features
OpenAI-Compatible API
Drop-in replacement for OpenAI SDK. Use the same chat/completions endpoint for local and remote models.
Local + Cloud Inference
Run models locally via Ollama with zero config. Pro users get automatic failover to RunPod cloud GPUs.
Model Fine-Tuning
Fine-tune LLMs with LoRA adapters using Unsloth, Axolotl, or MLX. Free tier supports real local training.
Apple Silicon Native
MLX adapter for Metal-optimized fine-tuning on M-series Macs. Auto-detected on macOS arm64.
CLI-First Workflow
Single CLI for init, start, train, and deploy. Zero Docker required for free tier.
Open-Core Architecture
MIT-licensed CLI and API layer. Proprietary router, optimized adapters, and cloud features in Pro tier.
How It Works
All requests enter through the Fastify API gateway. Free tier runs entirely locally; Pro adds cloud GPU and distributed training.
Free Tier
Local, Zero Config
Pro Tier
Cloud-Augmented
Free vs Pro
Free Tier
Local, Zero Config
- Ollama-powered local inference
- SQLite — no Docker, no external deps
- In-process model fine-tuning
- Unsloth, Axolotl & MLX adapters
- Alpaca & ShareGPT dataset formats
- CLI + OpenAI-compatible API
Pro Tier
Cloud-Augmented
- Everything in Free, plus:
- RunPod GPU inference + training
- BullMQ job queue with retries
- Checkpoint save/resume on failure
- Optimized adapters (Flash Attn 2, 4-bit)
- Dataset preprocessing & auto-packing
- Usage tracking & Stripe billing
- Cloudflare tunnel for public endpoints
Quick Start
Install & Init
Install globally with npm, then run init to auto-detect OS and set up Ollama.
Start Serving
Launch a local OpenAI-compatible API on port 8080. Zero Docker needed.
Train & Deploy
Fine-tune locally or on cloud GPUs. Supports LoRA with Unsloth, Axolotl, and MLX.
Training Configuration
All fields have sensible defaults. Free users only specify overrides. Supports Alpaca and ShareGPT dataset formats.
Run AI locally. Scale to cloud.
Free tier — zero config, no Docker, fully offline. Upgrade for cloud GPU and optimized training.