Available Now — Open Core

QuickSlug

Local-first, OpenAI-compatible AI platform. Run inference via Ollama, fall back to cloud GPU, and fine-tune models — all through a single CLI and API.

0
Runtime modes
0
Training frameworks
0×
Cloud GPU speedup
0
Config needed (Free)

Features

OpenAI-Compatible API

Drop-in replacement for OpenAI SDK. Use the same chat/completions endpoint for local and remote models.

Local + Cloud Inference

Run models locally via Ollama with zero config. Pro users get automatic failover to RunPod cloud GPUs.

Model Fine-Tuning

Fine-tune LLMs with LoRA adapters using Unsloth, Axolotl, or MLX. Free tier supports real local training.

Apple Silicon Native

MLX adapter for Metal-optimized fine-tuning on M-series Macs. Auto-detected on macOS arm64.

CLI-First Workflow

Single CLI for init, start, train, and deploy. Zero Docker required for free tier.

Open-Core Architecture

MIT-licensed CLI and API layer. Proprietary router, optimized adapters, and cloud features in Pro tier.

How It Works

All requests enter through the Fastify API gateway. Free tier runs entirely locally; Pro adds cloud GPU and distributed training.

Free Tier

Local, Zero Config

ClientCoreServiceFeature
CLI / SDK
Fastify API Gateway
Ollama (Local Inference)
SQLite
In-Process Training

Pro Tier

Cloud-Augmented

ClientCoreServiceFeature
CLI / SDK
Fastify API + Auth
Ollama + RunPod GPU
PostgreSQL + Redis
BullMQ Training Queue
Cloudflare Tunnel

Free vs Pro

Free Tier

Local, Zero Config

  • Ollama-powered local inference
  • SQLite — no Docker, no external deps
  • In-process model fine-tuning
  • Unsloth, Axolotl & MLX adapters
  • Alpaca & ShareGPT dataset formats
  • CLI + OpenAI-compatible API

Pro Tier

Cloud-Augmented

  • Everything in Free, plus:
  • RunPod GPU inference + training
  • BullMQ job queue with retries
  • Checkpoint save/resume on failure
  • Optimized adapters (Flash Attn 2, 4-bit)
  • Dataset preprocessing & auto-packing
  • Usage tracking & Stripe billing
  • Cloudflare tunnel for public endpoints

Quick Start

STEP 01

Install & Init

Install globally with npm, then run init to auto-detect OS and set up Ollama.

$ npm install -g quickslug && quickslug init
STEP 02

Start Serving

Launch a local OpenAI-compatible API on port 8080. Zero Docker needed.

$ quickslug start
STEP 03

Train & Deploy

Fine-tune locally or on cloud GPUs. Supports LoRA with Unsloth, Axolotl, and MLX.

$ quickslug train --config train.json

Training Configuration

All fields have sensible defaults. Free users only specify overrides. Supports Alpaca and ShareGPT dataset formats.

Framework
Unsloth / Axolotl / MLX
Base Model
llama3, mistral, phi-3...
LoRA Rank
8 (Free) / 16 (Pro)
Batch Size
2 (Free) / 4 (Pro)
Max Seq Length
2048 tokens
Epochs
3 default, configurable

Run AI locally. Scale to cloud.

Free tier — zero config, no Docker, fully offline. Upgrade for cloud GPU and optimized training.