Available Now — Open Core

QuickSlug

Local-first, OpenAI-compatible AI platform. Run inference via Ollama, fall back to cloud GPU, and fine-tune models — all through a single CLI and API.

Get Started Free Read the Guide

Runtime modes

Training frameworks

0×

Cloud GPU speedup

Config needed (Free)

Features

OpenAI-Compatible API

Drop-in replacement for OpenAI SDK. Use the same chat/completions endpoint for local and remote models.

Local + Cloud Inference

Run models locally via Ollama with zero config. Pro users get automatic failover to RunPod cloud GPUs.

Model Fine-Tuning

Fine-tune LLMs with LoRA adapters using Unsloth, Axolotl, or MLX. Free tier supports real local training.

Apple Silicon Native

MLX adapter for Metal-optimized fine-tuning on M-series Macs. Auto-detected on macOS arm64.

CLI-First Workflow

Single CLI for init, start, train, and deploy. Zero Docker required for free tier.

Open-Core Architecture

MIT-licensed CLI and API layer. Proprietary router, optimized adapters, and cloud features in Pro tier.

How It Works

All requests enter through the Fastify API gateway. Free tier runs entirely locally; Pro adds cloud GPU and distributed training.

Free Tier

Local, Zero Config

ClientCoreServiceFeature

CLI / SDK

Fastify API Gateway

Ollama (Local Inference)

SQLite

In-Process Training

Pro Tier

Cloud-Augmented

ClientCoreServiceFeature

CLI / SDK

Fastify API + Auth

Ollama + RunPod GPU

PostgreSQL + Redis

BullMQ Training Queue

Cloudflare Tunnel

Free vs Pro

Free Tier

Local, Zero Config

Ollama-powered local inference
SQLite — no Docker, no external deps
In-process model fine-tuning
Unsloth, Axolotl & MLX adapters
Alpaca & ShareGPT dataset formats
CLI + OpenAI-compatible API

Pro Tier

Cloud-Augmented

Everything in Free, plus:
RunPod GPU inference + training
BullMQ job queue with retries
Checkpoint save/resume on failure
Optimized adapters (Flash Attn 2, 4-bit)
Dataset preprocessing & auto-packing
Usage tracking & Stripe billing
Cloudflare tunnel for public endpoints

Quick Start

STEP 01

Install & Init

Install globally with npm, then run init to auto-detect OS and set up Ollama.

$ npm install -g quickslug && quickslug init

STEP 02

Start Serving

Launch a local OpenAI-compatible API on port 8080. Zero Docker needed.

$ quickslug start

STEP 03

Train & Deploy

Fine-tune locally or on cloud GPUs. Supports LoRA with Unsloth, Axolotl, and MLX.

$ quickslug train --config train.json

Training Configuration

All fields have sensible defaults. Free users only specify overrides. Supports Alpaca and ShareGPT dataset formats.

Framework

Unsloth / Axolotl / MLX

Base Model

llama3, mistral, phi-3...

LoRA Rank

8 (Free) / 16 (Pro)

Batch Size

2 (Free) / 4 (Pro)

Max Seq Length

2048 tokens

Epochs

3 default, configurable

Run AI locally. Scale to cloud.

Free tier — zero config, no Docker, fully offline. Upgrade for cloud GPU and optimized training.

Install QuickSlug All Products