Andrej Karpathy’s NanoChat: A ChatGPT clone for $100

Build local ChatGPT in minutes

Photo by Solen Feyissa on Unsplash

Andrej Karpathy just dropped something wild again: NanoChat, a full-stack LLM implementation that tries to answer one question, can you train a ChatGPT-grade system end-to-end for $100?

My new book on Audio AI is out !!

Turns out, kind of yes.

NanoChat is not a framework. It’s not another “toolkit” with config files and pipelines that look like alien hieroglyphs. It’s a single, cohesive codebase that trains, evaluates, fine-tunes, and serves a small LLM end-to-end.

The entire thing runs on a single 8×H100 node, costing roughly $24/hour, so the “$100 ChatGPT” claim comes from a ~4-hour full training run.

What it really is

Think of it as nanoGPT++, but instead of stopping at pretraining, it takes you all the way to a working ChatGPT-like web UI. You run one script (speedrun.sh), and it handles everything:

  • Tokenization using a Rust-based BPE tokenizer.
  • Pretraining on text shards.
  • Mid-training, fine-tuning, and optional RL.
  • Evaluation (ARC, GSM8K, HumanEval, MMLU, etc).
  • Web inference through a minimal chat UI.

That’s all in roughly 8,000 lines of code across 45 files. No heavy dependencies. No Hydra configs. No magical factory patterns. Just plain readable PyTorch.

The $100 speedrun

The real hook is speedrun.sh. It’s Karpathy’s all-in-one script that launches the entire training + inference cycle.

Visit the node’s IP on port 8000, and you’ve got your own ChatGPT-like interface. The result is modest: a 4e19 FLOPs model, behaves like a smart kid who can chat, write simple code, and hallucinate with confidence. But that’s exactly the point: it’s small enough to understand, hack, and replicate end-to-end.

Benchmarks

You get to see what 100 bucks of compute buys you in measurable performance.

Scaling up: from $100 to $1000

Karpathy mentions three tiers:

  • $100 tier : the default, kindergartener-level model.
  • $300 tier (d26) : roughly GPT-2 grade.
  • $1000 tier : not yet in the main branch, but planned.

Scaling basically means increasing the depth parameter (--depth=26), adjusting batch sizes, and ensuring you have enough data shards. The code auto-compensates by tweaking gradient accumulation when VRAM gets tight.

It’ll even run on A100s, albeit slower, or on a single GPU if you’re patient enough to wait 8× longer.

Built for tinkering, not deployment

NanoChat isn’t trying to beat GPT-4. It’s built for accessibility: both financial and cognitive. Karpathy calls it a “strong baseline”, not a “framework”. No config monsters, no dependency hell. You can literally read every line and know what’s going on.

That’s rare.

It’s also forkable as hell. Every file in the repo can be packaged up (files-to-prompt) into ~330KB of text, small enough to feed into an LLM for meta-questions like “Explain this repo to me.”

That’s where the name fits perfectly: NanoChat, a ChatGPT you can actually understand and own.

Why it matters

For most people, this is not about chatting with a mini ChatGPT. It’s about learning what happens under the hood, from tokenization to inference, without a million-dollar GPU cluster.

It’s the first public example of a complete ChatGPT clone pipeline that runs within a weekend budget. More importantly, it serves as the capstone project for Karpathy’s upcoming course LLM101n, a kind of “build your own ChatGPT from scratch” curriculum.

NanoChat shows what open-source AI education should look like: end-to-end, hackable, cheap, and clear enough to read like a textbook.

In short

NanoChat is not here to compete. It’s here to teach.

It’s the closest you’ll get to holding the entire ChatGPT training stack in your hands, for a hundred bucks and a bit of patience.

Or, in Karpathy’s words:

“The best ChatGPT that $100 can buy.”

Leave a Reply