NanoChat: The $100 PhD in Building Your Own ChatGPT (And Why You Shouldn’t Actually Use It)

https://github.com/karpathy/nanochat

What Andrej Karpathy Just Released Will Change How You Think About AI

Andrej Karpathy just dropped something wild: nanochat — a complete, from-scratch ChatGPT clone you can train for as little as $100. But before you start dreaming about launching “MyChatGPT.ai” and competing with OpenAI, let me tell you why this is simultaneously one of the most valuable learning resources ever created and probably a terrible idea for your startup.

What Actually Is NanoChat?

Think of nanochat as the complete recipe for baking a ChatGPT from scratch, with every ingredient measured and every step explained.

In just 8,000 lines of clean code, it takes you through:

  1. Training a tokenizer (teaching the model to understand text)
  2. Pretraining on 15 billion tokens from FineWeb (absorbing world knowledge)
  3. Midtraining on conversations and tool use (learning to chat)
  4. Supervised fine-tuning (polishing conversational abilities)
  5. Reinforcement learning (getting better at math problems)
  6. Inference engine with a web UI (actually talking to your creation)

Run one script. Wait 4–24 hours. Get a working chatbot.

The Results?

  • $100 (4 hours): A chatbot that can write stories and answer simple questions
  • $1,000 (24 hours): A model scoring 40+ on MMLU, 70+ on ARC-Easy, solving basic math and code problems

Not GPT-4 level, but you built it from nothing.

The Architecture: Llama-lite with Training Wheels Off

From Karpathy’s own words , nanochat uses:

  • Dense transformer (Llama-like but simpler)
  • Rotary embeddings (no classic positional encodings)
  • QK normalization (stability hack)
  • Tied weights for embedding/unembedding
  • Multi-Query Attention (MQA) (faster inference)
  • Muon+AdamW optimizer (experimental, still being tuned)

Translation: It’s built with modern, proven techniques but kept deliberately simple so you can actually understand every line.

What You Should Learn From This

1. The Full Stack, Not Just APIs

Most developers only interact with LLMs through APIs. Nanochat shows you:

  • How tokenization really works
  • Why pretraining takes billions of tokens
  • How instruction-following emerges from fine-tuning
  • What RL actually does to model behavior
  • How inference optimization (KV cache, prefill/decode) speeds things up

This is your X-ray vision into AI systems.

2. Small Models Can Be Useful

You don’t need GPT-4 for everything. A $100 model can:

  • Answer domain-specific questions (after targeted training)
  • Generate creative content
  • Serve as a research baseline
  • Run locally with full privacy

3. Training is Expensive, But Scalable

  • $100 gets you a toy
  • $1,000 gets you something genuinely useful
  • $10,000+ gets you… well, that’s when things get interesting

The cost/performance curve is steep and predictable.

💰 Ideas That COULD Make Money Using NanoChat

✅ Education & Courses

  • Create detailed tutorials walking through nanochat
  • Build university courses on LLM fundamentals
  • Offer hands-on workshops ($500–2000/person)

Why it works: People pay serious money to understand this tech deeply.

✅ Research Tools & Benchmarks

  • Fork nanochat for specific research questions
  • Create domain-specific training pipelines
  • Build evaluation harnesses for niche use cases

Why it works: Academia and research labs need reproducible baselines.

✅ Understanding to Build Better Products

  • Learn the fundamentals to make smarter architectural decisions
  • Understand what’s actually expensive in AI systems
  • Build products that use APIs more efficiently

Why it works: Deep knowledge leads to better product decisions and cost optimization.

✅ Specialized Domain Models

  • Train tiny models for specific, narrow tasks
  • Edge deployment scenarios where latency matters
  • Privacy-sensitive applications requiring local inference

Why it works: Sometimes a focused small model beats a generic giant one.

🚫 Ideas That SHOULDN’T Use NanoChat

❌ Your Personal AI Assistant

Karpathy’s answer: No.

“You should think of micro models maybe more as very young children (kindergarten etc.), they just don’t have the raw intelligence of their larger cousins. If you finetune/train it on your own data you’ll probably get some amusing parroting that feels like your writing in style, but it will be slop.”

His solution: Use RAG (like NotebookLM) over your data with a real LLM. Your personal data becomes context, not training data.

Why: Training on small personal datasets creates a model that’s basically a broken parrot. It won’t understand you — it’ll just sound vaguely like you while being confused.

❌ Production ChatGPT Competitor

Don’t try to compete with OpenAI, Anthropic, or Google using nanochat models.

Why: They have:

  • 1000x more compute
  • 1000x more data
  • Years of RL fine-tuning
  • Massive infrastructure

Your $1000 model will feel like a toy next to Claude or GPT-4.

❌ Customer-Facing Products (Yet)

Don’t deploy nanochat models for critical business applications without massive additional work.

Why:

  • Hallucinations aren’t production-ready
  • Safety guardrails need extensive work
  • Evaluation benchmarks are just proxies
  • You’ll need monitoring, versioning, rollback systems

❌ “Just Add More Data” Businesses

Don’t think “I’ll train it on [industry] data and sell it!”

Why: Without the core intelligence, adding domain data to a small model gives you an expensive keyword matcher, not understanding.

The Real Value: NanoChat as a Compass, Not a Product

Here’s the profound truth Karpathy revealed:

“Basically I’d say getting this to work well is still realm of research and not obvious.”

NanoChat isn’t a product. It’s a map.

It shows you:

  • ✅ What’s genuinely hard (alignment, evaluation, reasoning)
  • ✅ What’s surprisingly simple (basic transformer architecture)
  • ✅ Where the costs hide (pretraining data, compute hours)
  • ✅ What techniques actually matter (RoPE, MQA, QK norm)

The Bottom Line

Learn from nanochat. Don’t productize it.

Spend $100–1000 running through the pipeline yourself. You’ll gain intuition worth 10x that in better product decisions, smarter architecture choices, and understanding when to use small vs. large models.

But if someone asks you to train on their personal data to “understand them”? Send them to NotebookLM and RAG.

If someone wants to compete with ChatGPT? Smile and nod, then quietly invest in OpenAI’s next round.

The real money isn’t in nanochat itself — it’s in the deep understanding it gives you to build the next generation of AI products the right way.

Getting Started

  1. Clone the repo: github.com/karpathy/nanochat
  2. Read the code: All 8,000 lines. Yes, really.
  3. Run the speedrun: Start with the $100 4-hour version
  4. Break things: Fork it, modify it, understand what breaks and why
  5. Build something new: Use your newfound knowledge wisely

This is the capstone of LLM101n — the education you can’t buy, only earn through doing.

What will you build once you understand how LLMs really work?

Leave a Reply