NanoChat: The $100 PhD in Building Your Own ChatGPT (And Why You Shouldn’t Actually Use It)

What Andrej Karpathy Just Released Will Change How You Think About AI

Andrej Karpathy just dropped something wild: nanochat — a complete, from-scratch ChatGPT clone you can train for as little as $100. But before you start dreaming about launching “MyChatGPT.ai” and competing with OpenAI, let me tell you why this is simultaneously one of the most valuable learning resources ever created and probably a terrible idea for your startup.

What Actually Is NanoChat?

Think of nanochat as the complete recipe for baking a ChatGPT from scratch, with every ingredient measured and every step explained.

In just 8,000 lines of clean code, it takes you through:

Training a tokenizer (teaching the model to understand text)
Pretraining on 15 billion tokens from FineWeb (absorbing world knowledge)
Midtraining on conversations and tool use (learning to chat)
Supervised fine-tuning (polishing conversational abilities)
Reinforcement learning (getting better at math problems)
Inference engine with a web UI (actually talking to your creation)

Run one script. Wait 4–24 hours. Get a working chatbot.

The Results?

$100 (4 hours): A chatbot that can write stories and answer simple questions
$1,000 (24 hours): A model scoring 40+ on MMLU, 70+ on ARC-Easy, solving basic math and code problems

Not GPT-4 level, but you built it from nothing.

The Architecture: Llama-lite with Training Wheels Off

From Karpathy’s own words , nanochat uses:

Dense transformer (Llama-like but simpler)
Rotary embeddings (no classic positional encodings)
QK normalization (stability hack)
Tied weights for embedding/unembedding
Multi-Query Attention (MQA) (faster inference)
Muon+AdamW optimizer (experimental, still being tuned)

Translation: It’s built with modern, proven techniques but kept deliberately simple so you can actually understand every line.

What You Should Learn From This

1. The Full Stack, Not Just APIs

Most developers only interact with LLMs through APIs. Nanochat shows you:

How tokenization really works
Why pretraining takes billions of tokens
How instruction-following emerges from fine-tuning
What RL actually does to model behavior
How inference optimization (KV cache, prefill/decode) speeds things up

This is your X-ray vision into AI systems.

2. Small Models Can Be Useful

You don’t need GPT-4 for everything. A $100 model can:

Answer domain-specific questions (after targeted training)
Generate creative content
Serve as a research baseline
Run locally with full privacy

3. Training is Expensive, But Scalable

$100 gets you a toy
$1,000 gets you something genuinely useful
$10,000+ gets you… well, that’s when things get interesting

The cost/performance curve is steep and predictable.

💰 Ideas That COULD Make Money Using NanoChat

✅ Education & Courses

Create detailed tutorials walking through nanochat
Build university courses on LLM fundamentals
Offer hands-on workshops ($500–2000/person)

Why it works: People pay serious money to understand this tech deeply.

✅ Research Tools & Benchmarks

Fork nanochat for specific research questions
Create domain-specific training pipelines
Build evaluation harnesses for niche use cases

Why it works: Academia and research labs need reproducible baselines.

✅ Understanding to Build Better Products

Learn the fundamentals to make smarter architectural decisions
Understand what’s actually expensive in AI systems
Build products that use APIs more efficiently

Why it works: Deep knowledge leads to better product decisions and cost optimization.

✅ Specialized Domain Models

Train tiny models for specific, narrow tasks
Edge deployment scenarios where latency matters
Privacy-sensitive applications requiring local inference

Why it works: Sometimes a focused small model beats a generic giant one.

🚫 Ideas That SHOULDN’T Use NanoChat

❌ Your Personal AI Assistant

Karpathy’s answer: No.

“You should think of micro models maybe more as very young children (kindergarten etc.), they just don’t have the raw intelligence of their larger cousins. If you finetune/train it on your own data you’ll probably get some amusing parroting that feels like your writing in style, but it will be slop.”

His solution: Use RAG (like NotebookLM) over your data with a real LLM. Your personal data becomes context, not training data.

Why: Training on small personal datasets creates a model that’s basically a broken parrot. It won’t understand you — it’ll just sound vaguely like you while being confused.

❌ Production ChatGPT Competitor

Don’t try to compete with OpenAI, Anthropic, or Google using nanochat models.

Why: They have:

1000x more compute
1000x more data
Years of RL fine-tuning
Massive infrastructure

Your $1000 model will feel like a toy next to Claude or GPT-4.

❌ Customer-Facing Products (Yet)

Don’t deploy nanochat models for critical business applications without massive additional work.

Why:

Hallucinations aren’t production-ready
Safety guardrails need extensive work
Evaluation benchmarks are just proxies
You’ll need monitoring, versioning, rollback systems

❌ “Just Add More Data” Businesses

Don’t think “I’ll train it on [industry] data and sell it!”

Why: Without the core intelligence, adding domain data to a small model gives you an expensive keyword matcher, not understanding.

The Real Value: NanoChat as a Compass, Not a Product

Here’s the profound truth Karpathy revealed:

“Basically I’d say getting this to work well is still realm of research and not obvious.”

NanoChat isn’t a product. It’s a map.

It shows you:

✅ What’s genuinely hard (alignment, evaluation, reasoning)
✅ What’s surprisingly simple (basic transformer architecture)
✅ Where the costs hide (pretraining data, compute hours)
✅ What techniques actually matter (RoPE, MQA, QK norm)

The Bottom LineLearn from nanochat. Don’t productize it.
Spend $100–1000 running through the pipeline yourself. You’ll gain intuition worth 10x that in better product decisions, smarter architecture choices, and understanding when to use small vs. large models.
But if someone asks you to train on their personal data to “understand them”? Send them to NotebookLM and RAG.
If someone wants to compete with ChatGPT? Smile and nod, then quietly invest in OpenAI’s next round.
The real money isn’t in nanochat itself — it’s in the deep understanding it gives you to build the next generation of AI products the right way.

Getting Started

Clone the repo: github.com/karpathy/nanochat
Read the code: All 8,000 lines. Yes, really.
Run the speedrun: Start with the $100 4-hour version
Break things: Fork it, modify it, understand what breaks and why
Build something new: Use your newfound knowledge wisely

This is the capstone of LLM101n — the education you can’t buy, only earn through doing.

What will you build once you understand how LLMs really work?

NanoChat: The $100 PhD in Building Your Own ChatGPT (And Why You Shouldn’t Actually Use It)

What Andrej Karpathy Just Released Will Change How You Think About AI

What Actually Is NanoChat?

The Results?

The Architecture: Llama-lite with Training Wheels Off

What You Should Learn From This

1. The Full Stack, Not Just APIs

2. Small Models Can Be Useful

3. Training is Expensive, But Scalable

💰 Ideas That COULD Make Money Using NanoChat

✅ Education & Courses

✅ Research Tools & Benchmarks

✅ Understanding to Build Better Products

✅ Specialized Domain Models

🚫 Ideas That SHOULDN’T Use NanoChat

❌ Your Personal AI Assistant

❌ Production ChatGPT Competitor

❌ Customer-Facing Products (Yet)

❌ “Just Add More Data” Businesses

The Real Value: NanoChat as a Compass, Not a Product

The Bottom Line

Getting Started

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

How I self published my poetry book | 5 step guide — Rachel H | Poetry

Can Abrupt Decisions and Policies Rescue Healthcare?

SEBIN

Laboratory Automation System (LAS) Operations Asset Analyst III (Mon – Fri, 8am – 5pm)

Archives

NanoChat: The $100 PhD in Building Your Own ChatGPT (And Why You Shouldn’t Actually Use It)

What Andrej Karpathy Just Released Will Change How You Think About AI

What Actually Is NanoChat?

The Results?

The Architecture: Llama-lite with Training Wheels Off

What You Should Learn From This

1. The Full Stack, Not Just APIs

2. Small Models Can Be Useful

3. Training is Expensive, But Scalable

💰 Ideas That COULD Make Money Using NanoChat

✅ Education & Courses

✅ Research Tools & Benchmarks

✅ Understanding to Build Better Products

✅ Specialized Domain Models

🚫 Ideas That SHOULDN’T Use NanoChat

❌ Your Personal AI Assistant

❌ Production ChatGPT Competitor

❌ Customer-Facing Products (Yet)

❌ “Just Add More Data” Businesses

The Real Value: NanoChat as a Compass, Not a Product

The Bottom Line

Getting Started

Like this:

By skyforbes

Related Posts

How ChatGPT Opened My Eyes About the Futile Effort of What AI Is Trying to Achieve

OpenAI Offers 1 Year Free ChatGPT Go In India: What it meansThe Curious Case of Free Intelligence

ChatGPT Atlas vs Comet: The AI Browser Showdown Everyone’s Talking About

Leave a ReplyCancel reply

You Missed

How I self published my poetry book | 5 step guide — Rachel H | Poetry

Can Abrupt Decisions and Policies Rescue Healthcare?

SEBIN

Laboratory Automation System (LAS) Operations Asset Analyst III (Mon – Fri, 8am – 5pm)