Build Your Own ChatGPT in an Afternoon: The NanoGPT Guide

Teeth Over Education: Why NanoGPT Left MinGPT Behind! As Karpathy puts it in the official README: “The code is so simple, it is very easy to hack to your needs, train new models from scratch, or finetune pretrained checkpoints”. That’s the speedboat advantage: fast, nimble, and gets you where you need to go.

If you’ve ever wondered how ChatGPT actually works under the hood, you’re not alone. The magic behind large language models can feel intimidating, but what if you could build and train your own version this weekend? That’s exactly what Andrej Karpathy’s nanoGPT makes possible.

NanoGPT is the simplest and fastest way to train medium-sized GPT models from scratch. With just 600 lines of clean, readable Python code, you can reproduce GPT-2 or create your own language model in minutes, even on a basic laptop. No PhD required, no expensive cloud computing bills, just you and some surprisingly straightforward code.

Why NanoGPT Changes Everything

Most AI tutorials either oversimplify to the point of uselessness or drown you in academic papers. NanoGPT takes a different approach. It’s what Karpathy calls “the simplest, fastest repository” for understanding how these models actually work.

The entire codebase fits in two files. The model definition sits in about 300 lines, and the training loop takes another 300. You can read and understand every line in a single sitting. More importantly, this isn’t toy code. The same architecture can scale up to reproduce the full GPT-2 model with 124 million parameters.

What makes this remarkable is the accessibility. You don’t need a massive GPU cluster to get started. The Shakespeare example trains in about 3 minutes on a single GPU, and you can even run smaller experiments on a MacBook using Metal Performance Shaders. This democratizes AI learning in a way that wasn’t possible just a few years ago.

Getting Started: Your First Language Model

The barrier to entry is refreshingly low. After installing PyTorch and cloning the repository, you can train your first model with just a few commands. The Shakespeare dataset comes included, so you can see real results immediately.

When you run the training script, you’re not just executing someone else’s black box. You’re running code you can actually read and modify. Want to change the model size? Adjust one parameter. Curious about different learning rates? The configuration is right there in plain sight.

The beauty of starting with Shakespeare is that the results are immediately entertaining. After training on the Bard’s complete works, your model starts generating surprisingly coherent Elizabethan dialogue. It’s not perfect, but seeing your first AI-generated sonnets makes the underlying concepts click in a way that theory alone never could.

Understanding What’s Actually Happening

NanoGPT strips away the complexity that often obscures learning. The transformer architecture, attention mechanisms, and training loops are all implemented in straightforward Python. No proprietary frameworks, no hidden abstractions.

The training process follows the same principles as GPT-2 and GPT-3, just at a scale you can actually observe and understand. You watch the loss decrease, see the perplexity improve, and can sample text at any point to check progress. This immediate feedback loop makes the learning process tangible rather than theoretical.

Each component serves a clear purpose. The model file defines the transformer blocks, attention heads, and feed-forward networks. The training file handles data loading, optimization, and checkpointing. If you want to understand how positional embeddings work or why layer normalization matters, you can trace through the actual implementation.

Scaling Up: From Shakespeare to GPT-2

Once you’ve mastered the basics, nanoGPT lets you gradually increase complexity. The next natural step is reproducing GPT-2 on the OpenWebText dataset. This isn’t a simplified version or approximation. It’s the actual GPT-2 architecture trained on a similar dataset to the original.

The process takes about 4 days on an 8-GPU node, which sounds like a lot until you realize OpenAI’s original training took far longer on more expensive hardware. More importantly, you can start with smaller versions to validate your setup before committing to the full run.

This progression teaches you something crucial about modern AI. The difference between a toy model and a production system isn’t fundamentally different architectures. It’s mostly about scale, data quality, and computational resources. Understanding this removes much of the mystique around large language models.

Practical Applications and Finetuning

The real power emerges when you start adapting nanoGPT for your own projects. Finetuning on custom datasets is straightforward because the codebase is small enough to modify with confidence. Want to train on your company’s documentation? Your personal writing? Domain-specific technical papers? The process is the same.

The configuration system makes experimentation easy. You can adjust batch sizes for your available memory, modify learning rates for your dataset characteristics, and experiment with different model sizes without touching the core code. This flexibility turns nanoGPT into a genuine research and development tool, not just an educational resource.

Several developers have already extended nanoGPT in creative ways. Some have added features like improved tokenizers, others have optimized for specific hardware, and many have successfully trained models for languages beyond English. The simple codebase makes these modifications approachable for intermediate programmers.

Common Challenges and Solutions

The learning curve isn’t zero, but it’s manageable. Most issues come from environment setup rather than the code itself. PyTorch 2.0 introduced some compatibility considerations, so checking your version against the requirements saves frustration.

Memory constraints are the most common limitation for hobbyists. If you’re working with limited GPU memory, nanoGPT lets you adjust batch size and model dimensions to fit your hardware. The tradeoff is longer training times, but the fundamentals remain the same.

For those without GPU access, the CPU and Apple Silicon support means you can still experiment meaningfully. Training will be slower, but for learning purposes and small models, it’s perfectly viable. This inclusive design philosophy reflects Karpathy’s educational mission.

What You’ll Actually Learn

Working through nanoGPT teaches you how transformers really work, not just in theory but in practice. You’ll understand why attention mechanisms revolutionized NLP, how positional encodings solve sequence problems, and why layer normalization stabilizes training.

More valuable than any single concept is developing intuition about these models. You’ll learn to recognize when your model is overfitting, understand the relationship between parameters and performance, and develop a feel for what’s actually happening during training. This intuition is what separates people who use AI tools from people who can build and debug them.

The educational value extends beyond just language models. The patterns you’ll see in nanoGPT appear throughout deep learning. Attention mechanisms are now used in computer vision, audio processing, and reinforcement learning. Understanding them deeply in one context makes picking them up elsewhere much easier.

Your Next Steps

Start simple. Clone the repository, run the Shakespeare example, and actually read through the code. Don’t just execute it and move on. Spend time understanding what each section does. Modify something small and see what breaks. This hands-on exploration is how real understanding develops.

After the basic example works, try finetuning on your own text data. Pick something manageable like your blog posts, favorite book, or collection of emails. Seeing a model learn your writing style makes the concepts personal and memorable.

When you’re comfortable with the basics, dive into Karpathy’s video series where he builds GPT from scratch. Watching him code while you follow along in your own nanoGPT installation creates a powerful learning experience. You’ll catch details that written tutorials miss and develop better debugging instincts.

The beauty of nanoGPT is that it meets you where you are. Complete beginners can get results quickly. Intermediate practitioners can use it for serious projects. Even experts value it as a clean reference implementation free from production cruft.

Building your own language model isn’t just about the technical skills. It demystifies technology that often feels like magic. Once you’ve trained your own GPT, you understand both the power and limitations of these systems. That understanding is becoming essential as AI shapes more of our world.

The code is waiting on GitHub. Your laptop is probably powerful enough. The only question is whether you’re ready to move from AI consumer to AI builder. NanoGPT makes that transition smoother than it’s ever been.

  1. Karpathy, A. (2022). nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. GitHub. https://github.com/karpathy/nanoGPT
  2. Karpathy, A. (2023). Let’s build GPT: from scratch, in code, spelled out. YouTube. https://www.youtube.com/watch?v=kCc8FmEb1nY
  3. Karpathy, A. (2024). Let’s reproduce GPT-2 (124M). YouTube. https://www.youtube.com/watch?v=l8pRSuU81PU
  4. Karpathy, A. (2025). nanoGPT recursive self-improvement project. X (formerly Twitter). https://x.com/karpathy/status/1939709449956126910
  5. Karpathy, A. (2025). Announcing nanochat: A full-stack ChatGPT training pipeline. X (formerly Twitter). https://x.com/karpathy/status/1977755427569111362

Leave a Reply