Build Your Own ChatGPT-Style Model: A Developer’s Guide to nanochat

In Part 1, we explored why Nanochat matters for enterprises — a transparent, educational LLM stack that helps leaders and developers understand the true cost, architecture, and trade-offs behind conversational AI.

Now, let’s get hands-on.

This second part walks developers through the exact process of training and deploying a Nanochat model, from setup to serving a live chat endpoint.

Environment Setup (Developer Edition)

Step 1: Spin Up the Environment

Start with an 8× H100 (80 GB) instance on Lambda Labs, RunPod, or AWS EC2. SSH into the box.

git clone https://github.com/karpathy/nanochat.git
cd nanochat
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv && uv sync
source .venv/bin/activate

Step 2: Rust Tokenizer Setup

Nanochat uses a high-performance Rust-based BPE tokenizer.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
uv run maturin develop --release --manifest-path rustbpe/Cargo.toml

Step 3: Optional Monitoring

wandb login

Step 4: Verify Installation

python -m pytest tests/ -v -s

Configuration Philosophy: Radical Simplicity

Nanochat rejects YAML abstraction. Everything is explicit Python.

# from scripts/base_train.py
depth = 20
device_batch_size = 32
learning_rate = 0.02
run_name = "default"

Override directly:

torchrun -m scripts.base_train -- --depth=26 --device_batch_size=16 --run_name=enterprise_exp

Environment helpers:

# Set where artifacts are stored (default: ~/.cache/nanochat)
export NANOCHAT_BASE_DIR="$HOME/nanochat_data"
# Enable Weights & Biases logging
export WANDB_RUN="my_experiment"
# OR disable it entirely
export WANDB_RUN="dummy"
# Prevent OpenMP thread issues with multi-GPU
export OMP_NUM_THREADS=1

The Speedrun

To train your first model end-to-end, run:

screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh

This automated pipeline executes:

  1. Dataset download
  2. Tokenizer training
  3. Base pre-training
  4. Mid-training (conversation adaptation)
  5. Supervised fine-tuning
  6. Evaluation (ARC, MMLU, GSM8K, HumanEval)
  7. Report generation

Manual Control: Enterprise Experiment Mode

If your AI team wants to integrate with internal data or modify training logic, run each phase manually.

Step 1: Data Download

python -m nanochat.dataset -n 240   # d20 baseline

Step 2: Tokenizer Training

python -m scripts.tok_train --max_chars=2000000000
python -m scripts.tok_eval

Step 3: Base Pretraining

torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- \
--depth=20 --device_batch_size=32 --run=enterprise_run

Step 4: Mid-Training & Fine-Tuning

torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --run=enterprise_run
torchrun --standalone --nproc_per_node=8 -m scripts.chat_sft -- --run=enterprise_run

Step 5: Evaluation & Reporting

torchrun --standalone --nproc_per_node=8 -m scripts.chat_eval -- -i sft
python -m nanochat.report generate

Optional RL Fine-Tuning

torchrun --standalone --nproc_per_node=8 -m scripts.chat_rl -- --run=enterprise_run

Final Word

As mentioned in the earlier post, Nanochat isn’t about replacing GPT-4 — it’s about revealing how it works.

For enterprises, this open, hackable architecture transforms AI from a “black-box service” into an auditable, reproducible system.
Your developers get the keys to the engine. Your executives get clarity on cost, control, and capability.

Welcome to the build phase of enterprise AI.
Happy training!

References

[1] https://t.co/LLhbLCoZFt” / X

Leave a Reply