In Part 1, we explored why Nanochat matters for enterprises — a transparent, educational LLM stack that helps leaders and developers understand the true cost, architecture, and trade-offs behind conversational AI.
Now, let’s get hands-on.
This second part walks developers through the exact process of training and deploying a Nanochat model, from setup to serving a live chat endpoint.
Environment Setup (Developer Edition)
Step 1: Spin Up the Environment
Start with an 8× H100 (80 GB) instance on Lambda Labs, RunPod, or AWS EC2. SSH into the box.
git clone https://github.com/karpathy/nanochat.git
cd nanochat
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv && uv sync
source .venv/bin/activate
Step 2: Rust Tokenizer Setup
Nanochat uses a high-performance Rust-based BPE tokenizer.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
uv run maturin develop --release --manifest-path rustbpe/Cargo.toml
Step 3: Optional Monitoring
wandb login
Step 4: Verify Installation
python -m pytest tests/ -v -s
Configuration Philosophy: Radical Simplicity
Nanochat rejects YAML abstraction. Everything is explicit Python.
# from scripts/base_train.py
depth = 20
device_batch_size = 32
learning_rate = 0.02
run_name = "default"
Override directly:
torchrun -m scripts.base_train -- --depth=26 --device_batch_size=16 --run_name=enterprise_exp
Environment helpers:
# Set where artifacts are stored (default: ~/.cache/nanochat)
export NANOCHAT_BASE_DIR="$HOME/nanochat_data"
# Enable Weights & Biases logging
export WANDB_RUN="my_experiment"
# OR disable it entirely
export WANDB_RUN="dummy"
# Prevent OpenMP thread issues with multi-GPU
export OMP_NUM_THREADS=1
The Speedrun
To train your first model end-to-end, run:
screen -L -Logfile speedrun.log -S speedrun bash speedrun.sh
This automated pipeline executes:
- Dataset download
- Tokenizer training
- Base pre-training
- Mid-training (conversation adaptation)
- Supervised fine-tuning
- Evaluation (ARC, MMLU, GSM8K, HumanEval)
- Report generation
Manual Control: Enterprise Experiment Mode
If your AI team wants to integrate with internal data or modify training logic, run each phase manually.
Step 1: Data Download
python -m nanochat.dataset -n 240 # d20 baseline
Step 2: Tokenizer Training
python -m scripts.tok_train --max_chars=2000000000
python -m scripts.tok_eval
Step 3: Base Pretraining
torchrun --standalone --nproc_per_node=8 -m scripts.base_train -- \
--depth=20 --device_batch_size=32 --run=enterprise_run
Step 4: Mid-Training & Fine-Tuning
torchrun --standalone --nproc_per_node=8 -m scripts.mid_train -- --run=enterprise_run
torchrun --standalone --nproc_per_node=8 -m scripts.chat_sft -- --run=enterprise_run
Step 5: Evaluation & Reporting
torchrun --standalone --nproc_per_node=8 -m scripts.chat_eval -- -i sft
python -m nanochat.report generate
Optional RL Fine-Tuning
torchrun --standalone --nproc_per_node=8 -m scripts.chat_rl -- --run=enterprise_run
Final Word
As mentioned in the earlier post, Nanochat isn’t about replacing GPT-4 — it’s about revealing how it works.
For enterprises, this open, hackable architecture transforms AI from a “black-box service” into an auditable, reproducible system.
Your developers get the keys to the engine. Your executives get clarity on cost, control, and capability.
Welcome to the build phase of enterprise AI.
Happy training!
References
