Nine Rules for Proving (Rust) Algorithms Correct Without Knowing Formal Methods
My library package range-set-blaze
includes a critical function named internal_add
. The function inserts a range of integers into the crate’s data structure. Of course, I test it, but testing can miss bugs. Ideally, I want mathematical certainty of correctness.
Moreover, although I coded this project by hand, as AI begins to generate more of our code, the ability to prove that code correct will become increasingly important.
Aside: Modern languages like Rust — and to some extent C++ with its smart pointers and ownership conventions — already give us strong guarantees: no null dereferences, no use-after-free errors. But what if we could extend that same certainty to our algorithms themselves? That’s where proof systems like Lean come in.
Two years ago, to achieve correctness certainty, Divyanshu Ranjan and I ported internal_add
‘s algorithm to the Dafny language. We then, with much work, formally validated the Dafny version of the algorithm (article).
This year, with the help of ChatGPT-5 — paired with Claude Sonnet 4.5 and Codex — I ported and validated the algorithm with Lean. The result was over 4,700 lines of Lean proof, created with almost no proof work or Lean experience.
That doesn’t mean creating the proof was easy. It required translating the algorithm into a completely new language (from Rust to Lean), a step that can both lose details and introduce errors. The whole proof process took about three weeks of part-time work, hundreds of back-and-forth prompts, and roughly $50 in AI credits. Like the proverbial dancing pig, the marvel isn’t how well it dances, but that it dances at all.
I chose Lean because it is one of the most active modern proof assistants, with a growing ecosystem and strong automation. Most of Lean’s momentum comes from mathematics, not software validation. Fields Medalist Terence Tao has publicly used Lean to formalize his own research papers, written a Lean companion to his Analysis I textbook, and even discovered small mistakes in published proofs during the formalization process. Lean now leads in mathematical validation; I believe it will also soon lead in software validation.
Over the course of this new validation, I learned nine rules that can help you validate your algorithms — written in Rust or other languages — with Lean and AI. You may also find these rules interesting as a way to gauge the ease or difficulty of such verification using modern tools. The conclusion, in Part 2 (coming soon), will also discuss surprises and lessons learned.
The rules are:
- Don’t use Lean.
- Use Lean but don’t learn it.
- Drive Claude Sonnet 4.5 (or Codex) with ChatGPT-5.
- Define your algorithm’s basic concepts.
- Redo your concepts — this time the Lean way.
To be covered in Part 2:
- 6. Let the AI invent and prove a toy algorithm — simple, slow, but correct.
- 7. Specify a simplified, realistic algorithm and have the AI validate it.
- 8. Trust the AI to port your algorithm to Lean. Then distrustfully audit the port.
- 9. Trust the AI to validate the algorithm and audit everything again.
Aside: To avoid wishy-washiness, I call these “rules”, but they are, of course, just suggestions.
That’s the big picture. But what does the algorithm of interest actually do?
The internal_add
function tries to efficiently insert a new range of integers into an existing list of sorted and disjoint integer ranges. For example, if we started with [101..=102, 400..=402, 404..=405]
and added 402..=404
, we expect a result of [101..=102, 400..=405]
.
With the introduction out of the way, let’s first ask whether to even begin.
Rule 1: Don’t use Lean.
Before proving your algorithm formally, ask whether the payoff justifies the effort. And even if it does, decide whether Lean is the right tool for the job.
Using Lean means porting your algorithm into it — a process that can miss details or introduce new errors. Given this risk, should you use Lean to verify your algorithms? It depends.
- How important is your algorithm’s correctness? If you’re generating a report and it looks right, it probably is. But a library is different: others will depend on it. The
internal_add
algorithm underpins a data structure I want people to trust, which gives me extra motivation to verify it. - Maybe Lean isn’t the best fit. Consider other frameworks. Dafny, for example, translates your program into constraints for something called an SMT solver and searches for a proof there. This makes Dafny easier to use manually, and when it succeeds it often feels “magical.” But it can also fail unpredictably — because the search is partly random — making its results difficult to reproduce. Coq, another long-established proof assistant, offers rigor similar to Lean’s but relies on more manual proofs and older automation. Lean, in contrast, is newer, faster, and designed for richer automation and AI support while still producing full step-by-step proofs you can read and reuse.
- Maybe all formal verification, with current tools, is too hard. I believe, however, that AI makes things easier than ever before. You will find formally verifying code easier if you are already familiar with types (for example, from Java or Rust) and recursion/induction (used infrequently in Rust). Continue reading and then decide for yourself if/when formal verification is easy enough to be valuable to you.
- Maybe fuzzing (such as
cargo-fuzz
) and property-based testing (such asQuickCheck
) are good enough. Although these methods do not provide mathematical certainty, they are clever, useful, and easy to use. (Therange-set-blaze
crate already usesQuickCheck
. See Rule 9.5 in a previous article for details). - Maybe formal verification is — and will always be — doomed because writing a specification is as hard as writing code. I disagree. Think about refactoring: I often start with a simple, clear version of a program, then refactor it for efficiency. Writing a specification is like that first, simple version — it captures what the program should do without worrying about performance. In fact, for
internal_add
, I found the specification to be much simpler than any implementation. (You can judge this for yourself in Rule 6 of Part 2.) - Maybe formal verification is and will always be doomed because the halting problem tells us formally that formality isn’t generally possible. The halting problem does not doom us. While we can’t always understand arbitrary code, we don’t need to. We only need to understand our own code, which we (hopefully) wrote to be understandable.
- Maybe porting to Lean is too hard. This has not been my experience. Like Rust, Lean mixes and matches imperative and functional programming. I found porting my algorithm, with AI help, to be straightforward. There are translation toolchains emerging for Rust-to-proof-assistant such as coq-of-rust targeting Coq and Aeneas targeting Lean. These may one day forgo the need for manual or AI-assisted porting.
Assuming you still want to port and verify your algorithm with Lean, the good news is that you don’t really need to learn Lean.
Rule 2: Use Lean but don’t learn it.
Surprise: you don’t need to learn Lean in the traditional sense. You don’t need a textbook or to master tactics. What you do need is a working environment:
- Install VS Code with the Lean 4 extension. This gives you Lean inside your editor.
- Create your project. In these examples I use the package name
RangeSetBlaze
; use any name you like. From the command line:
# choose any name for your Git project folder
mkdir range-set-blaze-lean
cd range-set-blaze-lean
# start vscode and open a terminal
lake init RangeSetBlaze
Lake (think Make or Cargo for Lean) creates:
range-set-blaze-lean/
├── lakefile.toml # Lake build configuration (package name, deps, options)
├── lake-manifest.json # auto-generated "lock" file
├── lean-toolchain # Specifies Lean version
├── README.md # Project readme
├── RangeSetBlaze.lean # Main entry point (imports submodules)
├── Main.lean # Starter file with `def main` (safe to delete if unused)
└── RangeSetBlaze/
└── Basic.lean # Example submodule (place your code here)
Test the project with:
lake exe rangesetblaze
This outputs:
Hello, world!
- Get Mathlib. Mathlib is Lean’s giant standard math library, full of subtypes, sets, and proofs you’ll reuse instead of reinventing.
Aside: You can follow these instructions for adding Mathlib yourself or give them to your coding agent in VS Code.
To add Mathlib to your project, edit your lakefile.toml
and add this at the end:
[[require]]
name = "mathlib4"
git = "https://github.com/leanprover-community/mathlib4"
Then, in the VS Code terminal, run:
lake update
lake exe cache get
This pulls in Mathlib and downloads precompiled versions, so you don’t have to compile everything from scratch.
Now, in RangeSetBlaze/Basic.lean
add:
import Mathlib.Data.Int.Basic
Test it from the VS Code terminal with
lake build # compiles all Lean files in the project
- Use Git (the VS Code interface is fine) to checkpoint your progress. Each time you reach a milestone — anything you might want to roll back to later — commit your changes. Pushing to GitHub (or another remote) is optional but provides an extra layer of safety. A good rule of thumb:
When Vibe Validating, you must learn Git. Commit whenever you’d regret losing your current state if something goes off the rails.
You’re now set up with Lean. If you do wish to learn a bit of the Lean language, I recommend A Lean Syntax Primer by Dan Abramov.
Next, we’ll see how to get AI to write the Lean code for you. Your role is to explain your algorithm clearly, verify that the AI’s output matches your intent, and push back if it starts proving the wrong thing. Think of it as pair programming: you bring the domain expertise; the AI brings the Lean.
Rule 3: Drive Claude Sonnet 4.5 (or Codex) with ChatGPT-5.
During this project I tried several AI tools. I began with ChatGPT-5 in the browser ($20/month) and early Claude models in GitHub Copilot (now $10 per 300 “premium requests”). Then came OpenAI’s Codex as its own VS Code extension, and finally Claude Sonnet 4.5.
Each step was an upgrade: Codex outperformed the older Claudes, Claude 4.5 beat Codex, and all of them were surprisingly good at Lean — even though Lean is a niche language compared to Python or JavaScript. If you’re following along today, just use the newest and best models you have access to; the names will change, but the workflow won’t.
But there was a catch: quotas. Both Copilot and Codex coding agents eventually hit their usage limits during the project. To keep going — and to get access to Claude Sonnet 4.5 — I spent $10 for a GitHub Copilot subscription and $12 for extra premium requests (about 360 calls). ChatGPT-5 in the browser, by contrast, has no monthly quota. It does throttle short bursts, but it cheerfully accepts prompts with more than 1,000 lines of code.
The winning workflow for me was to have ChatGPT-5 plan the incremental steps. Then I fed its instructions one at a time to Claude or Codex running in full agent mode inside VS Code.
Aside: Agent mode feels like magic. The AI edits your Lean code, compiles it, and keeps retrying until every proof passes (Lean marks that with a satisfying green check). But you won’t always want your AI agent running for too long at once; it can burn through your quota or get stuck on an unprovable goal. It’s better to give it bite-sized goals so each run makes steady, verifiable progress.
In practice: ChatGPT-5 is the strategist, Claude or Codex is the executor. That pairing drove thousands of lines of Lean proof without burning me out — or burning through quotas too fast.
Aside: I ran this project twice — once to see if it was possible, and again to test these Nine Rules from scratch.
Now that the project setup and workflow are in place, it’s time to start the actual formalization. You’ll start by defining the fundamental types and relationships that make up your algorithm — the mathematical bones on which the proofs will rest.
Rule 4: Define your algorithm’s basic concepts.
Your first Lean work lays the mathematical foundation on which your proofs will stand. The goal isn’t to prove anything yet, but to describe the basic structures — in my case, ranges, sets, and lists — that the algorithm will manipulate.
Here’s the prompt I gave to Claude Sonnet 4.5, my AI coding agent:
I'm working in Lean 4, in the project created by lake init RangeSetBlaze.
Please edit the file RangeSetBlaze/Basic.lean to add the core data structures
for a port of the Rust library RangeSetBlaze.
Here's what I need:
• Define a type that represents an inclusive range of integers, with fields lo and hi.
• Our convention: a range is empty if hi < lo. Use this convention so that
proofs are easier later.
• Define a way to view a range as a set of integers. If the range is empty,
the set should be empty. Use mathlib functions for intervals and sets
wherever possible.
• Add a lemma proving that this set view really is empty when hi < lo.
• Define a predicate that says when one range comes strictly before another
with at least a one-integer gap. This is called disjoint. Use the
symbol ≺ (precedes). This is the opposite of touching or overlapping.
• Define a type for lists of non-empty ranges that are sorted and disjoint.
• Provide a way to convert such a list into a single set that is the union
of all its ranges.
• Please make the definitions efficient to evaluate and rely on mathlib as
much as practical.
Then, in Main.lean, add a few runnable examples:
• Create a couple of non-empty ranges and one empty range.
• Show, with small proofs, that the empty range is indeed empty, and that a
short list of ranges is sorted and disjoint. Create another example of a
list that is not.
• In main, print out those ranges and a short message so I can see something run.
• Keep all the core logic in Basic.lean, since lake init already gave me
that file.
Claude ran for a few minutes. (I committed these first results to GitHub, where you can browse them.)
If you now run lake exe rangesetblaze
again, you’ll see the output of the generated tests:
RangeSetBlaze - Lean 4 Port Examples
=====================================
Non-empty range 1: { lo := 1, hi := 5 }
Non-empty range 2: { lo := 10, hi := 15 }
Empty range (hi < lo): { lo := 10, hi := 5 }
Range1 is empty: false
Range2 is empty: false
EmptyRange is empty: true
Range1 disjoint from Range2: true
Adjacent ranges are disjoint: false
Overlapping ranges are disjoint: false
Good ranges (sorted & disjoint): [{ lo := 1, hi := 3 }, { lo := 5, hi := 7 }, { lo := 10, hi := 12 }]
All non-empty: true
Sorted and disjoint: true
Bad ranges (overlapping): [{ lo := 1, hi := 5 }, { lo := 4, hi := 8 }]
All non-empty: true
Sorted and disjoint: false
Hello, world!
Once the code compiled, I worked to get VS Code’s Problems panel to zero. Five tips that helped:
- Check that Lean files validate interactively. Open a file like
Basic.lean
and look for green checkmarks or messages in the Lean InfoView panel. If you see errors there, the file isn’t validating yet. (Learn about the InfoView panel.) - Claude created an extra file called
IMPLEMENTATION_NOTES.md
that I didn’t need—and my Markdown linter flagged it—so I just deleted it. - Restart VS Code, sometimes. This often clears stale VS Code-reported errors that appear even when the project builds correctly.
- Ask ChatGPT-5 for help via screenshots. Browser-based ChatGPT-5 can understand images. On Windows 11, press Windows + Shift + S to take a screen clipping, then paste it into the browser chat with your question.
- Commit to GitHub frequently — whenever you reach a point you might want to revert to later if something goes off the rails.
At this stage, you’ll have your algorithm’s mathematical skeleton defined and tested. Now it’s time to give it shape in Lean’s own idioms — to rebuild the same concepts, but the Lean way.
Rule 5: Redo your concepts — this time the Lean way.
At this point, the definitions worked, but when I tried to use them, I hit a dead end. To move forward, I asked browser-based ChatGPT-5 to help rewrite the code so that the result followed Lean’s conventions and worked smoothly with Mathlib’s existing lemmas and automation.
First, I gave Browser-based ChatGPT-5 this instruction:
Given these instructions:
<PASTED INSTRUCTIONS FROM ABOVE>
GitHub Copilot/Claude Sonnet 4.5 produced this Basic.lean file:
```lean
<PASTED Basic.lean FILE>
```
Can you review this and suggest instructions for Claude
that will get it using Mathlib and idiomatic Lean more?
Next, ChatGPT-5 produced a prompt for Claude. I fed that prompt to Claude to generate Lean code. Then, I asked ChatGPT-5 to review the result and refine the prompt. I repeated this loop several times, adding a few of my own ideas along the way. Finally, I arrived at a Basic.lean
that showed the Lean syntax we want and the math we’re aiming for. You can view the full file on GitHub.
Let’s look at a few snippets — not to master every detail, but to see how the code maps cleanly to the math.
The first lines of Basic.lean
import parts of Mathlib for intervals, sets, and list reasoning:
import Mathlib.Data.Int.Interval
import Mathlib.Data.Set.Lattice
import Mathlib.Data.List.Pairwise
Here Claude defines a structure for inclusive integer ranges. The structure is printable, and two ranges are equal when their fields are equal (structure equality):
/-- An inclusive range of integers with fields `lo` and `hi`. -/
structure IntRange where
lo : Int
hi : Int
deriving Repr, DecidableEq
Using a namespace
, Claude defines methods on the IntRange
structure. In Lean, a Prop
(short for proposition) is a statement meant to be proven true or false. It’s used for reasoning, not computation. Here, Claude defined two simple propositions: a range is empty when hi < lo
, and nonempty when lo ≤ hi
.
namespace IntRange
/-- A range is empty when hi < lo. -/
def empty (r : IntRange) : Prop := r.hi < r.lo
/-- A range is nonempty when lo ≤ hi. -/
def nonempty (r : IntRange) : Prop := r.lo ≤ r.hi
Claude now connects IntRange
to Mathlib’s interval and set machinery. It first shows that a range being nonempty is logically the same as it not being empty, and marks this with @[simp]
so Lean can apply it automatically in proofs:
/-- Nonempty is equivalent to not empty. -/
@[simp]
theorem nonempty_iff_not_empty (r : IntRange) : r.nonempty ↔ ¬ r.empty := by
simp [nonempty, empty, not_lt]
Next, it defines how to view an IntRange
as a mathematical set — specifically, the set of all integers between lo
and hi
, inclusive:
/-- View a range as a set of integers using the closed interval. -/
def toSet (r : IntRange) : Set Int := Set.Icc r.lo r.hi
Mathlib’s Set.Icc
(“interval, closed–closed”) provides standard theorems about integer intervals. Building on those, Claude created the following lemma—a formally proven statement within our own code—that uses Mathlib’s results as building blocks. This one states that the set view of a range is empty exactly when hi < lo
:
/-- The set view is empty iff hi < lo. -/
@[simp]
lemma toSet_eq_empty_iff (r : IntRange) :
r.toSet = (∅ : Set Int) ↔ r.hi < r.lo := by
simp [toSet, Set.Icc_eq_empty_iff, not_le]
Aside: In Lean, a theorem and a lemma are the same kind of object — both are proven propositions. By convention, we call something a lemma when it’s a smaller supporting result, and a theorem when it’s a main conclusion. The distinction is stylistic, not technical.
Aside: Lean has a standard coding style — short names, Unicode math symbols, abbreviations, and so on — very different from Rust’s. I find the style off-putting, but I didn’t try to change it; consistency and compatibility of style are more important than personal preference.
Finally, Claude adds a simpler, one-way version of the same fact. Instead of giving an exact logical equivalence, this lemma lets Lean immediately conclude that the set is empty once it knows hi < lo
. It’s a convenience rule for proofs that only need that direction, so Lean doesn’t have to re-establish the full “if and only if” result each time:
lemma toSet_eq_empty_of_hi_lt_lo {r : IntRange} (h : r.hi < r.lo) :
r.toSet = ∅ := (toSet_eq_empty_iff r).mpr h
Together, these bits of Lean code tie the arithmetic view (lo
, hi
) to the set view (Set.Icc lo hi
) and replace ad-hoc conditionals with standard, proof-ready Mathlib facts.
Next, Claude defines a subtype for nonempty ranges and a coercion to sets:
/-- Convenient abbreviation for nonempty ranges as a subtype. -/
abbrev NR := { r : IntRange // r.nonempty }
/-- Coercion from IntRange to Set Int. -/
instance : Coe IntRange (Set Int) where coe := toSet
Aside: Lean has two ways to define names:
def
andabbrev
. Usedef
when you want Lean to treat the definition as a distinct object that can appear in proofs and be unfolded when needed. Useabbrev
when you just want a shorter alias for an existing expression. In this project,abbrev NR := {r : IntRange // r.nonempty}
is simply a convenient shorthand for “a nonempty range,” not a new type of its own.
Now let’s look at the main RangeSetBlaze
data structure (Claude originally named it RangeList
):
def before (a b : IntRange) : Prop := a.hi + 1 < b.lo
scoped infixl:50 " ≺ " => IntRange.before
/-- ... --/
/-- A list of nonempty ranges that are sorted and pairwise disjoint with gaps. -/
structure RangeList where
ranges : List IntRange
all_nonempty : ∀ r ∈ ranges, r.nonempty
pairwise_gaps : List.Pairwise IntRange.before ranges
This defines RangeList
(RangeSetBlaze
) as a structure made up of data and proofs.
- The data is simply a list of integer ranges.
- The proofs ensure two properties:
- Every range in the list is nonempty (that is,
lo ≤ hi
). - The ranges are ordered and disjoint — formally, for any earlier range
a
and later rangeb
, we havea.hi + 1 < b.lo
.
This “Rule 5” version is cleaner and more mathematical than the original “Rule 4” form. It expresses the “sorted and disjoint” property declaratively, letting Lean’s built-in Pairwise
predicate handle all the index bookkeeping automatically.
Claude’s rewrite of Basic.lean
impressed me, but I wasn’t fully satisfied. I asked for one more refinement to make the design closer to my Rust library and more efficient for proofs:
- Rename
RangeList
toRangeSetBlaze
, matching the Rust project. - Define
before
(≺
) only for nonempty ranges, since empty ones can never appear. - Make the
ranges
field a list of nonempty ranges (NR
). - Replace the two separate proof fields with a single field,
ok
, defined usingPairwise
and the infix notation≺
. - Make the structure printable with
deriving Repr
.
That produced exactly what I wanted:
def before (a b : NR) : Prop := a.val.hi + 1 < b.val.lo
scoped infixl:50 " ≺ " => NR.before
/-- ... --/
structure RangeSetBlaze where
ranges : List NR
ok : List.Pairwise (· ≺ ·) ranges
deriving Repr
Now the type is nonempty by construction, the Pairwise
invariant is clean and expressive, and the name matches the Rust library.
You can see the final Rule 5 Basic.lean
or the whole project on GitHub.
Aside. Pushing “nonempty” into the type and leaning on Mathlib avoided a dead end I hit on my first pass. When the type rules out empty ranges, Lean’s proof automation no longer needs to consider them in every case, making its simplification and reasoning tools work more reliably.
So, there you have it: the first five rules for using AI to prove an algorithm’s correctness. Part 2 will cover rules 6 to 9 and more GitHub links to the results. By way of preview, I’ve put the first prompt for Rule 6 below.
Aside: If you’re interested in Part 2 or other future articles, please follow me on Medium. I write on scientific programming in Rust and Python, machine learning, and statistics. I tend to write about one article per month.
Preview of Rule 6: Let the AI invent and prove a toy algorithm — simple, slow, but correct:
The first prompt:
Define an `internalAddA` method on `RangeSetBlaze` in Lean.
The method should take:
1. a `RangeSetBlaze`, and
2. a possibly empty `IntRange`.
It should return a new `RangeSetBlaze` whose set equals the union
of the set of the original `RangeSetBlaze` and the set of the input range.
Use the existing definitions of `IntRange`, `NR`, `before (≺)`, and
`RangeSetBlaze`. Ensure the result remains valid (the list of nonempty
ranges stays sorted and disjoint).
Include any needed helper definitions or lemmas, and a simple correctness
lemma stating:
```lean
(internalAddA s r).toSet = s.toSet ∪ r.toSet
```
Also:
- Add small, concrete examples and sanity checks in `Main.lean` that exercise `internalAddA`:
- Construct a few `IntRange`/`NR` values (empty, touching, overlapping, and disjoint).
- Build a small `RangeSetBlaze`, call `internalAddA` with each case, and show results via `#eval` or `IO.println`.
- Include at least two `example`-style lemmas that compile (e.g., facts about membership after `internalAddA`).
- Make sure everything compiles with `lake build`. If you add a `main` in `Main.lean`, ensure it runs under `lake env lean --run Main.lean` (optional).
Process constraints:
- Work in minimal, compiling steps. After each change, recompile.
- If you hit a loop (e.g., repeating edits with no new errors resolved), stop after 2 compile cycles without progress and explain what’s blocking you.
- Prefer Mathlib lemmas over ad-hoc proofs. Keep proofs short and readable.
- Keep naming and notation consistent with earlier sections.
At the end, output:
1) A brief rationale of the approach (1–3 sentences).
2) The updated Lean snippets (only the changed/added parts), clearly labeled with filenames and line anchors if relevant.
3) The exact commands you used to compile/run.
4) A “next improvements” list (e.g., edge cases, perf, stronger lemmas).
If you're having problems with corrupted files be very careful
to avoid creating UTF8-BOM (create an AGENTS.md to remember this)