I tested Gemini 3 Pro vs. GPT-5.1 on real coding tasks so you don’t have to

By skyforbes Nov 29, 2025 No Comments

Gemini 3 Pro dropped recently and Google pushed it everywhere at once. Search, Workspace, the whole ecosystem got it at once. With that kind of confidence and all the buzz around its reasoning, I was curious about one thing that actually matters for me as a dev.

Can it code better than GPT-5.1?

Because so far, GPT-5.1 has been the most reliable model for some of my real projects (better than Claude 4.5 Sonnet).

So I tested both models on two actual real tasks:

Build a Windows style UI
Build an agent with UI from scratch using our Tool Router which also helps in dogfooding

NOTE: I've included the UI build just because this model is said to be the best model for working on the frontend, so why not put it to the test?

How I tested

GPT 5.1 was tested through OpenAI Codex
Gemini 3 Pro was tested through the Gemini CLI

Stats from my test

These are the raw stats from the test that matters:

Gemini 3 Pro

UI build: about 30k output tokens
UI build time: close to 10 minutes
Agent build: around 14k output tokens
Agent build time: around 5 minutes
Follow ups needed: very few
Hallucinations: minimal

GPT 5.1

UI build: similar token use but simpler output
Agent build: needed manual fixes after first attempt
Agent build time: slower overall because it did not follow the context well
Follow ups needed: multiple
Hallucinations: completely mocked the initial implementation

TL;DR

Gemini 3 Pro: Killed the UI task with almost no follow ups, and about 30k tokens in around 10 minutes. It also handled the agent build way better, finishing a working version in roughly 5 minutes with around 14k output tokens. Barely hallucinated and overall feels like the safer pick for day to day coding and agent workflows.
GPT 5.1: The code it writes is often cleaner and more maintainable, but it kinda fell apart on the agent test and didn’t pick up enough context from what I gave it. At first, it completely mocked the implementation, but then with some manual fixes, it produced something usable.

Verdict

If you're building tools or agentic workflows, just go with Gemini 3 Pro. For UI, Gemini 3 is better as well, but GPT-5.1 is still a great model for day-to-day coding. It just works, and I've had little to no issues with it.

If you want the full breakdown with token usage, code, and timings, here's the full blog: Gemini 3.0 Pro vs GPT 5.1

What should I test next? Thinking of doing something even bigger.

Has anyone else tried Gemini 3 for real coding yet? Curious how your results look.

By skyforbes

GeminiAI

I tested Gemini 3 Pro vs. GPT-5.1 on real coding tasks so you don’t have to

Stats from my test

TL;DR

Verdict

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

What happened to the 5.1 hype

My image requests are just at “Creating image. Just a moment”, and I don’t know what else to do.

How to consistently start a selfie recording on Samsung phone with Gemini during a traffic stop

Archives

I tested Gemini 3 Pro vs. GPT-5.1 on real coding tasks so you don’t have to

Stats from my test

TL;DR

Verdict

Like this:

By skyforbes

Related Posts

How to consistently start a selfie recording on Samsung phone with Gemini during a traffic stop

GEMİNİ PROMPS 👇🏼

Features that you love that aren’t mentioneed enough?

Leave a ReplyCancel reply

You Missed

What happened to the 5.1 hype

My image requests are just at “Creating image. Just a moment”, and I don’t know what else to do.

How to consistently start a selfie recording on Samsung phone with Gemini during a traffic stop