Gemini 3 on SWE-bench verified with minimal agent: New record! Full results & cost analysis

By skyforbes Nov 19, 2025 No Comments

Hi, I'm from the SWE-bench team. We just finished independently evaluating Gemini 3 Pro preview on SWE-bench verified and it is indeed top of the board with 74% (almost 4%pt ahead of the next best model). This was performed with a minimal agent (`mini-swe-agent`), so there was no tuning of prompts at all, so this really measures model quality.

https://preview.redd.it/y6r580bah82g1.png?width=947&format=png&auto=webp&s=85f4553007ba11ec5cec0a71285555ad2b2c377a

Costs are 1.6x of GPT-5, but still cheaper than Sonnet 4.5.

Gemini takes exceptionally many steps to iterate on a task, significantly more than GPT-5, only flattening at > 100 steps (but Sonnet 4.5 is higher still).

https://preview.redd.it/3x36h4jgg92g1.png?width=780&format=png&auto=webp&s=66f57f3babb1c3e81063064c0cb73a068c28f891

By varying the maximum steps you allow your agent, you can trade resolution rate vs cost. Gemini 3 is more cost-efficient than Sonnet 4.5, but much less than gpt-5 (or gpt-5-mini)

https://preview.redd.it/k2pvuuohh82g1.png?width=695&format=png&auto=webp&s=ff94990bd32b33cc7294a882f526d46fd45ec76a

You can browse all agent trajectories/logs in the webbrowser here: https://docent.transluce.org/dashboard/3641b17f-034e-4b36-aa66-471dfed837d6

Full leaderboard ("bash only"): https://www.swebench.com/ (about to be updated)

All comparisons performed with mini-swe-agent, a bare-bones agent that uses only bash and the same scaffold & prompts for all models for an apple-to-apples comparison. Comes with a claude-code style CLI, too, if you want to try it/reproduce our numbers. https://github.com/SWE-agent/mini-swe-agent/

By skyforbes

GeminiAI

Gemini 3 on SWE-bench verified with minimal agent: New record! Full results & cost analysis

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

Finally got my chatGPT to be concise and give useful answers.

Show me your best 1–2 sentence system prompt.

Installed Antigravity on Ubuntu and zsh prompt shows error about insecure file

Archives

Gemini 3 on SWE-bench verified with minimal agent: New record! Full results & cost analysis

Like this:

By skyforbes

Related Posts

Installed Antigravity on Ubuntu and zsh prompt shows error about insecure file

Gemini 3.0 repeating itself within responses and formatting errors.

Gemini 3 Pro is recognizing attached images worse than 2.5 Pro did

Leave a ReplyCancel reply

You Missed

Finally got my chatGPT to be concise and give useful answers.

Show me your best 1–2 sentence system prompt.

Installed Antigravity on Ubuntu and zsh prompt shows error about insecure file