Running Benchmarks on new Gemini 3 Pro Preview

Google has released Gemini 3 Pro Preview.

So I have run some tests and here are the Gemini 3 Pro Preview benchmark results:

– two benchmarks you have already seen on this subreddit when we were discussing if Polish is a better language for prompting: Logical Puzzles – English and Logical Puzzles – Polish. Gemini 3 Pro Preview scores 92% on Polish puzzles, first place ex aequo with Grok 4. For English puzzles the new Gemini model secures first place ex aequo with Gemini-2.5-pro with a perfect 100% score.

– next on AIME25 Mathematical Reasoning Benchmark. Gemini 3 Pro Preview once again is in the first place together with Grok 4. Cherry on the top: latency for Gemini is significantly lower than for Grok.

– next we have a linguistic challenge: Semantic and Emotional Exceptions in Brazilian Portuguese. Here the model placed only sixth after glm-4.6, deepseek-chat, qwen3-235b-a22b-2507, llama-4-maverick and grok-4.

All results below in comments! (not super easy to read since I can't attach a screenshot so better to click on corresponding benchmark links)

Let me know if there are any specific benchmarks you want me to run Gemini 3 on and what other models to compare it to.

P.S. looking at the leaderboard for Brazilian Portuguese I wonder if there is a correlation between geopolitics and model performance 🤔 A question for next week…

Links to benchmarks:

Leave a Reply