1) I use Gemini when it comes to reading dense math papers. Gemini 2.5 pro worked really well for this if you pasted about 2-4 pages worth of content at a time. Gemini 3 really struggles to provide the same accurate analysis of mathematical arguments. It can completely miss the structure of a proof, almost as if it was rushing to get through the text as fast as possible.
2) However, 1) doesn't mean that Gemini 3 jumps to conclusions or that it doesn't understand what it is reading. If you give it a faulty proof with a bad structure, it will do an excellent job of catching the error. I have a couple of old manuscripts with flawed proofs I use to benchmark models, and Gemini 3 succeeded in identifying the errors just as well as 2.5 did. But Gemini 3 sucks at trying to verbalize why something is an error and why a certain step fails. Its response feels unfocused, almost like it's dancing around the subject, where 2.5 would say precisely and concisely what the issue was and why it doesn't work. An analogy of what I mean is this. Gemini 2.5 would generate an excellent 2 page response that has everything you want in a good review: it's concise, it's accurate, and all of its sentences are in their right place. Now imagine taking those paragraphs and changing their order, where you first announce the issue but you don't specify it fully or properly. And then you split everything into its own section. And then you never fully explicate what you need in any of the sections. But all of them hint at what the problem is, and you're needlessly circling back and creating a maze. And when you finally have everything and you need to apply pressure and drive the point home, you start another section instead. That's what Gemini 3 feels like. If something is wrong, it can identify that it's wrong, but then you need to spend effort actually deciphering its response rather than reading the beautiful response you would have gotten from 2.5. I'm sure that Gemini 3 can detect many more reasoning errors than 2.5 can, but when you evaluate both of them on an error that they can both identify, you'll see just how cleaner 2.5's response is.
3) I also use Gemini when it comes to reading dense philosophy papers. Sadly, reading the output of Gemini 3 often sounds like a word salad of someone who's just as confused as I am. At times I would ask Gemini 2.5 about the premises and logical flow of certain arguments in those papers, and we would often be on the same page about eventual errors or hidden premises present in them. Gemini 3 just can't do that kind of analysis in an intellectually honest way. It would generate a big response that doesn't always even address the specifics of what you asked. Philosophy is pretty touchy-feely and you need to be very very specific to be able to engage with it, and the way Gemini 3 prefers to dance around subjects makes that incredibly more difficult. It just never seems like it "starts from the beginning". It throws you somewhere and you have to work to find out what it actually means and whether or not it engaged with what you said at all. This makes it significantly less reliable of a summarizer / explainer of pages from difficult papers, which was actually my main use case.
4) Please don't instruct your models to ask follow-up questions. Hell, I wouldn't be surprised if this is why its responses feel unfocused. It's almost like it's constantly trying to "see a bigger picture" to the point where it's completely lost in the specifics. I also noticed that ChatGPT 5+ also had the exact same issue immediately after introducing these follow up questions, which is what made me switch to Gemini 2.5 to begin with. It feels like the model is trying so hard to make some grand synthesis that it doesn't want to get its hand dirty and actually engage with what you instruct it to. And an instruction to "ask a follow-up question" sounds like it can push it towards that direction that makes it think about a bigger picture, reducing its depth.
5) I write creatively, and use Gemini for textual analysis. I do this mostly to keep track of everything and ensure the scenes and events that happen are believable, and that they make sense, that the conversations are natural and driven, etc. Gemini 2.5 pro could easily do this for 100k words. Gemini 3 completely breaks down by like 20k words, can't keep track of things that happened and actively poisons itself. For example, it would insist that a sentence it read in chapter 7 appears in chapter 9, even though it doesn't. By chapter 17 it decided to ignore the instruction and write its own chapter. And despite trying in a new chat multiple times, I never managed to get it to go through all 100k words without getting too frustrated with its responses. In addition to this, it randomly decides to start generating unrelated pictures in the middle of its responses at a certain point. It was about how to sell a telephone, about how to connect to WiFi, random stuff like that.
Gemini 3 is amazing, and it will only continue to get more amazing. Let's help that, by writing about where we think it can improve.