Coming from ChatGPT 5.1 Thinking, the difference is night and day.
Just with Nano Banana, it absolutely does not understand directional instructions. And it keeps going on loops with edits, regressing constantly. In projects it cannot often see reference images and constantly make up random things and false confident claims.
With multi-step/multi-element tasks, it just randomly decides what it will focus on and what not and gives an output that ignores a big chunk of what you asked. Again, keeps regressing, keeps missing things, keeps unreliable changing things for no reason.
I can literally see in its thought process it gets my instructions, then either ignores it, or can't access a resource, then decides to just make some crap up.
It's like working with someone who has ADHD, is a compulsive liar, and just can't be bothered making any effort to listen to you.
Vs. GPT you give it a complex tasks – it will take MUCH longer, but you almost know for certain it's followed everything.
It's improved since 2.5 by FAR from being as good as Google is claiming.
These benchmarks seem to mean very little at the end of the day it seems.