I swear when a task is longer than 5 messages it becomes the most stubborn thing in existence. It’s incredible at short things and one shotting but after it does the job once getting it to steer out of what its done seems impossible.
Basically the second it goes down the wrong path in any domain writing or coding it seems very difficult to get it out of it at all. Unless you break it down in extreme detail and tell it exactly what not to do.
When in the first prompt it seems very intuitive. It almost seems like to improve memory they weighted the input higher in the model? I don’t like it. It’s so frustrating telling to stop doing the thing it did in the first prompt for it just to keep doing it over and over again. When 2.5 struggled continuing to do the first prompt throughout the convo lol.