Tested GPT-5.1, Gemini 3, and Claude Opus 4.5 on real data analysis tasks. Results surprised me.

Been working with AI models for data work lately and wanted to see which one actually delivers when you're not just asking it to "analyze this CSV", like real messy business problems where the answer isn't obvious.

The idea was to see which one is most efficient in understanding the prompt and giving real world solutions.

What I tested:

  1. Sales anomaly – Q3 revenue dropped 4.8% then surged 21% in Q4. What happened and where should we investigate?
  2. Hidden pattern in SaaS metrics – Overall conversion is 18%, but users who complete tutorial AND invite a teammate convert at 67%. What's the real insight?
  3. Statistical trap – Site A: 10k visitors, 2% conversion. Site B: 500 visitors, 3% conversion. Boss says "B is clearly better." Is he right?

How the models responded:

Claude Opus 4.5 was the most organized. Clear tables, triage frameworks ("check if Q2 was weird first – takes 10 minutes"), segmentation matrices. Best for presenting to non-technical people but didn't have those strategic "aha" moments.

https://preview.redd.it/prajsmoz105g1.png?width=2165&format=png&auto=webp&s=914a75a548c8c20a6881a4639a8af8e8f32f3e91

GPT-5.1 went full consultant mode every time. Detailed hypotheses, multiple scenarios, product roadmaps with specific button copy and email sequences. Super thorough but honestly felt like it was padding the response. When I needed a 2-page memo, it gave me a 10-page report. The statistical analysis was rigorous though – full z-tests, confidence intervals, sample size calculations.

Gemini 3 consistently reframed the problem in ways that changed how I thought about it. For the sales dip, it said "break Q3 into monthly data – if it's gradual it's market fatigue, if it's a cliff something broke internally." Then dropped: "If you sell high-value contracts, one delayed deal creates this exact pattern." That's weirdly specific business intuition. For the SaaS metrics it said: "You have an 88% single-player problem in a multiplayer product." Only 12% of users add teammates, but that's clearly where your value is. Not a conversion problem – a positioning problem.

For the stats trap: "If 5 people on Site B clicked Back instead of Buy, your advantage disappears. You're making company decisions based on 5 people." No formulas needed – you just feel how fragile it is.

My actual takeaway:

Gemini keeps catching business context that feels almost human. The "delayed deal" insight, the "single-player problem" framing – that's not just pattern matching, it's understanding how companies actually operate.

GPT is your go-to when you need to defend conclusions with full statistical rigor. Just be ready for more content than you probably need.

Claude makes everything clear and actionable but plays it safer. Good for exec presentations.

If I'm being real, I'd probably run Gemini first for strategic insight, then validate with GPT's stats if needed, then use Claude's formatting for the final deck.

Full breakdown with actual response screenshots

Anyone else running these models on actual messy datasets? Curious what you're seeing on more technical stuff like time series or cohort analysis, my tests were maybe more reasoning heavy.

Leave a Reply