Kimi 2 Thinking vs. Detectors: ZeroGPT vs. AI or Not (Case Study Results)

I recently ran a case study on Kimi 2 Thinking to see how its output holds up against current detection tools. I tested the outputs against two popular detectors: AI or Not and ZeroGPT.

The Findings: I found a massive divergence in how these tools handle Kimi 2:

  • ✅ AI or Not: Did a solid job interpreting Kimi’s responses. The classification was generally consistent with the model's actual output nature.
  • ❌ ZeroGPT: Really struggled. It generated a high volume of false positives and inconsistent classifications that didn't reflect the model's performance.

Discussion: It seems ZeroGPT is failing to generalize well to newer architectures or "reasoning" style outputs. For those of us comparing models or tuning prompts, relying on legacy detection metrics might skew evaluation data.

Has anyone else noticed ZeroGPT degrading on newer models like Kimi 2 or o1

Case Study

Leave a Reply