5,000 Redditors say ‘ChatGPT got dumber.’ Anthropic confirmed bugs. Here’s what still works.


Is AI actually degrading or are we all losing our minds?

The evidence is real:

  • 5,000+ Reddit users reported GPT-5 "feels like a downgrade" with shorter, lower-quality responses.
  • Stanford/UC Berkeley study found GPT-4's accuracy on math problems dropped significantly over months
  • Anthropic officially admitted THREE separate bugs affecting Claude Sonnet 4, Haiku 3.5, and Opus 3 from August-September 2025
  • OpenAI acknowledged "elevated latency issues" affecting ChatGPT

Developer on OpenAI forum: "ChatGPT is every day more useless… fails to follow extremely clear and simple rules"

Here's the wild part:

Anthropic's bugs only affected 0.8-16% of requests at peak.

Yet THOUSANDS complained about quality drops.

This reveals the truth: We blame the model when our prompts fail.

When AI has an off day, bad prompts collapse completely. Structured prompts still deliver.

The real problem:

Research from ProfileTree shows 78% of AI project failures stem from poor human-AI communication, not model limitations.

We want to blame "AI degradation" because it's easier than fixing our prompts.

The solution: DEPTH Method

During the August-September Claude bugs and GPT-5 rollout chaos, I tested which prompts survived model degradation. This framework held up:

D – Define Multiple Expert Validation

Instead of: "You're a developer"

Use: "You are three experts working together: a senior developer writing the code, a QA tester identifying edge cases, and a code reviewer checking for bugs. Each expert validates the others' work."

Why it survives degradation: Creates internal error-checking even when the model is buggy.

E – Establish Explicit Success Metrics

Instead of: "Write good code"

Use: "Code must: pass these 5 specific test cases [list them], follow PEP 8 standards, include error handling for [scenarios], run in under 2 seconds, flag ANY assumptions as UNCERTAIN with explanation"

Why it survives degradation: Removes ambiguity that causes failures when models struggle.

P – Provide Complete Context

Instead of: "Fix this code"

Use: "Project context: uses Flask 2.3, Python 3.11, deployed on AWS Lambda. Previous attempts failed because [X]. Performance requirements: [Y]. Edge cases to handle: [Z]. Current error: [specific traceback]."

Why it survives degradation: Grounding in specifics reduces hallucinations even when model quality dips.

T – Task Sequential Breakdown

Instead of: "Debug, refactor, and document this"

Use:

  • First: Analyze the error and identify root cause
  • Second: List all edge cases this must handle
  • Third: Write the solution with inline comments
  • Fourth: Test against all edge cases and report results

Why it survives degradation: Prevents AI from jumping to conclusions when reasoning is impaired.

H – Self-Critique Loop (CRITICAL FOR DEGRADATION)

Instead of: Accepting first output

Use: "Review your solution. Rate it 1-10 on: correctness, performance, edge case handling. Test it mentally against these scenarios: [list]. If ANY score below 8, revise. Flag anything you're uncertain about as UNCERTAIN and explain your doubt."

Why it survives degradation: This catches errors the model makes on bad days. Self-critique forces double-checking.

Real-world proof:

During the confirmed Anthropic bugs (Aug-Sept 2025), users with structured prompts reported fewer issues than those using simple requests. The self-critique step caught hallucinations before they became problems.

The uncomfortable truth:

Simple prompts worked great in 2023. In 2025, with model instability, they fail more often. DEPTH adds the structure needed for consistent quality even when models have off days.

Want prompts that survive AI's bad days?

I documented 1,000+ prompts using DEPTH that worked through:

  • The August-September Claude bugs
  • The GPT-5 rollout issues
  • Various model degradation periods

Each prompt includes:

  • Multi-expert validation structures
  • Explicit success criteria
  • Self-critique loops
  • Error-catching mechanisms

Checkout my collection. These are battle-tested during confirmed AI degradation periods.

Bottom line: AI models DO have issues sometimes. But structured prompting is the difference between "AI failed me" and "I got usable results anyway."

Anyone else found prompts that work during model degradation?

Leave a Reply