Prompt as code – A simple 3 gate system for smoke, light, and heavy tests

By skyforbes Dec 6, 2025 No Comments

https://preview.redd.it/4x5vte5n5a3g1.png?width=1536&format=png&auto=webp&s=9c0c35544c51d6dbd78a3c27b7cc271cc11cacae

I keep seeing prompts treated as “magic strings” that people edit in production with no safety net. That works until you have multiple teams and hundreds of flows.

I am trying a simple “prompt as code” model:

Prompts are versioned in Git.
Every change passes three gates before it reaches users.
Heavy tests double as monitoring for AI state in production.

Three gates

Smoke tests (DEV)
- Validate syntax, variables, and output format.
- Tiny set of rule based checks only.
- Fast enough to run on every PR so people can experiment freely without breaking the system.
Light tests (STAGING)
- 20 to 50 curated examples per prompt.
- Designed for behavior and performance:
  - Do we still respect contracts other components rely on?
  - Is behavior stable for typical inputs and simple edge cases?
  - Are latency and token costs within budget?
Heavy tests (PROD gate + monitoring)
- 80 to 150 comprehensive cases that cover:
  - Happy paths.
  - Weird inputs, injection attempts, multilingual, multi turn flows.
  - Safety and compliance scenarios.
- Must be 100 percent green for a critical prompt to go live.
- The same suite is re run regularly in PROD to track drift in model behavior or cost.

The attached infographic is what I use to explain this flow to non engineers.

How are you all handling “prompt regression tests” today?

Do you have a formal pipeline at all?
Any lessons on keeping test sets maintainable as prompts evolve?
Has anyone found a nice way to auto generate or refresh edge cases?

Would love to steal ideas from people further along.

By skyforbes

GeminiAI

Looking back at 2025, these are the 6 AI tools that actually helped me daily

skyforbes Dec 6, 2025

GeminiAI

Any ideas how I can more consistently ask Gemini to be consistent with how it executes a specific repeated action?

skyforbes Dec 6, 2025

GeminiAI

A persona for gentle, emergent meaning

skyforbes Dec 6, 2025

Prompt as code – A simple 3 gate system for smoke, light, and heavy tests

Like this:

By skyforbes

Leave a ReplyCancel reply

You Missed

A Japanese winter/Christmas movie with a spooky feel that you recommend?

Someone, at some point in history, stretched a tanned animal hide over a wood bowl, hit it with sticks, and thought it sounded good.

In the longest line I have ever seen at our local church food bank, Been standing here for an hour, its 34 degrees

Stopping programs from being downloaded on all user accounts

Archives

Prompt as code – A simple 3 gate system for smoke, light, and heavy tests

Like this:

By skyforbes

Related Posts

Looking back at 2025, these are the 6 AI tools that actually helped me daily

Any ideas how I can more consistently ask Gemini to be consistent with how it executes a specific repeated action?

A persona for gentle, emergent meaning

Leave a ReplyCancel reply

You Missed

A Japanese winter/Christmas movie with a spooky feel that you recommend?

Someone, at some point in history, stretched a tanned animal hide over a wood bowl, hit it with sticks, and thought it sounded good.

In the longest line I have ever seen at our local church food bank, Been standing here for an hour, its 34 degrees

Stopping programs from being downloaded on all user accounts