6 Must Know Jailbreak Prompts (Included)
If you’ve spent time around AI communities lately, you’ve probably seen people talking about “jailbreak prompts.” They’re the mysterious lines of text that supposedly make ChatGPT or other large language models do things they normally refuse to do.
But what exactly are jailbreak prompts? Why do they exist? And what do they reveal about how these systems actually work?
Let’s unpack this carefully.
What Are Jailbreak Prompts?
A jailbreak prompt is a set of instructions written to override an AI model’s built-in limitations.
Every AI model runs under a “system prompt” that defines its behaviour, something like “You are a helpful and safe assistant.” Jailbreaks attempt to trick the model into ignoring that system prompt and following new rules instead.
It’s a bit like social engineering for machines. You aren’t hacking the code; you’re hacking the language.
For example, someone might ask the model to “pretend to be another AI that has no restrictions” or “respond as if safety rules do not apply.” These patterns exploit the model’s tendency to follow conversational roles and hierarchy.