I’ve been exploring issues around LLMs leaking system prompts and unexpected jailbreak behavior.
Thinking about a lightweight API that could help teams:
– detect jailbreak attempts & prompt leaks
– analyze prompt quality
– support QA/testing workflows for LLM-based systems
Curious how others are handling this – do you test prompt safety manually, or have any tools for it?
(Set up a small landing for early interest: assentra)
Would love to hear thoughts from other builders and researchers.