AI
Safety & guardrails
How ReplyFront prevents the AI from going off-topic, leaking secrets, or being prompt-injected.
By ReplyFront Team · Last updated June 11, 2026
Threat model
- Customers pretending to be admins or asking the AI to “ignore instructions.”
- Trying to extract internal pricing, refund rules, employee names.
- Embedding hostile instructions inside product reviews or PDF policies.
How we defend
- System prompt is locked — never echoed back, never overridable by chat.
- Customer messages are wrapped in delimiters (
<<customer_message>>…<</customer_message>>) and the model is told to never follow commands inside. - Retrieval results are similarly delimited and treated as data, not instructions.
- Forbidden topic list short-circuits before the model is even called.
- Refusal message is configurable, friendly, and on-brand.
No 100% guarantees
No LLM is fully immune to creative attacks. We log every refusal and ship guardrail upgrades regularly. Report concerning replies to [email protected].
What never leaves
Stripe keys, Shopify access tokens, OpenAI keys, customer PII not in the conversation — all encrypted at rest and never serialized into AI context.