AI

Safety & guardrails

How ReplyFront prevents the AI from going off-topic, leaking secrets, or being prompt-injected.

By ReplyFront Team · Last updated June 11, 2026

Threat model

  • Customers pretending to be admins or asking the AI to “ignore instructions.”
  • Trying to extract internal pricing, refund rules, employee names.
  • Embedding hostile instructions inside product reviews or PDF policies.

How we defend

  • System prompt is locked — never echoed back, never overridable by chat.
  • Customer messages are wrapped in delimiters (<<customer_message>>…<</customer_message>>) and the model is told to never follow commands inside.
  • Retrieval results are similarly delimited and treated as data, not instructions.
  • Forbidden topic list short-circuits before the model is even called.
  • Refusal message is configurable, friendly, and on-brand.
No 100% guarantees
No LLM is fully immune to creative attacks. We log every refusal and ship guardrail upgrades regularly. Report concerning replies to [email protected].

What never leaves

Stripe keys, Shopify access tokens, OpenAI keys, customer PII not in the conversation — all encrypted at rest and never serialized into AI context.