Safety & guardrails

How ReplyFront prevents the AI from going off-topic, leaking secrets, or being prompt-injected.

By ReplyFront Team · Last updated June 11, 2026

Threat model

Customers pretending to be admins or asking the AI to “ignore instructions.”
Trying to extract internal pricing, refund rules, employee names.
Embedding hostile instructions inside product reviews or PDF policies.

How we defend

System prompt is locked — never echoed back, never overridable by chat.
Customer messages are wrapped in delimiters (<<customer_message>>…<</customer_message>>) and the model is told to never follow commands inside.
Retrieval results are similarly delimited and treated as data, not instructions.
Forbidden topic list short-circuits before the model is even called.
Refusal message is configurable, friendly, and on-brand.

No 100% guarantees

No LLM is fully immune to creative attacks. We log every refusal and ship guardrail upgrades regularly. Report concerning replies to [email protected].

What never leaves

Stripe keys, Shopify access tokens, OpenAI keys, customer PII not in the conversation — all encrypted at rest and never serialized into AI context.

AI assistant

Tune the persona.

Security overview

Encryption & compliance.