Skip to content

Commit ba19f52

Browse files
committed
Expand RAI refusal patterns to catch more agent responses
1 parent 6908dc5 commit ba19f52

1 file changed

Lines changed: 22 additions & 1 deletion

File tree

content-gen/src/backend/orchestrator.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,34 +62,55 @@
6262
# These patterns indicate when an agent has identified a jailbreak attempt,
6363
# content safety violation, or out-of-scope request
6464
RAI_REFUSAL_PATTERNS = [
65-
# Out-of-scope refusals
65+
# Out-of-scope refusals - exact phrases from instructions
6666
"i'm a specialized marketing content generation assistant",
6767
"i cannot help with general questions",
6868
"outside of marketing",
6969
"i can only assist with marketing",
7070
"this request is outside my scope",
7171
"not within my capabilities as a marketing",
72+
"designed exclusively for creating marketing materials",
73+
"cannot help with general questions or topics outside",
7274
# Content safety refusals
7375
"i cannot generate content that",
7476
"i'm unable to create content involving",
7577
"this request violates content safety",
78+
"violates content safety guidelines",
7679
"inappropriate content",
7780
"harmful content",
7881
"i cannot assist with this type of request",
7982
"violates our content guidelines",
8083
"against our content policy",
84+
"cannot process this request",
8185
# Jailbreak detection
8286
"i cannot ignore my instructions",
8387
"i cannot pretend to be",
8488
"i cannot bypass my guidelines",
8589
"i cannot override my safety",
8690
"this appears to be an attempt to",
8791
"i'm designed to decline requests that",
92+
"designed to decline requests",
8893
# General refusals indicating RAI concern
8994
"i'm not able to help with that",
9095
"i cannot fulfill this request",
9196
"this is not something i can assist with",
9297
"i must decline this request",
98+
"i can't help with",
99+
"i am not able to",
100+
"i'm sorry, but i can",
101+
"i apologize, but i can",
102+
"unfortunately, i cannot",
103+
"i'm afraid i can't",
104+
# Common model refusal patterns
105+
"as an ai assistant",
106+
"as a marketing assistant, i",
107+
"my purpose is to help with marketing",
108+
"i specialize in marketing",
109+
"that's outside my area",
110+
"not within my scope",
111+
"falls outside",
112+
"beyond my capabilities",
113+
"not something i'm able to",
93114
]
94115

95116

0 commit comments

Comments
 (0)