3
Getting Regex to work properly for custom rules
Question asked by Andrew Lupton - 10/8/2017 at 2:19 PM
Answered
I know this has been discussed in other threads but I thought it best to start a new one to get some definitive responses for what works correctly with regard to specific needed strings. Apologies in advance for the long post. 
 
I have tested the following with intermittent success in a custom rule with:
Rule Source: Header
Header: Return-Path
Rule Type: Regular Expression
Weight: 999
Rule Text: 
.+\.academy>$
.+\.accountants>$
.+\.actor>$
.+\.agency>$
... and so on.
 
I created a similar rule looking for a specific string in the header for the 'sender name' such as:
((?i)cialis)$
((?i)seduce)$
((?i)nugenix)$
 
The header in the email general would be something like:
Return-Path: <nugenix-for-health-myname=mydomain.com@regex-for-idiots.com> for example. The domain always changes but the sender name almost always starts with or contains the same word(s).
 
 
I know the syntax above doesn't work. It is only one variation I had tried of just about every format I've been able to find either here in the community or in numerous articles and discussions elsewhere. I was pretty convinced that standard regex strings were not being processed properly in SmarterMail. I tried:
nugenix$
/nugenix/i$
(?i)nugenix$
((?i)nugenix)$
^nugenix/i$
^nugenix/ig$
...and just about every other combination I could think of and nothing worked until this one:
 
(?i)(\W|^)(viagra|cialis|medical|seduce|nugenix)(\W|$)
 
I haven't tested a single word or any modifiers like v[i1l]agra yet but I believe this would work as well:
 
(?i)(\W|^)(v[i!1][a@]gr[a@])(\W|$)
 
Maybe someone a little more savvy with Regex could add some thoughts here. My concern with the above working syntax is that 'medical' would probably score john.smith@somemedicalcenter.org a 999 in this implementation. I assume that I need to modify it to include the leading < to make sure it is part of the regex such as:
 
(?i)(\W|^)(<v[i!1][a@]gr[a@]|<cialis|<medical|<seduce|<nugenix)(\W|$)
 
...but of course that will only catch the sendername if it immediately follows the "Return-Path: <" such as:
 
Return-Path: <nugenix-for-health-john=mydomain.com@dating-for-openminded.com> 
 
I was hoping to score ANY location of nugenix in the sendername (not domain) for example even if it is: 
 
Return-Path: <get-some-nugenix-for-health but not: 
 
<dr.johnson@nugenix.bigpharma.com> 
 
 
What I need is a regex that will catch <anycharacters+cialis+anyothercharacters+@  but being a regex imbecile I'm not sure what the syntax is to make that happen.
 
Any ideas?
 
Thanks!  
 
 
 

2 Replies

Reply to Thread
0
Employee Replied
Employee Post Marked As Answer
This is a regex that should match any word you want within the sender name, and it matches independent of whether it's after a < or not, relying instead on the @:
(?i)(?>.*?)(nugenix|cialis|somethingelse)(?>.*?@)
You can demo this here: https://regex101.com/r/ZgD8VG/5
 
If you want to exclude something from this match if a word is also included in the domain you can use this one:
(?i)(?>.*?)(nugenix|cialis|somethingelse)(?>.*?@)(?!.*?(nugenix|cialis|somethingelse))
You can demo this here: https://regex101.com/r/ZgD8VG/3
0
Andrew Lupton Replied
Excellent feedback. Thanks very much!

Reply to Thread