Custom spam rule for blocking domains.
Question asked by Brian Phillips - February 4 at 7:37 AM
Unanswered
I've seen rules set up to block messages based on their domain name but that list can get long so I'm trying to get a rule together that will flag anything that is not a common top level domain such as .com, .net, etc.  Here is the rule that's been proposed:

Rule Source: Raw Content
Rule source: Regular Expression
Rule Text:  From:[^.\r\n]+\.(?!(com|net|gov|edu|org)[^\w.])

The rule works well as long as there is not a dot in the username (user.name@mail.com) or in has a sub-domain (username@sub.mail.com), otherwise they are falsely flagged.  I'd like to get it to where it only pays attention to to what's after the last dot after the @.  Any thoughts?

Thank you

7 Replies

Reply to Thread
0
Steve Norton Replied
Have you considered a whitelist of the common domains instead. As you say the list of non common domains can get quite long and new ones are added from time to time so you'd be constantly updating a blacklist approach.
You could lower your Spam thresholds by 3 and create rules for .com, .net etc and give them a weight of +3 negating the threshold decrease.
0
Brian Phillips Replied
I hate to do any whitelisting as there is plenty of spam that comes from those top level domains.  I'd just like to automatically assume something is spam if it is not coming from one of these domains, which is the case pretty much all the time for us. 
0
Steve Norton Replied
You would still run all other Spam checks so mail from .com gets checked just as it does today, it's not a bypass e.g.
Today - Spam low = 10 - Spam high (and deleted) = 20
Mail from .com - CBL fail weight +10 - score 10 - marked as Spam low
Mail from .ru - no RBL match - score 0 - delivered to the inbox
With whitelisting - Spam low = -10 - Spam high (and deleted) = 0
Mail from .com - CBL fail weight +10 - .com whitelisted source -20 - score -10 marked as Spam low
Mail from .icu - no RBL match - score 0 - Spam high and deleted

Does that make sense?
I used .icu as that's the latest domain I've seen abused and with the whitelisted option it's deleted by default.
0
Alex Carnot Replied
Employee Post
Hi,
I'd recommend testing this thoroughly before use, but I believe this is the Regex you are looking for.
From:[^@\r\n]+@(\w{2,}\.)*+(?!(com|net|gov|edu|org)[^\w\.]?)
Alex Carnot
Software Developer
SmarterTools Inc.
(877) 357-6278
www.smartertools.com
0
Brian Phillips Replied
Thank you Alex, it looks like we're really close.  This regex string does exactly what I was looking for when searching on regex101.com or in notepad++.  On the server, though, I'm getting the following error in the delivery logs:

System.ArgumentException: parsing "From:[^@\r\n]+@(\w{2,}\.)*+(?!(com|net|gov|edu|org)[^\w\.]?)" - Nested quantifier +.

I'm not the best at regex so any ideas are appreciated.

Thank you
2
Alex Carnot Replied
Employee Post
Hi,
It appears that .NET doesn't support possessive quantifiers.
Here is an updated Regex that replaces the possessive quantifier with an atomic group:
From:[^@\r\n]+@(?>(\w{2,}\.)*)(?!(com|net|gov|edu|org)[^\w\.]?)

If you'd like to know the meaning of this, I used the possessive quantifier (*+) to force the regex to capture all the way to the last period in an email address. This prevents backtracking, disallowing the regex from trying to match before the last period, and so the tld can be properly checked without the regex falling back to a domain/subdomain for a match. An atomic group ((?>)) achieves the same outcome.
Alex Carnot
Software Developer
SmarterTools Inc.
(877) 357-6278
www.smartertools.com
0
Brian Phillips Replied
I made a slight modification to the string as it was flagging domains containing a hyphen.  So far this seems to be working very well although I am still running it on my testing server to see if any other surprises pop up.

From:[^@\r\n]+@(?>(\w.{2,}\.)*)(?!(com|net|gov|edu|org)[^\w\.]?)

Thank you for all the help in getting this going!

Reply to Thread