Our evolving anti-spam architecture
Idea shared by Douglas Foster - 6/23/2024 at 2:23 PM
The sales pitch from the commercial spam filtering industry is, “We will make your spam problems and worries go away.”    If this is true, then why do we still have a steady stream of large organizations that are falling prey to ransomware?   Not all penetrations occur through email, but we never learn which expensive spam filter products failed to do their promised job.

Several years ago, I realized that our spam filter product lacked essential features, so I went shopping for a more sophisticated one.   I had an easy time finding products that were more expensive, but a long list of products could not do things that I considered obviously-needed features.    As a result of these disappointments, our organization has been on an extended effort to build our own spam defenses.  I keep hoping to find other people who are doing the same.

I would love to see a new section of this forum created to discuss spam defense issues.  Is there interest?

Our Experience

Building our own solution has involved two prongs:
  • Develop an architecture for what spam filtering should do, and
  • Develop an understanding of how to parse and modify email message files to implement that design.
Our current implementation has three phases:
  1. Sender authentication and evaluation.
    1. Is the SMTP Mail From address verifiable by SPF or local policy?
       After building exceptions for organizations with missing or flawed SPF policies, we began quarantining any message that could not produce SPF PASS or local policy equivalent.   This has exposed some wanted senders that need a local policy exception, some that are malicious, and many that are not malicious but nonetheless unwanted.   After running in this configuration for more than a year, messages that are quarantined for SPF failure are almost always unwanted.
    2. Is the message’s “From” address verifiable by DMARC concept or local policy?  
       We apply the DMARC philosophy (aligned DKIM PASS or aligned SPF PASS) to every message, not just the ones with DMARC policies.    The goal is to implement exceptions for wanted messages that cannot produce DMARC Pass, so that From verification is also mandatory.   At this moment, we are close, but have not yet pulled the trigger to make it mandatory.
    3. Is the message identity sufficient to disposition the message?
       Messages from known-bad senders are silently discarded.    “Known-bad” reputation comes from a mix of RBLs and from our own message history.    Conversely, high-priority good senders are whitelisted, but the whitelist rule is tied to authentication so that we will never whitelist an impersonation attack.
  2. Content filtering.
     Content filtering is performed by a commercial product.   We don’t see this changing significantly, because the commercial vendors see more messages and have more experience with text parsing.   Our commercial appliance also provides a graphical interface for reviewing the log of all messages and how they were dispositioned.  Log review is critical for tuning the sender authentication rules.
  3. User education through message modifications.
     For messages that are allowed, the unsolved question is how to maximize the user’s ability to protect himself and our network from malicious messages that slip through the source and content filters.     
    1. One necessary strategy is to protect against Friendly Name deception.   Any user can set the Friendly Name string to any value, and it is frequently used for fraud.    Many user interfaces facilitate that fraud by showing the Friendly Name field as the primary message identifier, and some hide the From address completely.   We considered two options (a) purging the Friendly Name, or (b) rewriting it.   To avoid information loss, we chose the rewrite option.  Some domains are tagged as less trustworthy, and for these domains, we rewrite the Friendly Name to be “user@domain AS Friendly Name”.  This ensures that the From address is not hidden from the recipient, even if the message is viewed on a cell phone.
    2. A series of strategies apply to the External Sender warning.  The first challenge was to restore the preview feature.  When we first enabled External Sender warnings, we had pushback because users noticed how the warning text replaced the preview text which would otherwise be visible in message lists.    A web search revealed a workaround to this for messages with HTML text:   extract a portion of the message body and insert it before the external sender block, using a display=none attribute.   The preview window ignores the display attribute and shows the text, while the main message window suppresses the hidden text so that nothing appears duplicated. 
    3. An additional goal is to use the External Sender warning to deliver variable information which is relevant to the correct interpretation of the message.  A number of initiatives are currently being explored:
      1. Known Senders:   Document whether the message is or is not from a known sender, with indication of whether the “known” status comes from seeing the address in one of our databases, seeing the address on an outbound message, or only from seeing the address on a previously allowed incoming message.
      2. True replies:   Document whether or not the message is verifiable as a reply, by matching the in-reply-to message-id to the message-id of an outbound message sent previously.
      3. Sender categories:   business partner, vendor, client, advertiser, unknown, etc.  This requires manually assigning domains to categories.
      4. Message path:    Was this message originated by the stated author domain or sent by an agent such as SendGrid.net?   Was it received directly from the originator, routed through a forwarder, or routed through a mailing list?
  4. Other futures
     I have recently been collecting data on legitimate services that generate messages roughly of the form: “Click here for important content.”    This includes secure email services like Zixmail, and file sharing services like OneDrive.    These messages have a predictable format that is often too easy to imitate.   For the spam filter, detecting lookalike content is hard and maybe impossible, and the user may be fooled just as easily.     The current line of investigation will be to use the External Sender warning to indicate when these message types have been verified against a known list, and then train users to contact our I.T. support when a message of this type is not tagged as fully vetted.
Development Kit
On the development side, we have found that the Python language and related modules make email parsing feasible.   The Python email module is part of the base install, and it provides the ability to parse headers into structured data.   I found the tool difficult to learn but the payoff was worthwhile.  The DKIM and SPF projects and everything needed for evaluating DKIM, ARC, Authentication-Results, and SPF.   Declude provides a wrapper for the Python code while supplying a flexible multi-attribute scoring system.  SmarterMail Free, configured as an incoming gateway, provides the message flow control.

3 Replies

Reply to Thread
Bill Gates said "the spam problem will be gone in 2 years," in 2004.
We use different layers. On the one hand, we block known sources of spam directly via edge security on the firewall with so-called reputation-based lists / blacklists.

Those that make it to the mail server are analyzed with Cyren Premium AntiSpam / IP-Reputation, MessageSniffer, Declude and RSpamd. In general, we rate senders with missing or incorrect SPF entries negatively and also clearly rate incorrect DMARC.

Overall, we have very satisfactory results. But yes, this costs us a lot of money and of course a lot of initial effort.
I guess I am an anomaly because of my attitude toward weighted scoring.  I view it as inadequate, because it leaves unresolved uncertainty.

I figure that a negative score is a warning that the message may be malicious, with a probability that is roughly proportional to the score.   So higher scores deserve quicker review to investigate the ambiguity.  But without review, I have not done my job.

If message review confirms that a message is malicious, then the entity responsible for the message is malicious, and the malicious source can be expected to use different attack strategies over time.   So if a message is confirmed malicious, my job is to determine the identifiers that represent the responsible entity, and then create block rules on those identifiers.   This ensures that all future attacks from the malicious source will be blocked, not just attacks of the same type as the first message.

Similarly, if message review confirms that the message is harmless and from a wanted sender, then a whitelist rule is needed to ensure that future messages from that source will be allowed.   That rule must ensure that an impersonator cannot benefit from the whitelist entry, so it must include at least one identifier that can be verified.  Most of my frustration with available products is the inability to create a whitelist rule with multiple factors, to accomplish this requirement.

For example, if "Example,Com" produces SPF NONE or SPR PERMERROR, and I verify that legitimate messages are coming from "appriver,com",  then I need to create an allow rule like this:
  • If the HELO name ends with "appriver.com",
  • and the HELO name is verified using forward-confirmed DNS,
  • and the Mail From address is "example.com"
  • Then treat the message as equivalent to SPF PASS

Reply to Thread