5
Using Declude to Implement the Spam Filter Reference Model
Question asked by Douglas Foster - 2/13/2020 at 8:20 PM
Unanswered



This is a followup to my following post, which proposed a reference model for spam filtering.

https://portal.smartertools.com/community/a92805/a-theoretical-foundation-for-spam-filtering.aspx

In short, the document suggests that inbound spam filtering involves three stages:

  1. Blocking connections from untrusted or unwanted senders.

  2. Determining whether the sender identity can be verified using SPF, DKIM, DMARC, local policies, and possibly DMARC.  Unverifiable senders could be blocked, quarantined, or allowed.  Quarantine mode is recommended, so that a policy can be configured to either consistently allow or consistently block the sender.

  3. Applying content filtering.   To avoid false positives, some senders will have relaxed filtering applied.  Relaxed filtering is safe as long as sender identity has been verified using a combination of characteristics that are acceptable to the recipient organization.

This process requires several tools:

  1. A way to visualize the incoming message traffic, so that the system manager can efficiently scan for blocked traffic that should be allowed, and allowed traffic that should be blocked.
  2. A tool to display specific messages, to evaluate whether a specific message should be allowed or blocked.   This often requires review of the received message path, the message body, and hidden message headers from the raw content of the message.
  3. A rules engine which allows the system manager to create local policies needed to filter mail correctly.

The strength of Declude is its rules engine, which meets the third requirement better than any other product that I have evaluated.   It matters very little if a product can show the system manager what problems exist, but fails to provide the tools needed to correct those problems.

Before launching Declude, we had a commercial email filter with good visualization tools but a very inadequate rules engine.    We placed Declude in front of that device, at the network perimeter, to obtain the necessary filtering features.    But we have kept the other device in service as well, sitting between Declude and the mail server.  This combination provides us with the visualization tools to address requirements 1 and 2, and it provides some filtering capabilities that are not available in Declude.   Any number of email filters can be configured in series, so there is no need to be limited by the capabilities of a single product.  

Since the downstream email filter only sees traffic that Declude does not block, we also built a tool to parse the Declude log into a SQL database.  The parsed log provides data elements that are not available in the downstream interface, and it provides summary information about messages that are deleted by Declude and never forwarded downstream.

For organizations that do not have a downstream device, I suggest configuring an external filter into Declude which simply copies the current message to an archive folder based on date received.  Old folders can be purged as needed, manually or with an automated script, consistent with organization retention policy and storage limits.   Once the archive structure is created, text editors can be used to review captured messages in raw form, and Microsoft Outlook can be used to review messages in normal mode.

Recommended Configuration

Use Separate Servers for Separate Functions

I recommend using separate servers for Inbound Gateway, Mail Server, and Outbound Gateway.   The threat management issues have significant differences between these three functions, so separating the functions will help to optimize the threat defenses.   When evaluating message traffic, the first step in the process will be to separate incoming from outgoing mail.   Using separate servers will make such separation automatic.

Use SmarterMail with Declude as an Incoming Gateway

This document focuses exclusively on using SmarterMail configured as an Incoming Gateway, with Declude integrated into it for inbound message filtering.

Block Non-Delivery Reports (NDRs)

When a message is forwarded from one server to the next, the receiving server has the option of accepting or rejecting the message.   When rejection occurs, the submitting server is responsible for notifying the user.    This is done by sending a Non-Delivery Report (NDR) as a new message.   For outgoing mail, this behavior is entirely appropriate, because the sender and return path are known with certainty.   However, for incoming mail, NDRs are unwise because the return path can be fraudulent.    If your server generates an NDR to a fraudulent destination, your server becomes part of the spam problem and the unhappy recipient may cause your server to become blacklisted.

In a perfect world, all delivery problems would be detected at the network perimeter, in real-time, by the incoming gateway.  This would permit the incoming gateway to reject the message, making sender notification the problem of the submitting mail server.   In practice, this is not achievable, and even rejections can be a security risk:   a spammer can attempt directory harvesting, where he guesses common variants of an executive’s name, until a submitted email address is not rejected.

SmarterMail contributes to the NDR problem.  While non-existent accounts are detected early, and rejected, disabled accounts and over-quota accounts are detected later, after the message has been accepted.    This generates an NDR at the mail server, instead of a rejection at the network perimeter.

To prevent NDRs, use the following configuration:

  • At the firewall, block outbound SMTP (port 25) from the incoming mail gateway.   NDRs generated by the gateway will be blocked until they expire.
  • At the outbound gateway, quarantine NDRs using content filters.   Subject filters have proven sufficient for our needs.    SmarterMail NDRs have subject lines starting with “Failed: “, and Exchange NDRs have subject lines starting with “Undeliverable: “   Messages should be quarantined or silently deleted; it is essential to ensure that the intercepted NDR does not cause another NDR to be generated in the reverse direction.

  • Use available configuration options to disable NDR generation in any product which provides the option.

  • In Declude, use the “Delete” action instead of the “BounceOnlyIfYouMust” action.   In the time required for SmarterMail to call Declude and obtain an answer, it is unlikely that the connection will still be open and available for a reject response.    The Delete action discards the message without SmarterMail involvement, and SmarterMail will timeout and close the connection with the message “accepted” as far as the submitting system can tell.

Declude Configuration Notes and Issue Workarounds

For technical reasons, Declude must be installed on SmarterMail v16.   Upgrading SmarterMail after installation is permitted.  

Because an incoming gateway forwards all messages inward to the mail server, Declude is configured with OUTBOUNDSCANNING enabled.   Since SmarterMail requires a placeholder domain to be created as part of the incoming gateway setup, a purist should also enable INBOUNDSCANNING and have a filter to delete any traffic addressed to the placeholder domain.

Declude includes a WARN action which writes a custom message header when a particular test fails.   This action is ignored for OUTBOUNDSCANNING mode, so the WARN action is useless on an incoming gateway server.    We have worked around that problem by using an XOUTHEADER which includes the variables %TESTSFAILED% or %TESTSFAILEDWITHWEIGHTS%.

Declude only writes the XOUTHEADER commands to the message if the message if the log level is set to HIGH or DEBUG.

To ensure that the log file encludes encryption details, ensure that the log level is set to DEBUG.

Because of the previous three issues, we operate with Declude with log level set to DEBUG.

Downstream Signalling

In some configurations, Declude will need the ability to signal a downstream system.    The preferred method for signaling is by adding a custom message header, since the header is invisible to the user and because it does not affect DKIM signature verification.    

Since Declude does not have a message log view, one use of this approach is to provide improved visibility to message disposition.   If a downstream device can provide a message log view, Declude can indicate message disposition with a custom header, then let the downstream device perform the disposition based on those headers.

However, there are some minor limitations to the use custom headers on an incoming gateway configuration.   In outbound scanning mode, WARN actions are not written to the outbound message, but instead they are ignored.   The recommended workaround is to use XOUTHEADER  statements, including one that references %TESTSFAILED% or %TESTSFAILEDWITHWEIGHTS%.     Additionally, the global.cfg file must be configured for LOGGING HIGH or LOGGING DEBUG.  At lower logging levels, the XOUTHEADER is not written to the message either.

When the intended purpose is to notify the recipient human, rather than another piece of software, this can be done using the SUBJECT, HEADER, or FOOTER actions.  Be aware that these will invalidate DKIM signatures.    As long as a message will not be auto-forwarded externally, DKIM signatures do not need to be preserved.    If external forwarding is possible, consideration must be given to the possibility that broken signatures may reduce the probability that an auto-forwarded message will be accepted at the final destination.    Another way to solve this problem is by using the Attach action, which replaces the original message and substitutes a notification message with the original attached.

Declude Basic Logic Operations

Declude has three basic filter types:

  • IPFILE – a list of IP addresses and CIDR ranges for matching to an IP Address
  • FROMFILE – a list of Email Addresses to match a Sender Address
  • FILTER – a list of tests of any type

IPFILE and FROMFILE are assumed to be a list of mutually exclusive possibilities, so the test exits on the first match and the entries in the list constitute a logical OR of the list elements.   FILTERS can be used to implement OR, AND, and multi-group selection.

Logical OR Filter – Match any one statement

Logical OR can be created by providing a list of tests, each having a nonzero weight, and setting the match rules using these keywords:

  • MINWEIGHTTOFAIL 1
  • MAXWEIGHT 1
  • STOPATFIRSTHIT

Logical AND filter – Match all statements

Logical AND can be created by providing a list of tests, and specifying that they all must be true:

MINWEIGHTTOFAIL 4

MAXWEIGHT 4

TESTA 1 MATCH VALUE

TESTB 1 MATCH VALUE

TESTC 1 MATCH VALUE

TESTD 1 MATCH VALUE

Group Selection – Match at least M of N statements

By using a value of MINWEIGHTTOFAIL which is between 1 and N, you can require at least a subset of all values to match.   MAXWEIGHT can be used to finish searching as soon as the minimum score is reached, which helps performance, or it can be omitted to identify all of the matching entries.

MINWEIGHTTOFAIL 2

MAXWEIGHT 2

TESTA 1 MATCH VALUE

TESTB 1 MATCH VALUE

TESTC 1 MATCH VALUE

Match with Exclusions

Nearly any filtering rule will encounter false positives, which creates the need for exceptions.    The recommended way to implement this in Declude is to use two filters:   one for the primary rule and one for the exceptions.   

Example:   Delete messages from Japan (*.jp) except Samsung (samsung.jp)

Global.cfg entries
 
SenderBlockExceptions filter <path>\SENDERBLOCKEXCEPTIONS.txt x 1 0
SemderBlockList filter <path>\SENDERBLOCKLIST.txt x 1 0
 SenderBlockList DELETE

SenderBlockExceptions.txt
 
MINWEIGHTTOFAIL 1
MAXWEIGHT 1
MAILFROM 1 ENDSWITH .Samsung.jp
 MAILFROM 1 ENDWITH @samsung.jp

SenderBlockList.txt
 
MINWEIGHTTOFAIL 2
MAXWEIGHT 2
TESTSFAILED 1 NOTCONTAINS SenderBlockExceptions
 MAILFROM   1 ENDSWITH .jp

Avoiding bloated weights

Declude has two methods for dispositioning messages:   Test status and Test weight.   As discussed in the spam filtering model description, the recommended filtering process has three stages:   

  1. Eliminating blacklisted sources.
  2. Evaluating whether traffic passes sender authentication, and optionally dispositioning messages that do not pass
  3. Content Filtering, with reduced filtering for specific authenticated senders, to avoid false positives.

For the first two stages, test weights are irrelevant because disposition will be based on test status.  After the message passes the first two stages, content filtering based on weights is preferred.  To prevent the content filtering results from being skewed by the first two phases, it is necessary to ensure that a message exits phase 2 with a weight of zero.    If a test causes a message to be deleted or bounced before phase 3, the weight does not matter.

To allow a filter test to be true without adding weight, use a filter file with these characteristics:

  • Within the file, ensure that MAXWEIGHT has the same value as MINWEIGHTTOFAIL.   This ensures that the filter file will return either zero or MAXWEIGHT.
  • In the filter test definition, set the success weight to the inverse of the MAXWEIGHT (and the fail weight to zero).   When the file weight is added to the weight on the test definition, the net weight will be zero whether the test is True or False.

Implementing the Filtering Model

Phase 1 – Blacklisted sources

Phase 1 blacklisting checks for any single attribute of a message which is disqualifying.    These attributes are the ones most likely to be useful for blacklist filtering with static lists and RBLs:

  • Blocked IP addresses.   Declude IPFILE is the natural tool for these checks.   An exception file is not required, because exceptions can be created by splitting one IP block into several smaller ones, with gaps as desired.

  • Blocked Envelope-Sender addresses.   If no exceptions are required, the Declude FROMFILE is the easiest solution.   If exceptions might be needed, a FILTER file using MAILFROM clauses will provide flexibility.

  • Reverse DNS.    This is implemented using a FILTER file with REVDNS ENDSWITH statements.   These rules are likely to require exceptions, as explained earlier in Match with Exclusions section.

  • HELO/EHLO name.   This is implemented using a FILTER file with HELO ENDSWITH statements.  These rules are also likely to require exceptions.

  • RBLs (IP Reputation Block Lists), DNSBLs (DNS reputation block lists).   Declude comes preconfigured for checking many of these services, which can be enabled or disabled based on the organization’s preferences and relationship with the block list developer.  Because RBLs use database lookups rather than static zones, RBL lookups can fail if DNS is forwarded to a public DNS server such as Google (8.8.8.8).   Use an internal DNS server that has no external forwarders configured.

  • Message FROM.   Surprisingly few email filters provide full support for filtering on the Message FROM.   Declude also has this weakness, but the weakness can be overcome, because Declude supports regular expresssions using the HEADER PCRE statement.   It also supports calls to external programs.   SmarterMail interacts with Declude using two files, the .EML file contains the message raw content, and the .HDR contains a few lines of summary information prepared by SmarterMail and updated by Declude.   One of the lines in the HDR file is a line containing the message from, preceded by the label “from: ”.   Because the header file is shorter and simpler than the message, it provides the easiest access to the Message From label.

Phase 2 – Sender Authentication

At first glance, Declude appears poorly prepared to support sender authentication, because the SPF component only tests for PASS and FAIL, and it has no support for DKIM or DMARC.     Even worse, our testing indicated that the SPF component only processed the first segment of an SPF policy that was published as multiple strings.   However, Declude is extensible with external calls, and we were able to implement both SPF and DKIM in Declude.

People close to the IETF standards process have implemented proof-of-concept code, for both SPF and DKIM, written in Python and published as freeware.   We have integrated the Python libraries for DNS, SPF, and DKIM into Declude, with some customization to improve the integration.   With these changes, we have these sender authentication capabilities:

  • SPF – A bitmap test for SPF, which returns all seven possible results.   A default SPF entry is used for domains with SPF NONE.

  • DKIM – Another bitmap test checks for a DKIM signature which is domain-aligned with the message From address.  Result bits indicate whether the signature is verified, signature present but not verified, signature missing, or the header itself is not present.  Additionally, it tests for domain alignment between the DKIM signature and the Message From address.    For this purpose, domain alignment occurs if the signature domain is equal to, or parent of, the message From domain.


  • Domain alignment between Envelope-From and Message-From:   This is one of the bits included in the DKIM test.   For this test, domain alignment means that either the domains match, or a parent-child domain relationship exists, in either direction, between the two addresses.   This is useful when a DKIM signature is not available.

  • Host name validation:  Another test takes a hostname and the Source IP address as parameters, then returns a result indicating whether the host name can be DNS forward-resolved to that IP.    This permits filtering based on host names, since the successful lookup indicates that the host name has not been spoofed.

Note:   the domain alignment rules indicated above are more restrictive than the relaxed domain alignment rules in the DMARC specification, which allows any two values from the same organization.  Determining a matching organization is more complex than checking a parent-child relationship.   Others can modify the Python code to implement their preferred policy.

Suggested Sender Authentication Policy and Filters

I recommend that a message sender is considered verfied if:

  • The message has a DKIM verified signature with domain alignment to the Message From address.   SPF result does not matter.
  • The message has a SPF PASS result and domain alignment between Envelope From and Message From.
  • A local policy overrides the lack of DKIM signature and domain alignment by allowing the combination of SPF PASS, Envelope From domain, and Message From domain.
  • A local policy overrides the lack of SPF PASS by allowing some combination of verified host name and Envelope From domain.   This may be combined with an additional override of DKIM or domain alignment checks on the Message From domain.

Some DKIM signatures can be invalidated in transit, at no fault of the sender.  This means that a series of messages from the same source may arrive with a mix of verifiable and unverifiable signatures.   I can imagine implementing a caching scheme which remembers sources that have a history of verified signatures, and using that cache history to ignore an occasional signature failure.   However, I have not attempted to build such a caching scheme.

My goal has been to detect and implement exceptions for desired senders that do not pass the sender authentication test automatically.   Once implemented, all other sender authentication failures could be blocked.    After using this scheme to evaluate our incoming data flow, I have been disappointed by the volume of mail, from desired sources, which fails this sender authentication requirement.   Senders who have not implemented DMARC do not receive feedback about their SPF errors, so it falls on the receiving system manager to notify them or work around the problem.   I may have to content myself with using the process to ensure that filtering exceptions are only granted to authenticated senders.

Phase 3 – Content Filtering

Because of Declude’s sophisticated filtering, content filtering can be very granular.   Suppose you begin receiving emails with phony invoices as attachments.  You want to block invoices from unknown senders, but allow invoices from senders that are known to use email for invoicing.   Once Phase 2 has verified sender identity, conditional content filtering is straightforward.   The “InvoiceAllow” filter checks for “TESTSFAILED CONTAINS SPFDKIMOK”, and Message From (HEADERS PCRE) or Envelope From (MAILFROM) matches one of the authorized domains.    All of the other identification attributes of the message have been validated, so they do not need to be tested again.  The “InvoiceBlock” filter simply checks the SUBJECT CONTAINS INVOICE, or something similar, as well as “TESTSFAILED NOTCONTAINS InvoiceAllow”.

Future Considerations

A high volume of exceptions can lead to such a large number of filters that the process becomes unwieldly.   Some exceptions are ideally configured using HELO-verified plus HELO name, or ReverseDNS-verified plus Reverse DNS name, but this requires doubling the number of exception files.      Declude rereads every filter file for every message, and linear processing of lists will become inefficient as a list grows in length.   All of this creates performance concerns in a high-volume environment.

I can imagine creating an external call to a database stored procedure to optimize these concerns.   I would run the existing Python checks for SPF, DKIM, and Host-to-IP, because that logic is complex and returns its results in a bitmask.   Then the message identifiers and the bitmask could be passed to the stored procedure for a final determination of sender verified or not, and message allowed, quarantined, or blocked.   The database procedure could implement conditional logic to avoid unnecessary checks, use indexes instead of linear scans to evaluate lists, eliminate host-name redundant checks, and simplify the configuration process.   This would be expected to reduce the number of tests dramatically while improving peak capacity.  (All that is lacking is a database design and the supporting SQL code!)

All of this becomes possible because Declude can call an external piece of code, including vbscript, jscript, or an executable program.     The sky is the limit.


Reply to Thread