To follow up my previous post, this is a summary of how I use SQL with Declude
Pre-Requisites:
You need authentication scripts to assess SPF, DKIM, and fcDNS. I have these already, mostly from freeware sources with a little customization. They are written in Python, and are called from a custom filter implemented as a vbscript. Authentication Results are written to SQL, and keyed on TrxID (the number used to create the name of the .HDR and .EML files) Once the initial transaction record is created, additional data can be added by matching on the TrxID.
Matching Logic
Host names and email addresses are broken at segment boundaries and loaded into a temporary table.
Example: John @ bounce.e.example.agency
Becomes these temporary table entries:
- John @ bounce.e.example.agency
- bounce.e.example.agency
- e.example.agency
- example.agency
- agency
The temporary table is joined to the policy table using inner join. This reduces flexibility slightly, but it eliminates the performance problems associated with “ends-with” matching rules. Instead, all matching is based on exact equality and can use SQL indexes and query optimization. I currently have tables with several thousand entries, but they could probably grow 100-fold with minimal performance impact.
An additional parameter to the query is whether the source item is validated or not. A flag on the data record indicates whether a validated match is required. If the validation-required bit is set on the policy record, then the source data must be validated for the record to match. This fits a security design which says that untrusted names do not need to be verified, because spoofing is not a concern. Trusted names must always be verified before they can be trusted, because I need to rule out spoofing.
Design flaw: Because of this design, two entries for the same name may be required, one to set an action for the verified name, and one to set an action for an unverified name. This has not been a huge issue because an unwanted and unverified name is often blocked based on a shorter segment of the name. Nonetheless, a better design would be to have set two flags on each record, one for verified and one for unverified. I intend to switch to that design in the future.
The SQL query sorts by length of the match string, and returns the TOP 1 record. The result data element is a simple letter starting from A. (Z indicates no match). When multiple queries are consolidated, the lowest letter wins. The design is intended to provide flexibility of interpretation, but I currently use these codes:
- A = Allow and whitelist
- B = Block (Delete silently)
- C = Quarantine
- D = Default – allow without whitelist
I have tables and supporting stored procedures to implement these types of queries:
- IP address allow/block
- DNS name allow/block
- Email address allow/block
- IP + Email address allow/block
- DNS name + SMTP MailFrom address allow/block
- SMTP MailFrom address + Message From Address allow/block
This approach provides a lot of flexibility. The single-attribute matches are mostly used to block unwanted sources. The multi-attribute matches are used for allow actions such as overriding a sender’s incorrect SPF policy, as well as block actions.
HELO name and Reverse DNS name are treated interchangeably, which fits my experience with actual data flows. In many cases, the two names match. When they are different, the HELO name is more likely to forward-confirm. But when the Reverse DNS name can forward-confirmed, it can be useful. For example, outlook.com servers will always forward-confirm on the Reverse DNS name but never forward-confirms on the HELO name, even though both names have similar suffixes.
When I find spam that uses a spammers infrastructure, I always block based on the server DNS name. Blocking on IP address and email domains are also appropriate, but host name seems to be the most powerful weapon against them. When I find an attack from a gmail account, I create a policy record to block the single address.
One mass mailer takes both wanted and criminal clients. Consequently, the incoming content varies from password reset messages for critical websites, to fraudulent messages pretending to be from a bank. I trust the mailer not to allow spoofed email addresses, so DMARC verification is not essential. I use the #6 filter to handle this source. Messages from known clients whose content is needed will be whitelisted. Messages from known clients whose content is unwanted will be blocked. Messages from uncategorized clients will be quarantined so that the sender can be categorized.
I still use Declude filters for some things, since they have more flexibility.
All of this is done on an incoming gateway. I am a firm believer that the incoming gateway and the mail store should be separate servers.
Updating the policy file could be a full-time job, but a few months of regular effort produced a significant improvement in spam control, so now it needs less attention. It is easier to block spammers by their source than by their content. As I said earlier, I still use commercial products downstream to handle content filtering, after Declude discards the known junk sources.
Sources available on request.