Maybe we all need multiple MX servers
Problem reported by Douglas Foster - Today at 4:08 AM
Submitted
Since we are all fighting a spam network with thousands of nodes, I have started worrying about Denial of Service impacts from a spam overload.   What happens if my spam volume grows by 10-fold this month?   I may have the logic to handle it, but I will not have the capacity.   Potential inbound email volume is unlimited, while actual email processing volume is limited.   If a spam spike causes a temporary processing overload, delivery of wanted email can be delayed for hours while the backlog is cleared.  If it persists, critical business communication will not occur. 

The most effective solution is to host your email filtering on one of the cloud services that can provide DDoS protection against even large scale assaults that last over long periods.   Akamai is one vendor that I have talked to about this service.   It is very expensive.

My current spam defenses is based around sending unknown senders to quarantine, so I started playing with extending the idea to separating known and unknown message sources on different MX systems:
  1. MX #1 is only accessible to specific IP addresses that are known sources of only wanted mail.    Other IP addresses receive no response, so those senders assume that the MX is down, and they move on to a different entry in the MX list.   The IP filtering is typically done at the firewall.

  2. MX #2 is accessible to a list of server domain names, that may send both wanted and unwanted mail.   Google.Com and Outlook.Com servers certainly fall into this category, but the list is probably long.   This design can be accomplished using an MX server running PostFix.   Its milter interface has a callout after the HELO message is received.   It should be possible to check HELO, Reverse DNS, and IP in that callout, then discard packets from anything that is not on the allowed list, producing silent discard.  Servers that receive no response will consider this MX to also be out of service and will move on to the third system.

  3. MX #3 is accessible to every other sender. 
Implications of this design:
  • Most spam surges will primarily effect MX #3, allowing email from the most important sources to  be processed normally on MX#1 and MX#2.

  • The configuration can be set up two ways:
    • Overflow mode:  Senders which qualify for MX#1 may use any of the three MX systems, and senders which qualify for MX#2 may use MX#2 or MX#3..
    • Partition mode:   Senders are only accepted on one of the three systems.

  • Surge control:   Excessive volume attacks will most likely to affect only MX#3.  Messages from the other MX servers will continue to flow.    If an excessive volume attack affects MX#2, its effect on MX#3 will be determined by the configuration mode.

  • Filtering differences:   MX#1 is expected to include only high-value message sources which are either whitelisted or candidates for whitelisting, so that message flow can be given a specialized, and presumably simpler filtering algorithm to save processing effort.   This savings in processing effort may offset the losses from running multiple MX systems. 
Preparation
  • Collect data now to know the IP addresses that qualify for MX#1 and the host names that qualify for MX#2.
  • Deploy and test the configuration before the spam attack occurs.
  • Configure a load balancer in front of all three MX devices, so that capacity can be added to any of the three categories by cloning one of the configurations. 
Defining the IP list for MX#1 and the host list for MX#2 requires data collection to be done now.   Then the 
Nathan Replied
Rspamd is your friend, we filter many millions of emails per month with an active-active cluster of redundant Exim based MXs running on relatively low specification VMs. HAProxy sits in front providing rate limiting, load balancing, etc, etc.

Reply to Thread

Enter the verification text