Backup MX Issues
Problem reported by Robbie Wright - 12/5/2014 at 8:00 AM
So we had a bunch of issues getting our backup mx install of smartermail to work correctly for us as can be referenced in this post. We opened a support ticket and after an unfortunate 6 weeks of time passed, we finally got to the bottom of the issue and the workaround.
The long and the short of it hinges on if you are using your backup server in a nat'd environment. In our case, our backup MX server is at Azure. The VM only sees its internal IP address assigned to it and not the public IP. And this would obviously be the case in any nat'd environment, not just Azure. When the backup MX server looks to see if it can deliver an email for a given domain, it looks up the MX servers for that domain via dns. In our example, the server at Azure is mx2 and our primary server, mx1, was down for testing. Since mx1 wasn't working, SM tried mx2 at its public IP address listed in DNS. SM would respond on mx2 via its public IP address and attempt to deliver the mail to itself. You can see this in your logs when you have many entries that look like this:
Received: from mx2.domain.net (mx2.domain.net [public ip of mx2]) by mx2.domain.net with SMTP;
This is happening because IIS is bound to the internal private IP on the nat'd Windows machine and SM is not aware of the public IP address so it tries to deliver it to itself. Additionally, mx2 will send an NDR (eventually) that will contain this message:
Reason: Remote host said: 554 Maximum hop count exceeded. Possible loop.
The solution to this problem is to create a firewall rule on mx2 that prevents port 25 traffic from flowing to itself on its public IP address. Since Windows can't see the public IP as being bound on an interface, SM can't see it either. When SM sees an IIS binding for a given IP, you wouldn't have this problem.
I would like to see one of two things happen:
1.) SM fixes this issue in the backup mx solution and enables a mechanism in which the end user can specify the public IP address in a nat'd environment. I can think of a number of ways to accomplish this, but would defer to SM staff to come up with the best way to do that.
2.) Update the documentation on setting up a backup mx system to account for people setting it up in a nat'd environment. This would have saved me an enormous amount of time.
The second bug in this process is that when mx2 (the backup mx server) sends the ndr about the mail loop, it does so without specifying  a return path in the header. This of course causes it to be marked as spam nearly immediately by most mail systems which makes it even harder to troubleshoot since you may not be receiving the ndr. My semi-educated guess is that since the backup mx server has no domains configured on it, SM sends the email without any information for a domain, which breaks spf and dmarc for every email sent from it. We need a mechanism to configure send from information on the backup server. Since backup mx servers (generally) would only send mail when the primary mx is down, it will have to originate the mail on its own. However, you cannot setup any of your domains on the backup mx server so I'm not 100% sure how to fix that, other than using a new domain (or sub domain with different mx) just to send system emails when the primary mx is down.
SM believes this is a feature request and not a bug and requested that we post it here for discussion.

2 Replies

Reply to Thread
Steve Reid Replied
I feel your pain. For us the backup says no such user no matter what inside of a natted environment.
Steve Reid Replied
My solution to this problem was to host a DNS instance that only refferenced the internal IPs.
Basically I copied my external DNS structure and substituted the external IPs for the actual internal ones.
This way when Smartermail does the MX lookup or whatever it doesn't get confused.
I have opened a ticket with support as well and they helped me understand the working of it better.

Reply to Thread