So we had a bunch of issues getting our backup mx install of smartermail to work correctly for us as can be referenced in
this post. We opened a support ticket and after an unfortunate 6 weeks of time passed, we finally got to the bottom of the issue and the workaround.
The long and the short of it hinges on if you are using your backup server in a nat'd environment. In our case, our backup MX server is at Azure. The VM only sees its internal IP address assigned to it and not the public IP. And this would obviously be the case in any nat'd environment, not just Azure. When the backup MX server looks to see if it can deliver an email for a given domain, it looks up the MX servers for that domain via dns. In our example, the server at Azure is mx2 and our primary server, mx1, was down for testing. Since mx1 wasn't working, SM tried mx2 at its public IP address listed in DNS. SM would respond on mx2 via its public IP address and attempt to deliver the mail to itself. You can see this in your logs when you have many entries that look like this:
Received: from mx2.domain.net (mx2.domain.net [public ip of mx2]) by mx2.domain.net with SMTP;
This is happening because IIS is bound to the internal private IP on the nat'd Windows machine and SM is not aware of the public IP address so it tries to deliver it to itself. Additionally, mx2 will send an NDR (eventually) that will contain this message:
Reason: Remote host said: 554 Maximum hop count exceeded. Possible loop.
The solution to this problem is to create a firewall rule on mx2 that prevents port 25 traffic from flowing to itself on its public IP address. Since Windows can't see the public IP as being bound on an interface, SM can't see it either. When SM sees an IIS binding for a given IP, you wouldn't have this problem.
I would like to see one of two things happen:
1.) SM fixes this issue in the backup mx solution and enables a mechanism in which the end user can specify the public IP address in a nat'd environment. I can think of a number of ways to accomplish this, but would defer to SM staff to come up with the best way to do that.
-or-
2.) Update the documentation on setting up a backup mx system to account for people setting it up in a nat'd environment. This would have saved me an enormous amount of time.
The second bug in this process is that when mx2 (the backup mx server) sends the ndr about the mail loop, it does so without specifying a return path in the header. This of course causes it to be marked as spam nearly immediately by most mail systems which makes it even harder to troubleshoot since you may not be receiving the ndr. My semi-educated guess is that since the backup mx server has no domains configured on it, SM sends the email without any information for a domain, which breaks spf and dmarc for every email sent from it. We need a mechanism to configure send from information on the backup server. Since backup mx servers (generally) would only send mail when the primary mx is down, it will have to originate the mail on its own. However, you cannot setup any of your domains on the backup mx server so I'm not 100% sure how to fix that, other than using a new domain (or sub domain with different mx) just to send system emails when the primary mx is down.
SM believes this is a feature request and not a bug and requested that we post it here for discussion.