For the past 7 days all of our Incoming Mail Gateways have been hanging at 100% CPU and not delivering any email. I have automated scripts that detects > 500 messages in the Spool and will automatically stop the mailservice, move all SubSpool contents to temp directories and restart mailservice and move them back to the Spool at a trickle until the server catches up on the load. It would work for a couple messages but within minutes hang at 100% CPU again.
This has happened every day for a week to 3 different Incoming Gateways running Smartermail Free Edition versions 13.6.570314, 14.7.60841 and 15.5.6222 running on Win2K3, Win2008, Win2008R2 respectively. They are all running antispam checks and then delivering email to a Smartermail Enterprise Edition 15.5.6222.
At wits end we replaced all the Server hardware for the Gateways but even with 4 Quad Core CPUs and 96GB of memory and SSD drives with Win2012R2 and they were all still hanging at 100% CPU and mail would not flow no matter how many times the mailservice was started and messages were spoon-fed back to the spool one at a time.
I finally figured out the cause, but I don't know how to prevent it....and desperately need help!
We are receiving malformed email that contains two separate strings in the Body that is apparently causing the issue. Every time Smartermail touches that email it crashes causing a Windows Heap Dump file to be generated (writing a file the size of that Server's Physical Memory + Virtual Memory to disk) and the system being locked until it is complete, at which point it restarts the mailservice and the next malformed email to be touched by Smartermail causes the same situation, over and over.
After finding the culprit I did a search for a partial segment of the BASE64 String causing the problem (discovered that running Windows FINDSTR for the String causes the same crash and a .HDMP file is created locking the computer at 100% CPU so I had to use Cygwin GREP instead) and removed those emails from the Spool and it processed all remaining emails without any problem within 15 minutes. I reviewed the other Incoming Mail Gateways and the same String in the email contents there caused the same issue on them. Removing the offending malformed emails solved the problem...until we got more...and more...and more...faster than I can block entire /16 CIDR Ranges in Smartermail (as a last resort).
Also discovered that one absolutely, positively shouldn't add Smartermail Content Filters that search for even the partial strings...
Rule Source: Body
Rule Type: Contains
The very first email received by the Spool (even good messages) cause the Spool to hang, crash, and then have Windows Heap Dump generate a .HDMP file. Even setting up those strings as RegEx caused the issue to be replicated.
So, imagine having to nurse your Smartermail Spool 24/7 running the same Grep cmd and deleting the offending emails in the SubSpools folders and restarting the mailservice only to have to rinse and repeat every 5-15 minutes. Haven't even had time to script it out and I desperately need sleep.
Anyone have any suggestion on how to resolve this and prevent Smartermail from crashing and hanging the system at 100% CPU for an hour or two after every couple of emails it processes?