Spool gets stuck in SmarterMail v15
Problem reported by Ishan Talathi - 4/17/2018 at 11:52 AM

We are using the latest SmarterMail v15 and have ~5000 users doing 50K incoming and 50K outgoing emails daily.

Every day during peak hours we are seeing 1000-6000 emails stuck in queue. Even status Delivered are stuck in queue with a next attempt listed. There are hundreds of emails in Delivery Delay and Local Delivery status.

We have 2x256GB SSD RAID1 for the spool and 10x4TB SATA enterprise 6gbps in RAID10 for the mailbox storage.

We are seeing Read queue averaging 0.5 and write queue averaging 0.8 during peak hours.

We have tried reducing indexing to 1 thread as well as completely disabling it during peak hours.

We have a E5-2620 6 core / 12 thread CPU which seems to be hitting 100% at times. RAM usage is normal < 10%

What could be the reason for the spool getting stuck ? We have disabled antivirus completely and reduced antispam rules to the bare minimum. We also tried adding another physical server as an outgoing gateway for some of the heavy users, it did not help.

At times, we are seeing 0 emails sent in last 5 minutes. Also , the count in Spool Dashboard is different from the one in All messages. We have to restart SmarterMail service multiple times a day to allow mails to start getting sent.

K Soon Replied
Same thing happens to my server. My server CPU, disk I/O, ram, bandwidth overall usage are all below 60%, but the spool will suddenly start accumulating and not moving at all. 
Anyone got any idea?
DJ Won Replied
same here, we checked the SMTP log already Successfully wrote to the HDR file in to the spool folder. But on the spool All Messages there is no message showing. we login to My Server there is a lot message is under the Spool Folder. Until restart the SmarterMail Service only can resolved the problem.
Tina Cline Replied
I would like to add to this.  Can't be sure what is causing it, but suddenly spool will backup.  AntiSpam avg times seem normal.  Spool just climbs.  Tried disabling URIBL and the issue appeared to resolve about 5 mins after disabling, so cannot be sure it was the cause.  After it resolves, turn on URIBL again and all is still well.  All emails in spool will be in various states.
Removing spam from the web interface and the files seem to stay in the spool folder on the server.
It also appears to resolve itself as prior incidents we sometimes don't see until it resolved as we will see no email delivered for 30mins and suddenly everyone gets a ton of email.
Will need to call in a ticket for the next time so ST can see.  Very mind boggling.
V 15.7.6663
Andrea Free Replied
Hi all,
Thanks for the reports about this problem. Unfortunately, this isn't a known issue, so we'll need to get you in touch with the Support Department for their review. They can provide a custom debug build with extra timing logging so we can see what's taking so long. (Keep in mind: If this issue turns out to be a bug in the software, the support ticket or ticket purchase will be refunded back to you.)
If you would like assistance purchasing and/or submitting a support ticket, please let me know. 

Andrea Free
SmarterTools Inc.


Jay Altemoos Replied
i just want to add to this thread, ever since we updated our SmarterMail from version 15.7.6669 to 15.7.6754 our spool accumilates emails in the spool with the Spam Check listed. We never had this issue until we installed the latest update. Nothing has changed on our server other than the latest SM update. I know the major change in this update was the removal of the Bayesian portion in SM, but I can't see why this would cause this problem. I thought at first our spamassassin installation was the issue (we run a local version on that server) because we kept seeing this in our delivery log:
Unable to run SpamAssassin spam checks on server Connection timed out
I have since reinstalled SpamAssassin on the server and it still crops up. It seems that if I restart the service on the server then things go back to normal for about an hour. Then the issue populates again.
The SA service is running on the server just fine and like I mentioned previously, this was not an issue until the latest patch. So it appears that there's an issue somewhere and it seems to be related to the patch. Anyone else running into this?
Should I try uninstalling the SM 15.7.6754 update and reapply it? Or do I need to open a ticket with support?
Jay Altemoos Replied
Ok just an update on this thread, I figured out the issue on my own. Apparently by default SpamAssassin only spawns 5 child process for handling spam checking, so for me all 5 child processes were taken up causing the mail to accumilate in the spool because email was coming in faster than the 5 processes could take care of it. So in turn the spool would continue to grow. So what i did was tell SpamAssassin to spawn 25 child processes and that seemed to clear up the issue for me at least.
This is what clued me in on what was going on with my SpamAssassin installation inm the spamd.log file:
[2892] info: prefork: server reached --max-children setting, consider raising it
So this was just coincidence that after the latest update 15.7.6754 that this started happening.

