2
SmarterMail CPU/Network peaks
Problem reported by AS Holding - 3/18/2021 at 12:21 AM
Submitted
Hello!

After checking all community topics related to this problem I was only able to see that your technicians somehow solved problem, no one from actual users was able to solve it (or I didn't look at it enough).

CPU Peaks start at 05:00 and around 15:00h and they last for around 2hours, I would assume that is company load oriented but it is not, since this started to be more like a random spike (last night started around 00:10 and lasted for half an hour), its not just CPU that is peaked. Network starts behaving pretty strange when this occurs.

Our server is on AWS and before I contacted you, I've made sure that we've made all security measurements in terms network that could have slight possibility to trigger this.

I've went even further and organized AWS Security DDOS specialists so I could be 100% sure that isn't network related.

After 3 days of speaking with Windows Server Specialists, Network & Security technicians we all came to conclusion that SmarterMail service is causing this problem (still not taking it for granted untill I get yours opinion).

When this starts behaving we see high amount of dns traffic from smartermail to private network adapter:

srcAddr 8.8.8.8
srcAddr 172.31.38.202

When we deny protocol 17(UDP) and port 53(DNS) server starts behaving normal BUT then Spool starts filling with "Waiting for delivery: 601 Failed to connect to the recipients mail server." It is expected since it needs DNS resolve.

We've 950 active users and we really need help solving this problem.

I hope someone solved this.

Our SM version is:
SmarterMail Enterprise Edition
Version 15.7.6726 

Best regards, 

7 Replies

Reply to Thread
0
Jack. Replied
Hi

You use solidcp / mspcontrol control panel ? 
0
AS Holding Replied
Hello Jack,

No I don't. This server was fully installed as new, without any pre migrations.
0
Douglas Foster Replied
Our SmarterMail service has gone into a race condition occasionally, on both
Build 7619 and build 7719.   Our fix has been to reboot the entire server.    The last occurrence was immediately after configuring a new IMAP connection for a user.  Have not found a root cause.
0
Kyle Kerst Replied
Employee Post
Doug - the traffic looks to be DNS lookup traffic. Are there a large number of incoming/outgoing deliveries when this happens? Both incoming/outgoing deliveries and the associated spam checks will generate DNS lookups and server load while those are taking place, and blocking DNS lookups would cause the delivery process to break down. 
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
1
Douglas Foster Replied
It is hard to know why the service is consuming so much CPU, but it does not settle down on its own.   

On one of the first events, restarting the service seemed to be insufficient, so now I reboot the entire server.   This clears the race condition and the machine restarts with normal load.   

In our configuration, the same workload should be present before and after the reboot.  All of the incoming traffic will be queued up at the incoming gateway, and all outgoing traffic will be queued up in SmarterMail.  So the evidence seems to point away from workload as an explanation.   

The problem has not happened often enough to develop a clear pattern, and we have not had sufficient instrumentation in place to find the cause when it does happen.  When it does occur, the priority is to get the system running again rather than collecting data.   But having it happen at all creates fear.

If it occurs on the next release, I will open a ticket and we can try to configure instrumentation to catch the problem.
0
Matt Petty Replied
Employee Post
Yea we'll try and get a dottrace from you or run it ourselves and that can tell us internally what is consuming the CPU/usage. It would involve running a dottrace profiler for 30 seconds to a minute while it is in a 'broken' state. Once we have this profile, the server can be restarted and we can start diagnosing the area that can be causing it. I'd suggest getting this setup now or a ticket going so that it can be setup that way it's ready to be profiled (as in you have DotTrace installed on the server) right as it happens. The profiler is fairly easy to use, you run it, go to "Attach to Process", give it admin so it can see MailService.exe, then attach it with "Timeline" profiler set to start immediately, wait 30 seconds and hit 'Stop' wait until it's completely "flushed" the profiler data then you may restart MailService.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
AS Holding Replied
Hello,

Are we still talking here about my posted problem or?

We've came to conclusion that high amount of traffic is hapening over ports 110 and 587, no matter if we stop SmarterMail service or not...

Reply to Thread