4
'SMTP In' overload = slow deliveries & busy CPU
Problem reported by kevind - 7/10/2023 at 9:48 AM
Submitted
Using Build 8587 (Jul 6, 2023) on incoming gateway. Works for a few hours, then the "SMTP In" connections goes up to 1000 (our max), CPU > 90%, and the server can hardly process messages. So we reboot, CPU drops to 20%, it works for a few more hours, and cycle repeats.

Note: when the SMTP In is maxed out -- click Settings -> Antispam and click on RBL or URIBL button, it doesn't show anything. The other buttons (IP Bypass, etc.) work. 

Already disabled all URIBLs. Problem is similar to what's described in these threads:
But started a new thread as those are tough to follow with so many replies and variety of issues.

TIA,
Kevin

17 Replies

Reply to Thread
3
Employee Replied
Employee Post
Hi Kevin, 

Thanks for submitting this, and I'm sorry to hear that you're seeing this issue. I created a support ticket with you so we can evaluate this issue directly. 

Thank you,
2
Alessandro Pereira Replied
We have a serious problem with excessive CPU usage in Build 8587 (July 6), we didn't have this problem in previous versions.
We disabled the URIBLS and also several RBL to be able to work again.
And the same problem we are facing here, the smtp connections went from 1200. We are going back to version Build 8580 (Jun 29, 2023)
2
Douglas Foster Replied
The original post sounds like the previoysly-deacribed pattern of SMTP Auth brute-force break-in attacks from 20,000 or so servers in China.

The solution is to implement Country blocking. 
3
kevind Replied
Douglas, that's a reasonable suggestion, but strange that this only started when we upgraded SmarterMail.

We were running the Feb-2023 Build for 4 months with no issues. Something must have changed between the Feb-2023 Build and the June-2023 Build.
1
Alessandro Pereira Replied
It's not DOS attack, we have an IDS system and we also use pfBlockerNG to block IP ranger even by GEOIP.

The problem is that we use SMTP blocking via RBL in the smartermail entry and the system is taking too long and with that creating new connections, we went back to version Version: 8580 (Jun 29, 2023) and we had no more problems.

3
kevind Replied
Yes, seems like it has something to do with RBLs. When this problem occurs, we go to Settings -> Antispam -> RBL button - and none of the RBLs are displayed (empty screen).  Restart server and you can see them again.
2
Kyle Kerst Replied
Employee Post
I'm aware we're diagnosing these potential RBL/URIBL issues further in support at this time, but I did want to offer some feedback on Douglas' comments regarding DoS and brute-force attacks from China and elsewhere as we have also seen these come and go, intermittently resulting in large numbers of IDS blocks, but then going quiet for months at a time. I just wanted to note that these kinds of attacks can have super curious timing on occasion! 
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
3
kevind Replied
We're not seeing a large number of IDS blocks (actually, less than 10 currently). Could this be related to the problem?  There have been many changes to IDS Rules over the last few months, like this one in April:

  • IMPORTANT: Some IDS Rules have been consolidated and will revert to default configurations upon upgrade. Therefore, review existing IDS Rules post-upgrade so modifications can be made, if desired. (Webmail Brute Force by IP and Webmail Brute Force by Email are now Password Brute Force by IP and Password Brute Force by Email. In addition, Password Brute Force by IP/Email and Denial of Service (DoS) will now use one rule for all services/protocols.)
Just trying to help troubleshoot this issue.
0
Kyle Kerst Replied
Employee Post
If you're noticing IDS activity is light I definitely recommend reviewing those settings to make sure they're fine tuned for the types of attacks you see regularly. I time for 5 or more failed attempts in a larger window of time because most email clients will stop trying a bad (old) password after 2-4 attempts (and therefore avoid the IDS block) whereas the rule will catch the more persistent bruteforce attackers. 

I couple this with long block times allowing me to keep offenders on the list long enough to observe for patterns in their IP addresses and in what order they show up in. When you start seeing a whole CIDR block popping up one at a time you can use our blacklist functionality (from the IDS Blocks page) to block them before they try the next one. 
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
2
Alain Néris Replied
I'm still on 8451 waiting for things to seem stabilized.... And I'm having the same issues with multiple attacks from China for several days.
Default SMTP DOS strict rule
Default SMTP Password Brute Force struct rule
No slow deliveries / No too busy CPU on this SM 8451
If that can help.
4
Alessandro Pereira Replied
The problem occurs when you activate the RBLS to block incoming SMTP, this option does not let the messages enter the server and has the function of blocking the server entrance.

Take a test, anyone who has a server with more than 3500 accounts, activate the option to block incoming SMTP in all your RBLS available for this option and see the result.
3
kevind Replied
@Alessandro, good find !!!  We'll explore this more.

At first I thought it might be related to attacks and intrusion detection, but your explanation makes more sense. We have multiple servers on the same domain, all receiving the same attacks, but the servers running older versions do not have this issue.

Also explains why going to Settings -> Antispam -> RBL - and SmarterMail shows a blank screen (no RBLs are listed). This problem is definitely related to RBLs.
4
Jay Dubb Replied
@Allesandro - nice catch.  We are now wondering if this has been the root cause of OUR problems too.  We're still in a holding pattern on the old build 8451 (Feb 20, 2023) while the new build gets some things sorted out, but have been plagued with sudden max-memory, max-CPU spikes that crippled the server.  

Since turning off all SMTP blocking for RBLs, our CPU usage is noticeably lower (averaging below 50% now) and we've had no incidents at all of runaway CPU/memory.

We're wondering if the problem existed all along, but something in the new build brought it out more dramatically into view.
 
3
kevind Replied
So you can't just turn off SMTP Blocking on the Options screen? You have to turn it off for each individual RBL. That's strange...

@Jay, FWIW, our issues also started when we updated from the (Feb-20) build...
2
Webio Replied
I'm wondering if anyone who is experiencing issues with RBLs and URBLs are experiencing them during SmarterMail update?

I've experienced similar issue today when I upgraded my main server (without any issues) and gateways. Issue occured on incoming gateways because when SmarterMail service is being restarted all messages which where previously in Waiting to Deliver section are being again in Spool and SPAM checks are being performed on them and because of that and also incoming new messages and RBLs taking a little longer than I expected spool messages number was rising to high levels. Disabling SORBS RBLs and URIBLs allowed messages to be moved from Spool to Waiting for deliver and this game me idea that maybe messages which are present in Spool during service startup should not be scanned for SPAM siince they where already there so they should be already scanned before right? Maybe there could be some kind of switch somewhere like "Scan for SPAM in Spool during startup" or something similar?


5
Webio Replied
It would be good to have some kind of info in troubleshooting log for SPAM checks how much time each check took to finish. I've checked my gateways after business hours and reenabled RBLs and URIBLs and started to observe that more messages are being stuck in Spool with Spam check status. Another interesting fact is that MailService process was using about 95% of CPU where this process is only a incoming gateway service. No users etc and in Spool there was only about 480 messages.

Example from Spam Checks log:

2023.07.14 18:48:20.253 [77372409] SpamCheck Processing Thread Started
2023.07.14 18:48:20.316 [77372409] Filetype Checks started.
2023.07.14 18:48:20.316 [77372409] Filetype Checks completed.
2023.07.14 18:48:20.316 [77372409] Spam checks to run: Reverse Dns Lookup, Null Sender, _SPF, _DK, _DKIM, Custom Rules, Barracuda, GBUdb, HostKarma - Blacklist, MAILSPIKE Z, SORBS - Abuse, SORBS - SMTP, Spamhaus - CBL, UCEProtect Level 1, UCEProtect Level 2, UCEProtect Level 3, VIRUS RBL - MSRBL, Backscatter, Anonmails, SpamRATS - Spam, SpamRATS - Dyna, SURRIEL, SORBS - SPAM, Received, SPAM - Return-Path
2023.07.14 18:48:20.316 [77372409] Found 25 spam checks to run: Reverse Dns Lookup, Null Sender, _SPF, _DK, _DKIM, Custom Rules, Barracuda, GBUdb, HostKarma - Blacklist, MAILSPIKE Z, SORBS - Abuse, SORBS - SMTP, Spamhaus - CBL, UCEProtect Level 1, UCEProtect Level 2, UCEProtect Level 3, VIRUS RBL - MSRBL, Backscatter, Anonmails, SpamRATS - Spam, SpamRATS - Dyna, SURRIEL, SORBS - SPAM, Received, SPAM - Return-Path
2023.07.14 18:48:20.316 [77372409] Spam check args: from: ......@gmail.com; messageID: 77372409; messagePath: D:\Poczta\Spool\SubSpool0\-1530777372409.eml; sender: ......@gmail.com; sendersDomain: gmail.com; sendersIp: 209.85.221.194; returnPath: .......@gmail.com; sendersEhlo: .........google.com
2023.07.14 18:48:20.316 [77372409] [209.85.221.194] Valid reverse DNS entry found: mail-vk1-f194.google.com
2023.07.14 18:48:20.316 [77372409] Running SPF check
2023.07.14 18:48:20.316 [77372409] Finished SPF check; result = Pass
2023.07.14 18:48:20.316 [77372409] [DKIM] Performing DKIM check...
2023.07.14 18:48:20.316 [77372409] [DKIM] Result: Good.
2023.07.14 18:50:08.252 [77372409] Spam Checks took 107942 ms
2023.07.14 18:50:08.252 [77372409] Spam Checks completed.
2023.07.14 18:50:08.252 [77372409] SpamCheck Processing Thread Completed

2023.07.14 18:49:13.387 [77372392] Spam Checks took 128398 ms

2023.07.14 18:49:17.162 [77372411] Spam Checks took 104985 ms

2023.07.14 18:49:20.657 [77372401] Spam Checks took 126090 ms

2023.07.14 18:49:20.719 [77372390] Spam Checks took 118808 ms

2023.07.14 18:49:23.948 [77372335] Spam Checks took 127264 ms

2023.07.14 18:49:25.539 [77372429] Spam Checks took 90861 ms

2023.07.14 18:49:26.163 [77372414] Spam Checks took 93317 ms

2023.07.14 18:49:29.159 [77372412] Spam Checks took 117648 ms

2023.07.14 18:49:29.283 [77372413] Spam Checks took 107531 ms

2023.07.14 18:49:32.481 [77372415] Spam Checks took 122098 ms

2023.07.14 18:49:40.032 [77372435] Spam Checks took 87104 ms

2023.07.14 18:49:54.337 [77372394] Spam Checks took 147562 ms

2023.07.14 18:49:59.376 [77372423] Spam Checks took 113966 ms

2023.07.14 18:50:08.252 [77372409] Spam Checks took 107942 ms

Disabling some of RBLs and URIBL and restarting SmarterMail service allowed to rescan for Spam messages in Spool and Waiting for Deliver which quite fast emptied Spool section (where Spool section had stuck messages even after RBLs and URIBLs disabling) and ended up with aboue 30-40% CPU usage of MailService process which IMHO is still way to high.

Maybe it would be good to have some kind of failsafe mechanism which when enabled will disable for some X minutes RBLs which are taking too long for check?
3
kevind Replied
@Webio, well written!!! This is pretty much the same issue we're seeing after disabling SMTP Blocking for all the RBLs.

The recent builds definitely have some issues with the RBL checks that increases CPU and causes messages to get stuck in either SMTP or Spool. Interestingly a reboot or restart of the service clears things out right away, so it's not that the DNS & RBLs are slow. Seems like a code loop or memory leak type of problem.

Have a nice weekend, everyone!  Hopefully no server issues. 😁

Reply to Thread