7
Issue with 8552, 8559, 8566 : after update to this releases, URIBLs cause big delay in mail delivery with mail staying in the spool even for hours...
Idea shared by Gabriele Maoret - SERSIS - 6/9/2023 at 12:53 AM
Completed
Problem with 8552, 8559, 8566: after upgrading to this version I noticed that sometimes (2-4 times a day so far...) there is a big delay in mail delivery with messages remaining in the spool sometimes even for hours ...

This causes the length of the SPOOL queue to grow and grow over time and the server crashes and doesn't even respond to commands anymore...
Sometimes you can't even do anything from the Windows interface and even launching a restart can take several tens of minutes to restart...


I'm thinking of going back to 8545 to see if it works better...

Update: Disabled all URIBLs and the server returned to normal operation.
This issue is related to the URIBLs filtering.
Gabriele Maoret - Head of SysAdmins at SERSIS
Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)

36 Replies

Reply to Thread
0
I havent upgraded yet to 8559 but I am 8552 and dont see the issues in the spool folder.
0
Hi brian! ...maybe is my servers that have problems...

To investigate further, this is my servers scenario:

- 2 x Windows Server 2022 Datacenter with Hyper-V role, CPU AMD EPYC 7313, 512GB RAM, 4 NVMe Enterprise 3.8TB Samsung SSD in raid5 mode
- every VM is running in one of the two server and replicated (via HypervReplica) to the other server every 5 minutes for disaster recovery
- Veeam Backup to 2 external datastores
- Smartermail is installed in a HyperV VM with this resources: 32vCores + 64GB dynamic RAM (minimum of 8GB) - Windows Server 2022 OS

Never seen this issue before upgrading (directly...) from 8451 to 8552.
Yesterday I aupdated to 8559 to see if resolves this issue, but nothing...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
I tried making these 2 changes to see if it's a VM configuration issue:
- set 16 vCores instead of 32 (to match real physical cores and not HyperThreading...)
- set 32 GB of static vRAM (to avoid dynamic vRAM problems)

We'll see if that helps...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Nice setup :) Try to extend the hyper-v replication to 15 mins and see if it relieves the spool issue.

Normally the replication hugs a lot of ressources on the host and that cloud be the issue youre seeing.
0
Mmmh, it seems strange to me that the problem is caused by Hyper V replication.

I've monitored this and the replication jobs always last a few seconds, so I don't think it's those few seconds every 5 minutes that cause the problem...

Also because until the 8541 it had never given any problems...


In the meantime, let's see if the changes I made to the vCPU and vRAM have improved the situation, then eventually I'll try to do this test too.

For now, thanks for the advice, Brian!
I'll let you know...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
UPDATE: Unfortunately the problem still occurs...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
4
Zach Sylvester Replied
Employee Post
Hello, 

Thanks for reaching out regarding this issue. Could you please try the following? 

  1. Go to Settings->Antispam
  2. Disable all URIBLS
  3. Restart the Service

Please let me know if the issue comes back after doing this. We had some problems with URILBS last week and I would like to rule it out. 

Thanks, 
Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com
0
Hi Zach

I'll try it
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
1
Zach Sylvester Replied
Employee Post
Hey Gabriele, 

Any updates on the issue? Did my suggestion fix this for you?

Thanks, 
Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com
3
We're seeing similar results after upgrading from a March build to the June-15 build. The spool has thousands of messages and can't keep up.  The server is running very slow, so we will try Zach's suggestion.

Update: Disabled all URIBLs and the server returned to normal operation. Messages in spool dropped to <50 and CPU no longer at 100%.
2
Hi Kevin.

I see a similar situation: after disabling every URIBLs this issue had never appeared anymore in the last 6 days
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
2
Hi Gabriele,

Good to know. Let's hope this gets fixed in the next build.
1
Zach Sylvester Replied
Employee Post
Hey Guys, 

This should be fixed in 8566 in the release notes we put Efficiency: Various improvements to DNSClient.
Are you guys running 8566 and having this issue?

Thanks, 
Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com
4
Zach,

Yes, running Build 8566 (Jun 15, 2023) and having this issue where we needed to disable URIBLs.

FWIW, why not just drop all these Build #'s which are confusing and hard to track. Just use Build Date to keep track of versions. It's simpler and easier to relate to. 😉
3
Same here, this issue is still present in 8566.

Changed the initial POST
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
2
Zach, was this fixed in the June-29 build?

Thanks and Happy Independence Day!
2
No. This Error is also in build from June-29.
1
Last Monday I upgraded to 8580 and tried to re-enable URIBLs filters.

Until now I haven't had any problems, but I'll wait a while before saying that everything is OK...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
1
Tim Uzzanti Replied
Employee Post
We have an adjustment to URIBL's coming this week.  Some of our DNS improvements which worked VERY well did impact and delay URIBL's.  
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
1
Updated to 8587 --> Until now I haven't had any problems, I think this issue is resolved...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
3
Also on 8587 (Jul 6) with no URIBLs --> but still seeing daily occurrences where 900+ messages are stuck in spool. This is on an inbound gateway, so it's just spooling to primary server.

Looking at the messages in the Spool:
  • Status = Spam Check
  • Next Attempt = 4:04pm  (but that was 25 minutes ago?)
Is there somewhere in the UI or logs that we can find out why these messages are still in the spool and/or determine what is keeping these messages from being delivered? TIA!
4
You can see in the logs under Delivery

running spam checks. Time (non-rbls): 7481ms, Time (URIBL/RBLS): 140025ms
[2023.07.18] 03:04:21.069 [16079369] Spam Check results: [_DMARC: 0,none], [REVERSE DNS LOOKUP: 0,Passed], [NULL SENDER: 0,passed], [_INTERNALSPAMASSASSIN: 2,5:4], [_SPF: 0,Pass], [_DK: 0,None], [_DKIM: 20,None], [_CUSTOMRULES: SPAM: 30;], [MAILSPIKE L4, MAILSPIKE L3: 0], [UCEPROTECT LEVEL 2: 0], [UCEPROTECT LEVEL 3: 0], [SPAMHAUS.ORG - XBL: 0], [MCAFEE: 0], [SORBS - ABUSE, SORBS - SOCKS, SORBS - PROXY, SORBS - DYNAMIC IP: 0], [SPAMRATS DYNA: 0], [BARRACUDA: 0], [MAILSPIKE L5: 0], [SPAMHAUS - PBL, SPAMHAUS - SBL, SPAMHAUS - CSS: 0], [BACKSCATTER: 0], [UCEPROTECT LEVEL 1: 0], [MAILSPIKE RBL: 0], [SPAMCOP: 0], [SPAMRATS AUTH: 0], [SPAMRATS SPAM: 0], [SPAMHAUS HBL: 0], [SPAMHAUS DBL: 0], [URIBL BLACK, URIBL GREY, URIBL RED: 0], [DNSBL: 0], [SURBL: 0], [SEM-URI: 0]
[2023.07.18] 03:04:21.069 [16079369] Spam Checks completed.

There you can see how long the spam checks last and especially the URIBL's
3
Tim Uzzanti Replied
Employee Post
We provided a custom build to a few customers yesterday and the issue is resolved. 
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
2
Great!!!  Was the issue something to do with RBL and URIBL checks? This issue first started many builds ago, like back in May, so good to have a resolution.
0
Latest release

[2023.07.19] 15:13:05.354 [67110242] Finished running spam checks. Time (non-rbls): 75885ms, Time (URIBL/RBLS): 112733ms
[2023.07.19] 15:13:05.354 [67110242] Spam Check results: [_DMARC: 0,none], [REVERSE DNS LOOKUP: 0,Passed], [NULL SENDER: 0,passed], [_INTERNALSPAMASSASSIN: 1:2], [_SPF: 20,None], [_DK: 0,None], [_DKIM: 0,Pass], [MAILSPIKE L5: 0], [SPAMHAUS.ORG - XBL: 0], [BACKSCATTER: 0], [UCEPROTECT LEVEL 3: 0], [MAILSPIKE RBL: 0], [SPAMRATS SPAM: 0], [SORBS - ABUSE, SORBS - SOCKS, SORBS - PROXY, SORBS - DYNAMIC IP: 0], [MAILSPIKE L4, MAILSPIKE L3: 0], [SPAMRATS DYNA: 0], [UCEPROTECT LEVEL 2: 0], [BARRACUDA: 0], [SPAMRATS AUTH: 0], [UCEPROTECT LEVEL 1: 0], [SPAMHAUS - PBL, SPAMHAUS - SBL, SPAMHAUS - CSS: 0], [SPAMCOP: 0], [MCAFEE: 0], [SPAMHAUS HBL: 0], [SPAMHAUS DBL: 0], [URIBL BLACK, URIBL GREY, URIBL RED: 0], [DNSBL: 0], [SURBL: 0], [SEM-URI: 0]
[2023.07.19] 15:13:05.354 [67110242] Spam Checks completed.
0
sorry to share, but custom build did not solve the problem for me. i have been using it since yesterday.
0
Tim Uzzanti Replied
Employee Post
Roger,

I just looked at your server and noticed a few things.  When pining your DNS servers we are seeing timeouts to 9.9.9.9 but not 1.1.1.1.  Also, SORBS are known to be slow and problematic and why we removed them from our defaults.  What lists you pick and use dictate how your server will preform.  
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
hi Tim

I see no timeouts with 9.9.9.9
Thank you, I will disable SORBS.

0
We dont see super long answertimes on SORBS alone...

0
I typically see no timeouts from devices in Northern NJ or NYC on 9.9.9.9 (quad9.com), although the response time is a couple of milliseconds longer than for, say, Google DNS. Worth it IMO for the filtering (security/privacy) they provide for free.

Your mileage may vary.
1
Kyle Kerst Replied
Employee Post
A ping is unfortunately not a great test for timeouts and delays (I did some testing in and around this recently) as we're doing a DNS lookup for RBL calls essentially, and each request we send might be for a different IP/sender and so the response times are going to vary quite a bit from lookup to lookup. I'm working on setting up something in powershell we can use to test that in general from different environments but I don't have anything solid just yet. On the SORBS side, I think what we were seeing in-house was delays across the board but only when SORBS was in use, and so the removal of SORBS (and the fixes noted above) were recommended. 
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
1
Tim Uzzanti Replied
Employee Post
Roger,

In addition to timeouts we saw things not loading from the web at times.  When we disabled Glasswire, everything resolved itself.  When we re-enable Glasswire, it works for awhile and then things start to slow down.  We think that is your culprit. 
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
1
We had some issues today with mails Not being delivered on the 13th July build. 

Restarting SM service fixed it 

I’m assuming this issue still exists?
0
Tim Uzzanti Replied
Employee Post
Chris,

The DNS issue that existed would not have been resolved with a restart.  Not sure what you experienced.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
thanks Tim, if it happens again we'll take a memory dump of the process and send it in to support.
0
Tim - understood on the Ping thing. I have entire LANs using Quad9 as their primary DNS service with zero complaints, but have not got it enabled on any SM server.

Reply to Thread