6
Some messages remain in the spool for hours without even a delivery attempt being made
Problem reported by Gabriele Maoret - SERSIS - 6/1/2020 at 3:26 AM
Resolved
some messages remain in the spool for hours without even a delivery attempt being made and recipients statu in PENDING.

After a while they simply fail.

This si a big issue for our customers


Can you figure out why this is happening?


EG:






Gabriele Maoret - Head of SysAdmins at SERSIS
Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)

105 Replies

Reply to Thread
0
Gabriele Maoret - SERSIS Replied
Restart SmartrMail service and/or reboot the server doesn't solve the issue
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
1
Thomas Lange Replied
Hi Gabriele,

we are not on build 7454 yet - we are still on 7451.

If I remember right there were issues some month ago with messages in Spool and failing. This was already fixed and in addition more frequent retries for Spool were suggested by support:

Settings / General / Spool - Retry Intervals (Minutes, separated by comma)
1, 1, 5, 5, 15, 30, 30, 30, 30, 60, 90, 120, 240, 480, 960, 1440, 2880

Perhaps this helps for your SmarterMail installation. Otherwise SmarterTools should have a closer look.
0
Gabriele Maoret - SERSIS Replied
Better checking the emails that remain "blocked" in spool I noticed one thing: some of these emails have the NEXT ATTEMPT ("PROSSIMO TENTATIVO" in Italian) set on a time in the past (now here are 12.22).

Could this be the problem?

Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Gabriele Maoret - SERSIS Replied
Am I the only one with this issue?
I'm getting more and more messages that's are for hours in REMOTE DELIVERY state, 0 ATTEMPTS and NEXT ATTEMPT in the past!

Example (actual time 16:51 24H format):



Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Gabriele Maoret - SERSIS Replied
I think i've figured out what's happening:

All the connections that are that state are versus an Aruba SMTP server with IP Address 62.149.157.166

If I try to connet to this IP on port 25 via telnet this is the response:

>>>>>>>>>
421 mxcm01-pc.ad.aruba.it bizsmtp mfA22200r3Uk8nK01 Too many connections, try later.
Connection loosed
>>>>>>>>>

It's seems that SmarterMail doesn't disconnet the SMTP session after that message and never retry again, so the messages remain in the queue forever (or so).


EDIT: another SMTP remote server that cause the same issue: 

62.149.157.151
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Gabriele Maoret - SERSIS Replied
Similar issue receiving messages...

Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
1
Tim Uzzanti Replied
Employee Post
Please open a ticket and include your delivery logs so we can evaluate.  We don't think there is an issue based on the number of servers we have been on over the last week fine tuning in preparation for release.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Gabriele Maoret - SERSIS Replied
I think you are right, Tim... next days I will investigate further and open a ticket
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Kyle Kerst Replied
Employee Post
Gabriele I took a quick look at your screenshots and noticed all of these pending deliveries are to Yahoo/Hotmail/etc and this could be a clue as to the root cause. Frequently when we see stalled messages to these providers it is indicative of one of the following:

1. Rate-limiting has been applied to your server IP due to the amount of email coming in from your server. 
2. Mail from your server is being rejected due to failed SPF, RDNS, DKIM, etc from the sending domain.
3. Sending IP address is listed on a blacklist or other spam list such as their internal lists. 

If you search your Delivery logs for these recipients what do you see there? If you could check on these items before submitting a ticket this will help us get to the bottom of it much quicker. Thanks, and have a great day!
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
1
Gabriele Maoret - SERSIS Replied
Hi Kyle, can I send the datails to you via PM or do you prefer that I open a ticket first?

P.S.: The destination SMTP servers aren't Yahoo/Hotmail/etc... These are the origination SMTP servers in my latest post that's talking of same issue with INCOMING e-mails... 

With OUTGOING emails the destinations seem to be some ARUBA S.p.A. smtp server, like 62.149.157.166 
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
3
Grady Werner Replied
Employee Post
Not Kyle, but since he hasn't answered yet, I figured I'd chime in.  We've had instances when using PM for troubleshooting issues that conversations get lost, and that hurts everyone.  We really prefer tickets because there's good oversight to ensure stuff doesn't fall through the cracks.  We realize that sometimes the back and forth of PMs are useful, but tickets ensure accountability and have a significantly higher chance of getting your issue resolved.
Grady Werner SmarterTools Inc. www.smartertools.com
1
Gabriele Maoret - SERSIS Replied
OK Grady, I'll open a ticket for this


Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
2
Gabriele Maoret - SERSIS Replied
Before opening a ticket, I thoroughly investigated and perhaps I found the trick...

I found that there are TLS authentication errors in the delivery logs, so I tried to disable the relative option in SETTINGS --> PROTOCOLS --> SMTP OUT:

This seems to have solved the issue, but now I think I have something wrong with my SM certificates setting...

Now I will do a thorough investigation in my configuration, but I ask you politely if anyone has any suggestions to give me on where to look so as not to waste time on unnecessary checks ...


Thanks in advance to all!
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Sébastien Riccio Replied
Do you have any TLS errors in delivery.log for these delivery attempts, or, is there no attempt at all logged ?

Is there anything else relevant in the delivery log flow for one of the attempt if they exist ?

Sébastien Riccio System & Network Admin https://swisscenter.com
0
Gabriele Maoret - SERSIS Replied
Hi Sebastian, this is an exeample:  LOG.txt

As you can see in the file, there'are some TLS errors like this:

[2020.06.03] 13:42:22.432 [63750] CMD: STARTTLS
[2020.06.03] 13:42:24.776 [63750] RSP: 220 2.0.0 Ready to start T
[2020.06.03] 13:42:25.510 [63750] Certificate name mismatch. 


The strange thing it's that delay the delivery, but after a while it works...

And it seems to happen only when SM delivery messages to certain SMTP servers, while other servers instead are OK...

Disabling TLS authentication on outbound SMTP solve the issue, but I think that if I can keep it enabled (without issues) it's better...

I need to understand if it's an error in my config or if it's a bug in SM or if are the destination SMTP servers that have issues...

Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Robert G. Replied
I'm having the same results as Gabriele. I'll open a ticket about this as well. It's even happening with users that are on my mail server. user1@domain.com to user2@domain.com getting delayed 20+ minutes Status: "Spam Check". 
GearHost.com
0
Scarab Replied
I was having the same problem in Build 7242 but it was maybe 3 or 4 messages a week, so I never really paid it much attention. After upgrading to Build 7459 I was getting 2000 messages an hour that weren't even attempting delivery to local users!

Gabriele, I could kiss you because turning off "Enable TLS if supported by the remote server" fixed it for us immediately (still took a while for the Spool to catch up on a couple hours of messages that accumulated since 2am when we upgraded)!

We have a commercial certificate for our primary domain and a LetsEncrypt certificate for all our other domains. Never had a problem with our certs in SM before, but sure enough we have tons of the "Certificate name mismatch" in the logs.
0
Sébastien Riccio Replied
Hello, the "Certificate name mismatch" could mean that the remote certificate does not match the contacted remote hostname and SM aborts sending the mail using TLS.
If I'm correct, in 7242 it was then retrying without using TLS.

Looks like in mapi-BUILDS it doesn't retry without TLS.

It's maybe a side effect introduced with code changes around this fix:
Fixed: Gateways are using TLS, if available, even though they are configured to use no encryption.

That's only my suppositions.

edit: We don't have this issue but also we  a gateway for relay so the TLS certificate always matches.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Gabriele Maoret - SERSIS Replied
Hi Sebastien, so you think it's a BUG in SmarterMail that if it finds out that the REMOTE certificate has an issue, SmarterMail itself doesn't retry without TLS, am I right?
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
1
Sébastien Riccio Replied
Hello Gabriele, that's a supposition. I remember that previously with 7242 I had every mail delayed mails for 5 minutes (if your spool retry settings are begining with 5 minutes) because our mail gateway had it's certificate expired.
It was trying TLS and failing because of certificate then re-trying after 5 minutes without TLS.

The fact that you have stuck mails forever in the queue if you enable TLS and also got certificates mismatch in the logs makes me think it can be that new builds doesn't retry anymore wihtout TLS, after a failed TLS session. But this would need to be confirmed.

Can you reproduce the issue and check the logs for the stuck messages and see if there is a certificate mismatch error again. If then, can you give me the MX it tries to reach when this errors appears so I can check it's certificate with an openssl command, to confirm that the certificate really mismatches.

Also there should be a little thinking about "should SM retry without TLS if the TLS attempt fail". Because some customers or companies you host can have in their requirements that all mails should be transfered using TLS, or shouldn't be transfered at all.

So a per domain configuration for this should be added, something like "Require TLS for outgoing mails", so that no mails from this customers can be transmitted outsite without a layer of security...

Well that's not the point of this thread... Can you check the remote hostname that triggers the certificate issue?




Sébastien Riccio System & Network Admin https://swisscenter.com
0
Sébastien Riccio Replied
Gabriele,

I've read the latest log you posted here, it doesn't seems related to the certificate mismatch as the logs shows that it stills sends the mail over the TLS connection.

However, for an unknown reason the session with the server seems to timeout after the DATA command (when the mail content is sent).


[2020.06.03] 13:42:33.214 [63750] CMD: DATA  
[2020.06.03] 13:42:36.026 [63750] RSP: 354 enter mail, end with "." on a line by itself
[2020.06.03] 13:43:36.050 [63750] The smtp session has timed out.
[2020.06.03] 13:43:36.050 [63750] Attempt to ip, '62.149.157.166' success: 'False'

Is it the only destination server having this issue or you have the same with other distant MX ?
(this one is in.9netweb.it)

Kind regards.

Sébastien Riccio System & Network Admin https://swisscenter.com
0
Gabriele Maoret - SERSIS Replied
No, it's not the only one. This is only an example, there's quite a few other there...

The fact is that if I disable TLS for OUTBAND SMTP the issue suddenly disappear...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Robert G. Replied
Our issue was due to URIBLs... The average response time was very high. Oddly enough these were just fine on 7459. It only became an issue after upgrading a few days back. 

GearHost.com
0
Jade D Replied
I want to add to this thread.

We have a ticket open about this same issue and there is for the most part a definite issue with smartermail when using tls.

We've seen issues where smartermail connects to the remote server using tls and then reports a mismatch despite there being no mismatch with the certificate and hostname.

Another instance, when smartermail connects via tls and then simply hangs.

Emails sit in the spool for days with no delivery attempts, and the next attempt date is in the past.

Turning off "Enable TLS if supported by remote server" and restarting the spool service does not resolve the issue.

The only workaround as of now is to restart the mail service which on a busy shared mail server causes a loss in email transmission for active mails which is not ideal.

Keeping "Enable TLS if supported by remote server" turned off results in clients complaining when they send email to Gmail as the email is not encrypted.

There appears to be no other way to restart the spool other than restarting the mail service on the server.
Jade https://absolutehosting.co.za
0
Jade D Replied
Here is some data to show how the amount of mails in the spool continuously grow until we restart the smartermail service.

Red arrows show when the mail services was restarted on the server.
The same is true for all 9 or more mail servers that we manage.

Jade https://absolutehosting.co.za
0
Sébastien Riccio Replied
Not a fix, but a workaround in the meantime, as we don't have this problem, I was asking myself why we don't :

We use an outgoing gateway so SmarterMail doesn't communicate with the remote peers directly but only with our outgoing gateway (that is not SM) and there is no TLS issue with it.

Well almost all transit through the outgoing gateway. Some NDR/Delivery Success messages doesn't seems to go through the gateway but that is another topic.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Gabriele Maoret - SERSIS Replied
To me this issue is resolved... Do you see it again?
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Jade D Replied
@Gabriele

What did you change to get the issue resolved?
Jade https://absolutehosting.co.za
0
Gabriele Maoret - SERSIS Replied
I disable TLS for OUTBAND SMTP 
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Jade D Replied
The problem with disabling TLS is that emails sent to gmail are flagged and the recipient is shown a warning.
Makes no sense that we have to compromise security to have a functioning mail server thats possible of delivering email.
Jade https://absolutehosting.co.za
0
Sébastien Riccio Replied
Yes, disabling TLS for Outbound SMTP is a bad idea nowadays. Many recipient servers now takes this into account for scoring e-mails and also display a security warning in some cases.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Jade D Replied
You're spot on Sebasien - we've received complaints from clients where mail has been filtered as spam because it was not delivered via TLS.

We have around 10 servers that are all suffering from the same symptoms. Mails queue for days if TLS is enabled and the only work around is to disable tls support, restart the mail server and wait for the spool to clear.

I've had a ticket open with Smartertools since last year and there has been no movement on it.

Smartermail tries to establish a connection with a remote host via tls and then hangs. No reason, no error, the connection simply sits there.

I've send Smartertools logs showing how on some days the mail arrive at the remote host, and others not.
Jade https://absolutehosting.co.za
0
Sébastien Riccio Replied
Jade, that is a really painful situation you have here. We're fortunate to use an outgoing gateway that isn't a SmarterMail instance and that handle outbound TLS flawlessly.

Sébastien Riccio System & Network Admin https://swisscenter.com
0
Sébastien Riccio Replied
Also the current global status of TLS on incoming mail servers is a bit cahotic. A lot of service providers mail servers use deprecated TLS versions and some even doesn't handle new ones that are current standard.

So if the sender server only accept TLS 1.2 and 1.3 and the remote server only propose TLS 1.0 and 1.1 (or even older), they can't negotiate and the transaction fails.
If I remember correctly, in SmarterMail, when sending with TLS fails, it fallbacks or at least was falling back to Non-TLS for the next retry. Have you check your logs, when the mail is finally sent, is it in a TLS enabled session?

Also it can be that a provider has multiple MX and some are updated with latest TLS libs and some other not and then it depends on which one you connect with the round-robin lottery of MXs. That could be the reason why sometimes it works and sometimes not.

All this bring another problem, if it retries without TLS after a TLS failure. You can't guarantee to your customers that the mail will be sent over a secure connection, and we have some customers that prefers the mail not to be delivered and receive a bounce instead of transmitting anything clear text on the wire.

So with all of this, our outgoing gateway we can force per source domain or destination domain, what we want for TLS (Try to use, Do not use, Force use and reject if it's not possible).
That way we have a complete control about our outgoing SMTP stuff.

Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Jade D Replied
"
So with all of this, our outgoing gateway we can force per source domain or destination domain, what we want for TLS (Try to use, Do not use, Force use and reject if it's not possible).
That way we have a complete control about our outgoing SMTP stuff."

What are you using as a gateway that allows you to force TLS based on recipient domain?
Jade https://absolutehosting.co.za
0
Steve Norton Replied
Jade,

Can you do a WireShark capture on TCP port 25 for one of the failed attempts so we can have a look at the TLS client/server hellos and the cipher suite negotiation.
1
Gabriele Maoret - SERSIS Replied
I can confirm the issue is still here.

Yesterday I tested it enabling "Enable TLS if supported by the remote server" and suddenly got tons of e-mails blocked in the ougoing queue.

Disabling "Enable TLS if supported by the remote server" solved the issue, but it's not a good solution for mail security...

Please SmarterTools, take care of this issue!!!!

It's more than six month that it's here and still no FIX!!!!
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Jade D Replied
Hey Steve

I would love to, but to run that on a mail server with 1000+ domains may cause issues and I cant spend an hour or so watching it hoping that one of our users sends an email to a mail server which previously does work and now doesnt.
Jade https://absolutehosting.co.za
0
Sébastien Riccio Replied
Hello Jade D,

We're forcing SSL based on *sender* domain (if our customer doesn't want their e-mail to be delivered without encryption), but with some work you could also base it on destination domain.
For this we use Haraka with some homemade plugins (Haraka plugin system let you hook on any event and to add your own piece of code to alter the processing)


We also considered (and still considering) using zone-mta that is a bit like Haraka with some nice features.

Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Steve Norton Replied
Jade (or anyone actually),
Do you have a destination domain name or IP address of a server that is causing this issue today and I'll see what captures I can do myself. 
Also need to know the OS versions people are running?
0
Jade D Replied
Thank you for that info Sébastien - I'll look into that.

Hey Steve,

You can test TLS on www95.cpt1.host-h.net  - 196.40.97.42 
I dont have a mailbox on this remote ISP to provide you, but it is one of the servers that we sent through to SmarterTools via support response.

Within our email we explained that on the 21st January mails to this server were being delivered, and on the 22nd not.

I then disabled tls on smartermail, restarted the mail service and the mails that were queued for this destination IP were delivered.


Jade https://absolutehosting.co.za
0
Steve Norton Replied
Jade,
I've tested ports 25 and 465, I get TLS 1.2 using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384.
This is pretty strong stuff, do you have that cipher suite in the registry at;
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Cryptography\Configuration\Local\SSL\00010002\
Or if you have it configured by policy, is it listed?
I've run a capture against SM and it uses this combination.
0
Jade D Replied
Hi Steve

Well done bud, you've achieved more in a few hours than what ST support have been able to find.

According to MS, the cipher suite TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 is available from Windows Server 2016 and up


Time to upgrade our version of Windows!
Jade https://absolutehosting.co.za
0
Steve Norton Replied
Okay Jade, well that's progress. So you don't have a common TLS 1.2 suite between you. I'll check to see if they accept 1.1.
I trust you have the default 2012 R2 suites.
0
Steve Norton Replied
Jade,
They accept TLS 1.1 with TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256 which your server supports I believe.
Maybe you need to disable 1.2 support via SM '/Protocols/Security Protocols' and re-enable 'Use TLS if supported'.
0
Jade D Replied
Hi Steve,

Heres a screenshot taken from the mail server running IIScrypto 


Jade https://absolutehosting.co.za
0
Jade D Replied
Hi Steve

I missed the response below :

Jade,
They accept TLS 1.1 with TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256 which your server supports I believe.
Maybe you need to disable 1.2 support via SM '/Protocols/Security Protocols' and re-enable 'Use TLS if supported'.

Im going to get a 2019 Server up and running and install SM on there and will report back. It makes sense to upgrade and future proof rather than disable one set of cipher suite which may cause issues with other providers.

I'll report back on this thread as soon as possible.
Jade https://absolutehosting.co.za
0
Tim Uzzanti Replied
Employee Post
Jade, what version of Windows Server are you using?

Looks like we need some KB's on this.  With some of the larger companies and cloud companies flipping the switch on old TLS, we are going to start seeing different kinds of results. 

Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Steve Norton Replied
Jade,
From what I found you have a customer that send mail to potchagricollege.co.za, that might help you track them down for testing.
I can give you the commands in PowerShell to see what you can connect with, if you'd like a before and after upgrade test.

0
Gabriele Maoret - SERSIS Replied
Hi Tim! I'm on WIndows Server 2016 (all updates applied) and this is my SmarterMail settings:


I think that my server can support all the protocols from TLS 1.0 to TLS 1.2.

Can it be that some SMTP servers out there use TLS 1.3 and this cause the issue?
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
Jade D Replied
Hi Steve

I have no issues running the commands on the live server if they merely perform a test?

Thanks for all your assistance with this mate.
Jade https://absolutehosting.co.za
0
Steve Norton Replied
Jade,
It is a basic TCP connect and disconnect, there's a commented out TLS 1.1 test too.

$RemoteHost = "www95.cpt1.host-h.net" #196.40.97.42
$Port = 465

$Socket = New-Object System.Net.Sockets.TcpClient($RemoteHost, $Port)
if ($Socket)
{
    $IgnoreCertificateValidationErrors = [System.Net.ServicePointManager]::ServerCertificateValidationCallback = {$true}
    $Stream = $Socket.GetStream()
    #$SslStream = New-Object System.Net.Security.SslStream $Stream,$false,$IgnoreCertificateValidationErrors
    $SslStream = New-Object System.Net.Security.SslStream $Stream,$false 
    $SslStream.AuthenticateAsClient($RemoteHost ,$null,"tls11,tls12",$false)
    #$SslStream.AuthenticateAsClient($RemoteHost ,$null,"tls11",$false)
    $SslStream
}
$SslStream.Close()
$Socket.Close()

Edit: script corrected exchanging $Name for $RemoteHost
0
Steve Norton Replied
Gabriele,
Do you have a hostname you can use in the PowerShell test script I've posted?
0
Steve Norton Replied
Jade,
I've done some analysis of the destination server and there is a combination on HTTPS that your server could work with.
Does the following PowerShell command find both options on your server?
if ((Get-TlsCipherSuite -Name "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384") -and (Get-TlsEccCurve -Name "NistP256")) {Write-Host "Both found."}
0
Jade D Replied
Hi Steve,

Apologies for the delay in responding. 

See the output below from your first PowerShell script 
TransportContext          : System.Net.SslStreamContext
IsAuthenticated           : True
IsMutuallyAuthenticated   : False
IsEncrypted               : True
IsSigned                  : True
IsServer                  : False
SslProtocol               : Tls12
CheckCertRevocationStatus : False
LocalCertificate          :
RemoteCertificate         : System.Security.Cryptography.X509Certificates.X509Certificate
CipherAlgorithm           : Aes256
CipherStrength            : 256
HashAlgorithm             : Sha1
HashStrength              : 160
KeyExchangeAlgorithm      : 44550
KeyExchangeStrength       : 256
CanSeek                   : False
CanRead                   : True
CanTimeout                : True
CanWrite                  : True
ReadTimeout               : -1
WriteTimeout              : -1
Length                    :
Position                  :
LeaveInnerStreamOpen      : False
Jade https://absolutehosting.co.za
0
Jade D Replied
Jade,
I've done some analysis of the destination server and there is a combination on HTTPS that your server could work with.
Does the following PowerShell command find both options on your server?
if ((Get-TlsCipherSuite -Name "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384") -and (Get-TlsEccCurve -Name "NistP256")) {Write-Host "Both found."}
This test failed with the output below

Get-TlsCipherSuite : The term 'Get-TlsCipherSuite' is not recognized as the name of a cmdlet, function, s
or operable program. Check the spelling of the name, or if a path was included, verify that the path is c
try again.
At C:\Users\Administrator\Desktop\stevetest2.ps1:1 char:6
+ if ((Get-TlsCipherSuite -Name "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384") -and (Get ...
+      ~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (Get-TlsCipherSuite:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
Jade https://absolutehosting.co.za
1
Sébastien Riccio Replied
Get-TlsCipherSuite command is only available from the powershell shipped with win 2016 and later.

Steve script will unfortunately always fail on your system.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Tim Uzzanti Replied
Employee Post
Jade, what version of Windows Server are you using?
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Steve Norton Replied
Jade,
The results from the first script show a successful connection at the OS level, an upgrade would change the suite options but a connection is a connection. We can see that it's TLS 1.2 ECDHE AES 256 SHA P256.
You could try the portable PowerShell 7 (pwsh) to run these, you get better output.
0
Jade D Replied
I've just finished getting a 2019 Server up for testing and out of the box it supports the cipher suite that we're missing.

Tim mentioned that they need to get knowledge base articles up, but the way forward is to rather have smartermail try the tls method thats supported and then downgrade accordingly. These actions can all be logged within the delivery log which will help with troubleshooting.

Jade https://absolutehosting.co.za
0
Steve Norton Replied
Jade,
The certificate they use on port 25 is from Let's Encrypt and was changed inline with the dates you quoted.

SSLVersion in use: TLSv1_2


Cipher in use: ECDHE-RSA-AES256-GCM-SHA384


Perfect Forward Secrecy: yes


Certificate #1 of 3 (sent by MX):


Cert VALIDATED: ok


Cert Hostname VERIFIED (mail.potchagricollege.co.za = potchagricollege.co.za | DNS:mail.potchagricollege.co.za | DNS:pop.potchagricollege.co.za | DNS:potchagricollege.co.za | DNS:smtp.potchagricollege.co.za | DNS:www.potchagricollege.co.za)


Not Valid Before: Jan 22 13:24:18 2021 GMT


Not Valid After: Apr 22 13:24:18 2021 GMT


subject= /CN=potchagricollege.co.za


issuer= /C=US/O=Let's Encrypt/CN=R3
0
Tim Uzzanti Replied
Employee Post
Jade,

Have I missed your response to what windows server version you were on?
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Jade D Replied
Hi Tim

Windows Server 2012 R2 
Jade https://absolutehosting.co.za
0
Jade D Replied
Hey Steve

I dont see any delivery attempts today for mail to potchagricollege.co.za

I've now completed the setup of a gateway server on Windows Server 2019 and havent had one email delayed due to TLS issues.

All mails to hetzner's servers *.host-h.net are being delivered without issues
Jade https://absolutehosting.co.za
0
Steve Norton Replied
Jade,
Interesting stuff, let's hope it stays issue free. What are the chances of installing WireShark on the gateway server to troubleshoot future issues, I can help you setup an efficient and specific IP/port capture that you can leave running for extended periods?
0
Jade D Replied
Morning Steve

Seems the latest version of SM & Windows Server 2019 is also having issues :(
Would you mind emailing me on jade @ absolutehosting.co.za and we can run through the setup of Wireshark on the gateway server?


[2021.01.27] 16:16:54.339 [73752258] Delivery started for <redacted2>@echo.co.za (via bypass - SMTP auth bypass) at 4:16:54 PM
[2021.01.27] 16:16:57.339 [73752258] Added to SpamCheckQueue (1 queued; 0/150 processing)
[2021.01.27] 16:16:57.339 [73752258] [SpamCheckQueue] Begin Processing.
[2021.01.27] 16:16:57.339 [73752258] Blocked Sender Checks started.
[2021.01.27] 16:16:57.855 [73752258] Spam Checks started.
[2021.01.27] 16:16:57.855 [73752258] Spam Checks skipped: User authenticated
[2021.01.27] 16:16:57.855 [73752258] Spam Checks completed.
[2021.01.27] 16:16:57.855 [73752258] Removed from SpamCheckQueue (0 queued or processing)
[2021.01.27] 16:17:00.355 [73752258] Added to RemoteDeliveryQueue (1 queued; 17/200 processing)
[2021.01.27] 16:17:00.355 [73752258] [RemoteDeliveryQueue] Begin Processing.
[2021.01.27] 16:17:00.355 [73752258] Sending remote mail for <redacted2>@echo.co.za
[2021.01.27] 16:17:00.355 [73752258] MxRecord count: '1' for domain 'golfkemp.co.za'
[2021.01.27] 16:17:00.683 [73752258] Attempting MxRecord Host Name: 'mail.golfkemp.co.za', preference '10', Ip Count: '1'
[2021.01.27] 16:17:00.683 [73752258] Attempting to send to MxRecord 'mail.golfkemp.co.za' ip: '41.204.200.133'
[2021.01.27] 16:17:00.683 [73752258] Sending remote mail to: <redacted1>@golfkemp.co.za
[2021.01.27] 16:17:00.683 [73752258] Initiating connection to 41.204.200.133
[2021.01.27] 16:17:00.683 [73752258] Connecting to 41.204.200.133:25 (Id: 1)
[2021.01.27] 16:17:00.683 [73752258] Binding to local IP 197.81.192.9 (Id: 1)
[2021.01.27] 16:17:00.714 [73752258] Connection to 41.204.200.133:25 from 197.81.192.9:53270 succeeded (Id: 1)
[2021.01.27] 16:17:04.011 [73752258] RSP: 220 dedi133.cpt2.host-h.net ESMTP XNEELO_MTA 1.00 Wed, 27 Jan 2021 16:17:04 +0200
[2021.01.27] 16:17:04.011 [73752258] CMD: EHLO smtp1-bl4n1.zadns.co.za
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-dedi133.cpt2.host-h.net Hello smtp1-bl4n1.zadns.co.za [197.81.192.9]
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-SIZE 31457280
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-8BITMIME
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-ETRN
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-PIPELINING
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-AUTH LOGIN PLAIN
[2021.01.27] 16:17:04.042 [73752258] RSP: 250-STARTTLS
[2021.01.27] 16:17:04.042 [73752258] RSP: 250 HELP
[2021.01.27] 16:17:04.042 [73752258] CMD: STARTTLS
[2021.01.27] 16:17:04.105 [73752258] RSP: 220 TLS go ahead
Jade https://absolutehosting.co.za
1
Gabriele Maoret - SERSIS Replied
Hi all! It seems that this issue is vanished in my SM installation...

I've re-enablet TLS on outgoing SMTP and now it's 2 days without issues...I don't know why, no other changes in my settings...

If I see new issues I will let you know
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
1
Steve Norton Replied
I've had fun and games today with my Let's Encrypt certificate renewal, turns out Windows Server 2019 doesn't have this intermediate certificate.
From the test I did for Jade I could see that the destination server sent the end entity certificate along with the intermediate, which is the correct way to do it. Is it possible SM has some issues with certificates that do not directly chain up to the computer certificate store?

Update:
I didn't import the intermediate certificate into the Windows machine store, I created a certificate with the intermediate in the pfx too. When tested externally IIS passed the end identity certificate and the intermediate correctly but SM only sent the end identity certificate resulting in TLS issues.
I then added the intermediate to the machine store and rebooted, the next SM test was successful showing the intermediate certificate.
So SM is building it's own chain when it should be using the chain in the PFX for client and server. If chain building fails SM should try to obtain the intermediate from the AIA record of the end identity certificate.
0
Jade D Replied
Still issues on our side. Apologies for the radio silence on my part but other than diagnosing SM issues my days are spent resolving other issues.

Have you managed to get any further Steve?


Jade https://absolutehosting.co.za
0
Keith Dovale Replied
I have had a ticket open since the 7th December 2020 regarding exactly this issue, Not only is is it seen on the latest sm 100 version it also is seen in the 15.7 last update. The mails go into the queues, the server tries to deliver the email, it connects and tells the remote server to negotiate the ciphers and that is where it stops, SMartermail never comes back with an error or anything, and if you try force-it or anything like that it still does nothing, we have checked all the ciphers, and tested with ssllabs, if you test a website /the server  it tests out 100% fine, and tests all the ciphers 100%, but if you test the smartemail server connection, it only uses the weak ciphers. We have provided wireshark traces, etc regarding this and we still have no way of resolving this issue.

We have gone through the process of enabling disabling, saving, removing, rebooting, etc and NO  FIX is in site re this, the only way was to restart the smartermail service, however I now even see that that sometimes doesn't deliver the mails. We picked this up delivering to Xneelos servers in capetown and we thought it was their issue, however we have picked up 2 other isps who we are also seeing this happen, we have hundreds of mails in our queues some days due to this issue. My personal option is it is to do with SM as SM15.7 is also seeing this issue after we upgraded to the last update.

We are running on windows server 2016, with best practices, and all these ciphers are enabled.
0
Jade D Replied
My ticket has also been open since last year.

There's been more progress within this thread than what has been made via the ticket system..

I wonder what support / development are actually doing.

In comparison I logged an issue with Paessler and provided them sufficient info to reproduce the issue along with logs and within 2 days they had a debug build available.

1.5 weeks later a patch was released to resolve the issue.

Log an issue with WHMCS and within 2 -3 days they have a patch available.
Log an issue with ModulesGarden and within a few days the issue is resolved.
Jade https://absolutehosting.co.za
0
Keith Dovale Replied
Yes smae feeling here, I am seriously looking into changing away from SM because of this, we have lost hundreds of customers due to this issue, we cant even revert back to the old version of software now either as these were MAJOR UPGRADES. This is also why we dont upgrade to every version when they come out as in our experience new versions have issues for a long time before they settle, we were working 100% fine on the 12 and 15 versions, and only due to the TLS depreciation to 1.2 and 1.3 we upgraded, to our own peril.

If this is not resolved by the end of the month we are going to look at alternatives as we will lose our clients if we carry on this way.

0
Steve Norton Replied
Hi Keith,
Do you have a domain name that the xneelos servers host and can you taker a look at the intermediate certificate that I mentioned in my last post.
Steve
0
Jade D Replied
Hi Steve

I've managed to get a mailbox created on hetzner / xneelo's mail servers and will send you a mail with the details so that we can test.

Will also update my ticket with ST with this info
Jade https://absolutehosting.co.za
0
Jade D Replied
Im with you Keith and share your thoughts and frustration regarding the lack of support and low levels of service experienced - no other vendor that we work with operates like this.

Quite pissed that I spent thousands of dollars renewing support in December of 2020 and this is the level of support that's received. 

Let me know what you find as a replacement, we as web hosting providers cannot continue to provide a solution or product to our clients that is so badly supported by the vendor.

I've setup a xneelo test mailbox and Steve is kindly assisting us with this matter. Just waiting on Smartermail to belly flop after a day or so and stop delivering emails again.
Jade https://absolutehosting.co.za
0
Jade D Replied
@Keith

Interestingly enough, mails to hetzner are being delivered with tls enabled, how ever we now seeing the same issue for mails being sent to datakeepers

Can you take a look at your mail spools and see if you see the same?
Jade https://absolutehosting.co.za
0
Steve Norton Replied
Okay all, here's a workaround for this problem until ST can get this resolved. I'll add lower level detail at the end of this reply.
Run the following PowerShell script interactively as an administrator.
#Begin
$TypeDefinition = @"
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Net.Sockets;
    using System.Runtime.InteropServices;
    using System.Text;

    namespace ConnectionKiller
    {
        public class Program
        {
            // Taken from https://github.com/yromen/repository/tree/master/DNProcessKiller
            // It part from the Disconnecter class.
            // In case of nested class use "+" like that [ConnectionKiller.Program+Disconnecter]::Connections()

            /// <summary>
            /// Enumeration of the states
            /// </summary>
            public enum State
            {
                /// <summary> All </summary>
                All = 0,
                /// <summary> Closed </summary>
                Closed = 1,
                /// <summary> Listen </summary>
                Listen = 2,
                /// <summary> Syn_Sent </summary>
                Syn_Sent = 3,
                /// <summary> Syn_Rcvd </summary>
                Syn_Rcvd = 4,
                /// <summary> Established </summary>
                Established = 5,
                /// <summary> Fin_Wait1 </summary>
                Fin_Wait1 = 6,
                /// <summary> Fin_Wait2 </summary>
                Fin_Wait2 = 7,
                /// <summary> Close_Wait </summary>
                Close_Wait = 8,
                /// <summary> Closing </summary>
                Closing = 9,
                /// <summary> Last_Ack </summary>
                Last_Ack = 10,
                /// <summary> Time_Wait </summary>
                Time_Wait = 11,
                /// <summary> Delete_TCB </summary>
                Delete_TCB = 12
            }

            /// <summary>
            /// Connection info
            /// </summary>
            private struct MIB_TCPROW
            {
                public int dwState;
                public int dwLocalAddr;
                public int dwLocalPort;
                public int dwRemoteAddr;
                public int dwRemotePort;
            }

            //API to change status of connection
            [DllImport("iphlpapi.dll")]
            //private static extern int SetTcpEntry(MIB_TCPROW tcprow);
            private static extern int SetTcpEntry(IntPtr pTcprow);

            //Convert 16-bit value from network to host byte order
            [DllImport("wsock32.dll")]
            private static extern int ntohs(int netshort);

            //Convert 16-bit value back again
            [DllImport("wsock32.dll")]
            private static extern int htons(int netshort);

            /// <summary>
            /// Close a connection by returning the connectionstring
            /// </summary>
            /// <param name="connectionstring"></param>
            public static void CloseConnection(string localAddress, int localPort, string remoteAddress, int remotePort)
            {
                try
                {
                    //if (parts.Length != 4) throw new Exception("Invalid connectionstring - use the one provided by Connections.");
                    string[] locaddr = localAddress.Split('.');
                    string[] remaddr = remoteAddress.Split('.');

                    //Fill structure with data
                    MIB_TCPROW row = new MIB_TCPROW();
                    row.dwState = 12;
                    byte[] bLocAddr = new byte[] { byte.Parse(locaddr[0]), byte.Parse(locaddr[1]), byte.Parse(locaddr[2]), byte.Parse(locaddr[3]) };
                    byte[] bRemAddr = new byte[] { byte.Parse(remaddr[0]), byte.Parse(remaddr[1]), byte.Parse(remaddr[2]), byte.Parse(remaddr[3]) };
                    row.dwLocalAddr = BitConverter.ToInt32(bLocAddr, 0);
                    row.dwRemoteAddr = BitConverter.ToInt32(bRemAddr, 0);
                    row.dwLocalPort = htons(localPort);
                    row.dwRemotePort = htons(remotePort);

                    //Make copy of the structure into memory and use the pointer to call SetTcpEntry
                    IntPtr ptr = GetPtrToNewObject(row);
                    int ret = SetTcpEntry(ptr);

                    if (ret == -1) throw new Exception("Unsuccessful");
                    if (ret == 65) throw new Exception("User has no sufficient privilege to execute this API successfully");
                    if (ret == 87) throw new Exception("Specified port is not in state to be closed down");
                    if (ret == 317) throw new Exception("The function is unable to set the TCP entry since the application is running non-elevated");
                    if (ret != 0) throw new Exception("Unknown error (" + ret + ")");

                }
                catch (Exception ex)
                {
                    throw new Exception("CloseConnection failed (" + localAddress + ":" + localPort + "->" +  remoteAddress + ":" + remotePort + ")! [" + ex.GetType().ToString() + "," + ex.Message + "]");
                }
            }

            private static IntPtr GetPtrToNewObject(object obj)
            {
                IntPtr ptr = Marshal.AllocCoTaskMem(Marshal.SizeOf(obj));
                Marshal.StructureToPtr(obj, ptr, false);
                return ptr;
            }
        }
    }

"@

Add-Type -TypeDefinition $TypeDefinition -PassThru | Out-Null

while ($true)
{
    $SmtpConnections = Get-NetTCPConnection -RemotePort 25 -State Established -ErrorAction SilentlyContinue| Where-Object -Property "CreationTime" -LT $((Get-Date).AddMinutes(-30))
    if ($SmtpConnections)
    {
        foreach ($Connection in $SmtpConnections)
        {
            Write-Host "Issue found at $(Get-Date) with connection`nRemote address`t" $Connection.RemoteAddress "`nLocal port`t`t" $Connection.LocalPort
            [ConnectionKiller.Program]::CloseConnection($connection.LocalAddress, $connection.LocalPort, $connection.RemoteAddress, $connection.RemotePort)
        }
    }
    else
    {
        Write-Host "No issues found at $(Get-Date)"
    }
    Start-Sleep -Seconds 900
}
#End

Low level;
The System.Net.Security.SslStream remains open when there are errors in the stream, the connection can be seen at the network level and remains in the 'established' state. The spooler holds the email in the processing state and so the mail is never retried. A restart of the service clears all established connections, which is why that has been the workaround this far.
The application needs to retry the SMTP command after 60 seconds if a response is not received (or a 'TCP Dup ACK' is seen), the application should also retry using STARTTLS if the message returns to spool processing instead of reverting to clear text. If the STARTTLS option is there it should be used.
My script looks for stale SMTP connections that have been open for more than 30 minutes and ends them.


0
Jade D Replied
Thank you Steve - Smartertools owe you for time and effort spent on identifying the issue with their code, which has been around for ages.

Jade https://absolutehosting.co.za
1
Matt Petty Replied
Employee Post
Steve, thank you for pointing this out. I will be doing some tests this week that involve sending large amounts of email into connections that will generate problems. I'll keep my eye on the number of streams remaining open and how we handle errors. Hopefully we have an answer for this soon.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
Steve Norton Replied
Jade,
The screenshot I sent you of the failed connection WireShark capture might help.
Matt,
What is the best email address to send the invoice to?
0
Matt Petty Replied
Employee Post
If you don't already have a ticket open, you can open the ticket and mention that it's in reference to the spool delivery issue from the community. DM or mention the ticket ID to me and I can add it to my task as a reference. I'm currently setting up an environment that I can test against that generates random SSL errors.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
3
Tim Uzzanti Replied
Employee Post
Jade,

You always need to throw shade at SmarterTools and its really disappointing because it doesn't help anyone.

On Feb 3rd you told our team you wouldn't provide us anymore information. On Feb 4th the Manager of Customer Support replied that we needed your cooperation if you wanted us to help resolve the problem. The team continued to reply to you asking if you wanted assistance and not until Feb 18th did you fill out the RSAA. We couldn't connect to your server, and I believe on Feb 24th we finally did get access and saw that SSL was disabled for outbound. We then asked for your approval to re-enable it so we could verify what connections were doing.

Not only do you make the process of helping you absolutely torture but you then criticize us in the community. This is how you have always acted with us and it's been very disappointing. I'm posting this timeline because it's important that our customers get both sides of the story, not simply an impression based on your numerous comments and complaints.

Maybe in the future you can and should work with us a bit differently?
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
2
Tim Uzzanti Replied
Employee Post
Steve,

Fantastic FIND!
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Matt Petty Replied
Employee Post
Steve,
Can you do, in a command prompt.
netstat -nao | find ":25"
(or whatever port is the primary port being used for the problematic outbound connections)
I've only done a couple tests but I am seeing connections left open but marked with "TIME_WAIT", I wanna make sure I see what you're seeing. Also these connections, are you connecting directly to SSL or is it upgrading to SSL with StartTLS? I'm going to be testing both but It'l help me out.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
Jade D Replied
Hi Matt.

You have access to my relay server - details sent through to you guys via ticket


Jade https://absolutehosting.co.za
0
Steve Norton Replied
Matt,
My script looks for '-State Established ' so it's not what you're seeing.
All issues relate to STARTTLS.

"Low level;
The System.Net.Security.SslStream remains open when there are errors in the stream, the connection can be seen at the network level and remains in the 'established' state. The spooler holds the email in the processing state and so the mail is never retried. A restart of the service clears all established connections, which is why that has been the workaround this far.
The application needs to retry the SMTP command after 60 seconds if a response is not received (or a 'TCP Dup ACK' is seen), the application should also retry using STARTTLS if the message returns to spool processing instead of reverting to clear text. If the STARTTLS option is there it should be used.
My script looks for stale SMTP connections that have been open for more than 30 minutes and ends them."
0
Steve Norton Replied
Matt,
Also note Jade's server has the script running that kills these connections.

0
Matt Petty Replied
Employee Post
Can you send me addresses/servers that you've been having issues with? I've got a utility that can simulate a SMTP outbound session and report me information about what's negotiated. I've tried smtp.aruba.it in my testing. You can DM this information to me if you do not wish to put it in the community.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
Steve Norton Replied
Matt,
It's not addresses/servers that are the problem, it's SM error handling and you can create your own listening application that you can SMTP to and drop the connection or send timeout messages to correct your handling. This is something you can recreate in-house.
1
Matt Petty Replied
Employee Post
I have recreated these issues in house but it requires that the server (a simple SMTP server I threw together) to not respond at all during the StartTLS negotiation. At the moment we do not have a timeout on the StartTLS process, the corrective action for this issue would be to add one. I was just hoping for a more concrete example of this issue rather than a slapped together SMTP server that I'm forcing to work incorrectly.

When fixing an issue I usually recreate the scenario in my own way (the test SMTP server) but then when I believe I have recreated the issue in my own way and fixed it, then I take the your scenario and run it with my code, to basically guarantee the scenario you are facing will be fixed by this. That's why I'd like to have some of the target servers, that way I can basically guarantee I can match the recreated issue with yours and that when I fix it, I can run the exact same test again and hopefully see some good runs.

TL;DR: I can create it inhouse, I'd like to verify what I'm creating is the exact issue, we don't have a timeout on StartTLS, we should add one, I'll do that and report back if it fixed MY scenario.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
1
Matt Petty Replied
Employee Post
I've got a timeout put into the code, this is what a full delivery looks like. First attempt StartTls, times out, second attempt without SSL, delivered.


[2021.03.01] 14:35:34.887 [33985000] Delivery started for test@smartermail.io at 2:35:34 PM
[2021.03.01] 14:35:37.920 [33985000] Added to SpamCheckQueue (1 queued; 0/30 processing)
[2021.03.01] 14:35:37.927 [33985000] [SpamCheckQueue] Begin Processing.
[2021.03.01] 14:35:37.933 [33985000] Blocked Sender Checks started.
[2021.03.01] 14:35:38.004 [33985000] Spam Checks started.
[2021.03.01] 14:35:38.005 [33985000] Spam Checks skipped: User authenticated
[2021.03.01] 14:35:38.005 [33985000] Spam Checks completed.
[2021.03.01] 14:35:38.006 [33985000] Removed from SpamCheckQueue (0 queued or processing)
[2021.03.01] 14:35:40.979 [33985000] Added to RemoteDeliveryQueue (1 queued; 0/50 processing)
[2021.03.01] 14:35:40.979 [33985000] [RemoteDeliveryQueue] Begin Processing.
[2021.03.01] 14:35:41.006 [33985000] Sending remote mail for test@smartermail.io
[2021.03.01] 14:35:41.258 [33985000] MxRecord count: '1' for domain 'test.com'
[2021.03.01] 14:35:41.258 [33985000] Attempting MxRecord Host Name: 'test.com', preference '1', Ip Count: '1'
[2021.03.01] 14:35:41.258 [33985000] The mx record ip '127.0.0.1' is a local IP.  All IPs of a lower preference have been tried.
[2021.03.01] 14:35:41.258 [33985000] MxRecord 'test.com' is a localIp, but skip local check is true so continuing anyways
[2021.03.01] 14:35:41.258 [33985000] Attempting to send to MxRecord 'test.com' ip: '127.0.0.1'
[2021.03.01] 14:35:41.260 [33985000] Sending remote mail to: test@test.com
[2021.03.01] 14:35:41.261 [33985000] Initiating connection to 127.0.0.1
[2021.03.01] 14:35:41.263 [33985000] Connecting to 127.0.0.1:25 (Id: 1)
[2021.03.01] 14:35:41.263 [33985000] Connection to 127.0.0.1:25 from 127.0.0.1:62948 succeeded (Id: 1)
[2021.03.01] 14:35:44.335 [33985000] RSP: 220 localhost v8.0.0.0 ESMTP ready
[2021.03.01] 14:35:44.341 [33985000] CMD: EHLO GE06.st.local
[2021.03.01] 14:35:44.421 [33985000] RSP: 250-localhost Hello GE06.st.local, haven't we met before?
[2021.03.01] 14:35:44.421 [33985000] RSP: 250-PIPELINING
[2021.03.01] 14:35:44.421 [33985000] RSP: 250-8BITMIME
[2021.03.01] 14:35:44.421 [33985000] RSP: 250-SMTPUTF8
[2021.03.01] 14:35:44.421 [33985000] RSP: 250 STARTTLS
[2021.03.01] 14:35:45.964 [33985000] CMD: STARTTLS
[2021.03.01] 14:35:45.995 [33985000] RSP: 220 ready when you are
[2021.03.01] 14:36:17.690 [33985000] Exception: SSLStream Authentication Timeout.
[2021.03.01] Stack:    at MailService.RelayServer.Clients.SMTP.ClientConnectionSync.InitiateSsl(Boolean validateAllCerts) in C:\Code\smartermail\src\MailService\RelayServer\Clients\SMTP\ClientConnectionSync.cs:line 170
[2021.03.01]    at MailService.RelayServer.Clients.SMTP.SmtpClientSession.GiveStartTls(Boolean validateAllCerts) in C:\Code\smartermail\src\MailService\RelayServer\Clients\SMTP\SmtpClientSession.cs:line 400
[2021.03.01] 14:36:18.364 [33985000] Attempt to ip, '127.0.0.1' success: 'False'
[2021.03.01] 14:36:18.372 [33985000] Removed from RemoteDeliveryQueue (0 queued or processing)
[2021.03.01] 14:37:20.860 [33985000] Added to RemoteDeliveryQueue (1 queued; 0/50 processing)
[2021.03.01] 14:37:20.860 [33985000] [RemoteDeliveryQueue] Begin Processing.
[2021.03.01] 14:37:20.862 [33985000] Sending remote mail for test@smartermail.io
[2021.03.01] 14:37:20.926 [33985000] MxRecord count: '1' for domain 'test.com'
[2021.03.01] 14:37:20.926 [33985000] Attempting MxRecord Host Name: 'test.com', preference '1', Ip Count: '1'
[2021.03.01] 14:37:20.926 [33985000] The mx record ip '127.0.0.1' is a local IP.  All IPs of a lower preference have been tried.
[2021.03.01] 14:37:20.926 [33985000] MxRecord 'test.com' is a localIp, but skip local check is true so continuing anyways
[2021.03.01] 14:37:20.926 [33985000] Attempting to send to MxRecord 'test.com' ip: '127.0.0.1'
[2021.03.01] 14:37:20.927 [33985000] Sending remote mail to: test@test.com
[2021.03.01] 14:37:20.927 [33985000] Initiating connection to 127.0.0.1
[2021.03.01] 14:37:20.927 [33985000] Connecting to 127.0.0.1:25 (Id: 1)
[2021.03.01] 14:37:20.927 [33985000] Connection to 127.0.0.1:25 from 127.0.0.1:63084 succeeded (Id: 1)
[2021.03.01] 14:37:42.302 [33985000] RSP: 220 localhost v8.0.0.0 ESMTP ready
[2021.03.01] 14:37:42.303 [33985000] CMD: EHLO GE06.st.local
[2021.03.01] 14:37:42.370 [33985000] RSP: 250-localhost Hello GE06.st.local, haven't we met before?
[2021.03.01] 14:37:42.370 [33985000] RSP: 250-PIPELINING
[2021.03.01] 14:37:42.370 [33985000] RSP: 250-8BITMIME
[2021.03.01] 14:37:42.370 [33985000] RSP: 250-SMTPUTF8
[2021.03.01] 14:37:42.370 [33985000] RSP: 250 STARTTLS
[2021.03.01] 14:37:44.683 [33985000] CMD: MAIL FROM:<test@smartermail.io>
[2021.03.01] 14:37:53.952 [33985000] RSP: 250 Ok
[2021.03.01] 14:37:53.952 [33985000] CMD: RCPT TO:<test@test.com>
[2021.03.01] 14:37:53.996 [33985000] RSP: 250 Ok
[2021.03.01] 14:37:53.996 [33985000] CMD: DATA
[2021.03.01] 14:37:54.043 [33985000] RSP: 354 end with <CRLF>.<CRLF>
[2021.03.01] 14:37:55.270 [33985000] RSP: 250 Ok
[2021.03.01] 14:37:55.270 [33985000] CMD: QUIT
[2021.03.01] 14:37:55.315 [33985000] RSP: 221 bye
[2021.03.01] 14:37:55.977 [33985000] Attempt to ip, '127.0.0.1' success: 'True'
[2021.03.01] 14:37:55.991 [33985000] Delivery for test@smartermail.io to test@test.com has completed (Delivered)
[2021.03.01] 14:37:55.991 [33985000] Removed from RemoteDeliveryQueue (0 queued or processing)
[2021.03.01] 14:37:57.344 [33985000] Removing Spool message: Killed: False, Failed: False, Finished: True
[2021.03.01] 14:37:57.345 [33985000] Delivery finished for test@smartermail.io at 2:37:57 PM	[id:397433985000]


This would likely work but I'm gonna wait before I slap my hands together and call this done though. I'd really wanna see this working in the realworld and not a simulated example. The Root-Cause-Analysis in me really would like to know why this StartTLS is failing at all in the first place and would like to know why we aren't getting the usual indicators that it's bad.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
Jade D Replied
Hi Matt

IP's were provided within my ticket 28A-276327AE-0B78 on the following dates
Dec 9, 2020 8:14:06 AM
Jan 4, 2021 10:25:47 AM

The last response includes a download link for you to download all the log files that were requested, so you'll be able to grab the destination IP's and recipient's email address - refer to the screenshots in the ticket for details.
Jade https://absolutehosting.co.za
0
Steve Norton Replied
Hi Matt,
It looks like you're checking the SslStream.IsAuthenticated property, I would agree that's the correct approach. There are many servers that, occasionally, have a problem creating the TLS stream, the SslStream.IsAuthenticated property would cover all scenarios telling us if the underlying negotiation was successful. I say "occasionally" because it doesn't happen all the time to the same server, I've seen a network capture where the 'client hello' was sent (not sure if it got there) but no 'server hello' was received (not sure if it was sent). So rather than using clear text on the next try SM should try STARTTLS again, maybe you could retry the AuthenticateAsClient method in the initial attempt.
To enhance this for all customers there should be a per domain option to fail a message that cannot use STARTTLS when the server has sent the 250-STARTTLS command. Many customers would rather mail fail to send than have it sent clear text, the cause of these failures can then be investigated.

For servers that send the 250-STARTTLS command that would give us;

'Allow mail to be sent clear text on TLS failures' - Enabled
Attempt 1 - AuthenticateAsClient called - 60 seconds pass - IsAuthenticated fails - sending fails and gets re-queued
Attempt 2 - AuthenticateAsClient called - 60 seconds pass - IsAuthenticated fails - sending fails and gets re-queued
Attempt 3 - mail is sent clear text

'Allow mail to be sent clear text on TLS failures' - Disabled
Attempt 1 - AuthenticateAsClient called - 60 seconds pass - IsAuthenticated fails - sending fails and gets re-queued
Attempt 2 - AuthenticateAsClient called - 60 seconds pass - IsAuthenticated fails - sending fails and gets re-queued
Attempt 3 - mail is returned to user

We're making progress :)
2
Employee Replied
Employee Post
Hello all, 

I wanted to let you know that development found a resolution for outbound emails using StartTLS getting stuck until restart. There was a spot where we waited for the other server's response without a timeout, and if the other server never responded, it could cause a lockup. If you would like the custom build with this resolution, please reach out via a support ticket.

Kind regards,
0
Jade D Replied
Thanks for the email Andrea,

I've installed the custom build and will report back on any issues.
Note that we typically only see issue after a day or so of Smartermail running so if you dont hear back from me after a day or more then it is not due to a lack of cooperating.
Jade https://absolutehosting.co.za
1
Tim Uzzanti Replied
Employee Post
We have implemented a 30 second timeout on connections. We are also attempting to negotiate TLS twice on the connection before performing a non TLS conversation. The examples that were provided are not reproducing the problem so it looks like certain customers (in this thread) have a higher chance of random bad connections. Why this is happening (Windows, Host issues, network adapter, firewall, networking equipment or upstream providers), we do not know. This issue isn't widespread but prominent for certain customers as you can see in this thread.

Although the 30 second timeout will resolve these bad connections and we should have had a timeout to being with, we are not satisfied waiting for the timeout. We want to know why these bad connections exist so we can terminate them sooner and not burden Windows and SmarterMail with connections just waiting around to timeout. We are going to continue looking at it this but the timeout will essentially solve the problem.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Keith Dovale Replied
Tim the issue here is they are not erroring out, they start the session, the session gets an error on the tls command, and then it stops there, no further actions take place, the connection attempts are 0, the force connection does nothing, you have to physically restart the sm service to get the emails delivered.

This all started in November / Dec when we upgraded our SM Versions. 
Still to this day we experience up to a hundred mails a day with this issue. 
I fgond this powershell fix up in the chain, and run it on our servers it fixes the locked mails in the queues when its run on both the 100x version and the 15.7x version, I will load the latest 100x version now

Has this issue been updated in the 15.7x version as well ? And will it be ? This seems to be a bug  and needs to be fixed I am not going to upgrade a perfectly working server to fix a bug.

1
Tim Uzzanti Replied
Employee Post
Keith, that is the reason for the timeout is to act as a catch-all.  A deeper understanding of the packets and what the last packet is would possibly allow us to terminate the session prior to the timeout and optimize things more.  Some people seem to see this more than others and it could be the destination servers/environments or customer servers/environments which causes it to be more frequent.  

Staying up to date with maintenance and support or leasing our products is how you get the latest and greatest no matter if it is a bug, security, or a result of the constantly changing world of the Internet/Protocols/Operating Systems/Applications.  Any additional discussions related to how updates are provided in this thread will be removed as it is off topic.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
3
Jade D Replied
Some feedback

We're seeing good results with the custom build that was supplied.
I am going to roll this version out to some of our other gateway servers and will monitor further.

This will provide a rider range of base OS to test with
Jade https://absolutehosting.co.za
0
Jade D Replied
Morning,

I installed the custom build onto two of our bulk mail gateway servers and it has resolved the issues that we were experiencing.

Mails which were stuck in the queues for days have now cleared.

Well done guys, finally managed to get this one nailed.
Jade https://absolutehosting.co.za
5
Sébastien Riccio Replied
Hello,

Great to hear this has been fixed.

While we're in the outgoing SSL topic, I would find interesting to allow domain admins to set a "Force outgoing TLS" for their domain.

While for some people it's important their mail reach the recipients no matter what (using TLS or not) some others also wants the opposite.
Only transmit the mail content if the connection is encrypted. We had a our bunch of requests about this, especially for customers handling sensitive informations (in some case we head them to PGP) but some would like at least to be sure their mail going out of our servers and reaching their recipient server are not transmitted clear text or not transmitted at all.

Because we can't do this with SmarterMail (and for other reasons) we relay all outgoing mails through a gateway where we can configure this on a sender domain basis.

I guess it wouldn't harm to have this built-in SmarterMail.

Scenarios:

Global TLS for outgoing is disabled and force outgoing TLS for the domain disabled:
- Send the mail clear text

Global TLS for outgoing is enabled and force outgoing TLS for the domain disabled:
- One or two attempts are made with TLS
- If TLS failed, fallback to non-TLS

Global TLS for outgoing is disabled and force outgoing TLS for the domain enabled:
- One or two attempts are made with TLS
- If TLS failed, bounce  back then sender with something like "Force outgoing SSL enabled,  but ca not negotiate a secure transaction" 

Global TLS for outgoing is enabled and force outgoing TLS for the domain enabled:
- One or two attempts are made with TLS
- If TLS failed, bounce  back then sender with something like "Force outgoing SSL enabled,  but ca not negotiate a secure transaction" 

Do you think this makes sense?

Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Keith Dovale Replied
Hi,

ok we run the script on the 15.7x version and the mails all get delievered and non sit in the spool, however, we updated to the new version, but I have still seen a few mails sit in the spool with 137hrs for delivery. Running the powershell script sorts this out, but obviously I don't want to run scripts on these servers. Will the 15.7x version also get a patch ?
0
echoDreamz Replied
@Keith, I doubt it... It's not a supported version anymore and hasn't received updates in 2 years.
0
Keith Dovale Replied
Yeah so it seems, I will run the script on the server to resolve this as a temporary issue, however the latest 100 SM version that was supposedly fixed is NOT fixed, we have loaded the new version, and we are still seeing mails left in the spool undeleivered, there is a definate, reduction of the number of these mails, but they still do exist. If I run the script on this server, then the mails left in the queues are then delivered, so the script seems to fix the issues, but not the fixed sm version.  The real reason whey we still using SM15.7 was it was stable and we had no issues, until december 2020. We then upgraded our one server to sm 100 version, but now we have endless issues with this version, with no fixes in site for an issue with reverse dns on a ipv4 server doing an ip6 revdns on the server, and then this issue with these mails being left in the spool. 

This is very frustrating, for me, as the other option that was indicated is if the rbls times are high time wise this can also be the issue, but with RBLS on or off theis issue still happens. Its like working in the dark.
0
Steve Gaston Replied
Ive just disabled this setting across all my domain

"enable tls if supported by remote server"

usual culprits are mail server in my country of origin and Italy ..

Reply to Thread