Explaining the "Client Disconnected" bug with Office 365

Question asked by Douglas Foster - 5/20/2022 at 9:05 AM

Unanswered

This is a generalized version of the email that I am using to explain the Office365 disconnect bug to our correspondents. I am posting it here in hopes that others will use something similar to work the issue with their correspondents. It also explains why this is much more than a SmarterMail problem.

We discovered a problem with a message sent from your organization to ours. The problem has been traced to Microsoft, and it may be affecting any of your outbound messages to any organization.

By the SMTP standard, a sending system is supposed to indicate end-of-data and then wait for an SMTP response code to confirm whether the message has been accepted or not. This leaves a design question of how the receiving system should act if the session is disconnected after end-of-data but before the SMTP response can be sent. The receiving system can reasonably assume that the disconnect was accidental, and therefore the sender will reattempt delivery. Based on this expectation, the submitted but unconfirmed message should be discarded so that the recipient does not receive two copies when the message is resubmitted.

A problem occurs if the sending system disconnects intentionally, without waiting for a status code, and has no intention of resubmitting the message. The users are left in a vacuum, because the sending system has declared delivery success, while the receiving system has declared delivery failure.

What's worse, the deliberate-disconnect behavior is most often observed from spamming sources, which may be more concerned about maximizing attacks than about confirming whether an attack is accepted. However, the behavior has also been observed by outbound Office365 servers. After becoming aware of this issue yesterday, I reviewed our logs back to <date>. As expected, 52 of the incidents were spam, and the other 2 were from Office365. The problem is also very intermiitent - it occurred on only 2 of the 2,421 messages we received from Office365 servers during that same time period. Our two problems were observed on traffic from these two servers:

40.107.89.105 (mail-bl0gcc02on2105.outbound.protection.outlook.com)

40.107.89.89 (mail-bl0gcc02on2089.outbound.protection.outlook.com)

Our mail system vendor has provided an option to continue delivery when a remote disconnect occurs after end-of-data, and I have activated that option for our environment. This ensures that future messages from you will not be affected by this Microsoft bug. However, it may increase our exposure to both unwanted messages and wanted-but-duplicate messages. Consequently, I would like to disable the feature as soon as Microsoft has found and fixed their problem. I am hoping that you can open a ticket with them to get this issue investigated.

9 Replies

Reply to Thread

Douglas Foster Replied

5/25/2022 at 8:22 AM

My correspondent organization has opened a ticket with Microsoft using the information that I provided them. For this particular type of problem, the SmarterMail logs and log viewer are exceptionally good: (1) search for a phrase, (2) capture all related traffic on that transaction, then (3) copy it to clipboard for further analysis. Searches big log files, whether zipped or unzipped, very quickly. Kudos to SmarterTools

Sébastien Riccio Replied

5/25/2022 at 10:22 PM

Hello Douglas,

Thank you for bringing attention to this problem to the fellows that could forward them to the right persons @MS.

I thought about trying to discuss the issue in the mailop mailing list. It might get some attention there too ?

https://www.mailop.org/

Kind regards,

Sébastien

Sébastien Riccio System & Network Admin https://swisscenter.com

Douglas Foster Replied

5/26/2022 at 8:58 AM

Anything that gets more people involved is fine with me, but ultimately, Microsoft has to be convinced to investigate and fix their problem. That requires tickets to Microsoft. Naturally, my correspondent's first attempt was rebuffed by a low-level tech who gave a boilerplate answer about ciphersuite compatibility, which indicated that they did not understand the technology or the problem. I have asked them to escalate.

Douglas Foster Replied

6/12/2022 at 4:37 AM

One correspondent opened a ticket with Microsoft, and received a boilerplate reply about TLS encryption suite compatibility problems with SMTP and SQL. (In my experience with Microsoft support, the first reply comes from someone who does not understand the question.) I asked them to escalate, but have not had an update. This week, I will plan to query that correspondent and start the process with another one. I am disappointed that no one else has results to report.

Matt Petty Replied

6/13/2022 at 7:27 AM

Employee Post

I'm very interested in following where this goes. This causes us a huge delivery issue for a while and ultimately we just hammed in a setting to override the proper behavior to accommodate office365.

*Everyone say it with me at the same time so Microsoft hears it*
👏 "Servers should wait for the DATA "OK" receiving server response BEFORE disconnecting" 👏

Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com

Douglas Foster Replied

6/13/2022 at 8:06 AM

I just checked the SMTP logs on my incoming gateway. I was surprised to see no new events since 5/24/2022, despite receipt of 2400 messages from Outlook.com since then.

Anyone else have data? Search rule:

SMTP log contains "Client socket is disconnected!"

Douglas Foster Replied

6/15/2022 at 2:14 PM

The problem is back.. Server 40.107.89.128 was the source today. Last time, client ticket was closed by Microsoft without comment. Will try again with a different client.

Alessandro Pereira Replied

7/7/2022 at 6:00 PM

We are facing the same problem.

[2022.07.05] 08:49:26.782 [40.92.97.66][21489292] Client socket is disconnected! Disconnect exception encountered: False, IsDisconnected: True, This message will be rejected.

[2022.07.05] 08:49:26.782 [40.92.97.66][21489292] Received message size: 577917 bytes

[2022.07.05] 08:49:26.782 [40.92.97.66][21489292] Successfully wrote to the HDR file. (I:\SmarterMail\Spool\SubSpool4\101801208.hdr)

[2022.07.05] 08:49:26.782 [40.92.97.66][21489292] Data transfer succeeded, writing mail to 101801208.eml (MessageID: <ROAP284MB1214E76A0A19554DD952339D95819@ROAP284MB1214.BRAP284.PROD.OUTLOOK.COM>)

[2022.07.05] 08:49:26.782 [40.92.97.66][21489292] Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.

[2022.07.05] 08:55:27.541 [40.92.97.66][21489292] rsp: 421 Command timeout, closing transmission channel

Zach Sylvester Replied

7/8/2022 at 12:29 PM

Employee Post

Hey Alessandro,

Thanks for reaching out. This issue is caused by the sending server not waiting for the DATA OK command. As a work around you can go to Setting->protocols then enable continue delivery session if client disconnects.

Kind Regards,

Zach Sylvester Software Developer SmarterTools Inc. www.smartertools.com

Back to Community Threads

Please leave this box unchecked

Reply to Thread

Enter the verification text