Explaining the "Client Disconnected" bug with Office 365
Question asked by Douglas Foster - 5/20/2022 at 9:05 AM
This is a generalized version of the email that I am using to explain the Office365 disconnect bug to our correspondents.   I am posting it here in hopes that others will use something similar to work the issue with their correspondents.   It also explains why this is much more than a SmarterMail problem.

<After introducing myself:>

We discovered a problem with a message sent from your organization to ours.    The problem has been traced to Microsoft, and it may be affecting any of your outbound messages to any organization.   
By the SMTP standard, a sending system is supposed to indicate end-of-data and then wait for an SMTP response code to confirm whether the message has been accepted or not.    This leaves a design question of how the receiving system should act if the session is disconnected after end-of-data but before the SMTP response can be sent.   The receiving system can reasonably assume that the disconnect was accidental, and therefore the sender will reattempt delivery.   Based on this expectation, the submitted but unconfirmed message should be discarded so that the recipient does not receive two copies when the message is resubmitted.
A problem occurs if the sending system disconnects intentionally, without waiting for a status code, and has no intention of resubmitting the message.   The users are left in a vacuum, because the sending system has declared delivery success, while the receiving system has declared delivery failure.
What's worse, the deliberate-disconnect behavior is most often observed from spamming sources, which may be more concerned about maximizing attacks than about confirming whether an attack is accepted.   However, the behavior has also been observed by outbound Office365 servers.   After becoming aware of this issue yesterday, I reviewed our logs back to <date>.   As expected, 52 of the incidents were spam, and the other 2 were from Office365.  The problem is also very intermiitent - it occurred on only 2 of the 2,421 messages we received from Office365 servers during that same time period.  Our two problems were observed on traffic from these two servers: (mail-bl0gcc02on2105.outbound.protection.outlook.com)  (mail-bl0gcc02on2089.outbound.protection.outlook.com)
Our mail system vendor has provided an option to continue delivery when a remote disconnect occurs after end-of-data, and I have activated that option for our environment.   This ensures that future messages from you will not be affected by this Microsoft bug.   However, it may increase our exposure to both unwanted messages and wanted-but-duplicate messages.   Consequently, I would like to disable the feature as soon as Microsoft has found and fixed their problem.  I am hoping that you can open a ticket with them to get this issue investigated.

7 Replies

Reply to Thread
Douglas Foster Replied
My correspondent organization has opened a ticket with Microsoft using the information that I provided them.   For this particular type of problem, the SmarterMail logs and log viewer are exceptionally good:  (1) search for a phrase, (2) capture all related traffic on that transaction, then (3) copy it to clipboard for further analysis.    Searches big log files, whether zipped or unzipped, very quickly.  Kudos to SmarterTools
Sébastien Riccio Replied
Hello Douglas,

Thank you for bringing attention to this problem to the fellows that could forward them to the right persons @MS.

I thought about trying to discuss the issue in the mailop mailing list. It might get some attention there too ?

Kind regards,
Sébastien Riccio
System & Network Admin

Douglas Foster Replied
Anything that gets more people involved is fine with me, but ultimately, Microsoft has to be convinced to investigate and fix their problem.   That requires tickets to Microsoft.   Naturally, my correspondent's first attempt was rebuffed by a low-level tech who gave a boilerplate answer about ciphersuite compatibility, which indicated that they did not  understand the technology or the problem.  I have asked them to escalate. 
Douglas Foster Replied
One correspondent opened a ticket with Microsoft, and received a boilerplate reply about TLS encryption suite compatibility problems with SMTP and SQL.   (In my experience with Microsoft support, the first reply comes from someone who does not understand the question.)   I asked them to escalate, but have not had an update.   This week, I will plan to query that correspondent and start the process with another one.   I am disappointed that no one else has results to report.
Matt Petty Replied
Employee Post
I'm very interested in following where this goes. This causes us a huge delivery issue for a while and ultimately we just hammed in a setting to override the proper behavior to accommodate office365. 
*Everyone say it with me at the same time so Microsoft hears it*
👏 "Servers should wait for the DATA "OK" receiving server response BEFORE disconnecting" 👏
Matt Petty
Software Developer
SmarterTools Inc.
(877) 357-6278
Douglas Foster Replied
I just checked the SMTP logs on my incoming gateway.   I was surprised to see no new events since 5/24/2022, despite receipt of 2400 messages from Outlook.com since then.
Anyone else have data?  Search rule:
SMTP log contains "Client socket is disconnected!"
Douglas Foster Replied
The problem is back..   Server was the source today.   Last time, client ticket was closed by Microsoft without comment.   Will try again with a different client.

Reply to Thread