4
Build 7523 Issues (Some with Work-Around Resolutions)
Problem reported by Scarab - 8/20/2020 at 12:14 PM
Submitted
We recently upgraded from SmarterMail Enterprise Build 7503 to Build 7523 using the .EXE on Windows Server 2019 on Hyper-V. We started getting @ 6 crashes per day that are referencing .NET clr.dll with the error:
The process was terminated due to an internal error in the .NET Runtime at IP 00007FFF715A59C3 (00007FFF71550000) with exit code 80131506.
We seemingly resolved these by discovering and eliminating the following issues:

  1. Using Dynamic Memory on the VM in Hyper-V. This caused an apparent memory leak that would otherwise use up 100% Memory resulting in the mailservice.exe to crash > 6x/day with the error above. Disabling this setting in the Hyper-V Host eliminated this issue and vastly decreased the memory usage. This probably should be isolated and fixed in SmarterMail at some point but I'm willing to accept that's more an environmental issue than a SmarterMail issue. Build 7503 would periodically crash < 1/day with the error message above, but it did not have the memory leak.

  2. We had multiple IMAP users with > 500 simultaneous IMAP sessions each. Each of them were MacOS Mail users. They would only utilize @ 10-16 simultaneous IMAP sessions on Build 7503. Turns out each of these users had a folder at the root level (on the same level as their normal Inbox, Deleted Items, Drafts, Sent Items) that was named with their email address. This folder then contained additional Archive, Inbox, Deleted Items, Drafts, Sent Items sub-folders (so \user@domain.tld\Archive, \user@domain.tld\Inbox, \user@domain.tld\Deleted Items, etc.) This was causing MacOS Mail IMAP connections to go crazy. These folders existed for these users in Build 7503 and did not cause this problem. Removing these folders named after their email address, and all their sub-folders, immediately resolved the high number of simultaneous IMAP sessions for those users in Build 7523.

  3. We had numerous users with sub-folders containing messages located in the root Deleted Items folder. Again, each of these users were MacOS Mail users and Build 7503 didn't have an issue with these sub-folders in the Deleted Items folder. However, with Build 7523 it was all of a sudden a problem and leading to Indexing issues and high memory usage. Deleting these sub-folders out of the Deleted Items folder immediately resolved the problem on Build 7523 and Indexing was able to complete for those users and memory usage was dramatically reduced. It also dramatically reduced the number of simultaneous IMAP sessions for each user just as the issue in #2.

  4. We had a closed Mailing List that was a recipient on another closed Mailing List in SmarterMail. This is a fringe case that never should happen but it did (and the user that added them has been scolded). The two of them kept bouncing "You do not have permissions to post to this list" messages back and forth until we had 100,000+ messages backed up in the Spool. Again, Build 7503 handled this fine (it stopped bouncing after a half dozen bounces back and forth) but on Build 7523 it will go on forever until manually resolved by an admin.

  5. A user with an autoresponder enabled in their Outlook CC'd themselves on an Outgoing message. They had themselves set as a Trusted Sender in SmarterMail. This resulted in an endless loop of emails from himself to himself until we had 30,000+ messages backed up in the Spool. This confirmed that Message Throttling is not working in Build 7523, which I suspect may have been the issue with #4.
Once these 5 issues were remedied Build 7523 has been stable for 2 days and counting without a single crash referencing .NET clr.dll. Even though these are all either environmental or data induced, items 2 & 3 definitely are due to a bug in SmarterMail and 4 & 5 due to a recent change.

We also made the following discoveries of issues with Build 7523:

  1. Message Throttling for messages, bandwidth and bounces is no longer working (can confirm that it was as of Build 7503).

  2. Autoresponders no longer function (can confirm that it was as of Build 7503). Although I just noticed that this is already a known issue.

6 Replies

Reply to Thread
1
Scarab Replied
This .NET exit code usually pertains to Garbage collection. Our 100% Memory Usage was a clue as to what was going on. When SmarterMail tried to do Garbage collection in .NET to clean up memory resources there was seemingly no free memory available to swap as the OS Layer was reporting to the App Layer that there was 100% Memory Use, resulting in the exit error. (.NET should have requested more memory allocated, as Dynamic Memory should have been able to grow the amount of Memory available to the VM, but it wasn't.) Turning off Dynamic Memory on the VM running SmarterMail was easy enough to do and test, and it did resolve our issue that had been driving us nuts for a very long time. 

(We eventually went back to running SmarterMail on bare-metal any way as we were having other I/O issues with SmarterMail running in a Hyper-V environment; mostly with excessively long Write Queues...but we know that was specific to our environment as we have a fairly heavy load and at the time were still using a Hyper-V host with Raid arrays of spinning HDDs instead of SSDs. When that Hyper-V host reached it's EOL we replaced it with a server that had two Raid-50 arrays of SSDs and went bare-metal just to be sure that we wouldn't ever experience a Write Queue problem again.)
0
George To Replied
We encountered the same on KVM virtualization.
We simply rebooted the guest VM.

From its Windows Event Logs

The process was terminated due to an internal error in the .NET Runtime at IP 00007FF9A8136B95 (00007FF9A80F0000) with exit code 80131506.

Faulting application name: MailService.exe, version: 100.0.8451.15021, time stamp: 0x63f38fe2
Faulting module name: clr.dll, version: 4.8.4644.0, time stamp: 0x64531a0f

May I have your advise/suggestion?

Thank you very much.
0
George To Replied
We temporarily used a PowerShell script and task scheduler to monitor Windows Event logs, and reboot it if found.

$dateStr = (Get-Date).ToString('yyyy-MM-dd HH:mm:ss')
$Events = Get-EventLog -LogName Application -InstanceId 1023 -After (Get-Date).AddMinutes(-5) -EntryType Error -Source ".NET Runtime" | Where-Object { $_.Message.Contains( "MailService" ) }

if ( $Events.count -lt 1 ) 
{
    write-output "$dateStr - OK"
}
else
{
    write-output "$dateStr - Need Reboot"
    Stop-Service -Name MailService
    Restart-Computer -Force
}
0
Kyle Kerst Replied
Employee Post
From past experience, these errors usually mean there is an underlying problem in the .NET environment on the affected server. I usually recommend starting with Windows Updates and a .NET repair, followed by a reinstall of the .NET components (and re-registering them in Windows) if all else fails.

If you run other .NET sites or services on the server that also rely on .NET you can try moving SmarterMail to its own isolated environment to narrow down whether those problems are being introduced by SmarterMail or the other services you have running there. 

Since comments above noted virtualized environments; we have also seen some environments have better results with memory usage when not using dynamic memory, so I recommend setting a hard limit temporarily to see if this helps. If you need a hand tracking this down please don't hesitate to submit a ticket with us and we'll do our best to get you pointed in the right direction!
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
0
Kyle Kerst Replied
Employee Post
That is great to hear Ron, thanks for keeping us posted! 
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com
1
George To Replied
F.Y.I. - 
We upgraded SmarterMail (Enterprise) from 8451 to 8664 on 23 Sept 2023.
It seems this .NET problem disappeared (not happened for a month).

Reply to Thread