3
loss of SAN connection, loss of mail service
Problem reported by Eric Swanzey - 3/13/2019 at 7:13 AM
Resolved
I run an updated SmarterMail server (build 7008) that stores all data on a SAN drive. When the SAN is unreachable, the mail server is unable to validate any user of any domain, so essentially mail service is down (though the mail server is essentially running fine). I now have a service monitor check the mail server every 5 minutes and it was tripped earlier this morning at 2:51 am.

When logging into SmarterMail, it fails to bring up any domain or user information because it's all stored on the SAN. In it's error log are repeating entries stating "Application Stopping.  Reason: The hosting environment shut down the application."

In the Windows Server logs, I see an error at 2:47:17 am regarding Schannel eventid 36888 - "A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 10. The Windows SChannel error state is 1203." I also see that the Windows Update service was busy installing updates.

The quickest resolution when this happens is to simply restart the SmarterMail service. I'd like to prevent this from happening, and want to see if anyone has any ideas on the cause. I thought about moving config files off the SAN and onto the server but don't think that would do much good because the message store would still be on the SAN. Any suggestions would be appreciated.

5 Replies

Reply to Thread
0
Eric Swanzey Replied
Really what I should have pointed out is that SmarterMail doesn't re-connect to the SAN after it comes back online. The SAN is setup as drive D on the server. SmarterMail should be smart enough to know when a drive comes back online, shouldn't it?
1
echoDreamz Replied
The bigger question here is, why is your SAN (what is storing all the mail data) is randomly disappearing? SmarterMail itself isnt reading and writing data, the .net runtime is. Honestly, I am surprised SmarterMail keeps running when the drives disappear... There are no longer any grp files to read, no domain settings etc. to read/write from. I think the answer here is to keep the SAN from going offline, or if the SAN needs to go offline, such as for maintenance etc. time that with with SM updates, server updates etc. so they go down together.
0
Eric Swanzey Replied
I don't think the SAN is randomly disappearing, but caused by Windows Update. Even so, that isn't the issue and I don't agree it's the bigger question.

I think that the bigger question is - why isn't SmarterMail re-checking for drive D SAN availability until it comes back online? That sounds like a bug to me, wouldn't you agree?
0
Matt Petty Replied
Employee Post
    We don't have any specific code that deals with SAN's. It's all transparently handled by .NET when we use filepaths that coincide with SAN capabilities. I'm sure it could maybe be something that we add. I feel like SM probably should have recovered automatically when the file availability comes back. Are your system files also stored on the SAN?, if so I'd advise against putting your system files on a SAN. Either way, we don't really have code that detects that it all the sudden can't access a file (that it previously was accessing) so we have no way to really recover and I'm not really sure what kind of behavior this could cause internally to SM.
    Even if we did continue to run without being able to access those files many parts of the application would throw exceptions when we go to save files etc. Maybe you can set up your service monitor to automatically reboot your mailservice?
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
1
Eric Swanzey Replied
Marked As Resolution
For whatever reason, the SAN apparently wasn't coming online in time for SmarterMail to access it after a reboot. The ultimate solution was to delay the SmarterMail service startup and so far it hasn't had any issues.

Reply to Thread