5
Build 7803 - Services doesn't start
Problem reported by Sébastien Riccio - 5/14/2021 at 12:07 PM
Resolved
Hello,

We've updated our production server to 7803 but we had to rollback quickly to the previous build.
After the update, the service is starting but no services are answering (it looks like it's stuck in the start procedure).

The webmail also display the usual dark gray "The server is not responding or cannot be reached. Please try again later. If the problem continues, please check your internet connection or contact your system administrator for assistance. "

I tried two restarts and finally gave up, uninstalled the build and installed the previous.

After this the service starte correctly after 30s to 1minute.

With the production server back to life I gave a new try with this build on our testing server.
The same happens. After the update service is stuck starting up but nothing comes alive.

The startup log shows this:

[20:23:44.709] [6356] MailService process id(s): 6356
[20:23:44.709] [6356] OnStart: Called
[20:23:44.709] [6356] OnStart: Failover not enabled.
[20:23:44.709] [6356] OnStart: Starting ServiceLifetimeFunction thread...
[20:23:44.709] [6356] OnStart: ServiceLifetimeFunction thread started successfully.
[20:23:44.709] [6356] OnStart: Completed
[20:23:44.709] [6356] ServiceLifetimeFunction: Called
[20:23:44.709] [6356] ServiceLifetimeStart: Called
[20:23:44.961] [6356] ServiceLifetimeStart: Completed
[20:23:44.961] [6356] ServiceLifetimeFunction: Entering service wait loop <-- last line and nothing after this

Any idea what could be wrong here ?

Kind regards.

Sébastien Riccio
System & Network Admin

18 Replies

Reply to Thread
2
Sébastien Riccio Replied
Update:

I let the SM service try to start for a few minutes on the test server, without success.
At some point the process was eating 95% of the server RAM and CPU started to load like crazy.
I had to kill the process.

Update 2:
I suspect this entry in the changelog:
Added: Windows Defender is now available as an anti-virus option and is enabled by default.

As we do all scanning work on frontend servers and to avoid unecessary resource usage, we remove windows defender from the servers running SmarterMail.
As it is enabled by default like stated in the changelog, could it be that SM tries to communicate with it at startup and loop on this ?

Update 3:
As susected readding Windows Defender on the system resolved the startup issue on the test server.
However we don't want to re-add windows defender on the prod system. Is there a way to disable this "enabled by default" new windows defender option in before updating so the service can start ?

Final update:
We had to re-install Windows Defender on the prod server, do the SmarterMail update, go to Antivirus settings and disable everything for windows defender then remove again Windows Defender.
Now the new build starts correctly.
It would be great to check if Windows defender is installed before activating it by default...

Kind regards
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Tim Uzzanti Replied
Employee Post
We do check to see if Defender is installed or not.  What server versions are you using? 
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Sébastien Riccio Replied
Hello Tim,


99% sure the issue is around Windows Defender not being present, as it was impossible to start the new build without re-installing Defender on the system.
It took me some time to figure it out on our testing environnement (which is a clone of the prod system).

Also something interresting. While the system was up with windows defender so we can go disable it in the antivirus settings, the server CPUs usage (24 cores) was around 80% when it is usually around 10-15%.

After uninstalling windows defender and restarting, the CPU usage dropped back to normal.
It is like if it was still being used to scan each and every access to users/domain json, cfg files despite the files seems to be excluded in Windows defender config.
This is mainly why we completly remove Windows defender instead of just disabling it (and also to avoid nag screens about system being vulnerable) 

Everything back to normal now.

ps: We remove defender like this:
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Sébastien Riccio Replied
I did an additional test on the test env:

Re-installed windows defender, enabled windows defender scan in  SM settings and removed windows defender, then restarted everything.

This time SM started and it shows that it checked if Windows defender is available:


So the detection seems good. However when doing the update initially the service wouldn't finish the startup without Defender installed, so I'm a bit confused about the reason here.

Is there some specific routine at the first start, after the update to latest build, that could differ from a "normal" startup ?

Really, we waited around 20-25 minutes the first start and then tryed a few restarts without sucess until we re-added Defender, and from this point it started in 1 minute... 

Both reproduced on prod and test env.

Everything is pointing at defender not being installed on our systems. This could be a coincidence but then I don't understand what could be the issue. We rarely had issues at startup, post-update, with other builds.

Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Webio Replied
.... why I didn't saw this topic before I've tried to update to latest version today. After about 30 minutes of struggle I've returned to previous build which have started in minutes. On my environment I also don't have Defender installed because of mentioned in previous posts reasons.

In the end I've returned to previous build and sent support ticket describing problem with screenshots and some weird log entries.
0
Sébastien Riccio Replied
Webio,

the fact you had the same issue and you also have Defender removed on your system confirms that the issue is really related to Defender not being present.

It tooks me quite some time to figure it out while trying to get it running on our test system.

Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Sébastien Riccio Replied
Hmmm I have only two -load.log files in our log directory. 
They are from two weeks ago and contain only one line about an activesync xml loading failure for a domain.
Nothing similar to what you have here.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Employee Replied
Employee Post
Hi Sébastien, Webio, 

I'm sorry to hear that you both had trouble with the upgrade to Build 7803. We have separate support tickets to continue investigating these issues. In the meantime, I wanted to let you know that it does not appear that the System.Exception error Webio submitted is related to the Windows Defender upgrade issue Sébastien reported. As such, I am going to move the following reply to its own thread to allow for continued discussion: 

Webio
Today (5/17/2021) at 3:31 AM
Can you also check if you have error log from time when you where updating SM and failed with filename:

2021.05.17-load.log 

I've started to update on 6AM and reverted to previous build on 6:30 AM and this log has 2GB size and starts on 6:13 and ends on 6:18 and contain a lot of errors like this one below (DOMAINAME is a real domain name). This might be from the time when I've restarted service to check if restart will fix this problem. I've also saw in perfmon a lot of trying to read settings.json files from various domains.

06:13:37.323 Failed to load domain DOMAINAME: System.Exception: Error get domain settings: A read lock may not be acquired with the write lock held in this mode.
Stack Trace:    at System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker timeout)
   at System.Threading.ReaderWriterLockSlim.TryEnterReadLock(TimeoutTracker timeout)
   at SmarterTools.Common.ExtensionMethods.RWLSExtension.ReadLockHelper..ctor(ReaderWriterLockSlim readerWriterLock, String memberName, String filePath, Int32 lineNumber)
   at MailService.Repositories.DomainRepository.LoadAllDomainSettings()
   at MailService.Repositories.DomainDataCache.CacheGet[T](String item, Func`1 getter)
   at MailService.Core.Domain.get_DomainSettings()
   at MailService.Core.Domain..ctor(String name, String rootDirectory, DomainRepository repo)
 ---> System.Threading.LockRecursionException: A read lock may not be acquired with the write lock held in this mode.
   at System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker timeout)
   at System.Threading.ReaderWriterLockSlim.TryEnterReadLock(TimeoutTracker timeout)
   at SmarterTools.Common.ExtensionMethods.RWLSExtension.ReadLockHelper..ctor(ReaderWriterLockSlim readerWriterLock, String memberName, String filePath, Int32 lineNumber)
   at MailService.Repositories.DomainRepository.LoadAllDomainSettings()
   at MailService.Repositories.DomainDataCache.CacheGet[T](String item, Func`1 getter)
   at MailService.Core.Domain.get_DomainSettings()
   at MailService.Core.Domain..ctor(String name, String rootDirectory, DomainRepository repo)
   --- End of inner exception stack trace ---
   at MailService.Core.Domain..ctor(String name, String rootDirectory, DomainRepository repo)
   at MailService.Core.Mailman.LoadDomain(DomainListDomainItem domainItem)
0
Webio Replied
@Sébastien - have you tried maybe after upgrade to stop forever starting service, edit settings.json and disable windows defender there and then run again mail service?

Params to change:

        "defender_scan_messages": true,
        "defender_scan_uploaded_files": true,
EDIT: Bottom line you have upgraded to latest build by adding WindowsDefender and then removed it and everything works fine even when you restart mail service? Can you check settings.json file (Service dir) for entries containing defender and their values? I have upgraded my incoming and outgoing gateways also to latest version where I don't have Defender installed too and for gateways update went fine and with info from SmarterTools in ticket that they have also tried to remove Defender in their test environment and their SmarterMail upgraded correctly it starts to point that this is some specific case scenario related to configuration. When I was checking taskmgr and perfmon for disk activity during initial startup process after upgrade I saw a lot of read actions on domains settings.json files. Since I don't have upgraded yet it makes me wonder if domains specific settings.json files also contain some kind of Windows Defender entry.
0
Sébastien Riccio Replied
Hello Webio,

For our production server I managed to install the build correctly by re-installing Defender first, doing the upgrade, disabling all Defender options in SM admin (Antivirus section). Then removing Defender. From there the service can restart without any issue with defender uninstalled.

It looks like a specific scenario under some circumstences that we apparently share and that is linked to Defender not being installed and the update procedure.

We have a test server (clone of the production server) on which I reproduced the issue that I rolled back to a snapshot a day prior of the update.

I gave ST team access to it so they can reproduce the issue themselves and they were able to reproduce it.
They think they identified what is going on and have a custom build they want to try again the upgrade with.

So I rolled it back again, so they can check with their new build if the problem persists.

Kind regards.


Sébastien Riccio System & Network Admin https://swisscenter.com
1
Shaun Peet Replied
This error happened this morning and killed the application pool...


Seems related to Windows Defender as well.  How do we turn this off?

0
Ionel Aurelian Rau Replied
Thanks Sébastien Riccio and the rest for reporting this.
Luckily, we looked at the threads here in the community before deciding to install the new build this evening, otherwise we would have had the same issue (not sure why the decision was made to enable this by default). 
1
Matt Petty Replied
Employee Post
Shaun error's that occur duing FullStart actually appear in the Event log by design, however, this does not mean it stopped the entire process. I'd wager MailService.exe was still running after that message was printed. We've identified the issue, may not actually have anything to do with Defender but we're gonna be testing our build and will release more info once we've had a chance to confirm. Webio's hint at thinking it might be related to domain loading/conversion was on the money, but I'm still trying to wrap my head around how this is related to Defender. We got confirmation that our fix, fixed the test environment.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
Employee Replied
Employee Post
Ionel Aurelian Rau, 

Do you happen to have a test server cloned from production? If so, we can certainly test out the fix in that environment. Alternatively, can we get a config-only copy of you production environment for local tests? As stated, it doesn't seem this issue is directly related to Windows Defender... it appears to be a combination of locking on the settings.json file(s), which stemmed from the Max Message Size adjustments, and Windows Defender not being present. 

If you're alright with participating in our local testing, I'll go ahead and start an internal ticket with you for further discussion. 

Kind regards,
0
Ionel Aurelian Rau Replied
Sorry Andrea, no test server available at this time to be cloned from production.

We`ll hold on upgrading a few more days to see if other reports come up.

Just to be sure, these parameters:
        "defender_scan_messages": true,
        "defender_scan_uploaded_files": true,
Are from the C:\Program Files (x86)\SmarterTools\SmarterMail\Service\Settings\settings.json file? Just in case we have to manually disable this.
2
Employee Replied
Employee Post
Hello Ionel,

That's correct. Note that the SmarterMail Service must be stopped before modifying any system files. That said, I don't believe this should be necessary. We were able to identify the issue that was causing certain upgrades to hang, and testing the fix in its custom build has proved successful. We'll be releasing a new build soon to address this. 

Kind regards,
2
Webio Replied
Hello,

I just wanted to let you know that build 7810 was able to start in my environment without any issues.

Thanks
0
Kyle Kerst Replied
Employee Post
Thanks for your follow-up on this, we're glad to hear it resolved your startup issues post-upgrade. Have a good one! As Sebastien's issue is also resolved (I believe - he can confirm) I'll go ahead and mark this Resolved
Kyle Kerst IT Coordinator SmarterTools Inc. www.smartertools.com

Reply to Thread