4
JSON Corruption
Problem reported by Netmate - 1/2/2022 at 1:01 AM
Not A Problem
Totally agree with David. We lost one big customer due to JSON files corruption issue. ST needs to fix this ASAP.

23 Replies

Reply to Thread
2
Shaun Peet Replied
We upgraded to 8025 two days ago and since then our mail server has had two "hung" states where it became unresponsive while running.  This had happened with a build a few months ago and we had to open an emergency ticket due to all the file corruption issues, but hasn't happened for quite some time.  It would be hard to believe that this is just a coincidence.  I'd write more details here but I am manually having to restore user settngs.json files as we find out which ones are corrupted (they are completely random - it's SUCH great customer support to wait for the email to come in letting us know there's a problem before we can start fixing it).

2
echoDreamz Replied
Could possibly throw together a PS tool or even a small C# console app to find corrupt user settings.json files.
0
Shaun Peet Replied
We have a PS file from the last time this was happening - it seems to not work out all that well and I have to go into the folders, copy some of the json files from the Archived Data folder back to the root (either for the Domain (less often) or for the User (more often)), then reload the domain from the web interface.  I know technically we're supposed to restart the entire mail service however that makes the entire mail server not available for thousands of unaffected users, and occasionally it also causes more corruption.  So we've found reloading the domain in the UI to be more practical and it seems to work.  It's just horribly manual, and there's no way that I can see to proactively fix corrupted files.  We have to wait until a user complains.
12
echoDreamz Replied
Agreed! While we've never experienced corruption (as far as I know), it would be great if SM could have some flag or something that indicated if it was cleanly shutdown, if the flag is set, you know it was a clean shutdown, if not, run a check of the needed JSON files and recover automatically where possible. 
3
JerseyConnect Team Replied
Shaun, I know I'm a few days behind, but here's the PS one-liner I used to find corrupted user accounts when our server crashed:
select-string -path D:\SmarterMail\Domains\*\Users\*\settings.json -pattern "{`"settings`":{`"" -simplematch -notmatch | format-table path
I also agree that it would be great if SM did these types of corruption checks and then automatically fixes the file from the archived backups.
4
David Sovereen Replied
We have reported this settings.json problem for YEARS (see ticket 3A7-278F93DF-0B2E, which references earlier tickets that were closed without resolution).  Ever since settings migrated from XML to JSON (v16, I think?), we've experienced problems.  These same problems never occurred when the settings were stored in XML.  What a low-level file trace shows is that SmarterMail MOVES settings.json to the Archived Data directory leaving NO settings.json where one is expected for a brief moment.  Then it COPIES settings.json back to the regular directory.  If a power outage, disk failure, software crash, or similar event happens after settings.json has been MOVED and before a copy has been put back in its expected place, that mailbox is left in a broken state when the service or server comes back online.

It makes no sense to MOVE this necessary file just to copy it back.  Why not COPY the file to the Archived Data area instead so that these critical JSON files never leave their needed place?

I am GLEEFUL to see these posts, as we have gone in circles with SmarterMail Support about this for YEARS and have always been told it must be our hardware or we must have a virus scanner locking the file or some other software is to blame because we are the only ones having this problem.  These posts confirms it is not just us and that alone is a really good feeling.

BTW, the problem happens with other .json files as well.  A cursory search of emails to the Smartermail Tickets shows we've had accounts.json and folder.json experience the same problem.  My guess is that there is a software routine that performs this "Move to Archived Data and Copy back to the normal file location" that is used every time a JSON file is changed, and that any JSON file can become lost this way if the timing of the power outage/disk failure/software crash is just right, but that JSON files written more frequently, like settings.json, just have a greater probability of becoming lost this way because they are written more frequently.

Please change the routine to always leave the JSON files in place and only put copies of them into the Archived Data folder... never move them!

Dave
0
Sébastien Riccio Replied
I also multiple times suggested, without success or even a reaction from ST, that there should be at least a check and a warning for corrupted json files.

We were so many times confronted to unusable users accounts after a crash of the service because of corrupted json and we only know about it when the customers open ticket. That's not professional...

Tired of the situation I started to write my own tool to check different integrity stuff for SmarterMail but they should be built-in!

Sébastien Riccio System & Network Admin https://swisscenter.com
2
Tim Uzzanti Replied
Employee Post
Moved the JSON posts into a separate topic to discuss.  Will provide more information shortly because I think some clarity is needed on how things work.  
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Sébastien Riccio Replied
Hello Tim. Thanks for the future clarification.
Sébastien Riccio System & Network Admin https://swisscenter.com
1
David Sovereen Replied
Any updates on this?

Dave
1
David Sovereen Replied
Bump
0
Netmate Replied
Bumping this up in case ST forgot it.
0
Sébastien Riccio Replied
+1
Sébastien Riccio System & Network Admin https://swisscenter.com
1
Tim Uzzanti Replied
Employee Post
Communications department is making a KB about settings files and how they work.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
2
Tim Uzzanti Replied
Employee Post
Also, we are introducing a setting in the Settings JSON that you can enable, like a "Chkdsk" for your computer. When using this setting, we will look at a variety of things on the server on startup. This is a very intensive process and will delay startup possibly for minutes but be an opportunity for customers to repair some things. That's why it will only be set in the Settings.json for the server and not as an option within the management interface. 

The problem is, corruption in SmarterMail, 9 out of 10 times, is due to external factors: hardware, software, disk issues, resource limitations, etc. Corruption due to SmarterMail is actually very, VERY uncommon. This makes things tricky because we could do more harm than good by creating tools that scan/manage/modify server settings. Our support team helps fix some common issues using one or more internal tools, or we have provided tools to customers on a ticket-by-ticket basis so we can be sure that the fix or tool does actually work as intended and doesn't cause any further issues.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Sébastien Riccio Replied
Yeah right thanks a lot for the clarification.
Sébastien Riccio System & Network Admin https://swisscenter.com
1
David Sovereen Replied
A low-level file trace shows is that SmarterMail MOVES json files to Archived Data directories leaving NO json file where one is expected for a brief moment.  Then it COPIES the json back to the regular directory.  If a power outage, disk failure, SmarterMail crash, or similar event happens after the json file has been MOVED and before a copy has been put back in its expected place, that related mailbox or domain is left in a broken state when the service or server comes back online.

It makes no sense to MOVE this necessary file just to copy it back.  Please change the behavior to COPY the file to the Archived Data area instead so that critical JSON files never leave their needed place!
1
Tim Uzzanti Replied
Employee Post
It does make no sense to move and its a good thing we are not doing that :). Moving messages guarantees loss of file or corruption if service interruption.  
What kind of storage solution are you using?
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
2
Sébastien Riccio Replied
Having some kind of integrity checking at startup that can be enabled would already be a good thing, even if there is no "autofix", but some warnings in a log would be already nice.

Most of the times corrupted json we had after a service crash were user settings.json missing, sometimes it was inclomplete like it was not entirely wrote and therefore rendering the service unusable for that user. 
Sometimes the settings.json is named settings.json.tmp and settings.json is not present.

It never was an hardware issue but the result of mailservice.exe crashing or when we had to kill the process because it went in an unstoppable 100% cpu usage loop (gracefully stopping the service at this point fails and the only resort is to kill the process).
However the 100% cpu issues seems to be resolved now, that's a good point :)

The main problem is that the resulting user json files corruption is silent. There is no way to know that the issue exists until the customer contact us.

The only way I found to avoid further issue with this is to code a script that will check every user/domains json files to check if they are present (for the mandatory ones) and are parsed as valid json. 
This helped us a lot to detect corruption issues after a service crash before we get complaints.

I would imagine that at the moment SmarterMail boots and loads the domains/users config, it might also do the same checks and log a warning somewhere so it's not needed to use an external tool for this so admins are aware of potential issues after a crash.

Sébastien Riccio System & Network Admin https://swisscenter.com
0
Tim Uzzanti Replied
Employee Post
Sebastian, yours was due to SmarterMail.  I made sure to mention that in the other thread when you posted.  I have never denied we couldn't find why your server would crash from time to time until recently. I even alluded to a handful of other customers we reached out to at the end or the year that would have an issue from time to time (month or quarter etc).  That is why you have dealt with JSON issues but not why most others have. 

The code around our loading and saving of files is sacred and some of the most reviewed code in our projects.  The amount of disk I/o a mail server is responsible for is ridiculous and how it is done needs to be absolutely perfect. I'm just glad we found your solution.  I'm also glad we tried to give you tools to see which ones would be corrupt.  At the time, we couldn't revert them because we didn't know the cause of the corruption.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
5
Sébastien Riccio Replied
Yes Tim, it is true that we recently had way less issues with these JSON corruptions as the service seems to be less likely to crash or goes in 100% cpu look and congrats for this.

We were used to have to check all JSON files in case of the process not exiting gracefully.
This is why I had to write a tool when dealing with these issues, in case it can be useful to someone.

"smart.py check" will check all domains/users settings.json files and report missing/corrupt ones.
The other options are work in progress and shouldn't be used.
Sébastien Riccio System & Network Admin https://swisscenter.com
1
Tim Uzzanti Replied
Employee Post
That is awesome of you to share.
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Sérgio Rocha Replied
Hi Tim,

The only think we need is a report page that list accounts with json corruption. Smarter Mail know and alert account with problems, but we need to click in every domain, the only thing we need is a report.

Regards,

SR

Reply to Thread