I'd like to get some feedback from other admins and hopefully from the SmarterMail team as well.
We run SmarterMail in production and currently rely on two layers of protection:
- SAN-level CDP/snapshots for fast recovery
- Daily backups for disaster recovery
Overall, this works well, but one recurring issue we've encountered during restores is JSON file corruption.
Whether restoring from a SAN snapshot or from a backup, we'll occasionally end up with user JSON files that are empty, truncated, or contain invalid data.
Not surprisingly, the users most affected tend to be the busiest ones on the system, since their data is being modified more frequently.
Suspend writes API endpoint
One thing I'd really like to see is some form of backup/snapshot coordination with SmarterMail itself.
As far as I know, there is currently no API endpoint or mechanism that would allow a backup job to tell SmarterMail:
- Finish pending writes
- Temporarily stop writing data
- Confirm when it is safe to take a snapshot or backup
- Resume normal operations afterward
Having this capability would allow storage platforms and backup software to create application-consistent snapshots rather than relying entirely on crash-consistent copies. I suspect this alone would eliminate many of the JSON corruption issues seen after restores.
Archive Data of latest good json files
The second point is related to SmarterMail's JSON recovery mechanism.
From what I've observed, if SmarterMail encounters a missing or corrupted JSON file, it attempts to recover it from the Archived Data folder. This is actually a very useful feature and has saved us more than once.
The limitation is that the archived file appears to represent a previous version of the JSON rather than the latest valid version.
As a result, the recovery process succeeds, but recent changes can still be lost.
We've experienced situations where customers lost recently added content filters or other configuration changes after a restore. SmarterMail detected the corrupted JSON and successfully rebuilt it from the archive, but the archived copy did not contain the latest modifications that existed before the backup or snapshot was taken.
From an administrator's perspective, this is much better than having a completely unusable account configuration, but it can still result in difficult conversations with users who discover that recent changes have disappeared.
It would be great if SmarterMail could also maintain a copy of the latest known-good version of these files, not only the previous version. That way, when the self-healing mechanism is triggered, it would have a better chance of restoring the most recent valid state rather than rolling back to an older configuration.
Of the two suggestions, I realize the backup coordination API is probably the larger request. It may require significant architectural work and would likely touch several areas of the product.
The JSON recovery enhancement, however, feels like a more realistic improvement since the underlying mechanisms already appear to exist. SmarterMail is already archiving these files and already knows how to recover from corruption. Extending that process to preserve the latest known-good version as well could significantly improve recovery outcomes while building on functionality that is already there today.
Maybe I'm misunderstanding part of how the archive and recovery process works internally, so if anyone from SmarterMail can clarify that, I'd be interested to hear it.
Has anyone else run into similar issues with backups, snapshots, or JSON recovery? I'd be interested to hear how other administrators are handling this.
Kind regards