3
TrueNas ZFS network share for data directory
Question asked by Gerardo Altman - 11/4/2020 at 2:55 AM
Unanswered
Hi All

Just looking into an idea of using SM in an HA configuration on 2 hosts + having an SMB share on ZFS TrueNas back end for the data store and archival storage tiers.

Something we can do with ZFS is create storage level snapshots and pass these over SMB share as shadow copies to the Windows Operating system.

This allows for in OS file level recovery of emails based on ZFS snapshots.
A secondary benefit is that the shadow copies cant be deleted/ removed by ransomware as they are read only and can only be removed by the TrueNas system which is locked down.

Has anyone tried this setup before with ZFS?
Has anyone tried running in HA with SM?

Just looking for feedback and experience.

""Cheers
G

8 Replies

Reply to Thread
2
Sébastien Riccio Replied
Hi,
As far as I know windows can mount NFS shares with the Client for NFS feature, but I'm really not convinced that SmarterMail will support this kind of (shared) storage.

Probably some specific windows filesystem features are required by SM are required to operate successfully especially around file locking.
My guess is that it would be messing things if the files are accessed simultaneously from two different servers and create corruption.

Maybe ST team can shed some light on the topic.

Personally I would better go with exporting ZVOL as an iscsi target and create a windows filesystem on it.
You should still be able to do snapshots on the storage and in case of need mount an older snapshot as another iscsi volume, in case you need to recover some files quickly.

For a near perfect solution it would also be nice to be able to flush writes/freeze the file system on the windows machine before taking the snapshots to ensure the snapshots are not taken in the middle of a file write.

But even without this, it's useful to have quickly mountable snapshots under the hand.

EDIT: after some research and (I was kinda thinking about the same idea) echoDreamz uses the API to stop the service before the backup/snapshot and then start them back.
In case of ZFS snapshots the downtime shouldn't be long as it's almost instant snapshots.
Maybe also after stopping the services, a flush of the filesystem writes could be handy with for example the sync utility from sysinternals.


Kind regards.



Sébastien Riccio System & Network Admin https://swisscenter.com
0
echoDreamz Replied
This is correct, we start/stop the services within SM (not the SM Windows service itself), but I dont really know how well it is working compared to fully shutting down the SM service each night (which isnt an option). SM still has no way to take consistent backups while running. If you have a smaller mail server, you could probably get away with stopping and starting the SM Windows service, but we cannot.
0
Gerardo Altman Replied
i dont think there is a perfect system even with exchange if an email hits when there is a power outage or host failure it may still be lost even with a multiple DAG setup.

Having ZFS take a non intrusive storage snapshot at regular intervals will provide similar results then using "Shadow Copies" but they will be more resilient than SC as these can and do corrupt, we've even seen them completely disappear with a host crash.

Taking the ZFS snapshot will be more consistent than SC and more robust with the same outcome to be able to restore files from the Windows UI in the event that emails need to be recovered.

If SM used a Database instead of individual flies for emails then i would suspect that using VSS would be needed, but because the only DB entries are for indexing this can be rebuilt from the UI if there is an issue.

ZFS replication can be used to send to another ZFS system as a backup using the snapshots as the replication factor i.e. every 5 min, every 15 min, every hour etc.

In the event of ransom-ware then the roll back can be quick with ZFS much faster then restoring from a backup min vs hours.

any hoo we are still playing with the concept and will do some testing shortly, really just feeling out others experience and suggestions in this area.

thanks for the replies and input.

""Cheers
G
0
echoDreamz Replied
Even with ZFS snapshots, you could still run into the possibility of inconsistent snapshots, unless you are a fairly low-traffic install. We have one of the largest SM servers around, we'd love to a good way to alert SM that it needs to commit to disk/queue changes. This is where Exchange handles that well as it has a VSS writer that tells Exchange to flush to disk and queue actions so you have a fully consistent snapshot. SQL Server also has a similar writer for this purpose as well.

SM also does in-memory caching, it has numerous json, cfg, stat and grp files per user/domain. There are quite a few moving parts to make sure are safely committed to disk in a consistent manor.
0
Gerardo Altman Replied
Are you running bare metal or virtual?

If you are looking for snapshot consistency then i would be virtualising and using VM snapshots that utilise VSS to take in memory or quiesce consistent snaps.

The only issue is that you'll run into pauses while the snap is being taken and then released, shouldn't be so much of an issue for users unless they are in Webmail at the time and even then it shouldn't be too bad. - this will of course be governed by the amount of memory you are looking to snap if any at all.

In saying this it will really come down to what's acceptable for your production environment, many service providers want to deliver unrealistic data resiliency models to clients.

Just curious if you don't mind sharing how many users are you accommodating on a single SM server?

""Cheers
G
0
Sébastien Riccio Replied
Just curious if you don't mind sharing how many users are you accommodating on a single SM server? 
In our case we have 25k mailboxes on our single server (a VM with plenty of ram and data stored on a big full-ssd NetApp filer)
But we offloaded antispam/antivirus to non-SM front filters, SM has all antispam/virus features disabled except custom headers checked to set spamweight to mails according to these headers and also all the outgoing mails are relayed through a pool of gateways that handle the retries and so on (unless the mail is rejected by the gateway itself due to obvisious outgoing spam).

So it helps a lot to reduce the load of the SM server and focus it's resources on the essential.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
echoDreamz Replied
We are about the same as Sebastien, though we run on bare metal, no virtualization. Also running 2 rSpamd gateways that handle antispam and anti-virus. We use a few postfix servers that rotate IPs to handle our outgoing mail.

For us, ANY amount of time where the service stalls out is bad. Even if it is for 5 seconds, if that happens enough, we will get loaded with calls, chats and tickets. Mail is SUPER sensitive, even a tiny hiccup (like an iisreset) is felt and reported.
0
Gerardo Altman Replied
@ Sébastien Riccio do you use Veeam or other backup solution to VSS snapshot on a regular basis for DB consistency and roll back?

I can see with so many mailboxes on a single instance that any disruption would cause some pauses to services.

Wondering if splitting up into smaller instances would be a good idea to distribute the loads across multiple SM servers - less mailboxes per server easier to snapshot.

@echoDreamz that's insane that customers are that sensitive to a small pause for their own data safety. Wondering if the users on Webmail are the ones logging the complaints as i would think that something like outlook or thunderbird only polling or waiting for a push wouldn't be so sensitive to a pause of that nature.

interesting feedback.

""Cheers
G

Reply to Thread