Making email-attachments storage efficient on SmarterMail server
Idea shared by Joseph Edelstein - 1/30/2015 at 12:35 PM
Currently, when SmarterMail receives an email with an attachment it will copy that email and attachment to each and every recipient’s folder. The same is true when Smarter Mail is receiving an email which was sent to an alias/distribution groups. Each message is fully duplicated to each member of that alias group.
Here's one concept I think could greatly improve the inefficient storage use of attachments:
We should separate the attachment from the email and use a public folder where all the attachments will be saved. In addition to the attachment file created in the public folder, Smarter Mail will create a corresponding "link file" which will contain a list of all SmarterMail users who are linked to that attachment. When a user wants to open the email, it will call the attachment file through a link added to the email message - which will refer to the public folders corresponding "link file" and thus the attachment.
Deleting messages:
Unless you're the last user to delete a message from your mailbox which contains a link to the attachment, it will simply remove the link from the "link file" and leave the attachment itself intact.
The only time an attachment can be deleted is when you're the last name listed in the corresponding "link file" in the public folder. Only then can the "link file" AND the attachment file be permanently deleted.
This method can be used not only on a single email transaction, but also when someone forwards the same message internally or externally. Smarter Mail should still make use of the link concept instead of recreating another copy of the file in the sent items folder.
Possibly, the outgoing spool may need to create a temporary single file copy of the entire message for spooling purposes only and not for storing.
I really thing this can be developed and this will greatly decrease unnecessary duplication of files.

2 Replies

Reply to Thread
Tim Uzzanti Replied
Employee Post
That is a HUGE architecture change. In some installations that have common attachments, that would be a benefit.  In other installations that would be a detriment because of the time it would take for lookups and orphans etc.
When looking towards the future and full encryption, the attachment is part of the message and couldn't be separated.
If you have a situation where your concerned about duplicate data then I would suggest you look at centralized storage solution that has de-duplication at the hardware level.  This is really the only way to get the performance necessary to handle high disk i/o loads!
SmarterMail is about as efficient as possible.  We built SmarterMail because we owned one of the largest Windows Hosting companies in the industry.  We were hosting millions of users mailboxes 15 years ago and we couldn't find a stable mail server or a mail server that would fully utilize the hardware we were purchasing.
Every decision we have made is a balancing act to get the most out of server hardware and handle hundreds of thousands of transactions per second.
We even think about NAS solutions in relation to SSD Caches.  For example, the way we organize our data in GRP files for a day allows active data to be on more expensive disks (SSD) where old data gets moved to legacy drives.  This way you can purchase cost effective centralized storage solutions with high volume disk i/o with large storage capacities.  
I appreciate you thinking this way and throwing out suggestions.  What might look like a simple change normally has 100's of counter arguments from a programming and implementation perspective.  Its kind of ridiculous how full our whiteboards get discussing a single checkbox setting let alone a huge architecture change!
Hope this helps,
Tim Uzzanti
SmarterTools Inc.
(877) 357-6278
Thank you. Very well articulated.

Reply to Thread