Message Archive Reindex Function Improvement
Idea shared by Rafael Grecco - 4/11/2023 at 11:24 AM

Up to the last version of Smartermail there was an issue with the Message Archive funcion. On some high volume Smartermail installations, some messages would not get correctly indexed.

While all messages did get on the archive folder, some of them were missing from the index, so they were not searchable. This issue was fixed with the new "Reindex" function.

The problem is that the new "Reindex" function deletes the current index file and recreates it by reading all files that are in the archive folder.

That becomes a problem on servers that have a huge amount of archive storage. In my case, I do have more than 8Tb of archive data which cannot be lost and must searchable (must be present on the index).

These amount of data can get very expensive, especially on cloud environments (which is my case), so a solution is to move old archive data to a cold storage.

So when the "Reindex" function runs, all messages that are not physically on the archive folder gets erased from the index, so they basically disapear from the search. This also causes another issue: this breaks some auditing standards.

A solution is to improve the Reindex function. Instead of erasing the index file and recreating it from scratch, only add extra messages that are not already on the index, without deleting any old entry.

This solution would solve the problem described above and would also make the archive search a trusted tool for auditing purposes (like it was before).

Please vote if you agree.

I was discussing this message archive issue with a customer that runs a medical company and I thought of another issue.

This customer of mine is actually the one that found out that some messages were not being correctly archived (he searched for a message that he had received and couldn't find it on the archive). I ran some tests myself and found out several messages not being correctly archived. He wanted to know if the issue was resolved, which I replied "almost"...

I opened a ticket with Smartertools and they could replicate this issue. I believe this is where they came up with the idea of the Reindex feature, because they discovered that the missing messages were actually on the archive folder, but they weren't correctly indexed (which means those messages are invisible).

The other issue I thought of is that we don't know when messages are not being correctly indexed. There could be dozens or hundreds of messages every day that are not being indexed on the archive. But we only know about them if we specifically search for a message that we know we sent/received and the message is not found.

My point is: The message archive is not 100% reliable as it should be. We can't be sure if a message we searched for and was not found should be there or not.

This brings me to another idea: The Reindex feature should optionally run on a schedule. It should run every night and look for unindexed messages. This would make the archive search a reliable tool.

Of course this would only work if the issue I posted on my original post is also solved. Just to ratify my point: the Reindex feature should not delete old index entries. it should only add new (missing) messages.

Why should we keep several terabytes of old archive data on the server drives (that grows everyday), paying hundreds of dollars of storage if we could move those files to a cold storage and pay hundreds of times cheaper?


+1 to the EXTERNAL STORAGE archiving function!!!!
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)

