2
Tool for inspecting/debugging GRP files (urgent issue)
Problem reported by Sébastien Riccio - 8/2/2024 at 11:31 AM
Submitted
Hello,

We really need a tool to be able to debug a user GRP files in order to troubleshoot when things goes wrong.

We actually have a customer with multiple mailboxes affected. When they receive some mails, the mails are either missing all headers (no from/to/date/subject), or either the headers are here but the body of the message is missing.

I already tried plenty of things. Rebuilding the folder. Detaching/reattaching the user. Detaching/reattaching the folder. Nothing helps, some mails he receives are still corrupted in his mailbox.

I have a ticket open with Tony for this but so far we found no source for the issue.
I thought first it was maybe an issue with the mail delivery from our incoming gateway, but it's not.
As suggested by Tony, I've enabled message Archiving for the domain.
Since I've activated message archiving, new mails arrived, some are corrupt again, but they are in a good shape in the message archive.

So at this point I suspect these users have an issue with their GRP files. Maybe one received message that they all received in their mailbox in the past that is causing the GRP files to go bozo.

So well, at this point I really would like to have a tool for troubleshooting GRP files.

Our customer calls daily to ask when the issue will be resolved and we can't answer ...

Please HELP!

BTW: These issues started since we upgraded to 8965. We're now running the latest build. I'm asking myself if I should downgrade to 8930, but if some of the new builds were responsible for GRP corruption, I'm not sure downgrading will help resolve it.
Sébastien Riccio
System & Network Admin

21 Replies

Reply to Thread
0
Charalampos Michael Replied
Is there any GRP repair tool ?
If not you must make one.
1
James North Replied
Oh good, at least we're not alone in getting this issue. We disconnected Sanebox, stopped using Restic for backups, and even deleted the account and re-created from scratch and copied the emails over again, but we're still getting corrupted emails.

As suggested by Tony, I've enabled message Archiving for the domain.
Since I've activated message archiving, new mails arrived, some are corrupt again, but they are in a good shape in the message archive.
This is my exact experience. Turned message archiving on, emails are fine beforehand, some go corrupt sometime afterward. Today I moved 200 emails from one folder to another, and the folder completely disappeared along with my emails, but that might be another issue.

We're on 8979 currently but this has been happening since at least 8972. Given we're a Linux deployment, we don't have the option of downgrading to 8930 as it doesn't have support for enterprise licenses with support for EAS/EWS/MAPI as far as I know. We did trial 8930 for a month during the BETA and it was fine the entire time, though, so that suggests this problem is new since 8965/8972.
0
James North Replied
Sebastien, was your Smartermail service forcibly killed by the OS because it was using up all the resources on the machine at any point? My service was killed by the OOM killer over a dozen times. I thought that might have been what caused the corruption for me, but if you haven't experienced any crashes, maybe it's something in the Smartermail build itself causing the corruption...
0
Sébastien Riccio Replied
Hello James,

No the service wasn't forcibly killed, at least not that I'm aware. I know that if SmarterMail crashes or is killed without a clean shutdown, there is a high probability to see corrupted files.
They usually are the json files in the domains/users folders, but I'm sure a grp file could also be corrupt if SM is killed in a middle of a write to it.

Finally, I coded a small python script that uses SmarterMail dll's to inspect the customer grp files and found something interesting:

There were some entries in these users GRP files for which the end offset was prior to the start offset, resulting in a negative message size.

I was able to remove this bogus entry and will see if the next received mails stops being corrupted.


2024-08-03 08:15:49.443 | DEBUG    | Verifying GRP file d:\SmarterMail\Domains\somedomain.ch\Users\joe.doe\Mail\Inbox\2024_7_28.grp
2024-08-03 08:15:49.443 | DEBUG    | GRP file d:\SmarterMail\Domains\somedomain.ch\Users\joe.doe\Mail\Inbox\2024_7_28.grp contains 4 entries.
2024-08-03 08:15:49.443 | DEBUG    | Entry UID: 2536 | start: 1208 | end: 6897159 | size: 6895951
2024-08-03 08:15:49.677 | DEBUG    | From: <eric.smith@foobar.ch> | To: <manu.guex@foobar.ch>

2024-08-03 08:15:49.677 | DEBUG    | Entry UID: 2537 | start: 6897159 | end: 13988440 | size: 7091281
2024-08-03 08:15:49.802 | DEBUG    | From: <eric.smith@foobar.ch> | To: "'joe.doe'" <joe.doe@foobar.ch>| Subject: =?iso-8859-1?Q?AOR_-_Cat=E9gories_VHC_et_VHRS_pour_2025?=

2024-08-03 08:15:49.802 | DEBUG    | Entry UID: 2538 | start: 13988440 | end: 13890775 | size: -97665

Traceback (most recent call last):
  File "C:\EvenSmarterTools\smart.py", line 837, in <module>
    main()
  File "C:\EvenSmarterTools\smart.py", line 795, in main
    check_accounts_integrity()
  File "C:\EvenSmarterTools\smart.py", line 649, in check_accounts_integrity
    grp_file_status = smarterlib.check_grp_file(grp_file)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\EvenSmarterTools\smarterlib.py", line 284, in check_grp_file
    grp_file_content = f.read(grp_file_entry.Size)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: read length must be non-negative or -1
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Tim Uzzanti Replied
Employee Post
Sebastien, is the email corrupt in webmail or just one of the protocols. Last I heard, it was just a client? We haven’t had a legitimate corruption issue caused by our code in a very long time.

Didn’t you have some utf8 / symbol issues? Did you try doing anything unusual to your accounts when attempting to resolve?  
Tim Uzzanti CEO SmarterTools Inc. www.smartertools.com
0
Sébastien Riccio Replied
Hello Tim, 

it's also corrupt in Webmail. We always first check the customer webmail content when they have issue with corrupt mails or mails that disappeared, before doing any other troubleshooting.

Also it seems, by checking the olders corrupted mail date, that it all started happening when we initially upgraded to build 8965.

We're now running 8979, but these affected mailbox still showed issues with new received mails. Not every mail is affected though.

Some of the affected mails are missing the headers, others have empty bodies, some have offset issues as I was able to check by inspecting the affected mailbox grp files indexes and these seems to generate ghost mails in the webmail. When you click on the mail, on the list, nothing happens, it just doesn't load that particular mail.

In a last attempt to fix things for this customer, I deleted the corrupt mails from the customer webmail for each affected mailbox, but it seems it was not enough to fix the problem.

I then created a temp. folder in the customer webmail, again for each affected mailbox, moved all the remaining mails from his Inbox to this temp folder, cleared the Inbox folder completely, and then moved back the mails from the temp folder to his mailbox.

I think this helped rebuild the GRP files from scratch with only correct elements.
I have now to wait a bit to see if the issue appears again in these mailboxes.

I suspect it has something to do with issues introduced in build 8965, either SMTPUTF8 and related encoding issues, or IMAP issues with UTF8=accept in IMAP, or I don't know what else it could be.
But it really all started after the upgrade to the doomed build 8965.

Sébastien Riccio System & Network Admin https://swisscenter.com
0
James North Replied
If it helps any, I renamed the domain to one of the domain aliases I had and deleted the domain alias, then re-indexed/resynced. That's the only thing I did. This immediately stopped issues with timeouts with Thunderbird, broken notifications in webmail, and I haven't seen a corrupt email in three days despite getting them daily before. I noticed my Domain's settings.json originally had an empty array for the "domain_aliases" parameter, but after renaming now has all of the aliases I added, so my issue seemed related to Domain Aliases.

Not sure if your issue is the same problem, Sebastien, but thought I'd let you know.

When you click on the mail, on the list, nothing happens, it just doesn't load that particular mail.
In my case, I was able to open the emails. I'd get anything between "[No HTML Content]" to lines and lines of HTML code instead of a rendered email.
0
Sébastien Riccio Replied
Hello James,

Thanks for your feedback. That's interresting.

In my case, I was able to open the emails. I'd get anything between "[No HTML Content]" to lines and lines of HTML code instead of a rendered email.
The affected accounts on our side also showed this exact kind of behavior, in addition to the ghost mails behavior (the one with negative size).

I noticed my Domain's settings.json originally had an empty array for the "domain_aliases" parameter, but after renaming now has all of the aliases I added, so my issue seemed related to Domain Aliases.
The particular domain I'm checking is also having a domain alias, but after checking settings.json the domain_alias array has the alias listed correctly.

        "domain_aliases": [
            "somealias.ch"
        ],
It is interresing though that you were able to fix the issue by doing a rename of the domain. I could maybe give it a try if nothing else helps.

Side question: How are the affected users on your side accessing their mailbox. From IMAP ? From mobile phone/desktop ?

Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
James North Replied
Hi Sebastien,

It is interresing though that you were able to fix the issue by doing a rename of the domain.
To explain why we thought a rename might work, this is the situation we used to have:

The Domain was named "somedomain.tld", but we didn't have any actual mailboxes like james@somedomain.tld - all the mailboxes were hosted on a Domain Alias.

For example, we had the "otherdom.tld" Domain Alias. The afflicted mailbox was james@otherdom.tld.

Since all of our mailboxes were actually at otherdom.tld, we thought we might as well use that for the Domain and see if it fixes anything. So now the Domain is otherdom.tld.

It's also possible renaming it caused the entire settings.json file to be re-generated and fixed some misconfiguration issue somewhere else... I'm not sure how it works exactly.

Side question: How are the affected users on your side accessing their mailbox. From IMAP ? From mobile phone/desktop ?
Fortunately it was only my mailbox that got a lot of these corrupted emails. Another user got them once but never again, and they were connected via Outlook and MAPI/EWS/IMAP. I was connected to my mailbox from IMAP with Thunderbird and IMAP from my phone with K-9 Mail. it also happened on webmail. I never accessed it any other way. But I integrated Sanebox at one point, which accessed the mailbox through EAS/EWS, I believe.

That being said, I have another account which is very rarely used (it gets 1 email a week). I sent a test email from it after I stopped getting corrupted emails from my main mailbox, and in the outbox the mail got corrupted. The header is blank but the body is readable. This mailbox is from an entirely different domain, and it is indeed hosted on a Domain Alias. Having a look at the settings.json for this domain, I can see it's also empty despite the Domain having three Domain Aliases set up for it...

Regards,
James
0
James North Replied
Just clicking "Reload Domain" regenerated the settings.json file and now the domain_aliases array is as expected. So that's an option too.
0
Sébastien Riccio Replied
Hello James,

That's good to know too. So to be safe, a "Reload domain" of all domains should be something to consider ?

Still, I'm not sure yet I understand the link between an issue with domain_aliases array in the domain settings.json and mail corruption.

Unless at some point it needs to lookup the domain_aliases to save the mail headers in the grp files or something like this... hmmm... scratching my head
Sébastien Riccio System & Network Admin https://swisscenter.com
0
James North Replied
So to be safe, a "Reload domain" of all domains should be something to consider ?
It probably can't hurt. It seems to regenerate the settings.json file while "deleting all memory and cache": https://help.smartertools.com/smartermail/current/topics/SystemAdmin/Manage/AddDomain

Still, I'm not sure yet I understand the link between an issue with domain_aliases array in the domain settings.json and mail corruption.
I'm not sure either! I only noticed it because SM Support noticed when I sent them my ConfigOnly Domain copy for testing that domain aliases didn't copy over; i.e. the domain_aliases array was empty.

I don't actually know whether this will fix the corruption issue with the other mailbox because it gets so little email; it'll be a while before I can see whether fixing the settings.json did anything.
0
Matt Petty Replied
Employee Post
Just a heads up if you make any manual changes to GRP either adding/removing data or files, you'll probably wanna delete the mailbox.cfg for that folder, this will cause you to lose status (read, replied, etc) for those messages so I recommend moving the broken messages to a new folder and then doing the manual fixups in just that folder (or vise versa). If your manually changing GRP's using our dlls then you could also try removing specific (the broken) entries from the CFG. Otherwise just delete the cfg, reload the user, and visit the folder again and that will trigger a "rebuild" of the cfg file and those broken entries in the mail list should go away.
Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com
0
Sébastien Riccio Replied
Hello Matt,

Thank you for the hint.

Actually my "tool" is read-only. I do not manipulate anything with it, it only read stuff.
It could be used to try to fix things but it would need to first shutdown SmarterMail as I'm very sure it wouldn't be wise that:
1) two process writes to the same file at the same time
2) any change to files that are probably partially cached in a running SM process without indicating to SM that the file should be reloaded, would cause more trouble than anything else.

The part of the script that inspects GRP files for strange issues:

# Load SmarterMail DLLs
from pythonnet import load
load("coreclr")
import clr
assembly_path = r"C:\Program Files (x86)\SmarterTools\SmarterMail\Service"
sys.path.append(assembly_path)
clr.AddReference("MailService")
from SmarterMail.Standard.Files.Grp import MailboxGrpFile
def check_grp_file(grp_file):
    # Check a GRP file for possible known issues
    # logger.debug('Verifying GRP file %s' % grp_file)
    out = None
    result, loaded_grp_file = MailboxGrpFile.TryLoad(grp_file, out)

    if not result:
        logger.warning('Cannot load GRP file %s. It seems invalid.' % grp_file)
        return False

    logger.debug('GRP file %s contains %s entries.' % (grp_file, len(loaded_grp_file.Entries)))

    for grp_file_entry in loaded_grp_file.Entries:
        logger.debug('Entry UID: %d | start: %d | end: %d | size: %d' % (grp_file_entry.UID, grp_file_entry.StartOffset, grp_file_entry.EndOffset, grp_file_entry.Size))

        if grp_file_entry.Size < 0:
            logger.warning('%s: Entry UID: %d has a negative size. start: %d | end: %d | size: %d' % grp_file, grp_file_entry.UID, grp_file_entry.StartOffset, grp_file_entry.EndOffset, grp_file_entry.Size )
            return False

        # Read group file entry content from file
        with open(grp_file, 'r+b') as f:
            # TODO: Read only header part (detect emtpy line?)
            f.seek(grp_file_entry.StartOffset, 0)
            grp_file_content = f.read(grp_file_entry.Size)
            f.close()

        # Process content and check for missing mandatory headers
        msg = email.message_from_bytes(grp_file_content)

        if 'From' not in msg and 'Subject' not in msg:
            logger.warning('%s: No "From" and "Subject" headers found in (at least) one message. '
                           'Could be an issue with this GRP file content!' % grp_file)
            return False

        logger.debug('From: %s | To: %s | Subject: %s' % (msg['From'], msg['To'], msg['Subject']))

    return True
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Sébastien Riccio Replied
Unfortunately even after having totally rebuilt the user's Inbox, the issue continue.

Some mails displays [This message has no HTML content]. 

"Fun" fact, when I do a "Download EML" of the message, there is html content in it, but even MORE, there is the raw mail source of TWO mails, inside the downloaded .eml file.

Sébastien Riccio System & Network Admin https://swisscenter.com
0
Sébastien Riccio Replied
Well I'm out of solutions, at last resort I'm downgrading to 8930 to see if the problem remains with that older stable build.
Sébastien Riccio System & Network Admin https://swisscenter.com
0
Larry Duran Replied
Employee Post
Hey Sebastien, we did reach out on some of those tickets to get more information.  We want to work with you on getting this resolved.  We haven't been able to replicate but we have some ideas on where to look.  
Larry Duran Software Developer SmarterTools Inc. www.smartertools.com
0
Douglas Foster Replied
Awhile back, I had an account with problems caused by lost boundaries between two messages.   Deleting and purging the damaged messages resolved the problem.  
5
Sébastien Riccio Replied
Hey Larry,

I replied to the corresponding ticket. With the latest answers from support, I'm not sure we're on the same issue though, but maybe it's related somehow.

Anyway because it was getting catastrophic and before our customers send their resignation letter, I had to downgrade our production server to 8930.

Since the downgrade, I saw no new mail corruption in our affected customers mailboxes. It was hitting them daily, but I think I need some more days running on 8930 to confirm the issue was introduced in latest build(s).

Unfortunately we can't keep our production server running a version that is bogus to work with you on resolving an issue. The confidence of our customers in our mail service is getting quite low with the recent issues we had.

This server is not for beta tests. But unfortunately when we upgrade to major new production versions (sometimes even minors) we feel like a bit we're betatesting and it's not possible to jeopardize our customers mail experience like this.

If we had a free secondary license we could run a secondary server in parallel of the main one with a bunch of selected customers that are okay to participate in beta testing "our" mail service, meaning they would have the updates before the main servers and know that things can go wrong sometimes.

We really must stop having troubles with our main server every time we HAVE to upgrade. I mean have because the only reason we first upgraded for is to finally have the long-waited bug fixes for ticket we submitted. They never made it to the previous stable "branch", the support answer was to wait for the beta to become the production build. 
Well the results are not really what we expect.

Kind regards


Sébastien Riccio System & Network Admin https://swisscenter.com
1
Gabriele Maoret - SERSIS Replied
Hi Sebastien.

We are on the same situation.
I think I'll stick with the 8930 for a loooong time before I try to do a new upgrade...
Gabriele Maoret - Head of SysAdmins at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
2
Sébastien Riccio Replied
Hi Gabriele,

Unfortunately I guess it will be the same for us, staying on 8930. Since we downgraded, corrupt mailboxes issue is gone. We're not satisfied because it's again a step backward for the other bugfixes :/ But it's more important for us that at least our customers don't get their Inbox mails corrupted.
Sébastien Riccio System & Network Admin https://swisscenter.com

Reply to Thread