Limitations of Spam Training folder
Idea shared by Douglas Foster - 4/17/2025 at 7:48 AM
Under Consideration
It appears that messages arrive in the Training folder as possible spam in one of two ways:
(a) manually placed in the users Junk/Spam folder
(b) automatically placed in the user's Junk/Spam folder by the Outlook program's undocumented spam-filtering heuristic.
(c) manually-created rules configured in webmail or Outlook

When processing these files, the evaluator needs to consider:
(a) is this message objectionable to this user or all users?
(b) is this message a legitimate news feed to which the user previously subscribed, but now he is using the spam folder instead of the unsubscribe link?
(c) Is this an automated heuristic that fired incorrectly?
(d) or is this actually spam that needs to be blocked.

When the message is dropped in the spam Training folder, all I see is the EML file with a random file name. It would be useful to know which user put it there and whether it arrived from a manual decision or an automated decision.

Any hope of creating a metadata file to help interpret the significance of the spam complaint?
Hi,

I was about to build a script to handle these, in order to trigger learning on our incoming mail gateway which is doing all the antispam stuff.

It is true that we are missing important informations to process these files, at least:
- Which user triggered the learning
- Was it triggered from webmail "Mark as spam" , or a move to spam folder from webmail.
- Was it triggered from a move from an external client (IMAP, EWS, etc...)

+10 for the original poster :)
Sébastien Riccio System & Network Admin https://swisscenter.com
Tony Scholz Replied
Employee Post
Hello, 

I have submitted this as a feature request for consideration for future development. 

Thank you
~Tony
Tony Scholz System/Network Administrator SmarterTools Inc. www.smartertools.com
Hi,

Having no news about this, I wanted to give again a try to see if I can code something to be able to detect quite correctly who triggered the "mark as spam" or "mark at not-spam" action by parsing the .eml files and maybe some log entry when the "mark" is done" (if such an entry exists).

So I have re-enabled this:


However I'm not able to found where is the training folder located. I must have missed something.

Any help is welcome :)

We really need a way for the users to report false negative or false positives other than opening a support ticket with us.

btw: having a side json file with some informations with the same ID as the .eml would be awesome.
Kind regards.
Sébastien Riccio System & Network Admin https://swisscenter.com
It occurred to me that Declude has a variable %allrecips% that can be written to a custom header.   Since most messages have a single recipient, this solves the problem for most messages.

The training folder is created when first used.  Working from memory, I think it is Smartermail\spool\training.
Hello,

Thanks for the reply. I don't know anything about Declude, we're not using it. 

We would need at least the Envelope "To:", for identifying the most precise way who triggered the action, but in case the mail was sent to an alias, I'm not sure even sure it will contain the alias's final recipient.

But anyway, the .eml files contains only the header To: field, not enveloppe, so not reliable to identify SM user from this.

As for the folder, you're right, I found it in a subdir of the Spool. Thanks!


Sébastien Riccio System & Network Admin https://swisscenter.com
Oh, it seems the .eml files are kept for 1 hour in the training\(ham|spam) folders. Can someone from SM can confirm this ?

Well I don't know if it's a age limit or count limit but I'm refreshing the folder and new .eml appears and olders are removed.

I do not see a configuration option or toggle for this retention time limit.

While on normal operation this shouldn't be a problem if the "pickup" script process the folder multiple times per hour, but if for some reason the script is not running for a few hours, it will miss items to process.

Sébastien Riccio System & Network Admin https://swisscenter.com
I would like to add why identifying the account that triggered the mark as ham/spam is the most important thing to know, non-exhaustive list:

- For maintaining per user spam learning on the incoming spam gateway
- Tracking users abusing this feature, using it more like a "delete button"* (what if a user selects all mails in his mailbox and mark them as spam). Especially if you're not doing per user spam learning, this can f**k up your whole spamfilter easily.
- Disabling learning for some users, if they are mostly doing it wrong.
- Probably more reasons :)

Also if "Send user spam feedback to training folder" is enabled on the system, it would be nice to have a modal, the first time a user click on mark as ham or spam, educating them a bit about it.

Something like:

Dear User,

To help keep your mailbox clean and improve spam protection for everyone, our webmail system provides two buttons: “Mark as Spam” and “Mark as Ham.”

Please use them with care:

Mark as Spam should only be used when you receive unwanted junk messages (such as fraudulent emails, phishing attempts, or advertisements you never signed up for). When you click it, the system learns to block similar messages in the future.

Mark as Ham should be used if an email was incorrectly classified as spam. This teaches the system that the message is legitimate and should be delivered normally.

⚠️ Important: Do not use “Mark as Spam” just to stop receiving newsletters or promotional emails you once subscribed to. Instead, please use the “Unsubscribe” link usually provided at the bottom of those messages.

Using the buttons correctly makes the filtering smarter and ensures that real spam is blocked while valid emails continue to reach your inbox.

Thank you for helping us improve your email experience!

(but shorter of course)
EDIT: Added shorter version of reminder

🔎 Reminder: Use Mark as Spam only for true junk or fraudulent emails.
Want to stop newsletters or promotions? Please click the Unsubscribe link instead.
Use Mark as Ham to rescue legitimate emails from the spam folder.
Sébastien Riccio System & Network Admin https://swisscenter.com
The file-purge observation is correct:  The documentation says that messages are only kept for an hour.  The system assumes that the folder is monitored by a service component of rSpamD or ClamAV, which will process the files to update bayesian rules but then leave files in place after processing.

Since I did not have a system service to process files instantly, I wrote a scheduled task that wakes up periodically and uses RoboCopy to move the files to a different folder on a different disk.   

When reviewing messages, their are two steps, which may run together:   Determine whether the message is Spam (action required) or notSpam (no action required).  Then the Spam category needs to be separated by "Spam awaiting Filtering Rule updates", and "Spam fully processed to include filtering changes".


Oh okay, I did not see that part in the documentation.

Yeah, the script I'm writing will be a python script that will run each X minutes to look for files to process and do what it has do be done with them, depending some conditions.

Well it could also probably run as a service and use filesystem changes notifications to trigger.

For now I will probably populate a database with them, on our homemade incoming filter gateway, that can be reviewed to either confirm "lean as spam", "learn as ham" and also review what rules can be updated to detect them better.

But still would need at least the reporting user to be specified in some metadata :)
Sébastien Riccio System & Network Admin https://swisscenter.com
Some products create a Received header that contains a " from <user@domain> term.  Unfortunately, SmarterMail is not one of them.   If you have an incoming gateway from another vendor which reliably provides that term using a fixed format, you should be able to parse the recipient pretty easily.  A problem will occur If there are multiple recipients:  products which provide this information usually report the first address in the recipient list, which may not be person who submitted the report.   

On the other hand, if the message is not acceptable for any recipient, then the reporting recipient becomes unimportant.   Most spam falls into this category.   

Ambiguity occurs when the message appears spammy because the interest of the recipient are unrelated to the message topic.  During admin review, the message may appear legitimate if it targets appropriate users, but illegitimate otherwise.   If the recipient is unknown, or the recipient's role is unknown, the admin may not be able to resolve the ambiguity.

Reply to Thread

Enter the verification text