We are doing something very similar, but using the Training folders as part of a larger spam analysis pipeline.
Our setup consists of a dedicated Rspamd server that receives spam training data from SmarterMail.
In addition to clients reporting junk through the manul moving of messages to junk, we maintain a number of spam-trap accounts and actively monitor inboxes for messages that evade filtering.
When a spam message is identified, it is moved to Junk, which causes SmarterMail to place a copy of the message into the Spam Training folder.
NOTE: the junk mail reporting works both on webmail as well as IMAP junk folder, so if a user is using MacMail, and moves a message to junk, this seems to be detected by SM and also goes to training folder (which I think is quite nice).
On the SmarterMail server (linux), we have a monitoring process watching the Training folders. When new .eml files appear, the process performs two actions:
- The message is submitted to our remote Rspamd server for Bayesian and statistical learning.
- The full message content and metadata are extracted and stored in a custom MySQL database for later manual analysis.
The database gives us a permanent archive of training samples, as SmarterMail only retains the files in the Training folder for a short period before purging them. It also allows us to review trends, investigate false negatives, identify common attack patterns, and potentially perform AI-assisted analysis in the future.
For anyone already using the Training folders, they are an excellent integration point because users continue to use the standard “Move to Junk” workflow while the backend automation handles collection, learning, and analysis automatically.