2
Deeper Integration with AI/OpenAI/ChatGPT
Idea shared by James North - 6/10/2025 at 11:17 PM
Proposed
Hi,

I just tried out the AI integration with Smartermail from the Marketplace. I've found it...underwhelming. Email drafting is all well and good, but it would be really interesting to have much deeper integration with AI.

For example, a chatbot you can ask about your emails:
  • Finding emails: "I need to find that email where I spoke to Brian about [whatever] over a year ago"
  • Asking for insights about your emails

I was wondering if that was on the roadmap.

8 Replies

Reply to Thread
2
Really?? And using your aown data to train an LLM? That you havent any control over?

If you want to use AI then pls. do so with a private AI only using your own data.... and not leaking it to everyone else.

Its not only your own info thats getting shared.... all your ever wrote to or had any correspondance with is also included.
1
According to OpenAI's privacy policy, they don't train their AI on your data when you use their API: https://platform.openai.com/docs/guides/your-data

Your data is your data. As of March 1, 2023, data sent to the OpenAI API is not used to train or improve OpenAI models (unless you explicitly opt in to share data with us).

Additionally, it can be implemented in a way that not all of your data needs to be sent. See Windsurf's Security page (which is actually about privacy):

Within each of these requests, the client machine sends a combination of context, such as relevant snippets of code, recent actions taken within the editor, the conversation history (if relevant), and user-specified signals (ex. rules, memories, context pinning, etc). No single request contains entire codebases or large contiguous pieces of code data. Even for ahead-of-time personalization, any codebase parsing happens on the client machine and individual code snippets are sent to compute the embeddings so that the server is not receiving a single request with the entire codebase.
Of course ideally you would have integration with a local LLM running on your own server infrastructure. But the Smartermail integration is already provider-agnostic.

The Thunderbird Assist service is a good example of how deep integration can be much more interesting than just email drafting.
0
Unfortunately, if you want an AI to be able to search your emails, then you have to pass ALL of your email content (ALL of your emails ever...) to that AI so it can analyze the content and then do the search you ask for...

How do you plan to get them to do that without them being able to keep your data?

Assuming we pretend to trust them, do you give them your entire email database EVERY TIME you ask for a search and then have them delete it right after? How many gigabytes of upload do you have to give them each time?


Gabriele Maoret - Head of SysAdmins and CISO at SERSIS Currently manages 6 SmarterMail installations (1 in the cloud for SERSIS which provides services to a few hundred third-party email domains + 5 on-premise for customers who prefer to have their mail server in-house)
0
None because they have it allredy. Data is the new gold.

ERP.... they know everybody that you invoice. They know your customers and your prices and what you sell the most. Your vendors and your contacts.

What more do they need to know to compete or sell your data?
1
I was playing Devil's Advocate with trusting these companies :) But a lot of people do actually trust them. And I don't think there's been any privacy-related scandals about them as yet. Good to know people around here care a lot about privacy.

Personally, it unnerves me when I need to give consent for Scribe to be used to go to most of my doctors nowadays.

I'm not personally interested in using AI for my emails. But I've got customers who are. And it seems like using a local LLM on a server you actually control would be the privacy-preserving way to do it, as Brian suggestted Thunderbird Assist uses local models for privacy reasons. I don't see why you couldn't setup a Mistral model and integrate it with Smartermail. Certainly, it seems like Microsoft and Google are doing deep integration with their own models.

As to uploading the entire database of emails - I don't think that's how it works. As I understand it, the data is processed to create embeddings that are stored in a vector database, which is significantly compressed. And then periodically updated.

As an aside:
There seems to be a lot of talk about vector databases, embeddings, and such. This is a bit beyond me. But I did find a research article about embeddings and how private they are: https://arxiv.org/html/2411.05034v1

(Which is to say, by default, not very, though this paper shows it can be mitigated, apparently)
1

1. Clearview AI – Facial Recognition Database Scraping

  • What happened: Clearview AI scraped over 3 billion images from social media and public websites without consent to build a facial recognition tool used by law enforcement.
  • Privacy concern: Individuals’ biometric data was collected and used without knowledge or permission.
  • Impact: Lawsuits and bans in several countries; regulators in Canada, Australia, and Europe ruled the company violated privacy laws.

2. Facebook & Cambridge Analytica (with AI-driven profiling)

  • What happened: Data from 87 million Facebook users was harvested via a personality quiz app and used to create AI-driven psychological profiles for targeted political advertising.
  • Privacy concern: Massive unauthorized use of personal data for behavioral prediction and manipulation.
  • Impact: $5 billion fine for Facebook from the FTC; widespread loss of trust in data privacy and social media platforms.

3. Amazon Alexa – Voice Recordings and Human Review

  • What happened: Amazon employees listened to users’ Alexa recordings to improve AI voice recognition accuracy—without clearly informing users.
  • Privacy concern: Private conversations were recorded and reviewed without explicit consent.
  • Impact: Backlash and increased scrutiny of smart home devices; Amazon introduced clearer opt-outs and improved privacy policies.

4. Zoom AI Features – Using User Data Without Consent

  • What happened: In 2023, Zoom faced backlash for updating its terms to allow training AI on customer data (including video, audio, and chat), which many users saw as a breach of trust.
  • Privacy concern: AI training on sensitive communications without clear user consent.
  • Impact: Zoom was forced to clarify and change its terms; damaged trust among enterprise users.

5. Google Bard (Gemini) – Data Misuse and Internal Access

  • What happened: Internal whistleblowers raised concerns that Google engineers accessed users' chat data in AI tools for training and debugging.
  • Privacy concern: Sensitive user data potentially accessed by employees or used without proper safeguards.
  • Impact: Internal investigations and increased calls for transparency in AI model training data practices.

Asked Google.....this is what turned up after a milisecond research...

0
If we're talking scandals, I'd be interested in one relating to OpenAI, which states in certain terms, right at the top of the agreement, that they do not train their models on your data. If they broke that agreement, I would be very interested in seeing reports of that, because OpenAI has staked their reputation on saying they do not do this. I couldn't find any articles saying they have.

As for Google Bard:

It really isn't surprising that Google is training their AI tools on user's data...in fact, I would be surprised if they ever said they wouldn't. Amazon is even less surprising, the Zoom scandal is about the fact they updated their terms to say they would train their AI on user's data, not for breaching a contract saying they wouldn't (Adobe would be a more recent example of the same thing). Clearview AI isn't offering an API as far as I'm aware, and the Facebook one is a real reach...

But I suppose that's about what you can expect from Google Bard's output 🤷

Now, I had seen recently that the NYT lawsuit with OpenAI had them ask the judge to serve an injunction to retain all their user's data so they could peruse it. I figured this had little chance of going through and would be appealed.


To comply with the order, OpenAI must "retain all user content indefinitely going forward, based on speculation" that the news plaintiffs "might find something that supports their case," OpenAI's statement alleged.
The order impacts users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI specified in a court filing this week. But "this does not impact ChatGPT Enterprise or ChatGPT Edu customers," OpenAI emphasized in its more recent statement. It also doesn't impact any user with a Zero Data Retention agreement.
To which my first thought was:

"Uh, but they didn't update the privacy policy..?"

And then I realised: "oh, of course. The privacy policy says they don't train on your data. Not that they don't retain it. And there is that section in there about sharing it with third parties for legal reasons."

Their policy:

By default, abuse monitoring logs are generated for all API feature usage and retained for up to 30 days, unless we are legally required to retain the logs for longer.
Anyway.

Back to my feature request... limiting it to a local LLM where the data never leaves the server/user's device. I don't see any privacy issues with that.
2
Matt Petty Replied
Employee Post

Here's something i've been experimenting with in my own time :) 

Matt Petty Senior Software Developer SmarterTools Inc. www.smartertools.com

Reply to Thread

Enter the verification text