If we're talking scandals, I'd be interested in one relating to OpenAI, which states in certain terms, right at the top of the agreement, that they do not train their models on your data. If they broke that agreement, I would be very interested in seeing reports of that, because OpenAI has staked their reputation on saying they do not do this. I couldn't find any articles saying they have.
As for Google Bard:
It really isn't surprising that Google is training their AI tools on user's data...in fact, I would be surprised if they ever said they wouldn't. Amazon is even less surprising, the Zoom scandal is about the fact they updated their terms to say they would train their AI on user's data, not for breaching a contract saying they wouldn't (Adobe would be a more recent example of the same thing). Clearview AI isn't offering an API as far as I'm aware, and the Facebook one is a real reach...
But I suppose that's about what you can expect from Google Bard's output 🤷
Now, I had seen recently that the NYT lawsuit with OpenAI had them ask the judge to serve an injunction to retain all their user's data so they could peruse it. I figured this had little chance of going through and would be appealed.
To comply with the order, OpenAI must "retain all user content indefinitely going forward, based on speculation" that the news plaintiffs "might find something that supports their case," OpenAI's statement alleged.
The order impacts users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI specified in a court filing this week. But "this does not impact ChatGPT Enterprise or ChatGPT Edu customers," OpenAI emphasized in its more recent statement. It also doesn't impact any user with a Zero Data Retention agreement.
To which my first thought was:
"Uh, but they didn't update the privacy policy..?"
And then I realised: "oh, of course. The privacy policy says they don't train on your data. Not that they don't retain it. And there is that section in there about sharing it with third parties for legal reasons."
Their policy:
By default, abuse monitoring logs are generated for all API feature usage and retained for up to 30 days, unless we are legally required to retain the logs for longer.
Anyway.
Back to my feature request... limiting it to a local LLM where the data never leaves the server/user's device. I don't see any privacy issues with that.