The Best Data Protection Terms for an LLM? Here’s One Analysis: Artificial Intelligence Trends

Doug Austin

2 years ago

Which top large language models (LLMs) provide the best data protection terms? One publication conducted a detailed analysis of several top LLMs.

The analysis was published on Legal Evolution (Fine print face-off: which top large language models provide the best data protection terms?, written by Evan Harris & David W. Tollen, who also conducted the analysis, and available here). Hat tip to Lucian Pera for the heads up on the article.

As the authors discuss, when it comes to the question of whether commercial AI providers can use your input data to train their models, the big worry is that intellectual property, trade secrets, other sensitive information, or private information is unknowingly leaked to the public via an LLM’s future output for a different customer. They provide a “nightmare scenario” as follows:

An HR platform uses a commercial LLM to power its “chat with my performance review” feature. This feature advises end-users, helping them set goals and role-play tough conversations. With one click, enterprise customers can turn on this feature for all employees.
The HR platform agrees via its contract to let the commercial LLM provider use input and output to “improve the service.”
The commercial LLM provider then includes end-user inputs in the training set for the LLM’s next training run.
Future users of the LLM might have access to this sensitive data, regardless of their authorization or what company they work for.
With a simple query, anyone could prompt the LLM to reveal intimate details about a coworker, friend, politician, or industry influencer’s job performance.
Chaos ensues.

Yeah, that would do it.

While that’s unlikely, it’s technically possible. As the authors note: “Fortunately, the major players in the commercial LLM space have done a good job of offering contract terms that protect your inputs and outputs. Many updated their terms during the last couple of months.”

The analysis was done by a product called Screens (created by Evan’s team TermScout), which uses GenAI to apply contract review playbooks authored by expert lawyers to the commercial terms of the leading LLMs. They analyzed (in alphabetical order, with links to the terms and conditions for each) Anthropic, AWS, Cohere, Google Gemini, Mistral, and OpenAI. And they also look at the Screens’ own Terms and Conditions.

It’s a lengthy article, but the authors do include a “TLDR” (too long, didn’t read) section near the top for anyone who wants to skip the detailed analysis. According to the authors, two LLMs “got perfect scores based on the criteria we’re looking at”, so they appear to offer the best data protection terms. Regardless, it’s a terrific, detailed analysis of the terms and conditions of the most popular LLMs. Subject to change at any moment, but nicely done!

So, what do you think? Are you familiar with the terms and conditions of the LLMs that you use? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Share this: