Does the LLMperor Have New Clothes? Or No Clothes At All? Artificial Intelligence Best Practices

Doug Austin

2 years ago

Does the LLMperor have new clothes? Or no clothes at all? A new paper discusses what we don’t know about LLMs in eDiscovery yet.

The paper (Does the LLMperor Have New Clothes? Some Thoughts on the Use of LLMs in eDiscovery, written by Maura R. Grossman, Gordon V. Cormack, and Jason R. Baron and available here) starts off with Hans Christian Andersen’s famous parable about the emperor’s new clothes, where the emperor and his officials were tricked by swindlers who convinced them that non-existent clothes they were “making” were the finest clothes ever and that the inability to see them was a sign of stupidity. But when he went out in his “new clothes”, a little child pointed out that he didn’t have anything on.*

How does that relate to large language models (LLMs) and eDiscovery? Because, as the authors state: “on almost a daily basis, claims are being made by lawyers and commercial solution providers that LLMs either can or will soon replace not only traditional methods of identifying responsive electronically stored information (‘ESI’) using keyword searches, but also newer methods using technology-assisted review (‘TAR’). As part of these assertions, suggestions have been made that LLMs eliminate the need to follow sophisticated protocols that have come to be associated with search methods and the complex statistical efforts aimed at validating the results of particular TAR efforts.”

After providing some definitions and background (including what LLMs are and the evolution of machine learning and technology assisted review in legal, including a reference to Judge Andrew Peck’s seminal Da Silva Moore decision), the authors state this:

“LLM tools and protocols have not yet been demonstrated to be as effective as currently recognized methods for legal research, nor for TAR. The first step towards such recognition should be empirical studies akin to those cited in Da Silva Moore, demonstrating the effectiveness of TAR for eDiscovery tasks on a meaningful number of varied and representative RFPs and datasets. The second step should be to demonstrate, through the use of a statistically sound and well-accepted validation protocol, that each particular eDiscovery effort using a recognized LLM tool and protocol is reasonably effective.”

As a reference point for validation protocols, the authors reference the one from the In re Broiler Chicken Antitrust Litigation case used by Grossman as a Special Master in that case, notably designed to be applicable regardless of the review method employed—TAR or manual review. And for the lack of empirical studies, they discuss some of the considerations there, including the potential of heavy reliance on the skill of the “prompt engineer” in much the same way that keyword search relies on the skill of the searcher, noting: “Slightly different prompt formats can lead to wildly varying responses” (which we saw in this recent study).

Their recommendation: “Until valid testing demonstrates that LLMs are at least as effective as established practice for concrete eDiscovery tasks, they should be treated with caution.”

Great advice, and the 7-page paper is an easy read to understand the current landscape regarding the use of LLMs in eDiscovery and the considerations that should concern you. Check it out here.

In his ruling in Da Silva Moore, Judge Peck cautioned that TAR was not a “Staples-easy button” and the use of TAR over the years has followed much more formal protocols than keyword search and manual review ever did (including greater scrutiny in litigation from opposing parties). Why hasn’t keyword search and manual review been subject to the same level of scrutiny? Because lawyers think they understand keyword search (even though many of them don’t).

Now, many lawyers are embracing LLMs and generative AI, even without those empirical studies illustrating its effectiveness, and without the use of a validation protocol. Why? Because they think they understand it because they all have played around with genAI chatbots like ChatGPT. My concern is that – despite the important concerns raised in this excellent article – many of them want that “Staples-easy button”.

Does the LLMperor have new clothes? Or no clothes at all? Good question.

So, what do you think? Are you concerned there aren’t any empirical studies yet for the use of LLMs in eDiscovery? Please share any comments you might have or if you’d like to know more about a particular topic.

* – I would say “spoiler alert”, but he wrote it in 1837. 😉

Disclaimer: The views represented herein are exclusively the views of the authors and speakers themselves, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Related Posts

Share this: