Tom O’Connor Weighs in on the NIST Study and Trustworthiness of AI: eDiscovery Trends

I love it when Tom O’Connor gets into one of our topics and provides his own analysis!  So, when he told me he was going to write about the blog post that Jim Gill wrote on the NIST study and the trustworthiness of AI, I expected he would have a lot of interesting takes on it.

Tom’s post on his Techno Gumbo blog (Is the New NIST Standard for AI Looking at the Wrong End of the Horse?) discusses the trustworthiness of AI (after switching analogies from the one referenced in the title – and today’s graphic – to one where he notes “that this is a case where we DO want to see the sausage being made”*) that “NIST talks about trust as a key element in getting lawyer buy in on AI.  True, but first it would be nice if vendors explained to us what they actually mean by AI.”

Stating that “explaining how ‘revolutionary’ or ‘groundbreaking’ your AI is only helps me if I know specifically how it works in my particular use case”, Tom provides “some examples of what vendors say about AI right off their web sites”, as follows:

Advertisement
eDiscovery Assistant

AI is the next frontier.

AI is the future of eDiscovery

Our AI uses cutting edge artificial intelligence and machine learning.

Our AI gives extraordinary results.

Advertisement
Casepoint

AI … Believe the Hype

Embrace the groundbreaking magic of artificial intelligence with (name deleted)

Precise predictions in a fraction of the time required for traditional review

Infusing AI across the entire E-Discovery process

(name deleted) artificial intelligence capabilities are built on top of the latest innovations in Deep Learning, Natural Language Processing, compute intensive hardware processing and other related architecture approaches (editors note: “compute” is not a typo)

And his personal favorite

yeah … this is what the future feels like

When it comes to the trustworthiness of AI, Tom nails it when he says, “part of the problem is that people aren’t really sure how these programs work”, which illustrates what a difficult subject it is.  He also provides a link to the slide deck published by Tess Blair of Morgan Lewis (who co-presented the Ethics of AI presentation I covered last week, and many of the slides in the linked deck were in that presentation), provides an informative discussion of the comparison of AI to online music search engines Pandora and Spotify (a comparison which he “hates”), links to an article by Bob Ambrogi about a study by law librarians showing that different legal research platforms deliver surprisingly different results, goes into depth into The Art of “Thinking Like a Data Scientist” (which, it turns out, is really hard) and more.

Tom’s penultimate point is that “We have an ethical duty to truly understand this technology in order to be able to explain it to our clients and the Court, when required.”  The American Bar Association actually “urges courts and lawyers to address the emerging ethical and legal issues related to the usage of” AI and has for nearly two years with Resolution 112, which was adopted in August 2019 (and covered by me here).

When it comes to the trustworthiness of AI, how do we get lawyers to trust it?  I think (as Tom discusses) the responsibility lies both with the providers of the AI technology to make it more transparent and understandable, and the legal profession to try to understand it better.  As Canned Heat sang in their 1970’s hit (which I suspect is in Tom’s playlists somewhere), Let’s Work Together!

So, what do you think?  Can we improve the trustworthiness of AI (or at least get lawyers to trust it more)?  Please share any comments you might have or if you’d like to know more about a particular topic.

* I could have used that analogy as today’s graphic. You’re welcome. 😉

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

7 comments

  1. I strenuously object to your use of a horse’s ass as accompanying imagery to a story about my good friend, Tom O’Connor. What did that horse ever do to you?

  2. Haha! No inference about the author, I assure you Craig. Just “horsing around” with his analogy — it’s the “glue” that keeps the theme together!

  3. Heya Doug — Thought provoking post! Happy to weigh in. Let me start, though, by saying that all opinions below are mine and all mine, not my employer’s. I try to push my employer as much as I can in the “right” direction, but some forces are beyond any individual’s control.

    That said, let me say that I absolutely loathe the term “AI”. Like with a real, utter, deep seated passion. It says nothing, and pretends to be everything. And so much of what is being pitched as AI is never backed up with any sort of evidence from the vendor. One particular term that rubs me the complete wrong way is “deep learning”. Here is something I wrote for Bloomberg back in 2018 that tries to talk sense into AI nomenclature: https://news.bloomberglaw.com/e-discovery-and-legal-tech/insight-deep-learning-and-e-discovery-fact-or-fiction

    Hell, let me even call out and give credit to a competitor here. Bill Dimm from Hot Neuron expressed it beautifully in private conversation about a year ago, when he said that all these vendor claims around AI are like taking your Tinder profile pic in front of someone else’s Ferrari. Enough of that.

    Personally, I don’t care what something is called. I care what it does. And very little is ever said about what something does. And if it is said, it’s given in absolute terms (e.g. “we saved 86%!”), which says almost nothing about what is happening. It may be good. It may be bad. It’s impossible to know from an “absoluet” statement — e.g. “saved X”.

    What is needed is a relative statement. What is a relative statement? It’s a comparison of some approach relative to a very similar, reasonable way to have done things. Let me go old school in a quick example: latent semantic indexing, aka concepts. When I first joined this industry in 2010, “concepts” were all the rage. Because of course, who wouldn’t want a computer to understand the MEANING of a word and pump that into the machine learning algorithm, rather than use the humble, literal keyword, right? It was taken as self-evident that concepts were better than keywords. But no one would ever show a TAR result with concepts side by side with a TAR result with only keywords, to show exactly how much better they were, if at all. In other words, “Using TAR with concepts, you save 86%!”

    But that 86% is relative to linear review. What folks should want to see is what you would save, relative to TAR with keywords. Maybe TAR with keywords lets you save 84%. In which case concepts only gives you a result that is 2 percentage points better. Or maybe TAR with keywords even lets you save 89%.. in which case TAR with concepts is actually worse! It’s still true that TAR with concepts saves you 86%, in absolute terms. But in relative terms, it may be WORSE than TAR with keywords. No concept vendor ever showed this comparison, side by side, though.

    I did my own internal testing in 2011 and reached my own internal conclusions and shaped my research accordingly (history for another day). But let me cite someone else’s work, a professor from Georgetown. In 2017, he published this paper at ICAIL (Int’l Conf on AI and Law), and what it essentially shows (Table 2) is that concepts hurt more often than they help. http://ir.cs.georgetown.edu/downloads/yang-icail-2017.pdf In other words, concepts are worse.. even if they still save. you 86%. The lowly keyword saves you even more.

    This brings me to my next point: It’s one thing for the vendor to not show such side by side, relative results. But why do lawyers/customers not ask to see them? Principle 6 says that producing parties are best situated to evaluate the techniques they’re using, but who actually does this evaluation? Since 2010 I have made probably hundreds of offers to do “counterfactual experimentation” with lawyers, so that they can get a good sense (“develop trust” as per the theme of your post) of what happens when different aspects of an AI (cough) system are changed and how it affects the overall outcome. For competitive reasons I will not disclose the number of takers that I’ve had, but let’s just say that the track record of folks willing to engage in counterfactual thinking ain’t great. If I were being snarky, which I can neither confirm nor deny that I am, I would say that the number of users willing to engage in counterfactual thinking is about the same percentage as the number of vendors claiming that their systems are “AI”.

    So what is counterfactual thinking? It’s basically the ability to see something that happened, and imagine some aspect (whether large or small) of that thing having happened differently. Gone in a different way. Like with the LSI example.. if LSI saves you 86%, then counterfactual thinking leads you to ask the question “how much would I have saved if everything else about the process (core algorithm, training judgments, etc.) were the same, but I just used lowly keywords instead of LSI?” But it can be about anything, not just LSI. It could be about using contract reviewer judgments for training rather than senior attorney judgments. It could be about what a “portable model” (the industry’s latest “LSI”-ish buzzword) does or does not give you. It could be about hundreds of other things.

    But lawyers need to become more “counterfactual savvy” and demand the answers to these sorts of questions in the first place. It’s really not hard at all, despite what Ambrogi says. You just take some claim that someone has pitched to you, and you ask “ok, you say that component ‘x’ is responsible for your really good results.. how would the same thing have gone if you used component ‘y’ instead?” Not “how well did the whole thing work relative to linear review”. But “what effect did component ‘x’ really have?” Easy peasy. Lawyers are good at cross-examinations, right? Counterfactual thinking is cross-examination of vendor claims.

    Funny thing is, the NIST report that you linked to essentially points in this same counterfactual thinking direction. When talking about how users build trust in an AI system, the NIST authors write:

    [For instance, this has been demonstrated in the Wason Rule Discovery Task, where participants complete the following two steps after being shown the number sequence “2, 4, 6”: 1) generate a hypothesized rule characterizing the number sequence and 2) generate several number sequences to test their hypothesized rule. In general, most individuals hypothesize the rule “+2” and generate only sequences that follow their rule for the second step (positive hypothesis tests). This underscores our tendency toward congruent processing, which, in this case, often leads to a failure to discover the true rule (i.e., “any series of increasing numbers”). Experiments showed that individuals low in dispositional trust and those primed with distrust were found to be significantly more likely to generate sequences that did not follow their rule (negative hypothesis tests) [9]. Distrust improved performance on the task by invoking a consideration of alternatives.]

    In other words, distrust = counterfactual thinking. Instead of latching on to some primary hypothesis (or vendor claim), imagine a slightly different version of that hypothesis, and see if it also holds. Distrust the claim by counterfactually asking what would happen if some piece of it were tweaked. Those individual who are able to not only distrust, but act on it by putting the distrust into action (or requiring that the vendor put the distrust into action) by testing the tweak and showing the relative results will (quoting NIST) “improve[] performance on the task by invoking a consideration of alternatives”.

    Until lawyers are willing to distrust — not DISMISS but ACTIVELY DISTRUST, which is a huge difference — by testing or requiring their vendors to test counterfactual claims, there will be no trust. Trust requires active distrust.

    Or as Descartes said in essence (as well as literally, in posthumously published work): I doubt, therefore I am.

  4. Tom was spot on, unsurprisingly. Buy-in will evolve from understanding the technology FIRST. And techonologists/technicians/data scientists will have to begin by using less esoteric terms, among other approaches. Thanks for bringing Tom’s article to the fore.

Leave a Reply