Eegads! A Saturday morning post* from eDiscovery Today? Recently, I’ve started writing posts for EDRM for them to publish on their blog. But, of course, the reader of eDiscovery Today are my first priority (and my bread and butter), so I want to make sure you can enjoy that content as well. So, ICYMI, here are 3 reasons people avoid using TAR in their cases and 4 ways to promote using it.
Recently, I received an email from a reader of eDiscovery Today in which she asked: Why don’t people use Technology Assisted Review (TAR)/predictive coding (and related technologies) more?
It’s a great question and the use of predictive coding (or lack thereof) is illustrated in the State of the Industry report that eDiscovery Today publishes every year, which is sponsored by EDRM. In this year’s report (which is summarized here), out of 281 respondents, only 25.9% of respondents said they use predictive coding in all or most of their cases, while 36.3% of respondents said they use it in very few or none of their cases.
Even worse, those numbers reflect less usage than the previous year, where 31.1% of respondents said they use predictive coding in all or most of their cases (5.2% higher than this year), and 32.8% of respondents said they use it in very few or none of their cases (3.5% lower than this year). In other words, a smaller percentage of respondents uses it most of the time and a larger percentage of respondents hardly uses it.
Does this mean the use of TAR is regressing? Not necessarily. I think the fact that this year’s survey had over 50% more respondents than last year’s survey (which had 183 respondents) means that it illustrates a broader, more realistic view of the industry’s use of it. Keep in mind these are still people who chose to take an eDiscovery survey, so those results are probably still more optimistic than the legal community overall.
So, why don’t more people use TAR? Here are 3 reasons:
Lack of Understanding: Simply put, people avoid what they don’t understand. And – despite many efforts to educate the legal community on it – they still don’t understand how machine learning technologies work. To many, it’s a “black box” that they don’t understand and feel ill equipped to defend in court if they are required to do so.
Conversely, most legal professionals think they understand keyword search (because they learned it in law school with Westlaw or Lexis, or they know how to perform a Google search), but many don’t understand keyword search best practices when it comes to discovery, including testing search results and the null sets for those results. This recent example illustrates that many legal professionals don’t understand keyword search either. But most think they understand it, so they prefer that approach over something they know they don’t understand.
Unwillingness to Change: Even if they do understand the technology and recognize the efficiency and effectiveness of TAR, many legal professionals are still unwilling to change their workflows to incorporate it. When there are a lot of documents to review, the mindset is to put more reviewers on it. In some cases, this may be because there is a concern that offering a different approach may reduce billable hours (there, I said it!), while in other cases, it may simply be a resistance to change.
This case that we discussed in the April case law webinar illustrates that – the defendants were looking at reviewing 225,000 documents at a cost of between $140,000 and $235,000, with a team of 15 reviewers. Yet, there was no mention of the consideration of TAR anywhere. It’s natural for people to stick with what they know unless forced to change, even if they recognize there may be a more efficient way to do it.
More Transparency is Expected with TAR: Another reason that legal professionals avoid using TAR is that opposing parties tend to want more transparency on TAR processes than they do on the traditional keyword search approach. Why? That’s a good question and it probably relates to reasons 1 and 2 above.
Maura R. Grossman and Gordon V. Cormack noted in their paper last year The eDiscovery Medicine Show (which I covered here) the “misconception…that only the TAR tool should be subject to validation, while keyword culling and manual review should be exempt, as they have always been.” Legal professionals tend to assume that their opposing counterparts know what they’re doing when it comes to keyword search (even though many don’t), but they only tend to want proof that they know what they’re doing when they’re using TAR.
In last year’s State of the Industry report, Judge Andrew Peck (ret.), who authored the first ruling to approve the use of TAR over ten years ago in Da Silva Moore, stated as a reason that more people don’t use TAR: “Part of the problem remains requesting parties that seek such extensive involvement in the process and overly complex verification that responding parties are discouraged from using TAR.”
It shouldn’t be that way, but it is. So, how do we get people to start using TAR? Here are 4 ways to promote using TAR in more cases.
Two things to note before I begin:
- Promoting the use of TAR is not just the responsibility of just one group here. Making a real difference involves action from lawyers, courts and the software providers that offer TAR solutions; and
- These are not just my thoughts – they are ideas and recommendations I’ve heard from other experts in the industry over the years.
What Lawyers Can Do
Of course, lawyers are the key to increased TAR use in litigation. But, as I mentioned last time, there’s a: 1) lack of understanding of TAR, 2) an unwillingness to change what they’re doing, and 3) a desire to avoid having to be more transparent about their eDiscovery process when using TAR. Let’s talk about two ways to address these concerns:
Test TAR on a Prior Case First: It’s understandable that lawyers are reluctant to use TAR on a case that is active with tight deadlines – that’s not the best time to experiment and learn new workflows. Practicing the use of TAR with a case that’s already completed enables you to get used to the process without any deadlines or required outcomes to worry about, while giving you an outcome that has already occurred to measure against. If you’re missing important documents that you found in the case before, now is the time to adjust and learn how to identify them more accurately. Practicing TAR on a prior case may also uncover important documents you missed the first time, showing you just how powerful TAR can be!
Use TAR on Documents Produced to You: The biggest argument that lawyers may have regarding testing TAR on a prior case is the “ain’t nobody got time for that” argument. While you should make time to test TAR before actively using it on live cases, I get it – lawyers are busy and that’s not always feasible. What you can do is use TAR to conduct a review of documents that were produced to you. You have to review them anyway, right? Why not let the technology help you find the documents that are important?
One of the biggest advantages of testing TAR before using it on an active case and/or using TAR on documents produced to you is eliminating the defensibility requirement. You don’t have to be transparent when using it for your own needs and you can work out the process before using it for review and production in a case where you may have to be transparent.
What Courts Can Do
Speaking of transparency, let’s discuss what the courts can do to help promote the use of TAR. As Maura R. Grossman and Gordon V. Cormack noted in their paper The eDiscovery Medicine Show, “The Ball is in The Courts’ Court”. Maura and Gordon, while acknowledging Sedona Principle 6, stated “producing parties should show—and the courts should demand that they show—the reasonableness of their eDiscovery search and review processes, as well as the resulting production, by hewing closely to tools, methods, and procedures that have been scientifically vetted and shown to be valid and reliable.”
Not that every case requires (or should require) that level of defensibility. Lacking a reason to otherwise question them, Sedona Principle 6, which states that “[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information”, should be a sufficient barometer for parties’ handling of discovery in many cases without disputes over potential deficiencies in a party’s discovery process. Heck, even Judge Andrew Peck (ret.), the author of the first ruling to approve the use of TAR (and a strong advocate of TAR approaches), invoked that principle in Hyles v. New York City in ruling the defendant didn’t have to use TAR because of Sedona Principle 6.
What courts can do, most of all is:
Be Consistent: Don’t hold TAR based approaches to a different standard than approaches based on keyword search. What’s good for the goose is good for the gander! You can quote me on that! 😉 If TAR based approaches should be transparent, so should keyword search-based approaches.
Lawyers can help here as well, by (at least) demanding the same level of transparency in TAR and keyword search. Too many legal professionals get keyword search wrong to assume people know what they’re doing and give them a pass.
What Software Providers Can Do
Let’s not leave software providers out of the mix here. They need to:
Make TAR (and Other Machine Learning Processes) More Intuitive: Part of the reason that lawyers eschew TAR is that software platforms make the workflow so different than what they’re used to doing with a traditional keyword search and review approach.
To the extent possible, machine learning needs to be baked into existing workflows so that lawyers almost don’t even realize they’re using TAR. All lawyers use machine learning technology these days – in platforms ranging from Netflix to Pandora to Amazon (and many more). It’s intuitive for them there, why can’t it be in eDiscovery platforms too? The good news is that several platforms seem to be moving in that direction, so this way of promoting TAR is already well underway.
Making strides in the use of TAR within the legal profession will not only require more from lawyers, but it will also require more from courts and providers, as well. Progress is always a group effort!
So, what do you think? How else can we get people to use TAR more? Please share any comments you might have or if you’d like to know more about a particular topic.
*Not to be confused with The Saturday Evening Post. See what I did there? 😉
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.
Hey Doug:
Thanks for a good post on why people aren’t using TAR. You offer a good start on the reasons:
1. Lack of understanding;
2. Unwillingness to change; and
3. The Demands of transparency.
But there is another elephant in the room–the billable hour. When you bill by the hour, all the incentives run away from efficiency. And when you are measuring risk, the heavy bias is to stick to established methods. Why take chances when the results cut billables.
Complexity is another key factor here. In the beginning, vendors were selling TAR 1.0 systems that were complex, confusing and seemed to require high priced consultants. The process required front end work and the early algos didn’t work all that well, especially with low richness collections that were the norm in ediscovery.
The training methods begged for debates and oversight and the early programs were focused on silly issues like “can I use two SMEs for the training.” And, there were all the statistics like F1 measures, recall, precision etc. Besides, human review was the gold standard, wasn’t it. And don’t I want to miss a single relevant document.
I think we will see a new generation of search products where machine learning is fully integrated. Reviews in all cases, large or small, will be ordered by an AI algorithm which will provide the documents we need without a lot of extra effort. And, the lawyers will find what they need to move the case forward without endless discovery battles.
My views may be biased, of course, because we are launching a next generation search and TAR product but others will quickly catch on and make all of this easy and painless. We helped move the ball forward with TAR 2.0 and continuous active learning but that was only the beginning. The next generation of algorithms are thousands of times faster and more scalable that what is on the market today and they will truly make finding information in large document populations as easy as Pandora and Spotify make finding great music. A lot can happen in the next year or so.
JT
I appreciate the comment, John. And I did briefly allude to the billable hours resistance point above. I do believe that some lawyers out there are resistant to TAR because they’re concerned it will reduce billable hours, which I think is a short-sighted concern. Perhaps I should have added a “What Clients Can Do” section and discuss how they need to hold law firms accountable on leveraging technology to provide the best possible service.
As for the next generation of machine learning, I’m excited for that and what companies like yours can do, and are doing, to make it more intuitive. That’s huge!
Excellent points…and you can’t go wrong quoting Judge Peck on this! The point about software providers jumps out – any time significant change in an approach, in this case processes, occurs resistance is sure to follow. Software providers making adjustments and TAR users adopting change are steps in the right direction.
Thanks, Aaron! I find I usually can’t go wrong quoting Judge Peck on anything! 😉
“To the extent possible, machine learning needs to be baked into existing workflows so that lawyers almost don’t even realize they’re using TAR.”
Yes and no. Now, speaking only for myself, and not my employer or any other entity, I see both sides of this coin. On the one hand, as Tredennick mentions above, we did bake the TAR 2.0 workflow into the platform in a completely natural way. Back in 2012, when we introduced TAR 2.0, we (after extensive testing to ensure that it worked) did away with control sets, the need for SME training, and other obstacles. All the review team had to do was just start reviewing documents. No different than if you were doing a linear review, which every lawyer is familiar with. So we baked that in.
At the same time, the workflow can’t be exactly the same as one has done it before, either. For example, with linear review, the review team would continue reviewing until the entire collection was reviewed. But with TAR baked in, the review team needs to stop after a reasonable recall target has been hit.. and not review the entire collection. Otherwise, that defeats the purpose of having TAR at all.
Expecting to not have to change any aspect of one’s workflow is like moving from riding a horse to driving a car, and asking the vendor (the car manufacturer) to give you reins to steer the car, rather than a steering wheel. And stirrups rather than a brake pedal. No. At some point, even if the vendor has baked certain things in, one has to understand that a better technology will alter the workflow, too. I mean, right? There has to be give and take on both sides.
“Heck, even Judge Andrew Peck (ret.), the author of the first ruling to approve the use of TAR (and a strong advocate of TAR approaches), invoked that principle in Hyles v. New York City in ruling the defendant didn’t have to use TAR because of Sedona Principle 6.”
A very interesting discussion could be had about the word “evaluation” inside of Principle 6.
Case in point: You started off the post by noting that “people avoid what they don’t understand. And – despite many efforts to educate the legal community on it – they still don’t understand how machine learning technologies work.”
If someone doesn’t understand the technology, then they haven’t really evaluated it, eh? The process of evaluating something gives you understanding, does it not? And does not Principle 6 state that responding parties are best situated to evaluate these technologies?
Agreed, Dr. J. And I think everybody who produces ESI SHOULD evaluate TAR. They also have a duty to understand the benefits and risks associated with relevant technology under Comment 8 of ABA Model Rule 1.1.
What I’m saying is that TAR shouldn’t be scrutinized any differently than keyword search and manual review. Either expect a heightened level of defensibility for both or give parties the benefit of the doubt for both under Sedona Principle 6 (unless deficiencies to their approach are identified). Don’t apply a different standard of defensibility to TAR when many people don’t know how to do keyword search and manual review either.
Oh for sure, they should be held to the same standard. And for sure, on the duty of technological competence.
I’m just saying that I don’t quite understand Principle 6 being quoted as a justification for not using TAR.. unless the producing party had actually done an a priori evaluation of it and was able to convincingly show that, for this particular matter, it did not have a high chance of being better than the alternative.
Was that part of the Hyles vs New York City ruling? Did the responding party show all the evaluation that they had done?
Dr. J, the ruling (linked above) said: “The City declined, both because of cost and concerns that the parties, based on their history of scope negotiations, would not be able to collaborate to develop the seed set for a TAR process…At the conference, even after the Court’s ruling on the custodians and date range largely accepted the City’s scope parameters, the City still declined to agree to use TAR.”
That’s all I see, so the ruling doesn’t elaborate on any evaluation, which I suspect wasn’t presented to the Court (otherwise Judge Peck might have commented on it).
Well said.
““The City declined, both because of cost and concerns that the parties, based on their history of scope negotiations, would not be able to collaborate to develop the seed set for a TAR process…”
Ah, here we have it. A non-intentional reveal that evaluation was NOT done. Why? Because I know — having done multiple publicly-published experiments from 2012-2016 in which I examine the effect of starting with different seed sets under a TAR 2.0 (CAL) regimen — that variations in the seed set doesn’t matter. Start however you like, even with a single seed, and TAR 2.0 will get you to an efficient (high precision) endpoint (high target recall).
If the Principle 6 evaluation had been done, they would have known this. But because they think that seed sets are something that need to be endlessly discussed and fought over, this tells me that they didn’t actually do the evaluation. Judge didn’t catch it, either.
Like I said above: A very interesting discussion could be had about the word “evaluation” inside of Principle 6.
I don’t think it’s a matter of Judge Peck “catching it”, I don’t think courts usually “evaluate the evaluation”, but rather leave it to parties to decide how they want to conduct discovery. They can be as inefficient as they want (assuming that they don’t use that to make burden arguments in proportionality disputes), as long as they get the job done of responding to the discovery requests.
If their response is inadequate, “[t]he requesting party has the burden on a motion to compel to show that the responding party’s steps to preserve and produce relevant electronically stored information were inadequate.” (Sedona Principle 7)
Let me be completely transparent here and point out that I’m intentionally feigning a bit of naiveté in this discussion in order to tease out a point. Clearly the way the courts are interpreting Principle 6 is in a “do whatever you want” manner. But that’s not what Principle 6 actually says. It actually says:
“Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”
Where, according to Merriam-Webster, to evaluate means “to determine the significance, worth, or condition of usually by careful appraisal and study”.
The implication of course being that the responding party will then pick the best option for their particular circumstance, data, goal, etc. based on that evaluation. I.e. that they will eventually do whatever they want, anyway. But, at least in how I read Principle 6 as it is written, they get to do whatever they want because they have first evaluated what it is that they’re doing. Evaluation is the precondition to the setup of getting do whatever you want. I mean, right?
What I mean by the judge not catching it, is that there were un-evaluated claims being made. The precondition was missing. And again, does not Principle 6 imply that evaluation is a necessary party of the equation?
Urgh: Necessary _part_ of the equation. Not necessary _party_.
Ah, Dr. J, there you go again “intentionally feigning a bit of naiveté” to “tease out a point”. 😉
I agree that the parties should be truly evaluating the approaches, not just choosing to do what they want. If more outside counsel firms did that, they would be choosing the best way to provide services to their clients and we would see TAR used on a LOT more cases.
You’ve inspired another post topic from me on Sedona Principle 6 and the idea of “evaluation”. Look for that soon. 🙂
This point about evaluation is also something that we’re trying to address in one of the EDRM AI and Analytics groups. We’re moving slowly on getting material out, but go ahead and steal any thunder (i.e. write about it as much as you can/should/would like) because the whole point is to get people thinking about this issue. Principle 6 literally says “evaluation”, and everyone, judges included, seem to miss that fact.
> there you go again “intentionally feigning a bit of naiveté” to “tease out a point”. 😉
As long as one is transparent about it, eh?