Hat tip to Maura R. Grossman for this story! A lot of legal professionals eschew Technology Assisted Review (TAR) because they don’t understand it, preferring keyword search, which they say they do get. But do they really? This example regarding a keyword list for Donald Trump’s lawyer regarding the January 6th attack on Congress last year illustrates many think they understand keywords, but really don’t.
The Guardian article Keyword list for Trump lawyer hints at focus of US Capitol attack investigation, written by Hugo Lowell states that the House select committee investigating the January 6 attack on the US Capitol is asking the former Trump lawyer John Eastman to prioritize turning over records with certain keywords as he complies with his subpoena – a list of terms that reveal the panel’s focus as it investigates a potential conspiracy.
The 82-term keyword list includes a Gmail address used by Donald Trump’s former chief of staff Mark Meadows and the names of various individuals involved in the effort to overturn the 2020 election, from top Trump aides to Republican members of Congress to the former justice department official Jeffrey Clark.
The search terms list – which the select committee transmitted to Eastman – provides a glimpse of what House investigators suspect might be contained among the thousands of emails and documents that Eastman is being forced to review to comply with his subpoena.
The keyword list is intended to act as a dragnet to catch his records from January 4th to 7th about efforts to overturn the 2020 election results between Eastman and individuals in different “centres of gravity”, according to a source with direct knowledge of the investigation.
Before I get started looking at the keyword list, I should note this is not a political post, it’s an eDiscovery best practices post to show mistakes people make when identifying search terms to use in discovery. Also note that the “AND” words in red below are being used to separate distinct terms in the list for comparison, they are not part of the terms themselves. Finally, forgive me if I’m a bit snarky with some of these comments (in blue below), it’s been a long week!
Likely Under-Inclusive Term Examples:
- stole* w/5 election*
Apparently, the committee cares if someone “stole” the election, but not if they were “stealing” it.
- (foreign OR election*) w/3 interfere*
They also care if someone does “interfere”, “interferes” or “interfered” with it, but not if they have been “interfering” with it (no “e”), which won’t be retrieved by this term.
Likely Over-Inclusive Term Examples:
- Arizona, Georgia, Michigan, Nevada, “New Mexico”, Pennsylvania AND Wisconsin
Apparently, the Committee thinks that any document mentioning any of these states during that time period is conspiratorial.
- war w/2 gam*
This term runs the “gamut” of words that begin with “gam” – quite a “gamble”!
- rig* w/5 election*
If their goal is to retrieve documents that discuss rigging the election, they failed to use the proper “rigor” in testing it. It’s just not “right”!
The “Department of Redundancy Department” Examples:
- Corrupt* w/5 election*, fraud* w/5 election*, rig* w/5 election*, stole* w/5 election* AND election*
- house.gov, justice.gov, senate.gov, usdoj.gov AND .gov
I’ve seen this mistake a lot. Parties identify specific proximity terms for searching, then also include a superset term that eliminates the need to search for those specific terms. In the first instance, “Election*” will be a superset of the other four terms here, making them irrelevant (unless tracking documents responsive to each term in the production or prioritizing the more specific terms for review is important). In the second instance, the “.gov” occurrence will not just retrieve national government websites, but also state and local websites too. There are over 5,300 websites with “.gov” domains.
You get the idea. I’ve been providing search term consulting for years and have seen mistakes like this more times than I can count. Also, on my very first day of daily blogging over 11 years ago, I wrote a post that warned people “don’t get wild with wildcards!” where I discussed how wild they can be. I also discussed it in this post from last October, referencing a case I covered a few years ago where the parties used a wildcard for “app” (“app”, “apps”, “application”, “applications”) to search for phone apps, which also retrieved all of other the 302 words in the English language that start with “app” in the collection. That post even provides a link for testing wildcard terms that is super easy to use.
Many people don’t use TAR because they don’t understand it, but they don’t realize that there are due diligence steps they need to take with keyword searches as well. Both involve a process of balancing recall and precision, which requires testing and refinement of the results. The use of keyword search doesn’t eliminate that requirement.
As Maura said to me, “welcome to 2022 – you should be using TAR!” Examples like this happen every day and illustrate just that. Many people say they don’t understand TAR, but they also don’t truly understand keyword search either. They just don’t realize it.
So, what do you think? Do you agree that many legal professionals think they understand keyword search, but don’t really understand it? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.