During the Power Searching panel at the UF Law E-Discovery Conference (reminder, this is the last day to register to get access to the sessions on-demand!), one of the panelists (Paul H. McVoy, CEO of Meta-E Discovery LLC) discussed the importance of a search term report to helping to evaluate your search terms. That reminded me of how unique hits are great for identifying potentially overbroad terms in eDiscovery!
Unique hits are the documents for a term in a list of search terms executed that are returned by only that particular term. If more than one term returns a particular document, that document is not counted as a unique hit.
Paul used a search of terms in the ubiquitous Enron database that included the search term “Enron”, which (of course) was an obvious overbroad term, but it illustrated the point.
I once had a client who ran a bunch of search terms and got more than a hundred thousand documents with hits. They asked for my help in reducing the collection and two of the terms in the list were “India” and “China”, which were responsible for more than 80% of the documents with hits. While it seems obvious those are overbroad terms, the unique hits are a great way to quantify just how much.
Let’s look at a real-world example from a case I covered on Tuesday where the Court ordered the defendants to produce the entire email mailboxes for two individual defendants (something I’m not sure I’ve ever seen before). He also granted the plaintiffs’ (who were drivers for Papa John’s) search terms for 12 custodians of certain records, which were:
“(Doug or Dougie) and delivery)”; “mile*”; “mile* and delivery”; “reimburs*”; “termin*”; “Wylie and deliver”; and “Wylie and driver”
Two of those terms are “mile*” and “mile* and delivery”. Ignoring the fact that the term “mile*” is a superset of “mile* and delivery” (which makes the second term irrelevant if you’re including both terms), let’s take a look at the term “mile*”. The words in the English language that begin with “mile” are:
mile, mileage, mileages, milepost, mileposts, miler, milers, miles, milesimo, milesimos, milestone and milestones
In a dispute about driver mileage, words like “mile”, “mileage”, “mileages” and “miles” are relevant, which is why “mile OR miles OR mileage*” would be a much more precise term to use here. The other words aren’t relevant. Imagine if there were scores of communications discussing the importance of “meeting our milestones” or “did you meet your milestone” for the week. These would be hits for the term but would not be responsive to the issue.
So, assuming the other terms are on point (which is often a stretch), if the unique hits for “mile*” was considerably high compared to the rest of the terms, that could be an indication that term is overly broad. That’s why unique hits are great for identifying potentially overbroad terms!
So, what do you think? Do you agree that unique hits are great for identifying potentially overbroad terms? If not, why not? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.