Court Overrules Most Objections by Defendant on Search Terms: eDiscovery Case Law

In No Spill, LLC v. Scepter Canada, Inc., No. 18-cv-2681-HLT-KGG (D. Kan. Oct. 19, 2021), Kansas Magistrate Judge Kenneth G. Gale granted in part and denied in part the plaintiffs’ “Motion and Suggestions to Compel Use of Search Terms for Electronically Stored Information”, overruling the defendants’ objections concerning the duty to meet and confer and undue burdensomeness and finding that the 28 search terms proposed were relevant and related to the RFPs.  Judge Gale also rejected the defendants’ cost shifting request, but did find so-called “general search terms” not proportional in rejecting those terms.

Case Background

In this dispute over claims for patent infringement, breach of contract, and engaging in unfair competition regarding two patents held by the plaintiffs relating to preventing the explosion of portable fuel containers, the parties agreed that they would utilize mutually agreeable search terms pursuant to an ESI Protocol but couldn’t agree on numerical search term limitations.  The defendants also proposed they would produce a maximum ten custodians and five search terms per custodian per the Federal Circuit E-Discovery Model Order, followed by a proposal of 12 custodians and 8 search terms, and a condition where a search term may not result in more than 1000 non-duplicative hits – all of which were rejected by the plaintiffs.

ProSearch

Continued negotiation ensued and the plaintiffs planned to move forward with a motion to compel if the defendants did not agree to the final set of search terms that would have resulted in 342,375 documents de-duplicated.  The plaintiffs rejected proposals by the defendants for the plaintiffs to identify which production requests (RFPs) each search term was intended to address or agree to cost shifting over 10,000 documents, so the plaintiffs filed the motion in August 2021.  The defendant objected that the duty to meet and confer had not been met, the RFPs were overly burdensome, and the RFPs were not proportional to the needs of the case.

Judge’s Ruling

Regarding the defendants’ objection regarding the duty to meet and confer, Judge Gale stated: “the Court is satisfied that there is sufficient good faith discussion. The parties have had several meetings concerning the matter, discussed the situation with the Court, and have communicated for over a year. Having a dispute regarding how the search terms relate to the RFPs does not necessarily rise to bad-faith discussion. And in any event, additional time to confer would likely be futile…Accordingly, Scepter’s objection that the duty to meet and confer has not been met is overruled.”

Noting that the plaintiffs “attached an exhibit which provided a summary of the search terms and how they relate to specific RFPs”, Judge Gale stated that the plaintiffs “breakdown the RFPs into four categories: (1) operation of the product; (2) Scepter’s knowledge of No Spill’s products; (3) Scepter’s financial information; and (4) Scepter’s fulfillment of No Spill’s orders.”  After a detailed review of all of the search terms, Judge Gale stated: “The Court has determined that all the search terms are relevant and seek discoverable information sought in the RFPs.”

Judge Gale also rejected the defendants’ burdensome argument, stating that “Scepter fails to provide evidence of its burdensome objection” and rejecting the defendants’ citation of Lawson v. Spirit Aerosystems, noting in that case “the court repeatedly relied on several affidavits and declarations in the record”.  He also compared the review estimate of $416,635.50 against the potential damages award of more than $70 million in rejecting that argument.

Judge Gale also did find “the search terms sought are largely proportional”, but not so-called “general search terms” (such as “flame mitigation device,” “No Spill,” or “nospill.com”). 

Finally, Judge Gale considered the defendants’ request for the plaintiffs to bear the cost of discovery after 10,000 documents, where he stated:

“In this case, the Court does not find it appropriate to apply any cost-shifting analysis. The Court is only compelling the use of search terms that are proportional and not overly burdensome. The advisory committee notes indicates that it should not be common or the norm. And most importantly, there is no agreement in place between the parties that addresses cost-shifting. As such, the Court is not ordering any cost-shifting. However, in the event No Spill wishes to use the search terms the Court has determined to be disproportionate, then they will be responsible for the costs. This is also consistent with the model order regarding e-discovery in patent cases.”

So, what do you think?  Should courts consider the assistance of an independent Special Master to resolve search term disputes?  Please share any comments you might have or if you’d like to know more about a particular topic.

Case opinion link courtesy of eDiscovery Assistant, an Affinity partner of eDiscovery Today.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

7 comments

  1. As one who makes his living as a court-appointed independent Special Master in matters involving ESI, I’ll pass on the question of using Masters to resolve search term disputes. At all events, that wasn’t an issue before the Magistrate, and why pay a Master for an analysis the Magistrate undertook for “free?” What’s missing here are solid metrics. All the fuss is about search terms when no evidence is supplied going to the effectiveness or ineffectiveness of same. A year of quarreling and they’ve yet to discover that a great many of the queries in question are not going to work in the way either side expects! Good luck with those numeric searches punctuated by hyphens! Has anybody TESTED these searches against representative samples of data? The number of hits being deemed arbitrarily “high” or “low” isn’t as meaningful as what evidence is implicated or excluded. How many of the 12,000 items Scepter deemed responsive would have been brought forward by these queries? What are examples of noise hits? Search is more than a game of numbers; there must be a qualitative component attendant to assessing search methodologies. Are the documents being hit upon truly proper candidates for review? Are the documents being excluded truly not? What technologies will be brought to bear on the next tier of culling and review? Hard to believe this is the best we can do on search in Court some 16 years after the “new” Rules. Sad, really.

  2. Let me second everything Craig just said.. specifically about evidence, data, analysis, real information involving both quality and coverage of the search term hits. If nothing is being measured, then all arguments (on all sides, judicial included) are moot.

    I mean, what does the following statement even mean? “The Court has determined that all the search terms are relevant and seek discoverable information sought in the RFPs.” I thought only documents were relevant. What does it mean for a search term to be relevant?

    and that the producing party has an obligation to produce relevant (responsive) documents/information. What does it mean for a search term itself to be relevant? (A point Craig makes as well when he talks about the number of hits being high or low as not being meaningful vis-a-vis evidence.)

    Here’s the thing: This whole discussion seems to have lost the forest for the trees. Let’s back up to what we are (or should be) trying to do in discovery. And that is to determine a “just, speedy, and inexpensive” way of finding the information responsive to an RFP, correct?

    If we reframe the question of arguing about keywords to the question of whether something is “just, speedy, and inexpensive”, then we might come to realize that this whole exercise is framed incorrectly. That is, we can step back for a moment and ask ourselves the question: Why are search terms being used to cull anything away in the first place? The answer to that is: Because hosting and review are expensive. Right? But what we may have missed is that the expense of hosting is not the same expense as review. Over the years, the hosting expense has dropped significantly as technology has improved. But the review expense, a human-driven cost, has pretty much stayed the same. So it could very well be that the economics of needing to agree to search term hits may not make sense anymore, if we’re taking the large picture view of what is overall just, speedy, and inexpensive.

    I can illustrate this with a concrete example, if anyone is interested.

  3. Great comments from both of you. Dr. J, I’m also glad you commented, because it reminded me that I forgot to respond to Craig’s comment — now I can respond to both of yours. And, I would welcome the “concrete example” you propose to illustrate the issue! And, Craig, the Special Master question at the end was partially designed to promote a response from you, but it sounds like I didn’t need it!

    Every couple of years, I see a case like this where the Court is deciding how appropriate highly complicated search terms are. And I always wonder how they’re supposed to do that in a vacuum, but yet, they still do it. Last year, I covered a case where Michigan Magistrate Judge Steven Whalen, citing US v. O’Keefe, declined to rule on search terms, ordering the parties to enlist an expert to help (https://ediscoverytoday.com/2020/08/12/court-grants-part-of-plaintiffs-motion-but-wont-go-where-angels-fear-to-tread-on-search-terms-ediscovery-case-week/).

    I praised that decision, but less than a month later, he ruled in another case (https://ediscoverytoday.com/2020/09/24/court-finds-angels-somewhat-less-afraid-to-tread-on-search-terms-this-time-ediscovery-case-law/) where this time, he did determine some search terms were relevant including the wildcard “black!” in a racial discrimination case — even though there are several words in the English language that begin with “black” that have nothing to do with race (https://www.morewords.com/words-that-start-with/black). Go figure.

    The point is, the search terms should be tested, results reviewed for responsiveness to the case, then optimized and run again until properly tuned. An expert can help with that, which was the other reason I mentioned a Special Master at the end. And even though it sounds complicated, it could have been done in a small fraction of the time with cooperating parties as opposed to arguing over the scope of them for a year, even for the complicated searches illustrated in this case.

    Assuming, that is, that search terms are the best way to address this collection. We don’t know the makeup of the document collection, so it’s hard to say. Machine learning/TAR isn’t for every case. But too many attorneys still don’t even consider machine learning/TAR because they think they “get” search terms (because they learned them in Westlaw and Lexis), even though the way case research searching is done is completely different than searching for eDiscovery. Cases like this illustrate that many lawyers are still stuck in old methodologies when it comes to identifying a responsive document set for discovery.

    • Ok, lemme find some free time (might not be able to get to it until tomorrow), and I’ll share an example. More of a thought experiment, really. But I will share that soon.

      In the meantime, let me quickly point something out. You write:

      “The point is, the search terms should be tested, results reviewed for responsiveness to the case, then optimized and run again until properly tuned.”

      I totally agree with that statement. But what’s funny to me is that this cycle, this feedback loop that you describe.. there is a name for an architecture that carries out this cycle automatically. It’s called.. wait for it.. machine learning. Aka AI, aka the foundation of TAR.

      This is a point that continually seems to elude this industry. The question isn’t one of search terms vs TAR. When search terms are done properly, i.e. with testing and feedback loops, they’re expressing the same fundamental abstraction.

    • Alright, considering the following example/thought experiment. Suppose you have a de-NISTED but otherwise unculled collection of 1 million documents. Of which (say) 40,000 are responsive. Furthermore, let’s suppose in one parallel universe you spent a bunch of time (and money) creating (and negotiating) keyword culling terms. And that those terms remove 800,000 documents from the collection, so that you only have to host/review 200,000.

      Let’s look at some cost comparisons. Specifically, there seems to be an emerging pattern, particularly from some governmental agencies, that you’re allowed a choice between keyword culling or TAR, but not both. So let’s hypothetically compare the costs.. the TOTAL costs.. of each.

      In this universe in which you’ve culled using keywords, you have the following costs:

      (1a) Hosting = 200,000 docs, which at 3,000 docs/GB is about 67 GB. Let’s pretend your getting hosting at $12/GB (insert your own number here) and that the case sticks around for 12 months. That’s a total hosting cost of 67 * $12 * 12 = $9,600.
      (1b) Review = 200,000 docs, at about $1/doc = $200,000
      (1c) Hours and days spent crafting keywords and fighting about them in court = $?
      TOTAL = $209,600 + unspecified court costs

      Now imagine another parallel universe, in which you don’t keyword cull first, host all million documents, and then use TAR. My 10+ years of experiments doing TAR 2 (CAL) have led me to expect about a 2:1 ratio of reponsive to non-responsive docs at an acceptable recall level of about 80%, when the TAR user follows the proper workflow and doesn’t continue reviewing past the point that is necessary. So what would our costs be in this parallel un-keyword culled TAR universe?

      (2a) Hosting = 1,000,000 documents, which using the same 3,000 docs/GB leaves us with 333 GB, which at the same $12/GB and 12 month hosting period is 333 * $12 * 12 = $48,000
      (2b) Review = 64,000 documents (that’s 2:1 at 80% recall), which at the same $1/doc = $64,000
      (2c) Additional cost of court time fighting over TAR = $?, but probably less than fighting over keywords, given that TAR has precedent, but specific keywords are still fought over.
      TOTAL = $112,000 + lower unspecified court costs

      That’s a TOTAL cost of almost $210k for keywords, $112k for TAR.. EVEN with the additional 800,000 documents of hosting!

      Plug in your own numbers for GB hosting costs and $/doc review costs if you want. I’m just using those numbers as general examples. But even with slightly different numbers, the overall story remains true: All the keyword culling time, effort, and hosting savings is negligible, because your hosting is the least significant, lowest proportion cost in this whole endeavor. Losing sight of that is losing the forest for the trees.

      See what I mean?

      So let’s keep in mind: The purpose of this whole endeavor is the just, speedy, and inexpensive discovery of responsive information, correct? And if we find ourselves at the point where a special master has to be called in to settle keyword searches, and keyword searches don’t even give you the biggest cost savings, then there is something seriously out of alignment.

      What say ye, Doug? Thoughts?

  4. Agreed, Dr. J. And I think the problem with the use of search terms by too many legal professionals out there (note that I said “legal professionals” and not “lawyers” as the problem isn’t just limited to lawyers) is that too many of them don’t apply that feedback loop to search terms. They seem to understand it’s necessary for TAR, but they don’t understand it’s necessary for search terms as well (as Grossman and Cormack noted in their article a few weeks ago).

    For me, the issue can be summed up in two points: 1) Legal professionals (for several reasons) aren’t considering the use of TAR enough, and 2) They aren’t applying best practices to the use of keyword searching. Believe it or not, I think the first point is likely to be addressed long before the second point is because most legal professionals doing keyword search don’t believe they’re doing it wrong. The only way to teach old dogs new tricks is to not try to get them to drop bad habits first.

    • But lawyers certify the productions, right? As well as have a duty of technological competence. So they must have some basis for believing that they.. or those doing the work for them.. are not doing it wrong. I mean, assuming that they’re unfamiliar with Blair and Maron and the problems inherent in keywords, the expense analysis alone is accessible to anyone who has ever realized that printers are cheap because the ink costs more than its weight in gold. That is to say, if you can do a total cost analysis of printer ownership, you should be able to do a total cost analysis of keywords.. and very quickly realize that the hosting (analogy: printer) costs pale in comparison to the review (analogy: ink) costs.

Leave a Reply