Three New TAR Articles from Grossman, Cormack & Grant Thornton Ireland: eDiscovery Best Practices

Doug Austin

2 years ago

Here are not one, not two, but three new TAR articles from Maura R. Grossman, Gordon V. Cormack and three professionals from Grant Thorton Ireland!

The articles, which were provided to eDiscovery Today on Monday, discuss aspects of the process for Technology Assisted Review (TAR). All three were written based on testing conducted “[i]n a large-scale eDiscovery effort” by Maura R. Grossman and Gordon V. Cormack of the University of Waterloo & Tom O’Halloran, Bronagh McManus and Andrew Harbison of Grant Thorton Ireland. The three new TAR articles (with their Abstracts) are as follows*:

Technology-Assisted Review for Spreadsheets and Noisy Text (4 pages): In a large-scale eDiscovery effort, human assessors participated in a technology-assisted review (“TAR”) process employing a modified version of Grossman and Cormack’s Continuous Active Learning® (“CAL®”) tool to review Excel spreadsheets and poor-quality OCR text (defined as 30-50% Markov error rate). In the legal industry, these documents are typically considered inappropriate for the application of TAR and, consequently, are usually the subject of exhaustive manual review. Our results assuage this concern by showing that a CAL TAR process, using feature engineering techniques adapted from spam filtering, can achieve satisfactory results on Excel spreadsheets and noisy OCR text. Our findings are cause for optimism in the legal industry—adding these document classes to TAR datasets will make large reviews more manageable and less costly.

Limitations of the Utility of Categorization in eDiscovery Review efforts (6 pages): In large-scale litigation, an eDiscovery production request typically consists of numerous individual Requests for Production (“RFPs”), each specifying a category of information sought. In this study, we investigate the effects on assessor performance—measured in terms of review speed and consistency in categorization—when using 55 individual RFPs versus a lesser number of broader composite categories (“CCs”) covering the same underlying RFPs, applied to the same document population. Our results show that increasing the number of review categories has a substantial and significant negative impact on both review speed and the consistency in categorization achieved by assessors. The overall inability of assessors to achieve consistent categorizations raises serious questions about the utility of the practice of categorization in eDiscovery review efforts.

Comparison of Tools and Methods for Technology-Assisted Review (10 pages): In a large-scale eDiscovery effort in Irish litigation, human assessors participated in two technology-assisted reviews (“TAR”) employing continuous active learning (“CAL”) processes, one using Grossman and Cormack’s logistic regression CAL tool and the other using a leading eDiscovery provider’s support-vector-machine-based (“SVM”) tool. In this work, we investigate the extent to which the different tools and associated methods impacted the effectiveness and efficiency of the competing TAR reviews across the same document population, measured by recall, precision, review effort, and the average cost incurred per relevant document found. Our results show that the tool and method underlying the TAR model matters – the CAL process outperformed the provider’s process on all measures.

All three new TAR articles are in-depth in discussing the testing that was performed and the results that were achieved, and it involved 250 million(!) documents in an actual case. Very interesting findings on a very large-scale eDiscovery effort in terms of possibilities and challenges associated with a CAL TAR process (including Grossman and Cormack’s own CAL tool)!

So, what do you think? Do any of these three new TAR articles change any considerations regarding your approach to TAR? Please share any comments you might have or if you’d like to know more about a particular topic.

*Links to the last two documents are provided with permission as PDFs downloadable from eDiscovery Today prior to publishing. All papers are reviewed and accepted and will be presented and published as part of the International Conference on Information Management (ICIM 2024) conference proceedings.

Image created using GPT-4’s Image Creator Powered by DALL-E, using the term “robot holding up three fingers”.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Related Posts

Share this: