AI Prompt to Improve Keyword

AI Prompt to Improve Keyword Search by Craig Ball: eDiscovery Trends

Yesterday, Craig Ball blogged about an AI prompt to improve keyword search he created. So, I decided to try it out and see how it works!

In Craig’s post titled (wait for it!) AI Prompt to Improve Keyword Search (available here), he discusses how he once dreamed up a website where you would submit a list of eDiscovery keywords and queries and the site would critique the searches and suggest improvements to make them more efficient and effective. It would flag stop words, propose alternate spellings, and alert the user to pitfalls making searches less effective or noisy. Craig even envisioned it testing queries against a benign dataset to identify overly broad terms and false hits.

The emergence of AI-powered Large Language Models like ChatGPT made Craig think what I’d hoped to bring to life years ago might finally be feasible. So, “Craig dedicated a sunny Sunday morning to playing ‘prompt engineer,’ a whole cloth term for those who craft AI prompts to achieve desired outcomes.”

Advertisement
Level Legal

In his post, Craig goes on to discuss his lengthy (yet comprehensive) prompt, which provides an introduction and an analysis framework for stop word identification, synonyms and variants, industry-specific jargon and abbreviations, Boolean query structure and logic, search syntax and connectors, wildcards and stemming, special characters and indexing, spaces and punctuation, numeric values and short words, diacritical marks and case sensitivity. It then concludes with the objective and instructions of the prompt.

Craig even provides instructions for using it with ChatGPT 4o, including pasting the prompt into the chat window, then uploading a discrete list of search terms for analysis using the paper clip. Nice!

Naturally, I thought: “I have to try this and share it with the eDiscovery Today readers!” But where do you find a set of public domain search terms that you can use and share?

Then I remembered this case I covered a couple of years ago, which had a particularly “gnarly” search term (which is shown in the case ruling itself). That search term is:

Advertisement
Minerva26

((“trade show” OR “tradeshow” OR “dental show” OR “AACD” OR “AAO” OR “CDS” OR “DLOAC” OR “GNYDM” OR Hinman OR “Lab Day East” OR “Lab Day West” OR “Pacific Dental Conference” OR “Yankee Dental” OR “IDS” OR “ADA” OR “CAD/CAM Conference” OR “California Dental Association” OR “CDA” OR “Greater New York Dental “ OR “Southwest Dental Conference” OR “Academy of General Dentistry” OR “AGD” or “Smiles at Sea” or “Mid-Continent Dental” OR “MCDC” OR “Rocky Mountain Dental Convention” OR “RMDC” OR “YDC” or Midwinter OR “CDS” OR “Pacific Northwest Dental Conference” OR “PNDC” OR “Star of the North” OR “TDA” or “NODC” OR “AAOMS” OR “Kois”) w/7 (itero OR “IOS” OR scanner* OR Invisalign OR Cadent OR aligner OR competit* OR trios OR 3Shape OR 3-shape OR promo)) /25 (“USA” or U.S.A. or “US” or U.S. or “United States” or “North America” or America* or “NA”)

I would like to say that search terms like this are highly unusual, but they’re more common than you might think.

To turn this single monstrosity into a list of terms, I stripped all the parentheses, the OR operators and the proximity terms (note: they weren’t consistent with the syntax there) and broke the term out into 56 distinct terms. It’ll do.

I saved the keyword list to a file, copied Craig’s prompt into the chat window as instructed, loaded in the list and hit Enter. The output from ChatGPT has been saved to a PDF file and is available here. Check it out – it’s very cool! And check out Craig’s blog post discussing his AI prompt to improve keyword search and try it yourself!

So, what do you think? Have you developed any AI prompts that can be useful in eDiscovery? Please share any comments you might have or if you’d like to know more about a particular topic.

Image created using GPT-4o’s Image Creator Powered by DALL-E, using the term “robot writing a keyword search on a computer”. That one’s for you, Craig! 😀

Disclaimer: The views represented herein are exclusively the views of the authors and speakers themselves, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.


Discover more from eDiscovery Today by Doug Austin

Subscribe to get the latest posts sent to your email.

2 comments

  1. Hello Doug:
    I also wrote about this during the early months of GPT 3.5. These LLM tools are amazing for all kinds of purposes including refining keyword search. We found that an even better way to do this is to submit a lot of relevant documents to the LLM to read, analyze and suggest keywords from the documents. Essentially, let the relevant documents speak to you about how to find more.

    We let AI do more, taking the key terms and using them to find other relevant documents without being at the mercy of Boolean construct. We then integrate a classifier (e.g. TAR engine) to find even more relevant documents than would be practical with just keyword syntax alone.

    It is an exciting world these days.

    JT

  2. Here is more early work in this direction, in the systematic review (medical “ediscovery”) space. But still using LLMs to construct Boolean queries. To me, it all comes down to how well it works. Evaluation drives innovation. https://arxiv.org/abs/2302.03495

Leave a Reply