Judge Used Copilot to Check Expert’s Work & Got 3 Different Answers: Artificial Intelligence Trends

Doug Austin

2 years ago

In a fiduciary duty dispute, Microsoft Copilot figured in the discussion. The judge used Copilot to check an expert’s work in the case. It did not go well.

The case ruling was issued by Judge Jonathan G. Schopf of the Surrogate’s Court of Saratoga County. The Objectant in the case, Owen K. Weber, claimed that the trustee Susan F. Weber breached her fiduciary duty by retaining a particular property and using it for personal travel. The Objectant’s expert, Charles Ranson, argued the property should have been sold earlier and reinvested. However, Judge Schopf found his testimony speculative and lacking in real estate expertise.

So, where does Copilot come into play? As Judge Schopf stated: “the testimony revealed that Mr. Ranson relied on Microsoft Copilot, a large language model generative artificial intelligence chatbot, in cross-checking his calculations.” Continuing, he said: “Despite his reliance on artificial intelligence, Mr. Ranson could not recall what input or prompt he used to assist him with the Supplemental Damages Report. He also could not state what sources Copilot relied upon and could not explain any details about how Copilot works or how it arrives at a given output. There was no testimony on whether these Copilot calculations considered any fund fees or tax implications.”

Despite the fact that Judge Schopf “has no objective understanding as to how Copilot works”, he stated: “To illustrate the concern with this, the Court entered the following prompt into Microsoft Copilot on its Unified Court System (UCS) issued computer: ‘Can you calculate the value of $250,000 invested in the Vanguard Balanced Index Fund from December 31, 2004 through January 31, 2021?’ and it returned a value of $949,070.97 — a number different than Mr. Ranson’s. Upon running this same query on two (2) additional UCS computers, it returned values of $948,209.63 and a little more than $951,000.00, respectively. While these resulting variations are not large, the fact there are variations at all calls into question the reliability and accuracy of Copilot to generate evidence to be relied upon in a court proceeding.”

In other words, the judge used Copilot to check Ranson’s work, asking it the same question three times and getting three different answers.

Continuing, Judge Schopf stated: “Interestingly, when asked the following question: ‘are you accurate’, Copilot generated the following answer: ‘I aim to be accurate within the data I’ve been trained on and the information I can find for you. That said, my accuracy is only as good as my sources so for critical matters, it’s always wise to verify.’” Judge Schopf provided a footnote to the answer, stating: “This brings to mind the old adage, ‘garbage in, garbage out’. Clearly a user of Copilot and other artificial intelligence software must be trained or have knowledge of the appropriate inputs to ensure the most accurate results.”

Judge Schopf also asked about reliability with a follow-up question of: “are your calculations reliable enough for use in court”, to which Copilot responded with “[w]hen it comes to legal matters, any calculations or data need to meet strict standards. I can provide accurate info, but it should always be verified by experts and accompanied by professional evaluations before being used in court…”

In noting that AI is “an emerging issue that trial courts are beginning to grapple with and for which it does not appear that a bright-line rule exists”, Judge Schopf stated: “The use of artificial intelligence is a rapidly growing reality across many industries. The mere fact that artificial intelligence has played a role, which continues to expand in our everyday lives, does not make the results generated by artificial intelligence admissible in Court.” While citing People v. Wakefield, 38 NY3d 367 [2022], Judge Schopf stated: “The Court of Appeals has found that certain industry specific artificial intelligence technology is generally accepted…However, Wakefield involved a full Frye hearing that included expert testimony that explained the mathematical formulas, the processes involved, and the peer-reviewed published articles in scientific journals.” [link added]

In his conclusion on the issue (and ruling against the Objectant and for the Petitioner), Judge Schopf stated: “In what may be an issue of first impression, at least in Surrogate’s Court practice, this Court holds that due to the nature of the rapid evolution of artificial intelligence and its inherent reliability issues that prior to evidence being introduced which has been generated by an artificial intelligence product or system, counsel has an affirmative duty to disclose the use of artificial intelligence and the evidence sought to be admitted should properly be subject to a Frye hearing prior to its admission, the scope of which should be determined by the Court, either in a pre-trial hearing or at the time the evidence is offered.”

One of the big concerns about large language models I’ve seen again and again is how it can be asked the same question multiple times and give a different answer each time. For generated AI content to be used as evidence in the courtroom, that’s a concern that will need to be addressed. Otherwise, expect similar concerns raised by courts as Judge Schopf did here.

So, what do you think? Are you surprised that the judge used Copilot to check on the expert’s work and got a different answer each time? Please share any comments you might have or if you’d like to know more about a particular topic.

Hat tip to Maura R. Grossman for the heads up on this case!

Image created using GPT-4o’s Image Creator Powered by DALL-E, using the term “robot judge doing a faceplant when looking at a computer”.

Disclaimer: The views represented herein are exclusively the views of the authors and speakers themselves, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Share this: