In the case In re Mosaic LLM Litig., No. 24-cv-01451-CRB (LJC) (N.D. Cal. August 8, 2025), California Magistrate Judge Lisa J. Cisneros, finding Plaintiffs’ request for production (RFP) was “overbroad and places an undue burden on nonparty Microsoft”, modified the subpoena to lessen the burden on Microsoft.
Case Discussion and Judge’s Ruling
In this case over the use of copyrighted works to train large language models, Plaintiffs served Microsoft with a subpoena seeking two categories of documents:
- Executed licensing agreements for AI training data (RFP No. 1).
- All documents and communications related to licensing books for AI training purposes (RFP No. 2).
Microsoft complied with the first request, producing its executed agreements, but objected to the second. It argued that the plaintiffs’ request for “all documents and communications” was irrelevant, overbroad, and burdensome. Plaintiffs countered that these materials were central to their claims, particularly regarding the fair use defense raised by defendants and the calculation of damages.
At the center of the dispute was whether Microsoft’s communications regarding licensing books were relevant to the case. The plaintiffs argued these documents were directly tied to factor four of the fair use analysis: the effect of use on the potential market for copyrighted works.
To lessen the burden on Microsoft, Plaintiffs offered to limit the scope of their request to Microsoft’s “documents that it has already produced in the ongoing In re OpenAI” MDL, but Microsoft noted that this “compromise” would not “sufficiently relieve” the burden of production because 1) they would still have to search through the over two hundred thousand documents they produced in the MDL to find documents responsive to RFP No. 2, and 2) production would require them to produce confidential business information and communications with third parties unrelated to this litigation.
Judge Cisneros acknowledged the unsettled nature of this issue in the generative AI context, stating: “The undersigned is not aware of controlling authority that forecloses a finding that evidence of an existing market for licensing material for training AI bears on the factor four analysis. And, if Judge Breyer considers the market for licensing for AI-training purposes under factor four, documents responsive to RFP No. 2 would be highly important to resolving the fair use issue.”
Judge Cisneros declined to adopt a categorical rule, as decisions in Kadrey and Bartz had, that harms to a licensing market for AI training are not cognizable under fair use. Instead, she emphasized that discovery should remain broad, stating: “Finding that documents relating to the licensing market are not relevant at this point in litigation would prematurely foreclose Plaintiffs’ ability to advance their claim and shift what is likely a central issue in this case from resolution at summary judgment to a definitive resolution through a discovery dispute.” She also added: “for purposes of discovery, relevance is defined broadly… Relevant information for purposes of discovery is information ‘reasonably calculated to lead to the discovery of admissible evidence.”
The plaintiffs also argued the materials were relevant to damages, particularly the calculation of hypothetical-license damages. Judge Cisneros acknowledged the point, explaining: “One measure of actual damages is the fair market value of a hypothetical license…Evidence that a copyright holder ‘would have reached a licensing agreement with the infringer’ or of ‘benchmark licenses in the industry’ may help the rightsholder establish the amount of such damages without undue speculation.”
Still, Judge Cisneros found the plaintiffs had not shown a “substantial need” for negotiations beyond the executed licenses already produced. For damages purposes, the agreements themselves were sufficient benchmarks.
While recognizing the documents’ relevance, Judge Cisneros also weighed Microsoft’s burden as a non-party. So, Judge Cisneros agreed that the request as written was overbroad, stating: “The undersigned is cognizant that Microsoft, as a nonparty, is afforded additional protection against overbroad discovery requests… Requests to non-parties should be narrowly drawn to meet specific needs for information.”
As a result, to lessen the burden on Microsoft, Judge Cisneros narrowed the subpoena, as follows:
“The Court declines to order Microsoft to produce all documents and communications ‘related to licensing books for use as AI training data.’ Instead, the undersigned modifies the subpoena to limit Microsoft’s production to its communications with and documents regarding the entities with which it entered into licensing agreements for AI Training Data, where the communications or documents indicate potential or proposed terms for licensing.”
She also ruled that the materials were to be designated “Highly Confidential – Attorneys’ Eyes Only,” offering protection for Microsoft and third-party partners.
So, what do you think? Are you surprised that the Court narrowed the subpoena? Please share any comments you might have or if you’d like to know more about a particular topic.
Case opinion link courtesy of Minerva26, an Affinity partner of eDiscovery Today.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.
Discover more from eDiscovery Today by Doug Austin
Subscribe to get the latest posts sent to your email.




Oops! The magistrate applied the wrong standard! She wrote “…for purposes of discovery, relevance is defined broadly… Relevant information for purposes of discovery is information “reasonably calculated to lead to the discovery of admissible evidence.” She focused on admissibility. That’s wrong. The 2105 revisions focus on nonprivileged information, relevance to claims or defenses, proportionality importance of issues, amount in controversy, etc. and more. Here’s what 26(b)(1) says:
(b) Discovery Scope and Limits.
(1) Scope in General. Unless otherwise limited by court order, the scope of discovery is as follows: Parties may obtain discovery regarding any nonprivileged matter that is relevant to any party’s claim or defense and proportional to the needs of the case…
Here’s what the Committee Notes on Rules – 2015 Amendment say:
“Rule 26(b)(1) is changed in several ways. Information is discoverable under revised Rule 26(b)(1) if it is relevant to any party’s claim or defense and is proportional to the needs of the case.
“The former provision for discovery of relevant but inadmissible information that appears “reasonably calculated to lead to the discovery of admissible evidence” is also deleted. The phrase has been used by some, incorrectly, to define the scope of discovery. As the Committee Note to the 2000 amendments observed, use of the “reasonably calculated” phrase to define the scope of discovery “might swallow any other limitation on the scope of discovery.” The 2000 amendments sought to prevent such misuse by adding the word “Relevant” at the beginning of the sentence, making clear that “‘relevant’ means within the scope of discovery as defined in this subdivision . . . .” The “reasonably calculated” phrase has continued to create problems, however, and is removed by these amendments. It is replaced by the direct statement that “Information within this scope of discovery need not be admissible in evidence to be discoverable.” Discovery of nonprivileged information not admissible in evidence remains available so long as it is otherwise within the scope of discovery.” Committee Notes on Rules—2015 Amendment.
Milton Robinson, Senior Attorney, US EPA
Thanks, Milton. Yes, I noticed that. It’s amazing how many lawyers and even judges still reference that standard. There are a whopping 2,094 cases that reference that phrase in Minerva26 since 12/1/2015. Many are used as boilerplate objections (“not reasonably calculated to lead to the discovery of admissible evidence”), which is bad too.
Regardless, it’s surprising that some judges still reference that standard – even an experienced judge like Judge Cisneros (who has had some terrific ruling on hyperlinked files). Go figure.
[…] Non-Party Production of AI Training Data and Documents […]