Former OpenAI Researcher

Former OpenAI Researcher Says the Company Broke Copyright Law: Artificial Intelligence Trends

A former OpenAI researcher said he helped the company gather enormous amounts of internet data and concluded that data was copyrighted.

So, guess who decided to tell his story? The publication that sued OpenAI for “billions of dollars in statutory and actual damages” back in December.

As reported by The New York Times (Former OpenAI Researcher Says the Company Broke Copyright Law, written by Cade Metz and available here), Suchir Balaji spent nearly four years as an artificial intelligence researcher at OpenAI. Among other projects, he helped gather and organize the enormous amounts of internet data the company used to build its online chatbot, ChatGPT.

Advertisement
Level Legal

At the time, he did not carefully consider whether the company had a legal right to build its products in this way. He assumed the San Francisco start-up was free to use any internet data, whether it was copyrighted or not.

But after the release of ChatGPT in late 2022, he thought harder about what the company was doing. He came to the conclusion that OpenAI’s use of copyrighted data violated the law and that technologies like ChatGPT were damaging the internet.

In August, he left OpenAI because he no longer wanted to contribute to technologies that he believed would bring society more harm than benefit.

“If you believe what I believe, you have to just leave the company,” he said during a recent series of interviews with The New York Times.

Advertisement
Cimplifi

Many researchers who have worked inside OpenAI and other tech companies have cautioned that A.I. technologies could cause serious harm. But most of those warnings have been about future risks, like A.I. systems that could one day help create new bioweapons or even destroy humanity.

Balaji believes the threats are more immediate. ChatGPT and other chatbots, he said, are destroying the commercial viability of the individuals, businesses and internet services that created the digital data used to train these AI systems.

“This is not a sustainable model for the internet ecosystem as a whole,” he told The Times.

OpenAI disagrees with Balaji, saying in a statement: “We build our A.I. models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”

It’s not clear if Balaji came to the conclusion back in 2022 that all that data he gathered was copyrighted and waited until he left the company to voice that concern, or that he left because he just now came to that conclusion. Regardless, he couldn’t have found a more willing publication to hear his story! 😀

So, what do you think? Should the use of published content by AI models be considered fair use or should the use of it be prohibited without consent? Please share any comments you might have or if you’d like to know more about a particular topic.

Image created using GPT-4o’s Image Creator Powered by DALL-E, using the term “robot reading a newspaper”.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.


Discover more from eDiscovery Today by Doug Austin

Subscribe to get the latest posts sent to your email.

Leave a Reply