Degenerative AI & What Happens When AI Trains on AI Data

AI models need so much data that they are beginning to be trained on AI-generated data. An article shows how this could lead to “degenerative AI”.

According to The New York Times (When A.I.’s Output Is a Threat to A.I. Itself, written by Aatish Bhatia and available here), Sam Altman of OpenAI wrote in February that the company generated about 100 billion words per day — a million novels’ worth of text, every day, an unknown share of which finds its way onto the internet.

A.I.-generated text may show up as a restaurant review, a dating profile or a social media post. And it may show up as a news article, too: NewsGuard, a group that tracks online misinformation, recently identified over a thousand websites that churn out error-prone A.I.-generated news articles.

All this A.I.-generated information can make it harder for us to know what’s real. And it also poses a problem for A.I. companies. As they trawl the web for new data to train their next models on — an increasingly challenging task — they’re likely to ingest some of their own A.I.-generated content, creating an unintentional feedback loop in which what was once the output from one A.I. becomes the input for another.

In the long run, this cycle may pose a threat to A.I. itself. Research has shown that when generative A.I. is trained on a lot of its own output, it can get a lot worse. Here are a couple of illustrations of that (based on research by Ilia Shumailov and others):

This exercise illustrates how training of AI to mimic handwritten digits by training the AI on previous AI outputs causes all the digits to converge into a single shape after 30 generations:

This exercise shows how training of AI by the researchers by asking it to complete a sentence that started with “To cook a turkey for Thanksgiving, you…,” began to repeat phrases incoherently after just four generations:

Another team of researchers at Rice University studied what would happen when the kinds of A.I. that generate images are repeatedly trained on their own output — a problem that could already be occurring as A.I.-generated images flood the web.

They found that glitches and image artifacts started to build up in the A.I.’s output, eventually producing distorted images with wrinkled patterns and mangled fingers.

And even after addressing that issue, another one emerged – a lack of diversity in AI generated images:

From this (which already seems to be notably lacking some ethnic groups):

To this, after four generations, where the faces all appeared to converge:

This “degenerative AI” not only affects the quality of the output, but it also affects energy and costs as well. Computer scientists at N.Y.U. found that when there is a lot of A.I.-generated content in the training data, it takes more computing power to train A.I. — which translates into more energy and more money.

“Models won’t scale anymore as they should be scaling,” said Julia Kempe, the N.Y.U. professor who led this work.

The leading A.I. models already cost tens to hundreds of millions of dollars to train, and they consume staggering amounts of energy, so this can be a sizable problem.

Potential solutions include for AI companies to pay for the data they are training on to ensure quality levels – which may improve the quality but limit the parameters (not to mention that they are currently using a lot of data for free and litigating over what they state is the “fair use” of it) – and the use of “watermarking” to detect AI content (which can’t always be reliably detected and can be easily subverted).

This is certainly a potential issue which could jeopardize the potential improvement of AI models over time. The age-old expression of “garbage in, garbage out” is more significant than ever when it comes to the advancement of AI models.

So, what do you think? Are you concerned about the potential for “degenerative AI”? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the authors and speakers themselves, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Discover more from eDiscovery Today by Doug Austin

Subscribe to get the latest posts sent to your email.

5 comments

Ball Craig says:

August 26, 2024 at 2:52 pm

I love how the numbers converge to 666. AI evil? Why would you say that? 🤣

Loading...

Doug Austin says:

August 26, 2024 at 4:46 pm

Craig, the answer is on the 13th floor. 🤣

Loading...

Wat voegt Generatieve AI nu eigenlijk toe aan het onderwijs? – drakenvlieg says:

October 22, 2024 at 2:50 pm

[…] eerste generatie GenAI. Rechts een vierde generatie Generatieve AI gebaseerd op GenAI datasets. <https://ediscoverytoday.com/2024/08/26/degenerative-ai-what-happens-when-ai-trains-on-ai-data-artifi… […]

Loading...

If AI Doesn’t Keep Getting Better, Then What? says:

November 13, 2024 at 6:30 am

[…] OpenAI and other companies have already begun pivoting to training on synthetic data (created by other models) in an attempt to push past this quickly approaching training wall. But that has led to concerns about degenerative AI and results that degrade after several repetitions. […]

Loading...

P(doom) and Misogyny – Literary Festival Musings – Yellow Brick Road Writer says:

June 26, 2025 at 2:15 pm

[…] In my naive, hopeful moments I like to imagine AI will destroy itself. As more content online becomes AI generated, it will double-back and self-destruct. Reports suggest AI is already mining AI…this can only make it less reliable. […]

Loading...

eDiscovery Today by Doug Austin

eDiscovery Today – Doug Austin

Degenerative AI & What Happens When AI Trains on AI Data: Artificial Intelligence Trends

Like this:

Related

Discover more from eDiscovery Today by Doug Austin

5 comments

Leave a ReplyCancel reply

Degenerative AI & What Happens When AI Trains on AI Data: Artificial Intelligence Trends

Related Posts

Share this:

Like this:

Related

Discover more from eDiscovery Today by Doug Austin

5 comments

Leave a ReplyCancel reply

Discover more from eDiscovery Today by Doug Austin