Data Poisoning: Yet Another AI Threat

There are many potential threats resulting from AI. Here’s a threat to AI itself: data poisoning. Here’s what you need to know about it.

Data poisoning is the intentional insertion of misleading, corrupted, or adversarial data into a dataset with the goal of influencing how an AI model learns and behaves. Traditionally, it’s discussed as a security threat (e.g., attackers feeding bad data into a model so it produces incorrect or biased outputs).

But more recently, the concept has evolved into something more strategic: organizations deliberately “poisoning” their own content to control how AI systems use it. It has even become a form of civil disobedience in terms of resistance again AI in general.

What Data Poisoning Means in the AI Context

At a high level, data poisoning can happen in two main ways: 1) training-time poisoning and 2) inference-time poisoning. Training-time poisoning involves manipulating the data used to train a model so that it learns incorrect associations, while inference-time poisoning (often referred to as prompt or data attacks) involves crafting inputs that cause a model to behave in unintended ways.

Historically, these techniques were primarily associated with attempts to break or exploit AI systems. For example, in a joint study with the UK AI Security Institute and the Alan Turing Institute, Anthropic found that as few as 250 malicious documents can produce a “backdoor” vulnerability in a large language model – regardless of model size or training data volume. Some have even advocated the use of data poisoning operations against adversary AI systems by US military under U.S. Code Title 50 (War and National Defense)!

How Companies Are Applying “Defensive” Data Poisoning

Today, however, the concept has evolved into something more strategic, with organizations beginning to use similar techniques to protect their intellectual property and exert control over how AI systems use their data. Here are five ways companies are applying “defensive” data poisoning:

Injecting Subtle Errors into Content: Some organizations intentionally embed slight factual inaccuracies, non-obvious contradictions, or unique phrasing patterns into their content. The goal is that if an AI model trains on scraped versions of that content, it will inherit these distortions. This makes it easier for the originating organization to detect unauthorized use through identifiable “fingerprints” and may also degrade the usefulness of improperly sourced training datasets.
AI “Honeytokens” and Trap Data: Companies are also embedding traceable markers, often referred to as “honeytokens”, into their content. These can include unique phrases, synthetic facts, or even invisible metadata and structured anomalies. If these markers later appear in the outputs of an AI model, they can serve as evidence that the model was trained on proprietary or restricted data, supporting enforcement efforts or potential litigation.
Adversarial Perturbations (Especially for Images & Media): In the context of images, audio, and even text, some organizations apply small, human-imperceptible changes known as adversarial perturbations. While these changes are typically unnoticed by human users, they can significantly confuse AI models during training. For example, images can be altered so that AI systems misclassify them, or text can be subtly modified to distort embeddings and semantic interpretation. The goal is to make scraped datasets less useful or even actively harmful for training purposes.
Content Obfuscation and Structural Noise: Some publishers take a more structural approach by altering how their content is presented. This can include dynamically changing HTML structures, inserting irrelevant or misleading surrounding text, or using formatting techniques that humans can easily ignore but automated systems will ingest. These methods are designed to disrupt scraping pipelines and reduce the quality of data that AI systems can extract and learn from.
Licensing Pressure + Technical Enforcement: Data poisoning strategies are often combined with more traditional controls such as terms of service restrictions, robots.txt directives, API access limitations, and paywalls. Together, these measures create both legal and technical friction for AI developers, reinforcing the organization’s control over how its data is accessed and used.

Why This Matters for Legal & eDiscovery Professionals

If training data is intentionally corrupted, the reliability of AI outputs becomes more uncertain, as models may produce subtle inaccuracies or exhibit increased hallucination risk. In eDiscovery workflows, this raises important concerns about the defensibility of AI-assisted review processes and the adequacy of validation protocols and quality control measures. At the same time, data poisoning introduces a critical question regarding whether the origin and integrity of training or reference data can be trusted, directly intersecting with established principles of auditability, documentation, and defensibility in legal workflows.

Beyond reliability concerns, poisoned data can also function as a form of watermarking for copyright enforcement, with identifiable markers serving as evidence in disputes over unauthorized data usage. This is likely to drive increased litigation focused on training data sources and greater scrutiny in discovery requests seeking transparency into how AI models were developed. More broadly, the landscape is shifting from an era of widely accessible data to one where data is protected, traceable, and sometimes adversarial: fundamentally reshaping how AI models are trained and how enterprises approach data governance, reuse, and control.

Bottom Line

Data poisoning is no longer just an attack vector; it is becoming a deliberate defensive strategy. Organizations are embedding traps, distortions, and fingerprints into their content to influence how AI models learn from it – or to prevent them from doing so altogether. For legal and eDiscovery professionals, this underscores a key takeaway: the integrity, provenance, and defensibility of data used by AI are becoming just as important as the AI technologies themselves.

So, what do you think? Had you ever heard of data poisoning before this post? You have now! 😉 Please share any comments you might have or if you’d like to know more about a particular topic.

Image created using DALL-E 3, using the term “devious looking robot pouring a bottle of poison on a hard drive”.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Discover more from eDiscovery Today by Doug Austin

Subscribe to get the latest posts sent to your email.

eDiscovery Today by Doug Austin

eDiscovery Today – Doug Austin

Data Poisoning: Yet Another AI Threat: Artificial Intelligence Trends

Like this:

Related

Discover more from eDiscovery Today by Doug Austin

One comment

Leave a ReplyCancel reply

Data Poisoning: Yet Another AI Threat: Artificial Intelligence Trends

Share this:

Like this:

Related

Discover more from eDiscovery Today by Doug Austin

One comment

Leave a ReplyCancel reply

Discover more from eDiscovery Today by Doug Austin