A new report from a plagiarism detector found that nearly 60 percent of GPT 3.5 outputs contained some form of plagiarized content.
In the wake of several lawsuits regarding AI infringing on copyright and potentially plagiarizing, educational institutions and enterprises across the globe are questioning the authenticity of AI text: Where did it originate from? Is it safe to use as original content? Does AI plagiarize?
To find out, plagiarism detector Copyleaks conducted an analysis to determine the degree to which AI-generated content is original and free of potential plagiarism. To conduct the analysis, Copyleaks asked GPT 3.5 to write 1,045 outputs, averaging 412 words across all outputs, in 26 subjects. The resulting report is available here.
The findings? Nearly 60 percent of GPT 3.5 outputs (59.7 percent to be exact) contained plagiarized content. 45.7% of all outputs contained identical text, 27.4% contained minor changes, and 46.5% had paraphrased text.
Copyleaks uses a specific scoring method (called the Similarity Score) that aggregates the rate of identical text, minor changes, paraphrased text, and more. A score of 0% signifies that all the content is original, whereas a score of 100% means that none of the content is original.
Among the 26 subjects, the subject with the highest average Similarity Score was Physics at 31.3%, followed closely by Psychology at 27.7% and Science at 26.7%. The subjects with the lowest average Similarity Score were Theater at 0.9%, Humanities at 2.8%, and English Language at 5.4%.
Of course, this was a study of GPT 3.5 outputs (which is what most people have used as part of the free version of ChatGPT), not GPT 4.0. Will they do a study of 4.0 as well? We’ll see.
So, what do you think? Are you concerned that the report found nearly 60 percent of GPT 3.5 outputs contained some form of plagiarized content? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

