Bots fool the experts: Professional linguists struggle to spot AI-generated writing -

TAMPA, Fla. — Between the rise of chatbots like ChatGPT, AI-generated songs being submitted to the Grammys, and Hollywood’s striking actors and writers citing major studios’ recent liberal use of automation as a top complaint, it’s safe to say 2023 has been the year of artificial intelligence. Whether that’s more scary than exciting is up to you, but researchers from the University of South Florida say AI-generated written content may already be indistinguishable from human writing.

Study authors gathered together linguistics experts for this project, and despite studying language patterns professionally, even this group of specialists from the world’s top linguistic journals found it challenging to pick out AI-written content. All in all, they were only able to make the correct distinction 39 percent of the time.

“We thought if anybody is going to be able to identify human-produced writing, it should be people in linguistics who’ve spent their careers studying patterns in language and other aspects of human communication,” says Matthew Kessler, a scholar in the USF Department of World Languages, in a media release.

In collaboration with J. Elliott Casal, assistant professor of applied linguistics at The University of Memphis, Kessler asked a collection of 72 linguistics experts to review a series of research abstracts and determine which were written by humans and which were generated by AI.

Robot financial advisor on computer — (© tiagozr – stock.adobe.com)

More specifically, each participating expert had to examine four writing samples. Not a single expert was able to correctly identify all four, and 13 percent got all of them wrong. Based on these results, study authors had no choice but to conclude most modern professors would be unable to distinguish a student’s own writing from content generated by an AI-powered language model like ChatGPT. Software may have to be developed in the near future to help professors identify AI-written content, researchers theorize.

The linguistic experts tried to use rationales to judge the writing samples, for example, by identifying certain linguistic and stylistic features. Ultimately, however, these approaches largely proved unsuccessful — resulting in an overall positive identification rate of just 38.9 percent.

“What was more interesting was when we asked them why they decided something was written by AI or a human,” Kessler adds. “They shared very logical reasons, but again and again, they were not accurate or consistent.”

In summation, both Kessler and Casal conclude chatbots like ChatGPT can indeed write short genres on the same level as most humans, if not better in some cases. AI usually does not make grammatical errors. However, study authors point out that humans still reign supreme when it comes to longer forms of writing.

“For longer texts, AI has been known to hallucinate and make up content, making it easier to identify that it was generated by AI,” Kessler concludes.

Kessler hopes this work will encourage a larger conversation centered on the pressing need to establish clear ethics and guidelines regarding the use of AI in research and education.

The study is published in the journal Research Methods in Applied Linguistics.

You might also be interested in:

Tags: AI, artificial intelligence, chatgpt, deepfakes, linguistics, plagiarism, writing