How “Towards AI” detects AI-Generated Articles

On the science of detecting whether a text is created by a human or an AI.

Ahmad Mustapha
Towards AI

--

Photo by Brett Jordan on Unsplash

The last time I submitted an article to the “Towards AI” publication I was surprised by their reply. It said:

Your blog has been flagged as containing AI-generated content. Double and triple-check any statements and facts for mistakes and hallucinations before resubmitting.

I went go-and-forth with them on this. They make it clear that they do accept AI helping writers in their content, which is expected from a “Towards AI” publication, only if it is clear and doesn’t contain hallucinations.

So yes, my article was created using ChatGPT. Not that it was the brainchild of ChatGPT. I went with GPT in a go-and-forth fashion from the introduction to the supporting details to the conclusion. But it was flagged as AI-generated. The question is how they figured it out.

Not Plagiarism

One would think that there is some sort of plagiarism system under the hood. But as a matter of fact, you won’t be able to see my co-AI article anywhere on the internet. It was original. So how did they manage to detect that it was AI-generated? And what are hallucinations anyway?

AI Detectors

If you are familiar with how neural networks operate, you will know that you can always depend on neural networks to do what even humans can fail. Neural networks are pattern-hungry mathematical models that, when given a task of differentiating between two classes (authentic and synthetic text in our case) of data, learn the slightest probability variation between both.

Large language models (LLMs) like ChatGPT have been trained on a large amount of data on different writing styles. Their generated text distribution might be somehow different from that of humans. They consume more styles, and they are more capable. Their language model is superior to human language models (even without being able to understand or pinpoint what is being generated — see it as a pure probabilistic mathematical process).

To understand what I mean by text distribution, consider the following snippet generated by ChatGPT and the following toy reasoning:

This ambition is widespread, and a wealth of general wisdom, as well as the lessons from businesses that faltered in their attempts to build AI from scratch, underscores the pitfalls of such an approach.

This ambition is widespread. What is the probability of the word “widespread” coming after the phrase “This ambition is”? Perhaps for some of us, it feels exotic. It has a low probability. After every “This ambition is” in different human-generated texts, only a few use the vocab “widespread”. It is grammatically correct, but we tend to say more like “This ambition is common, is not uncommon, is not unique”.

If we trained another model to differentiate between human text and AI text it will be able to figure it out. Why? Because they come from a different distribution. Different minds. One formed of a collection of individual humans each with a style. Another form of a superior or meta-style has been dealt a bad hand by being trained in all kinds of styles. The second is less consistent and hallucinates by using a spaghetti of styles in one paragraph.

Eventually

Some people argue that eventually, it won’t be possible to differentiate between both [1]. However, for now, this is not the case. A research group from the owners of ChatGPT themselves trained a model [2] to detect chtGPT generated text. Now we have many commercials that provide such services as Copyleaks, Scribber, GPTZero, and Undetectable.

Take away

First, try not to submit AI-generated articles to “Towards AI”. Second, “keep calm” and “don’t panic yet” as for a while, the usage of AI detectors will prevent AI from taking over writers' jobs.

[1] Can AI-Generated Text be Reliably Detected? Sadasivan et al.

[2] OpenAI. Gpt-2: 1.5b release. November 2019.

--

--

A computer engineer obtained a master's degree in engineering from the AUB. He worked on different AI projects of different natures.