With the growth of tools like OpenAI’s GPT-3, a lot more text is being generated by computers. That presents a problem in many spaces.
Professors may struggle to know whether their students’ papers are actually written by students or by a machine, publishers may struggle to know whether the writers are using AI, and as a consumer, you might not know whether a webpage you’re reading was really written by an expert or by GPT-3.
Here are some things you can look for to help you determine whether text was written by AI. As we’ll cover towards the end of the article, one of the easiest ways to detect ChatGPT or GPT-3 text is by using a tool like Originality AI. If you want to skip ahead to an easy solution, check Originality out.
Best in class ChatGPT, GPT-3.5 and AI text detector, with a built-in plagiarism checker
Even if you use a tool like Originality, though, it helps to understand the kinds of patterns you might see in AI-generated text, so you can spot them even without machine assistance.
A Bland, Informational Voice
AI systems create text by predicting what word should come next based on patterns they observe in large amounts of language that they analyze. By definition, the output of these systems is relatively predictable. Because they rely on statistical analysis, they rarely disobey conventions or do strange things with language.
Even if you ask an AI system to write with a strong voice or in a strange way, it usually accomplishes this less by messing with form and more by including strange words. It’s almost like the AI takes a piece of text and uses a thesaurus to randomly swap in different words, rather than changing up the predictable structure.
One way to tell if something was written by a machine, therefore, is to look for text that has a bland and consistent voice. AI systems rarely write in the first person, don’t often share personal anecdotes, and tend not to use extreme words like love, hate, terrible, beautiful, and the like.
Imagine the voice one would use if they were writing a dictionary entry. That’s the kind of voice that AI usually writes in. If you see that content is written in this voice, it’s more likely to be AI-generated.
Of course, sometimes humans mimic this voice when they want to sound authoritative, so this method isn’t perfect! But a bland, overly consistent, and predictable voice can be one telltale sign of an AI.
Strange Repetitions
Again, AI systems are trained on huge amounts of existing text. In many cases, they pick up some kind of pattern associated with a particular type of content, and then repeat it ad nauseam.
For example, GPT-3 seems to have learned that blog posts often start with a question like, “Are you looking for…” or “Have you ever wondered…” If someone asks GPT-3 to write a blog post, it will often start the post with these exact words, or some predictable variant of them.
Humans are different. Human writers often work harder to write a compelling introduction or something more specific to the topic they’re covering.
It can be hard to detect these kinds of patterns in a single piece of content. But if you’re looking at an entire website, you can start to see the pattern emerge. For example, if you’re wondering if a website is created with AI, take a look at a few articles.
If they all begin essentially the same way, with a generic “Have you ever wondered…”, it could be a sign the content is generated by an AI to which a human operator has fed a consistent prompt. If you ask GPT-3 to “Write a blog post about ______” one hundred times, it has no problem starting each post in exactly the same way.
Text That’s Too Perfect
The old saying goes that to err is human. Human writers make tons of mistakes. But because AI systems are machines, they rarely make dumb grammatical errors. Likewise, they might choose the wrong word, but they very rarely misspell things.
According to MIT Technology Review, the presence of typos is often a good sign the text was written by a person and not an AI. If the text you’re considering is too perfect, and follows conventions too closely, that’s a clue that a computer may have written it.
Of course, it’s also possible that the writer is just really pedantic when it comes to grammar, or they have a great copy editor working with them! However, if text follows conventions perfectly and includes zero typos, it’s possible that a a computer wrote it.
Structural Errors, Not Just Typos
For this, it’s more helpful to look less for outright typos and more at the kinds of weird choices that humans make, but that computer don’t.
Ending a sentence with a preposition, starting with “and” or “but,” including “however” in strange places, using “which” instead of “that,” and other innocent grammatical mistakes are all evidence of the human hand. Because AI is trained to write properly and follow grammatical and spelling rules, AI systems rarely make these kinds of human errors.
Lengthwise, structural issues in a piece of writing are another sign of human involvement. When AI goes off on a tangent, it tends to repeat the same information over and over again. When humans go off on a tangent, they tend to talk about something that’s topically unrelated or doesn’t contribute to the overall flow of peace.
If it feels like someone is rambling about an unrelated topic, or sharing something that happened to catch their interest while they were writing, that’s evidence that a person wrote what you’re reading. If you see several paragraphs in a row that essentially repeat the same point or even recycle language, you might be looking at AI-generated content instead.
As we pointed out before, paradoxically, bad or imperfect writing often indicates the beneficial presence of a person.
Using AI to Catch AI
Although these techniques can help, the gold standard at the moment for AI detection is to use an AI tool. It’s ironic, we know. But in many cases, the predictable and statistically consistent outputs of AI systems are easier to detect with other AI systems than they are with human ingenuity.
The best system we’ve tested for detecting AI-generated content at the moment is Originality AI. Their system is trained on GPT-3 and can also detect ChatGPT.
In our testing, Originality was quite successful at detecting content written by the systems. It did struggle a bit with content that was a hybrid between human and AI generation. But then, many people would argue that once a person has become involved in the content creation process, it’s no longer really AI-generated content.
We recommend using Originality.AI if you want a simple way to do at least a first pass of detecting AI-generated content. Keep in mind, of course, that Originality.AI can produce false positives. Don’t give your student an F if the system says their text was AI-generated; it might just be that they wrote something really generic!
If you run a writer’s work or a student’s paper through Originality.AI, and it points to AI-generated content with high statistical probability, you may at least want to pursue a conversation with that person to ensure that the writing really is theirs.
Conclusion
AI systems will likely get better over time and write in a way that humans find indistinguishable from the writing of other humans. ChatGPT and GPT-3 are both big steps in this direction.
The good news for human writers, and for people looking for AI-generated content, is that they’re not there yet. These simple techniques, as well as the advanced tool we discussed here, can help a lot in separating AI written text from the work of humans.