Google Can Likely Detect When a Blog Post is Written by AI

AI text generations like GPT-3, as well as products built on the system, have exploded over the last year. Tons of writers now use AI-writing assistants like Jasper AI to give them a leg up in generating content quickly.

I agree that generating lots of content is important, but I’ve never been sold on the value of an AI writing assistant. I love tools like Grammarly, that help you write faster and better by detecting and fixing dumb mistakes. But even as an OpenAI beta tester, I’ve never felt that using an AI writing assistant was necessarily a good idea.

A fascinating article by Nick Nolan confirms what I’ve long suspected — because AI writing assistants are designed to detect and follow patterns, the writing they generate follows patterns of its own.

Although humans might not be able to tell the difference between AI-generated text and human-generated text, machines certainly can.

Machines Detecting Machines

In his article, Nolan shares a fascinating resource — an Open AI text detector created by Hugging Face.

To use the detector, you paste in some text that might or might not be generated by a computer. The system uses the same transformers-based architecture that powers GPT-3, but essentially runs it in reverse. That allows Hugging Face to quickly detect whether text may have been generated by an AI system like GPT-3.

Nolan shares some tests in his article, but I performed several of my own as well. In my testing, Hugging Face’s OpenAI detector was remarkably good at detecting AI-generated text.

To test it, I took the transcript of a recent YouTube video that I made and used that as a seed in GPT-3 to create an AI-written blog post that summarizes my conclusions in the video.

I thought this would be a good test, since in this case GPT-3 isn’t creating wholly new text — it’s just summarizing text that’s already in my voice.

Here’s the AI-written summary it created:

I recently had the chance to try the limoncello flavor of Lacroix water, and I have to say, I was impressed! The flavor is complex and interesting, with a creamy mouthfeel that is surprisingly similar to the traditional Italian liqueur. Lacroix water is naturally calorie- and sweetener-free, and this flavor is no exception. The water is infused with the flavors of lemon and vanilla, which combine to create a refreshing and unique drink. I would definitely recommend trying the limoncello flavor of Lacroix water if you are looking for something a little different than your usual water or soda. Thanks again to Lacroix for sending it to me!

If you watch the original video, that’s a pretty good summary of my main points and conclusions. Perhaps it’s not going to win a Pulitzer, but most people would probably read that paragraph — especially after seeing my video — and conclude that a human probably wrote it.

Not Hugging Face, though. Their tool returned a 99.8% probability that my text was generated by a machine.

I then decided to test the opposite case. I took a chunk of text from an article that I wrote by hand. With this article, I also worked from the transcript from one of my own YouTube videos. But in this case, I wrote the text manually, with no AI assistance.

The verdict? A 96.57% probability that the text was written by a human.

As Nolan points out, the system isn’t perfect. If you write something formulaic, the detector is more likely to flag it as AI-generated than if you write something voicy and original.

Basically, if you write like a robot, the system thinks you might be one!

Still, Hugging Face’s tool is remarkably good at flagging AI-generated text that might otherwise fool a human.

The Takeaway? Google Knows

That finding has big implications for SEOs and website owners who use AI-generated text. You’ve probably never heard of Hugging Face. I still have no idea who they are.

If they can build a detector that’s pretty good at sniffing out AI-generated text, you can bet that Google has their own detectors, which are 1000 times better.

Google has the resources and the machine learning knowledge to create fantastic detectors. They also have their own AI text generation systems — and access to billions of pages of text that were created before the systems came out, and thus can’t contain AI text.

Training a machine learning system that sniffs out the output of other machine learning systems should thus be a piece of cake for them.

In short, if you use AI to write content for you, Google almost certainly knows.

Some people reference a conversation with Google’s John Mueller, where he appeared to say that Google can’t detect AI-generated content. But those weren’t his exact words. He said he “couldn’t claim that,” and addressed AI-generated content in the context of a broader conversation about webspam.

But Do They Care?

Even if Google knows that you’re using AI in your blog posts or website content, do they care?

Right now, the answer is “Yes.”

In the same conversation, Mueller said:

“Currently it’s all against the webmaster guidelines. So from our point of view, if we were to run across something like that, if the webspam team were to see it, they would see it as spam.”

In reality, this prohibition probably applies mainly to the most egregious misuses of AI. If you create a 10,000-page website using solely output from GPT-3, Google might take manual action against it.

If you use AI to summarize a product description as part of a longer review article, you’re probably not going to get blacklisted by the search engine. Google makes it clear that they favor helpful content, but that helpfulness is just one among many signals the search engine evaluates.

But because of the scale of today’s Internet, Google is all about signals. Using AI to help you write a blog post might not immediately get you banned. But if Google can detect your robot content, that’s a signal suggesting that the content is lower quality, and might even be spam. That’s not something you want to invite onto your site.

The Future

In the same conversation, Mueller made it clear that Google’s stance on AI content might change.

In his words,

“I think over time maybe this is something that will evolve in that it will become more of a tool for people. Kind of like you would use machine translation as a basis for creating a translated version of a website, but you still work through it manually.
And maybe over time these AI tools will evolve in that direction that you use them to be more efficient in your writing or to make sure that you’re writing in a proper way like the spelling and the grammar checking tools, which are also based on machine learning.”

So Google is open to the idea that AI might one day serve as a helpful tool for assisting human writers. (They also explicitly call out grammar-checking tools as being okay to use.)

Google has also indicated that — perhaps surprisingly — it’s pretty okay with many uses of AI image generation.

But today’s AI text writing tools aren’t there yet, and thus you can’t use them on your site without running afoul of Google.

Takeaways

Here are the big takeaways for SEOs:

Google can almost certainly tell if you use AI-generated content
Currently, all AI-generated content is banned. If you use it, you risk a manual action, or at least sending a “low quality” signal.
Text should be helpful and written by people, for people.

Thanks to Nolan for his great piece that shared the Hugging Face tool.

By the way, wondering how this article does on the tool? It returned a 99.98% probability of being human-written.

Google Can Likely Detect When a Blog Post is Written by AI

Machines Detecting Machines

The Takeaway? Google Knows

But Do They Care?

The Future

Takeaways

Leave a Comment Cancel reply

Receive the latest articles in your inbox

About Synthetic Engineers