New OpenAI tool detects if text is written by ChatGPT or another AI

Researchers stress that the classifier is "not fully reliable."
Deena Theresa

Two months after OpenAI introduced ChatGPT to the public, students flocked to the AI tool to write their assignments, exams, software codes, and whatnot. This resulted in universities beginning to crack down on the usage of ChatGPT. A few science journals even banned the use of the chatbot in their journals. 

In what sounds like a solution, OpenAI themselves have released a tool designed to detect if the text has been written by artificial intelligence. In a blog post on Tuesday, OpenAI elaborated on the tool that has been trained to figure out if the text is written by a human or generated by AI, including ChatGPT.

"We recognize that identifying AI-written text has been an important point of discussion among educators, and equally important is recognizing the limits and impacts of AI-generated text classifiers in the classroom," the researchers wrote in a blog post. "We have developed a preliminary resource on the use of ChatGPT for educators, which outlines some of the uses and associated limitations and considerations."

However, the research laboratory warns that the new classifier tool is not "fully reliable" yet. So far, it has only correctly identified 26 percent of AI-written English texts. It also incorrectly labeled human-written text as AI-written nine percent of the time - "false positives."

The new classifier trained on a dataset of human-written and AI-generated texts

OpenAI also added that in comparison to the previously released classifier, the new classifier is more reliable on text from more recent AI systems.

The new classifier is a language model that has been fine-tuned on a dataset of pairs of human-written text and AI-written text on the same topic. The researchers collected the datasets from sources written by humans and divided the texts into prompts and responses. 

"On these prompts, we generated responses from a variety of different language models trained by us and other organizations. For our web app, we adjust the confidence threshold to keep the false positive rate low; in other words, we only mark text as likely AI-written if the classifier is very confident," according to the blog post.

The classifier should not be used as a primary decision-making tool

In their blog post, the researchers stress that the classifier should not be used as a "primary decision-making tool." The classifier is unreliable on short texts, those below 1,000 characters. The researchers also recommend using the classifier only for English text. It is unreliable on code. 

It is also to be noted that AI-written text can be edited to easily evade the classifier. "Classifiers like ours can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long-term," the blog post reads.

The classifier is currently publicly available to get feedback on whether such imperfect tools are useful.