Microsoft paper claims GPT-4 has common sense and can reason like humans

Microsoft researchers say GPT-4 shows signs of artificial general intelligence (AGI).
Sejal Sharma
Representational image
Representational image


The most remarkable breakthrough in artificial intelligence has been the advent of large language models (LLMs), which are trained on massive amounts of data and can predict the next word in a partial sentence. But now scientists are saying that these new LLMs can be trained to reason and use common sense like humans.

This is huge in the AI sphere.

A team of researchers at Microsoft, the company that has invested billions of dollars in OpenAI, had access to ChatGPT-4 before it was launched publicly. So, they toyed with the technology and later published a 155-page paper that entails some interesting details.

In the paper, the research team declared that GPT-4 is a significant step towards artificial general intelligence (AGI), which the team identifies as a system that can reason, plan, and learn from experience at the same level as humans do or possibly above them.

The AI drew a unicorn

In order to demonstrate the difference between true learning and memorization, the team asked GPT-4 two “Draw a unicorn in TikZ'' three times in a span of a month. In their paper, the team published the following illustrations that GPT-4 came up with:

Microsoft paper claims GPT-4 has common sense and can reason like humans
Three different results to the prompt “Draw a unicorn in TikZ" in the span of a month

We can see a clear evolution in the sophistication of GPT-4’s drawings, said the team.

“I started off being very skeptical — and that evolved into a sense of frustration, annoyance, maybe even fear,” Peter Lee, who leads research at Microsoft, told the New York Times. “You think: Where the heck is this coming from?”

The team notes that despite GPT-4 being purely a large language model, the early version had remarkable capabilities in different fields and tasks such as abstraction, comprehension, coding, vision, mathematics, law, understanding of human motives, medicine, and even emotions.

Even though it still has shortcomings like hallucinations, creating results that aren’t real, and making basic arithmetic mistakes, the team says that GPT-4 has made great progress in applying common sense.

The research team also gave GPT-4 another prompt: ‘Can you write a proof that there are infinitely many primes, with every line that rhymes?’

Dr. Bubeck, former Princeton University professor, who was part of the research team, told the New York Times that GPT-4’s poetic proof was impressive, mathematically and linguistically, that he was unable to understand whether he was chatting with an AI or a human. “At that point, I was like: What is going on?”

Skepticism over the paper

The researchers, although amazed by the capabilities of the machine-learning model, are skeptical as well. They write in their paper, “We acknowledge that this approach is somewhat subjective and informal, and that it may not satisfy the rigorous standards of scientific evaluation.”

The same is echoed by Maarten Sap, a researcher and professor at Carnegie Mellon University, who came down heavily on the paper, saying, “The ‘Sparks of A.G.I.’ is an example of some of these big companies co-opting the research paper format into P.R. pitches. They literally acknowledge in their paper’s introduction that their approach is subjective and informal and may not satisfy the rigorous standards of scientific evaluation.”

Study abstract:

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4 [Ope23], was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT- 4 is part of a new cohort of LLMs (along with ChatGPT and Google’s PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board