Sometimes, ChatGPT made "surprising" mistakes in school-level math.
Microsoft-backed OpenAI's AI chatbot ChatGPT has been making headlines ever since it was released to the public on November 30. It can break down complex scientific concepts, compose poems, write stories, code, and create malware...the list is endless. OpenAI has also released a paid version of the chatbot. Known as 'ChatGPT Professional', it is available at $42 per month.

ChatGPT3 has even raised alarms within Google - according to a recent report by The New York Times, 'Google now intends to unveil more than 20 new products and demonstrate a version of its search engine with chatbot features this year'.

Students have also been using the chatbot to complete assignments. It turns out it can clear examinations, too, with flying colors. Christian Terwiesch, a professor at the Wharton School School of Business, University of Pennsylvania, tested the performance of ChatGPT in an MBA exam. He questioned the chatbot on Operations Management, a core MBA subject. 

The paper, titled 'Would Chat GPT3 Get a Wharton MBA? A Prediction Based on Its Performance in the Operations Management Course', delved into the implications of ChatGPT's 'academic performance'.

ChatGPT gave superb answers but was not great at Math

Based on the test, Terwiesch found that the answers were correct, and the chatbot provided excellent explanations. However, ChatGPT also made mistakes in relatively simple calculations at the sixth-grade math level, which can be massive in magnitude. 

Terwiesch also noted that the current version of ChatGPT cannot handle more advanced process analysis questions, even if they are based on fairly standard templates. 

Terwiesch concluded by stating in his study that ChatGPT has "remarkable skills in handling problems as used extensively in the training and testing of our MBA students. Combining the results of the questions, I would grade this performance as a B to B-". 

The professor also added a reference point to put the chatbot's performance into perspective: "Until Wharton allowed students more flexibility in which courses they take, this Operations Management course was a required course that every student had to take. However, we did allow students to waive this course if they could demonstrate content mastery on a waiver exam. The performance of Chat GPT3 reported above would have been sufficient to pass the waiver exam, though by a very small margin."

Could ChatGPT develop a question paper?

Terwiesch didn't stop there. The professor wanted to find out if Chat GPT3 could develop a question paper. "By now, I have written thousands of questions, and, at times, I feel I have exhausted my imagination for new problems. Can I turn to Chat GPT3 to come up with new exam questions?" he asks in the study.

It was found that the questions were plausible and humorous. Terwiesch mentioned that the questions were good enough to be taken advantage of in upcoming question papers.

Professor Terwiesch compared the effect electronic calculators had on the corporate world to the impact ChatGPT could have on academia. "Prior to the introduction of calculators and other computing devices, many firms employed hundreds of employees whose task it was to manually perform mathematical operations such as multiplications or matrix inversions. Obviously, such tasks are now automated, and the value of the associated skills has dramatically decreased. In the same way, any automation of the skills taught in our MBA programs could potentially reduce the value of an MBA education," he said in the study.

Andrew Karolyi, dean of Cornell University’s SC Johnson College of Business, resonated with the same. He told the Financial Times: "One thing we all know for sure is that ChatGPT is not going away. If anything, these AI techniques will continue to get better and better. Faculty and university administrators need to invest to educate themselves."

Study Abstract:

OpenAI’s Chat GPT3 has shown a remarkable ability to automate some of the skills of highly compensated knowledge workers in general and specifically the knowledge workers in the jobs held by MBA graduates including analysts, managers, and consultants. Chat GPT3 has demonstrated the capability of performing professional tasks such as writing software code and preparing legal documents. The purpose of this paper is to document how Chat GPT3 performed on the final exam of a typical MBA core course, Operations Management. Exam questions were uploaded as used in a final exam setting and then graded. The “academic performance” of Chat GPT3 can be summarized as follows. First, it does an amazing job at basic operations management and process analysis questions including those that are based on case studies. Not only are the answers correct, but the explanations are excellent. Second, Chat GPT3 at times makes surprising mistakes in relatively simple calculations at the level of 6th grade Math. These mistakes can be massive in magnitude. Third, the present version of Chat GPT is not capable of handling more advanced process analysis questions, even when they are based on fairly standard templates. This includes process flows with multiple products and problems with stochastic effects such as demand variability. Finally, ChatGPT3 is remarkably good at modifying its answers in response to human hints. In other words, in the instances where it initially failed to match the problem with the right solution method, Chat GPT3 was able to correct itself after receiving an appropriate hint from a human expert. Considering this performance, Chat GPT3 would have received a B to B- grade on the exam. This has important implications for business school education, including the need for exam policies, curriculum design focusing on collaboration between humans and AI, opportunities to simulate real-world decision-making processes, the need to teach creative problem-solving, improved teaching productivity, and more.

