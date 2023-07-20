It seems that the honeymoon phase for large language models (LLMs), introduced in the rush to make inroads in the generative AI space, is over.

According to a study by researchers at Stanford and UC Berkeley, the performance of OpenAI’s LLMs has decreased significantly over time.

The researchers wanted to determine if these LLMs were improving, as they can be updated based on data, user feedback, and design changes.

The team evaluated the behavior of the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four tasks. The first was solving math problems, the second was answering sensitive/dangerous questions, the third was generating code, and the fourth was assessing the models on visual reasoning.