Ex-Uber employee designs new approach to evaluating AI capabilities

The Turing Test, developed in 1950 has become quite obsolete.
Ameya Paleja
Stock photo: 3D rendered image, AI concept.
Stock photo: 3D rendered image, AI concept.


Chris Saad, the former head of product development at Uber, has designed a new framework to benchmark the intelligence of artificial intelligence (AI), which is currently undergoing a sea change. The framework, based on a theory that intelligence is not a monolithic construction, was recently shared on Tech Crunch.

AI has been the trending topic for the past few months after OpenAI made public their conversational chatbot, ChatGPT. Users have tested the chatbot in many different areas varying from writing poetry to code and even sales pitches, and the bot hasn't disappointed.

It would have been a surprise if the bot did not pass the Turing Test, which was designed in the 1950s and tests the AI for its ability to dish out human-like replies. Not just Turing Test, chatbots such as Google's LaMDA have even convinced their testers that they are indeed sentient.

AI Classification Framework

Saad argues that the Turing Test operates on a simple pass/fail basis and focuses only on one aspect of human intelligence - linguistic ability. Researchers have been attempting to crack this one-faceted test since the 1960s when they designed a chatbot named Eliza that mimicked a psychologist. With conversational chatbots now set to become commonplace, it is high time to set a new benchmark.

In 1983 Howard Gardner, a psychologist, argued that intelligence was a collection of different abilities that could manifest in a variety of ways and further went on to classify them into eight types.

Ex-Uber employee designs new approach to evaluating AI capabilities
Multiple dimensions of the AI Classification Framework

Saad borrows from the "Theory of Multiple Intelligences" for AI Classification Framework, which evaluates AI tools across multiple dimensions, viz., linguistic, logical-mathematical, musical, spatial, bodily-kinesthetic, interpersonal, and intrapersonal intelligence.

For each dimension, the framework provides a scale of 1-5, with one denoting infant-like capability while five denoting 'self-agency' or capability beyond human ability, something that could be referred to as 'Super Intelligence.'

To add some context on how this scale can be used, Saad used it to evaluate the capabilities of the ChatGPT, which are detailed here. ChatGPT demonstrates an expert level (Level 3) capability when it comes to logical-mathematical and verbal linguistics. However, it fares much like an infant in other areas, which includes visual-spatial or music-rhythmic abilities, to name a few.

OpenAI's other product, DALL-E 2, which can generate art using just text prompts, is also at Level 3 in the visual-spatial department but has no other capabilities to boast about when compared to human intelligence.

While some may argue that this is a case of shifting the goalpost when AI has started delivering better output, it is also a necessary step at a time when everybody seems to be incorporating AI into their offerings.

With companies like Tesla aiming to deploy humanoid robots in households in the near future, it would be nice to know what level of AI we are actually bringing into our homes and what it is capable of.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board