AI Can Now Pass School Tests but Still Falls Short on the Turing Test

From winning at Go to passing eighth grade level multiple choice tests, AI is making rapid advances. But its creativity still leaves much to be desired.
Ariella  Brown

On September 4, 2019, Peter Clark,  along with several other researchers, publishedFrom ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project∗” The Aristo project named in the title is hailed for the rapid improvement it has demonstrated when it tested the way eighth-grade human students in New York State are tested for their knowledge of science. 

The researchers concluded that this is an important milestone for AI: "Although Aristo only answers multiple choice questions without diagrams, and operates only in the domain of science, it nevertheless represents an important milestone towards systems that can read and understand. The momentum on this task has been remarkable, with accuracy moving from roughly 60% to over 90% in just three years."

The Aristo project is powered by the financial resources and vision of Paul G. Allen, the Founder of the Allen Institute for Artificial Intelligence (A12).  As the site explains, there are several parts to making AI capable of passing a multiple-choice test. 

Aristo's most recent solvers include:

  • The Information Retrieval, PMI, and ACME solvers that look for answers in a large corpus using statistical word correlations. These solvers are effective for "lookup" questions where an answer is explicit in text.
  • The Tuple Inference, Multee, and Qualitative Reasoning solvers that attempt to answer questions by reasoning, where two or more pieces of evidence need to be combined to derive an answer.
  • The AristoBERT and AristoRoBERTa solvers that apply the recent BERT-based language-models to science questions. These systems are trained to apply relevant background knowledge to the question, and use a small training curriculum to improve their performance. Their high performance reflects the rapid progress made by the NLP field as a whole.

While Aristo’s progress is, indeed, impressive, and, no doubt, there are some eight graders who wish they could find some way to carry along the AI with them to the test, it still is far from capable of passing a Turing test. In fact, the Allen Institute for Artificial Intelligence admitted that it was deliberately testing its AI in a different way when it set out to develop it in 2016. 

The explanation was given in an article entitled, “Moving Beyond the Turing Test with the Allen AI Science Challenge. Admitting that the test would not be “a full test of machine intelligence,” it still considered worthwhile for its showing “several capabilities strongly associated with intelligence - capabilities that our machines need if they are to reliably perform the smart activities we desire of them in the future - including language understanding, reasoning, and use of commonsense knowledge.”

There’s also the practical consideration that makes testing with ready-made tests so appealing: “In addition, from a practical point of view, exams are accessible, measurable, understandable, and compelling.” Come to think of it, that’s why some educators love having standardized tests, while others decry them for the very fact that they give the false impression they are measuring intelligence when all they can measure is performance of a very specific nature. 

When it comes to more creative intelligence in which the answer is not simply out there to be found or even intuited, AI still has quite a way to go. We can see that in its attempts to create a script. 

Making movies with AI

Benjamin (formerly known as Jetson) is the self-chosen name of “the world’s first automated screenwriter.”  The screenwriter known as Benjamin is “a self-improving LSTM RNN [Long short-term memory recurrent neural network] machine intelligence trained on human screenplays. 

Benjamin has his/its own Facebook page, Benjamin also used to have a site under that name, but now he/it shares the credit on a more generally named one,, which offers links to all three of the films based on scripts generated by AI that were made within just two days to qualify for the Sci-Fi London's 48hr Film Challenge.

Benjamin’s first foray into film was the script for “Sunspring.” However, even that required a bit of prompting from Ross Goodwin, “creative technologist, artist, hacker, data scientist,” as well as the work of the filmmaker Oscar Sharp, and three human actors.

The film was posted to YouTube, and you can see it in its entirety by sitting through the entire 9 minutes. See if you share the assessment expressed by the writer Neil Gaiman  whose tweet appears on the Benjamin site: “Watch a short SF film gloriously fail the Turing Test.”

What makes the AI film distinct from a human creation

From a human perspective, it makes less than perfect sense. The main character (identified as H) declares at some point: “It’s a damn thing scared to say. Nothing is going to be a thing but I was the one that got on this rock with a child and then I left the other two.”

The film producer also had to show what the script indicated in a stage direction that is physically impossible: “He is standing in the stars and sitting on the floor.” One cannot be in two different places at the same time and cannot both be standing and sitting at the same time.

I contacted Goodwin via email back in 2016 to ask him to explain how the algorithm operated. He explained that the one he selected, “an LSTM recurrent neural network, can be influenced in certain ways -- for example, by selecting only science fiction materials for the corpus, the output will have a science fiction feel to it.”

He went on to explain its limitations in creating text that “doesn't lend itself to control over story structure” because it is solely based on “a statistical model with millions of parameters and predicting which letter comes next over and over again.”

One thing Benjamin does have in common with his human counterparts is making sequels. There have been two follow-ups to “Sunspring" to date.

Will  Benjamin put human writers out of work? 

The answer to that question is offered as yes but with a kind of nightmarish twist in a follow-up film that refers to “Sunspring.”  In 2017, Benjamin generated a new script that was the basis of another film with a different set of actors called “It’s No Game” that you can see here:


The writing credits include three entities: Benjamin 2.0, Oscar Sharp, (writer of Benjamin 2.0) and Ross Goodwin. Either Benjamin is writing about himself and being very meta, or some of the humans chose to insert that kind of self-referential quality into the film about human writers losing their jobs as AI takes over and machines direct everything.

Additional credits that appear at the end of the film include “BALLETRON,” which is also explained as “a recursive context-free algorithm using a dictionary of French ballet terminology and English words to generate choreographic from input initials.” That is what would have dictated the dance moves appearing near the end of the film.

Credit for the closing speech of the star, David Hasselhoff, is given to “THE SOLILOQUIZER,” which is explained only as “The Cornell Movie Database.” That would mean yet another training data set was used to generate the speech that doesn’t necessarily fit the narrative buildup any more than the dance moves do.

The basic narrative line for the first two thirds or so of the film does kind of make sense. You see the writers being told they are to be supplanted. Also, the shifts in dialogue style are contextualized and explained as the results of a particular output from 1980s shows or Shakespearian plays used to train Benjamin’s writing for those instances. 

However, it also slips off into non-sequiturs, despite the fact that humans are clearly involved in the writing and could correct for the failure of coherence that results from pure machine-generated text. It could be that Goodwin wants to retain that off-kilter feel for the film to reflect where AI-generated text is at for the present. 

However, back in 2016, Goodwin insisted that he does expect that AI will one day attain the same type of coherence found in films made by humans. The 2017 film actually touches exactly on the point he raised in the email correspondence then that people have to think about the way they will “ use such technology well in advance of its arrival.”

Is the third time the charm?

The humans behind Benjamin certainly tried, though alas, the third film they produced, “Zone Out,” didn’t make the top 3 (as the second film did) or even top 10 (as the first film did) in the competition for the Sci-Fi London's 48hr Film Challenge in 2018. It was disqualified for excessive use of existing footage.

You can watch it here, but that will be just over six and a half minutes of your life that you won’t get back.

What makes this film somewhat more painful than the previous two is the combination of the mechanized-sounding voices combined with footage that didn’t quite match the words and the not quite believable imposition of mouth movements imposed on the faces of the actors from the public domain movies, The Last Man on Earth and The BrainThat Wouldn’t Die that make up the footage.

All this is due to the ambition to have Benjamin not only write but also “appear” in the movie. And despite the idea of an AI taking on a body that was portrayed in Avengers: The Age of Ultron, artificial intelligence does not actually have any physical characteristics, robotic or human. So Benjamin had to substitute him/itself with characterizations already captured on film with the AI-generated words taking the place of the original scripts.

The results are rather messy.  The best that can be said about it, as one of the most popular comments did, is “Still better love story than Twilight.” 

Faking it

What the commentators failed to grasp, as the answers to the question in the comments about moustaches appearing on the female characters was how the effect was achieved. It’s all because the actor Thomas Middleditch served as the AI’s model for mouthing speech for all the actors appearing in the film.

The way it worked is explained in this video:


We’ve seen far more polished results in the recreation of young Princess Leia and  Grand Moff Tarkin for the 2017 Star Wars film Rogue One. Obviously, though, the producers of what is guaranteed blockbuster have much bigger budgets and much more time to spend on achieving such realistic recreations through the combination of actors and technology as explained in the video below: 

However, the impressive effect achieved through those massive efforts on the Star Wars film indicate that in future, the combination of machines and humans can produce creative results that are worth watching.  You can have generated effects and even generated faces, so long as they remember to tell a good story that has some internal logic that appeals to humans.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board