AI beats self, learns to play Atari game 6000 times faster than before
There are many things AI models are good at, but one thing they are not is efficient learners. It takes them huge amounts of time and data to solve problems humans can pick up almost instantaneously.
Now, researchers have figured out that getting AI to read instruction manuals before attempting a task may speed up their learning skills. It’s called reinforcement learning, and it involves setting a goal and rewarding the AI for taking actions that help reach that goal.
As effective as the technique is, it does, however, rely on trial and error to find a strategy that works. This means these algorithms may take years to find a winning formula.
Now a team from Carnegie Mellon University has devised a way to aid reinforcement learning algorithms learn much faster by combining them with a language model that can read instruction manuals, according to a report published on Friday by Singularity Hub.
So far, they have been successful in teaching an AI to play a challenging Atari video game thousands of times faster than a model developed by DeepMind.
“Our work is the first to demonstrate the possibility of a fully-automated reinforcement learning framework to benefit from an instruction manual for a widely studied game,” said Yue Wu, who led the research.
“We have been conducting experiments on other more complicated games like Minecraft, and have seen promising results. We believe our approach should apply to more complex problems.”
Summarizing key information
The team began by training a language model to extract and summarize key information from the game’s official instruction manual. This data was then used to ask questions about the game to a pre-trained language model.
The resulting answers were then used to create additional rewards for the reinforcement algorithm and fed into a well-established reinforcement learning algorithm to help it learn the game faster.
To assess their approach, the researchers tested it on Skiing 6000, a game where the leading AI had to run through 80 billion frames of the game to achieve comparable performance to a human.
They found that the new approach required just 13 million frames to get the hang of the game.
Now they have moved on to more complex 3D games like Minecraft, with promising early results, and are seeking to evaluate how rapid improvements in AI language models could act as a catalyst for progress elsewhere in the field, noted the report.
The research is published in a pre-print paper on arXiv.
Study abstract:
High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interation or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithm son Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. Auxiliary reward is then provided to a standard A2C RLagent, when interaction is detected. When assisted by our design, A2C improves on 4games in the Atari environment with sparse rewards, and requires 1000x less training frames compared to the previous SOTA Agent 57 on Skiing, the hardest game in Atari.