A self-taught artificial intelligence (AI) system called DeepCube has mastered solving the Rubik's Cube puzzle in just 44 hours without human intervention. The system's inventors have detailed their design in a paper titled 'Solving the Rubik's Cube Without Human Knowledge’.
“A generally intelligent agent must be able to teach itself how to solve problems in complex domains with minimal human supervision,” write the paper’s authors. “Indeed, if we’re ever going to achieve a general, human-like machine intelligence, we’ll have to develop systems that can learn and then apply those learnings to real-world applications.”
Rubik's Cube Proved more Challenging than Go or Chess
While many AI systems have been taught to play games, mastering the complexity of a Rubik's Cube posed a unique set of challenges. Teaching games such as Go and chess is usually done by learning a strategy that instructs ‘good’ and ‘bad’ moves and rewards positive decision making.
However, this type of learning doesn’t work with solving Rubik's Cube as it is difficult to determine if a single move has accelerated the puzzle towards the solution. If the system can’t be rewarded for incremental steps then it can’t learn.
A 3X3X3 Rubik’s Cube has a total “state space” of 43,252,003,274,489,856,000 combinations (that’s 43 quintillion). To solve the puzzle all six sides of the cube has to be the same color.
To get to this magic moment there are multitudes of algorithms or strategies, the first of which took the puzzle's inventor Ernő Rubik several months to figure out. The minimal possible moves to unscramble the frustrating game has been determined to be 26.
Since the invention of the game, we have developed lots of ways to solve the puzzle and fans of the toy are eager to share them with newcomers. However, the researchers were determined to find a way to teach the system to solve the puzzle without giving it access to this prior knowledge and list of tips.
New AI technique developed
To solve the learning problem the research team from the University of California, Irvine, developed a new AI technique known as Autodidactic Iteration. “In order to solve the Rubik’s Cube using reinforcement learning, the algorithm will learn a policy,” write the researchers in their study.
“The policy determines which move to take in any given state.” To create this policy DeepCube developed its own reward system, and using only the changes in the cube, learned to evaluate the possible success of its proposed moves. It does this in a super clever but incredibly time-consuming (for mere humans at least) way.
When DeepCube decides on a move it jumps all the way forward to the completed cube then all the way back to its proposed adjustment. This system lets DeepCube evaluate the overall success of the move.
Once it has collected enough data, it then uses a tree search method to examine all the possible search moves before deciding on which path to take. “Our algorithm is able to solve 100 percent of randomly scrambled cubes while achieving a median solve length of 30 moves —less than or equal to solvers that employ human domain knowledge,” write the researchers.
The researchers will soon elevate the challenge and test the new Autodidactic Iteration technique on harder, 16-sided cubes.