Did a drone beat human pilots? Scientists reveal the answer

The drone took a different route from the human pilots and not just flew faster but also much closer to the gates.
Ameya Paleja
Time-lapse illustrations of a high-performance racing drone controlled by our RL policy.
Time-lapse illustrations of a high-performance racing drone controlled by our RL policy.

Robotics and Perception Group, University of Zurich 

Even within their relatively short existence, humans have primarily piloted drones. Equipped with cameras and sensors, drones have relayed information to a pilot, who usually makes critical decisions, while the drone floats about in the air.

This has been changing with autonomous drones and swarms for military purposes. Still, the real triumph of machines over humans happened recently when the autonomous drone beat not one but three human pilots who were champions at the game of drone racing. The drone flew at speeds exceeding 60 miles (100 km) per hour and navigated the obstacles much better than its human counterparts, and that has been narrowed down to neural networks.

What is a neural network?

A neural network is a way of teaching the machine how to process information. Inspired by the human brain, the approach uses a network of interconnected nodes to create an adaptive system where the computer receives feedback on the wrong decisions and uses it to improve its performance.

This approach has been used for many problems in medicine, infrastructure, finance, and, more recently, natural language processing. But number or text-crunching differs greatly from participating in a high-speed race and defeating human champions.

The research team led by David Scaramuzza at the University of Zurich trained the drone and published a paper detailing how it achieved this feat today.

Reinforcement learning versus optimal control

The researchers used optimal control (OC) and reinforcement learning (RL) approaches to train the drone. The OC approach is much more suited to this application since it has been extensively used for agile flight operations earlier. OC training optimized for trajectory tracking and contouring control was used by researchers and demonstrated promising results in simulations. However, when it came to battling it out with humans, RL came up on top.

Did a drone beat human pilots? Scientists reveal the answer
Neural networks consist of nodes that work much like the human brain

In their paper, the researchers argue that the OC assumes that the system is deterministic. The task is first converted into a reference trajectory planning and then tracked by the controller. When unmodeled dynamics enter this system, the system becomes erratic and does not perform optimally.

RL, on the other hand, can work with deterministic and stochastic systems. In the RL approach, the drone learned a control policy using offline optimization and then used feedback for real-time adaptation. Unlike the OC approach, where determining the path is the main objective, RL focuses on the main task objective, which is to fly faster than the human racers.

While the RL-based drone was trained in only 10 minutes on a standard workstation, the authors note that it benefitted from the near-perfect state estimation conducted by a motion capture system as well as lower latency when compared to human pilots.

The drone flew at much higher speeds than the human racers and took a distinctly different path, which was shorter and closer to the gates.

The research findings were published in the journal Science Robotics.


A central question in robotics is how to design a control system for an agile, mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods inQ3 this setting. We then investigated which fundamental factors have contributed to the success of RL or havelimited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and controlwith an explicit intermediate representation, such as a trajectory, that serves as an interface. This decompositionlimits the range of behaviors that can be expressed by the controller, leading to inferior control performancewhen facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leveragedomain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak accelerationgreater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policyachieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.

Add Interesting Engineering to your Google News feed.
Add Interesting Engineering to your Google News feed.
message circleSHOW COMMENT (1)chevron
Job Board