Researchers at Carnegie Mellon University have combined iPhone videos shot from separate cameras to create 4D visualizations, which allow viewers to see swift action from various angles, according to a new press release. The new shooting process can even remove people or objects from view.
This new method — which lets video editors show off fresh tricks in real-time — might have implications for a time when face-swapping technology is already on the verge of causing a seismic shift in the reliability of video content.
Creating 4D scenes from multiple visualizations
Imagine a live event where the concert or sports game (for example) is captured from every smartphone in the theater or arena, and that any obstruction to the field of vision could be quickly removed.
It's dizzying, but also exciting.
Every video — shot independently from several vantage points — from guests at a wedding reception, could put viewers there, in the middle of moments that last forever, Aayush Bansal, a Ph.D. student in CMU's Robotics Institute, explained in the press release.
Another application lies in recording actors in one setting and then insert them into another, he added.
"We are only limited by the number of cameras," Bansal said, explaining that there is no upper limit on how many video feeds can be combined.
Bringing movie studios to iPhones
'Virtualized reality,' as the Carnegie Mellon press release calls it, is nothing new. This new video capture method is significant because it's both easy and accessible to use. While it's nothing new, virtualized reality was previously exclusive to studio setups like CMU's Panoptic Studio, which boasts more than 500 video cameras embedded in its geodesic walls.
However, combining visual information of real-world scenes shot from multiple independent and handheld cameras to make a single comprehensive model that reconstructs a 3D scene hasn't been possible until now.
To develop their method, Bansal and his colleagues used convolutional neural nets (CNNs), a type of deep learning program that is adept at analyzing visual data. The team found that scene-specific CNNs can adequately compose different snippets into a full 4D scene.
'The world is our studio'
To demonstrate their method, the researchers used up to 15 iPhones to capture several scenes — including dances, martial arts demonstrations, and even flamingos at the National Aviary in Pittsburgh, in the United States.
"The point of using iPhones was to show that anyone can use this system," Bansal said. "The world is our studio."
Bansal and his colleagues presented their 4D visualization method at the Computer Vision and Pattern Recognition virtual conference last month. While unprecedented, this new technology is just the start of a new future for media and video capture.