Machine learning could help complete research tasks usually given to citizen scientists. A new study shows how teaching a computer specific image recognition skills can be used in projects that require classification of large amounts of image data.
For years scientists have taken advantage of volunteers who help them sort through massive datasets that are too large for small research teams. Previously this work was needed to be done by humans because the technology for a machine to do it didn’t exist.
Researchers teamed up with ecologists
But that is all about to change. To test the machine learning idea, the researchers partnered with ecologists who study wildlife with camera traps. These ‘traps’ are hidden cameras that are triggered by motion and infrared sensors that provide images for the ecologists to use in their specific research.
However, all of the resulting images need to be reviewed and classified so they can provide useful data for analysis. Often this task is given to trained volunteers who can complete the task within the required timeframe. But the new research replaces volunteers with computers.
Citizen scientists will always be valuable
"In the past, researchers asked citizen scientists to help them process and classify the images within a reasonable time-frame," said the study's lead author Marco Willi, a recent graduate of the University of Minnesota master's program in data science and researcher in the University's School of Physics and Astronomy.
"Now, some of these recent camera trap projects have collected millions of images. Even with the help of citizen scientists, it could take years to classify all of the images. This new study is a proof of concept that machine learning techniques can help significantly reduce the time of classification."
To test their theory that machine learning techniques could be valuable in these instances, the scientists gathered three datasets of images from Africa--Snapshot Serengeti, Camera CATalogue, and Elephant Expedition--and one dataset from Snapshot Wisconsin with images collected in North America.
Computer begins to learn with outlines and colors
Each dataset contained between nine and fifty-five species. The datasets also varied in how each species was photographed, the camera placement, camera configuration, and species coverage. The computer was then taught how to classify the images by being shown the images from a dataset already classified by humans. For example, the machine would be shown full and partial images of a warthog. The computer would then start to recognize the edges and colors of warthogs in images before being to be able to correctly classify it.
The computer also learned when to identify photographs without animals present, which happens when wind triggers the camera. Being able to quickly eliminate these ‘enpty’ photographs can greatly speed up the overall classification effort.
Classification projects greatly sped up
"Our machine learning techniques allow ecology researchers to speed up the image classification process and pave the way for even larger citizen science projects in the future," Willi said. "Instead of every image having to be classified by multiple volunteers, one or two volunteers could confirm the computer's classification."
While this test of the capability of machine learning techniques in image classification was focussed on animals images from camera traps, the researchers say the same ideas could be applied to other science areas that engage with citizen scientists like space and biology.
"Data in a wide range of science areas is growing much faster than the number of citizen science project volunteers," said study co-author Lucy Fortson, a University of Minnesota physics and astronomy professor and co-founder of Zooniverse, the largest citizen science online platform that hosted the projects in the study.
"While there will always be a need for human effort in these projects, combining these efforts with the help of Big Data techniques can help researchers process more data even faster and allows the volunteers to focus on the harder, rarer classifications."