Sight is a miracle— the relationship of reflection, refraction, and messages decoded by nerves within the brain.
When you look at an object, you’re staring at a reflection of light that enters your cornea in wavelengths. As it enters the cornea, the light is refracted, or bent, toward the thin, filmy crystalline lens that further refracts the light. The lens is a fine-tuner: it focuses the light more directly at the retina, forming a smaller, more focused beam. At the retina, the light stimulates photoreceptor cells called rods and cones. Think of the rods and cones and microscopic translators— they change the light into electrical impulses that are sent to the brain.
The impulses shoot down the optic nerve toward the visual cortex, where the image is flipped right-side-up. The cortex then interprets these signals and allows you to make meaningful decisions about them, "Look, it's a dog!"
Sight is obviously nothing new for humans, but now computers are also learning to see. In fact, they are at the dawn of a new age — an age of vision.
Computer vision is a form of artificial intelligence (AI) focused on teaching computers to comprehend and interpret images.
The beginnings of computer vision history date back to the late 1950s, with two scientists, a firing neuron, and a cat.
David Hubel and Torsten Wiesel were investigating the response of a cat’s visual experience (to seeing small spots of light or a black dot on a clear glass slide projected onto a screen), and how neurons in higher-functioning areas of the brain reacted to sight. After many frustrating trials without any helpful readings, the two made an accidental discovery. As the cats watched, one of the researchers accidentally moved the glass slide a little too far, bringing its faint edge into view. That single line moving across the screen at a particular angle caused the cat's neuron to fire. This one mistake changed how we view visual processing.
How? The researchers found that particular neurons in the visual cortex were responsible for responding to specific orientations, such as lines and angles. These and later studies showed how the visual system builds an image from simple stimuli into more complex representations. That one happy accident established the basis of all deep learning models, particularly ones used in computer vision.
By the 1980s, progress in developing computer vision was on the rise. In 1982, David Marr established an algorithmic structure for the vision that could identify corners, edges, and other distinct visual features. Kunihiko Fukushima’s Neocognitron created a model for a self-organizing neural network, both simple and complex, that could recognize patterns. These convoluted neural networks proved very effective at image recognition, however, they were hard to apply to high-resolution images, making training the net very time-consuming.
So what made computer vision really take off?
An AI competition in 2012.
At the time, typical top-5 error rates for visual recognition hovered around 26% (the top-5 error rate is the fraction of test images for which the correct label is among the 5 most likely), and it looked like there was no changing that percentage. Then AlexNet came along. The University of Toronto team created a convolutional neural network, a deep learning model that identifies images by assigning weights and biases to elements of an image, that obliterated the past error rates, with a top 5 error rate of 15.3%.
We’ve reached the point where, like humans, computers have vision. But the issue in CV isn’t what computers can see, but rather what they can’t.
Computer vision is dependent on deep learning, a subfield of machine learning. In order to finely-tune a computer’s “sight”, it needs to be fed data — a lot of data. But there’s an issue with this data: it’s often biased.
This is a major problem, one that, in the most extreme examples, could even lead to death. Case in point, it's estimated that some 33 million autonomous vehicles will be on the road by 2040, potentially eliminating some of the dangers posed by fallible human motorists. The problem? The computer vision systems in these vehicles can't recognize pedestrians with darker skin tones.
In 2018, Joy Buolamwini published “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification”. It sounds like a mouthful, but Buolamwini's Gender Shades project changed how we view skin tone and computer vision. The study chronicled the accuracy of three major gender classification algorithms, some of the most popular and widely used (including Microsoft and IBM) across four classification groups: lighter male, lighter female, darker male, and darker female. Buolamwini found that, overall, every program had higher accuracy in lighter-skinned individuals, with the error between lighter and darker skin ranging from 11.8% to 19.2%. This in itself was concerning: the software could not perform nearly as accurately on darker-skinned individuals as lighter-skinned ones.
Then Buolamwini broke the accuracy down by gender and skin tone. Microsoft and IBM algorithms had the highest accuracy on light-skinned males. Microsoft even had a 100% accuracy rate, and the lowest, Face++, was 99.2% accurate.
But then the programs revealed a more troubling trend.
Among darker-skinned females, accuracy rates were as much as 34% lower than than the rates for lighter-skinned males. In fact, of the faces incorrectly gendered by Microsoft, 93.6% of them were dark-skinned.
Buolamwini went on to investigate the results within more specific tones using the Fitzpatrick Skin Type system and found that as skin tone darkened in women, the accuracy was essentially a coin toss: roughly 50%.
Additionally, image-identifying AI can easily fall victim to harmful stereotypes in its image classification. A 2021 study from Carnegie Mellon University and George Washington University developed an approach toward recognizing biased associations between concepts such as race, gender, and occupation in image databases. They analyzed the results across two CV models: iGPT and SimCLR. Within the Gender-Career test, which measures the association between sexes and career attributes, men were matched with concepts like “office” or “business”, while females were matched with “children” and “home”. These findings reflected an incredibly strong bias.
The researchers found that both displayed statistically significant racial biases. When testing races for association with objects, iGPT and SimCLRv2 associated white people with tools, while Black people were matched with weapons. Both models found "Arab-Muslim" individuals as “unpleasant” compared to "European Americans", while the iGPT found that lighter skin tones were more “pleasant.”
This idea of lighter skin tones supposedly being more “pleasant” has also faced scrutiny on many social media platforms, and reflects a deeper issue of colorism in society. In 2017, the popular photo editing app FaceApp came under fire for its “hot” filter — which claimed to make users look “hotter”— by lightening skin tone. In other words, to make people look better, the AI system was making people lighter.
Colorism has a long history of harming BIPOC groups and still plays an active and destructive role in society today. Colorism is defined as a form of discrimination in which lighter-skinned individuals are treated in more favorable manners than darker-skinned ones. Much of this discrimination arose from ideas of white supremacy and eurocentrism. Research suggests that while slavery was rampant in the United States, lighter-skinned slaves with typically “European” features would be treated less harshly or given slightly more "favorable" treatment (as if any treatment as a slave could be considered favorable).
One of the most infamous instances of this discrimination in the United States was the paper bag test. If a Black person’s skin was darker than a paper bag, then they would not be permitted into certain spaces or afforded working opportunities; if their skin was lighter, then these opportunities would magically open up to them. Over time, these notions of colorism have seeped into every aspect of American life, harming job prospects, mental health, court proceedings, and more.
And AI is perpetuating and continuing this stereotyping and poor treatment.
So how can we fix this? How are we working toward making computer vision more inclusive and less biased?
The answer lies within fixing the databases.
The accuracy of machine learning-based AI is entirely dependent on the data that it’s fed. If you feed a program with millions of images of turtles, it’ll get really good at identifying pictures of turtles. But if you show it a single image of a snake, the model won’t know what it is.
That’s what the issue is for race. Many image databases, including ImageNet, one of the most-used image databases, are overwhelmingly white and lighter-skinned. In gender shades, Buolamwini found that some data sets were over 85% light-skinned, in a world where billions of people have darker skin shades. To put it plain and simple, our databases are lacking diversity, and artificial intelligence is failing because of it. The current color scale used in AI, the Fitzpatrick Skin Type, wasn’t even created for identifying skin tone -- it was to classify skin types that were most at risk for sunburn. This system grossly oversimplifies color, categorizing shade into just six groups.
Currently, Google and other groups are reworking skin classification software in hopes of fine-tuning how computers see different races.
Now, more than ever, we’re finally acknowledging the importance of diversity in our society and in our machine systems. In the 1960s and 1970s, we saw students fight to have ethics studies in universities. We’re seeing cultural parks like the San Pedro Creek Culture Park celebrate diverse heritage. And now, workforce diversity is at an all-time high in the United States
To ensure equality and safety for all, we need to bring this diversity to AI.