Hate speech can create a toxic environment for people online and is becoming a growing problem, so much so that social media platforms are under increasing pressure to respond.
However, even this can be an issue as automated removal of that type of content opens the doors to potentially more issues as this solution tends to further suppress already marginalized voices. In short, the process is tricky.
Nevertheless, back in 2016 Google did their best to create an artificial intelligence algorithm meant to monitor and prevent hate speech on social media platforms and websites. However, a recent study conducted by people at the University of Washington discovered that the same tool was racially biased; profiling tweets posted by African Americans.
The biased hate speech detector
Now, when training any machine learning tool, the right data set is important, and the Google hate speech algorithm was no different. Developers from the company comprised a database of well over 100,000 tweets that were labeled "toxic" by Google's API called Perspective. This toxic content then became the guiding light for the algorithm, using what it “learned” to identify and flag “good content” and anything distasteful, rude or disrespectful.
The University of Washington paper found that the Google tool had a suspiciously high-rate of flagging black people even though most of the language of the tweets were identified as not harmful.
Interestingly, when the tool was tested against 5.4 million tweets they found that the tool was two times more likely to flag posts by those written by African Americans. It seems the Google tool struggled with tweets written in African-American vernacular English.
Picking the right data
As said above, the right data is very important. It is even more important in areas in which race is involved. The overall issue with the Google algorithm is that it lacked the appropriate understanding and culture awareness to properly identify the African-American Vernacular English. In short, AAE was not properly introduced to the data set which in turn led to the bias.
The solution? As stated in the report, “We introduced dialect and race priming, two ways to reduce annotator bias by highlighting the dialect of a tweet in the data annotation, and show that it significantly decreases the likelihood of AAE tweets being labeled as offensive.”
The University of Washington team believes that extra attention should be paid to the confounding effects of dialect so as to avoid unintended racial biases in hate speech detection.