Hate Speech Detection and Reclaimed Language: Mitigating False Positives and Compounded Discrimination

Abstract

While minimising false negatives in hate speech classification remains an important goal in order to reduce discrimination and increase fairness for online communities, there is a growing need to produce models that are sensitive to nuanced language use. This is particularly true for terms that may be considered hateful in certain contexts, but not others. The LGBTQ+ community has long faced stigmatisation and hate, which continues to be the case online. There has been a rise in appreciation and understanding of this community’s use of mock impoliteness and the reclaiming of language that has traditionally been used derogatorily against them. Reclaimed language in particular presents a challenge in the field of hate-speech detection. As a first-of-its-kind study looking into the impact of reclaimed language on hate speech detection models, we create a novel dataset, Reclaimed Hate Speech Dataset (RHSD), which enables investigation into the phenomenon. Through the use of a state-of-the-art hate speech detection model, we demonstrate that models may inadvertently discriminate against the LGBTQ+ community’s reclaimed language use through misclassifying such content as hateful. As a result, there is a risk of compounding discrimination against this population through restricting their language use and self-expression. In response to this issue, we produce a fine-tuned hate-speech detection model which aims to minimise false positive classifications of reclaimed language. By creating and publishing the first dataset that focuses on reclaimed language and investigating its impact on hate speech detection models, our research

Type