Last year, when I was preparing a grant application to develop applicable solutions to tackle online extreme speech, I briefly ventured into “Perspective API.” A joint creation of Google’s Counter Abuse Technology project and Jigsaw, Perspective aims to fine tune machine learning models for online discussions. The web interface of Perspective, when I saw it first, had a sleek, minimalist design, displaying an aesthetic of business-as-usual. The overall aesthetic remains the same to this day. Cast in blocks of purple and stripes of white as thin as hairpins, the design belies a more ambitious agenda that underlies the project. The aim is to involve an unlimited number of ordinary users and coders in the company’s efforts to improve machine learning models that can detect “toxicity” and “online negative behavior.”

“By directly involving enthusiastic coders and users, they promise to enlarge the sphere of human moderators.”

In the area of hate speech moderation, these efforts are no doubt promising. By directly involving enthusiastic coders and users, they promise to enlarge the sphere of human moderators. However, it is hard to look away from its limitations. At one end of the problem is the stated or implied motivation that drives such efforts. Companies might be responding to mounting regulatory pressures to take action against hateful exchange online, but it is not the public interest value of inclusive dialogue that drives these efforts, but the anxiety to ensure online users do not flee their websites because of unpleasantness. For a profit model that hinges on user participation as the pivot of value creation, reticence could be devastating. Perspective’s definition of “toxicity” says it all: “Toxicity classifies rude, disrespectful, or unreasonable comments that are likely to make people leave a discussion.” People then should be encouraged and prodded on to continue discussing, so that the pipeline does not run out of “data.”

The anxieties that lie behind public declarations on issues as severe as “online toxicity” reveal social media companies’ eagerness to ensure there is continuous user engagement. In other words, the more users share and create content, and the more content they share and create, the better it is for data capitalism. This approach to content as something inherently expanding because of technological growth and the accompanying evaluative stance that content expansion is in itself positive could be defined as the corporate logic of positive abundance via participation. This logic stands in contrast to regulatory values of restraint and containment in speech governance, i.e., to see content as something that has to be delimited or contained to achieve specific democratic goals. Corporate efforts to enhance user participation through content creation and sharing conflict with the need for paying prudent attention to the meanings, intents, and repercussions of speech—a need that we as a society owe to ourselves.

“Studies have revealed that Perspective API has shown a large number of false positives as well as racial bias.”

When companies have attempted to address extreme speech moderation, there have been limitations of implementation and design of AI assisted models. Studies have revealed that Perspective API has shown a large number of false positives as well as racial bias.1Maarten Sap et al., “The Risk of Racial Bias in Hate Speech Detection,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Florence, Italy: Association for Computational Linguistics, 2019), 1668–1678. To test it briefly, I entered the expression, “You are an antinational” on the system,2These entries were done in the “create tools for commenters” dialogue box of Perspective API. following my research into online nationalism. The system gave a score of 0.59 (on a scale of 0–1, with 1 as the highest). The accompanying description read, “Unsure if this follows community guidelines.” In the context of India, where nationalism has regained momentum as part of majoritarian politics, “antinational” is an accusation that can have dire consequences both online and on the ground. I tried another phrase, “You are a nationalist moron.” The system gave a higher score of 0.97 with the description, “May violate community guidelines.” This was because the model identified words such as “morons” as toxic. Contextually, one might think of the expression “nationalist moron” as a challenge to regressive nationalist jingoism. An expression like, “tukde tukde gang” (in Hindi and English, which translates into “the gang that wants India to split up”) gave the result, “Error scoring text. Please try again.” “Tukde tukde gang” is a common expression that nationalist actors invoke in Indian online spheres to criticize activists and students who demand fair and secular governance across the country. Perspective models did not assign a score to this expression because of language and context.

A single, and admittedly simplistic, foray into existing corporate efforts to address online extreme speech revealed the enormity of the challenge researchers face. While extreme forms of speech circulate widely in regional languages, corporate AI systems are currently limited to major world languages. While Facebook’s AI models are better equipped to flag problematic content in English, Spanish, and Mandarin, Perspective’s toxicity scores are currently available only for six languages. A recent Time magazine story pointed out that “languages spoken by minorities are the hardest-hit by this disparity.”

Furthermore, the problem is not limited to semantics, i.e., how machine learning can pin down meanings and demarcate culpability within the world of words. It has erupted at different levels, including political misuse, commercialization of digital manipulation services, as well as digital cultural practices of vitriolic exchange, which have together facilitated the resurgence of populist regimes in many parts of the world.

AI as remedy

“Often, assertions of privilege go hand in hand with the shared pleasures of transgressing acceptable speech norms online, and the tendency to say things that are unacceptable elsewhere.”

Evidence has shown that authoritarian regimes, extremist groups, and right-wing xenophobic groups have weaponized online hate speech and disinformation to take aim against minoritized communities, including immigrants, women, and advocates for inclusive societies.3Sahana Udupa and Matti Pohjonen, “Extreme Speech and Global Digital Cultures,” International Journal of Communication 13 (2019): 3049–3067. Some ordinary online users are drawn to hate speech under the influence of intentional political sabotage and organized disinformation campaigns, or the threat of noncompliance. But a large number of users spew hate because they are able to. They are powered by the structures of historical privilege and anger around imaginary wounds. Often, assertions of privilege go hand in hand with the shared pleasures of transgressing acceptable speech norms online, and the tendency to say things that are unacceptable elsewhere.4Sahana Udupa, “Nationalism in the Digital Age: Fun as a Metapractice of Extreme Speech,” Journal of Communication 13 (2019): 3143–3163.

Governments, companies, and civil society groups have responded to this phenomenon by increasingly turning to artificial intelligence (AI) as a tool that can detect, decelerate, and remove online extreme speech. Deployment of AI is assumed to bring scalability, reduce costs, and decrease human discretion and emotional labor. However, as the anecdotal case of Perspective signals and mounting empirical evidence attests, such efforts face many challenges.

One of the key challenges is the quality, scope, and inclusivity of training data sets. Several studies have shown that classification algorithms are limited by the homogenous work forces of technology companies that employ disproportionately fewer women and people of color.5New York: New York University Press, 2018More Info → Language-based asymmetries and uneven allocation of corporate and state resources for extreme-speech moderation that affects different communities within the nation-state, as well as on a global scale, are another reason for quality issues in training datasets.

“The dynamic nature of online hate speech—where hateful expressions keep changing—adds to the complexity.”

The second challenge is the lack of procedural guidelines and frameworks that can bring cultural contextualization to these systems. There is obviously no catch-all algorithm that can work for different contexts. Lack of cultural contextualization has resulted in false positives and overapplication. In addition, hate groups have managed to escape keyword-based machine detection through clever combinations of words, misspellings, satire, and coded language. The dynamic nature of online hate speech—where hateful expressions keep changing—adds to the complexity.

A more foundational problem cuts through the above two challenges. This is about the definitional problem of hate speech. There is no consensus either legally or culturally, let alone globally, around what comprises hate speech. This increases the difficulty of deploying AI-assisted systems for content moderation in diverse national, linguistic, and cultural contexts. The challenge is further compounded by the fact that groups that are directly involved in flagging online hate-speech content at local and regional levels lack the technological tools that can expedite and scale up their work, although they come with the cultural knowledge about what constitutes hate speech within specific contexts.

“Algorithms are always the product of social, technical, and political decisions and negotiations that occur throughout their development and implementation.”

Although these challenges are widely acknowledged, corporate imaginations still position AI as a magical wand that can generate best decisions, because the inherently “neutral” machine-learning model will only become more robust, training itself with more and more data, in the march toward perfecting what Silicon Valley “high priests” ambitiously define as “social physics.”6New York: Public Affairs, 2019More Info → This utopian vision of AI conceals the materialities and politics behind AI technologies. As Ed Finn writes, “there is no such thing as ‘just code’.”7Cambridge, MA: MIT Press, 2017More Info → Algorithms are always the product of social, technical, and political decisions and negotiations that occur throughout their development and implementation.8Laura Forlano, “Invisible Algorithms, Invisible Politics,” Public Books blog, February 2, 2018. Even the “high priests” of Silicon Valley would not refute this well-founded criticism.

The process of choosing and labeling information that feeds the “machine” is never neutral. The challenge is to ensure that this decision making is embedded within democratic processes, however messy these processes might be.

How then do we operationalize an open, nuanced, and decentralized ecosystem to combat hate?

Fact checkers as a critical community

There are no easy answers. One approach is to involve communities so that we are able to place politics and perspective into AI models. An important study in this direction is what Haji Mohammad Saleem and colleagues have described as “leveraging a community-based classification of hateful language.”9Haji Mohammed Saleem et al., “A Web of Hate: Tackling Hateful Speech in Online Social Spaces,” arXiv, September 28, 2017. Rather than keyword-based methods, they have pioneered a model that incorporates expressions used by self-identifying hateful communities as training data. The key here is to source passages from hateful communities that openly identify themselves as right-wing or antiwomen, rather than starting with the words selected by annotators.

“If human supervision is critical, it is then important that we devise ways to connect, support, and mobilize existing communities who have gained reasonable access to meaning and context of speech because of their involvement in online speech moderation of some kind.”

One other way to implement a community-based approach is to involve critical intermediaries, such as independent fact-checkers. We have adopted this approach in the AI4Dignity project.10The project is funded by the European Research Council Proof of Concept Grant (2020). The idea is to address the immense problem of bringing cultural contextualization to big data sets by rescaling the challenge. Independent fact-checkers could be seen as a key stakeholder community that can provide a feasible and meaningful gateway into cultural variation in online extreme speech. While scalability could come from some form of automation, the problem of meaning and context looms large. If human supervision is critical, it is then important that we devise ways to connect, support, and mobilize existing communities who have gained reasonable access to meaning and context of speech because of their involvement in online speech moderation of some kind. Fact-checkers are distinct from other antihate groups because of their professional proximity to journalism. Exposed to volumes of disinformation data containing hateful expressions, they use—directly or indirectly—journalistic practices associated with checking and categorizing content. The plan to involve fact-checkers in this exercise comes with the risk of conflating two seemingly discordant objectives of extreme-speech detection and antidisinformation tasks. However, while they appear discordant, these speech forms come closely packed in actual practice.

Without doubt, fact-checkers are already overburdened with verification related tasks, but they might still benefit from flagging extreme speech as a critical subsidiary to their core activities. Moreover, for fact-checkers, this collaboration also offers the means to foreground their own grievances as a target community of extreme speech. Our interactions with fact-checkers have shown how their inboxes are filled with hateful messages, because their public role in verification invariably upsets groups that seek to (re)shape public discourse for exclusionary ideologies. By involving fact-checkers, we therefore aim to draw upon the professional competence of a relatively independent group of experts who are confronted with extreme speech both as part of the data they sieve for disinformation and as targets of extreme speech. In this way, we are creating a mechanism where the “close cousin” of disinformation, namely hate-filled speech, is spotted during the course of fact-checkers’ daily routines, without interrupting their everyday activities as much as possible.

“Through this triangulation, we are developing a process model that can create collaborative spaces beyond the purview of corporate high priests.”

Building spaces of direct dialogue and collaboration between AI developers and relatively independent fact-checkers who are not part of large media corporations, political parties, or social media companies is a key component in AI4Dignity. Even more, this dialogue will involve academic researchers specialized in particular regions as facilitators. Through this triangulation, we are developing a process model that can create collaborative spaces beyond the purview of corporate high priests. AI4Dignity’s process model aims to stabilize a more encompassing collaborative structure in which “hybrid” models of human-machine filters are able to incorporate dynamic reciprocity between critical communities, such as independent fact-checkers, AI developers, and academic researchers.

These efforts by no means escape the widening net of data capitalism. But they might at least temper its hubris and address its undeclared bias by establishing a grassroots people-centric intervention that avows the principle of restraint and containment—to remove rather than add to the pile.

Banner photo: Austin Distel/Unsplash.

References:

1
Maarten Sap et al., “The Risk of Racial Bias in Hate Speech Detection,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Florence, Italy: Association for Computational Linguistics, 2019), 1668–1678.
2
These entries were done in the “create tools for commenters” dialogue box of Perspective API.
3
Sahana Udupa and Matti Pohjonen, “Extreme Speech and Global Digital Cultures,” International Journal of Communication 13 (2019): 3049–3067.
4
Sahana Udupa, “Nationalism in the Digital Age: Fun as a Metapractice of Extreme Speech,” Journal of Communication 13 (2019): 3143–3163.
5
New York: New York University Press, 2018More Info →
6
New York: Public Affairs, 2019More Info →
7
Cambridge, MA: MIT Press, 2017More Info →
8
Laura Forlano, “Invisible Algorithms, Invisible Politics,” Public Books blog, February 2, 2018.
9
Haji Mohammed Saleem et al., “A Web of Hate: Tackling Hateful Speech in Online Social Spaces,” arXiv, September 28, 2017.
10
The project is funded by the European Research Council Proof of Concept Grant (2020).