Jay Kemp
- Feb 15
- 4 min read

To fight extremism and hate speech online, invest in AI-powered content moderation

This piece was originally published for the Reboot Democracy blog.

Last month, Forbes covered a report from Australia’s online safety commissioner, which revealed that Elon Musk – since his reign at X( formerly known as Twitter)began – has fired over 80% of the platform’s “trust and safety” (T&S) engineers and a third of the non-engineers working on the same T&S team. Musk’s firing of over 1,000 of these employees since 2022 and his crusade to reinstate banned accounts has created a “perfect storm” for the spread of abusive content online, in the watchdog’s own words.

This is not an update to ignore. Online hate has only grown since the pandemic, and may we be reminded that hate crimes only surge during election years. After 2020’s Trump-Biden showdown, anti-Black hate crimes increased by 14%, anti-Hispanic or anti-Latino hate crimes increased by 35%, and anti-Asian hate increased by 168%.

The approaching 2024 election promises new waves of political extremism, while the pandemic has forced us to find community in our online spaces more than ever before. Yet, the digital commons remain unsafe. With the backdrop of significant layoffs at major social media platforms, a reduction in the workforce dedicated to content moderation, and a rollback of policies aimed at safeguarding online spaces, the question arises: how can we hope to combat this digital tide of hate and extremism?

The answer may just lie in AI-powered content moderation.

Old Dog, New Updates

Traditional content moderation relies heavily on human moderators to analyze and manage the vast majority of content that goes against a platform’s community guidelines – whether hate speech, disinformation, harassment, or other inappropriate behaviors. Not only does manual oversight require a vast “suck” of resources from the platform, but trawling through offensive content is incredibly draining and difficult on human moderators.

To supplement workers, many social media platforms have been using some form of algorithm or machine-learning moderator for decades. Helping sift through the tidal wave of content, some tools use keywords, black lists, and flags to remove content as it's posted and reduce the load on human moderators. Certain algorithms count flesh-toned pixels to filter nudity or pornography. However, research has shown that simpler, “rule-based” algorithms are “inherently fragile to the nuances of natural language.”

It begs asking – what would happen if these platforms instead more heavily deputized data-driven, deep learning methods? Such deep neural networks, more similar to the AI tools we’ve seen flourish this year, are able to learn richer, more precise representations and extrapolate these representations to new data. Data can teach them to recognize patterns, nuances, and anomalies indicative of hate speech, disinformation, and extremist propaganda – say, from open-source datasets like HateCheck, compiled for research using AI in hate speech detection.

AI-powered content moderation offers several compelling benefits over the resource-intensive human system. Scalability easily tops that list, allowing moderation-focused programs to keep pace with the explosive growth of online content as digital first-responders struggle to stay ahead. The urgency is made even more pressing by the new need to combat tidal waves of AI-generated spam and disinformation. Why not "fight fire with fire?"

AI systems can also review content as it's posted. The speed of AI tools is particularly crucial for moderating live streams, a place where social media platforms have consistently struggled to detect and mitigate harmful content. An AI-powered approach can automatically detect any harmful cases before they go live. Look at online shopping in China, where the Chinese government is struggling to keep up with the multitude of gamers and influencers now selling products through digitally-created avatars in 24/7 live streams.

To be sure, LLMs are expensive to run and operate, so it’s unlikely any platform could be meaningfully moderated exclusively via generative AI. Instead, social media companies should construct and rely on a series of smaller, targeted tools that use machine-learning to precisely moderate one aspect of disallowed content. For example, Facebook’s content moderation rules have recently shifted from its former “colorblind” approach to one that weighs identity categories differently; they could further this effort by separately training a series of targeted AI tools on the complex dynamics of how different minoritized populations experience oppression and hate online, with each tool tasked to protect a different identity group.

Learning Systems

Bias, while mitigated, cannot be entirely eliminated. That’s why it’s hugely consequential that these AI-powered systems can learn. Through continuous feedback loops with human moderators, these systems can adjust, reducing bias and increasing effectiveness over time. AI systems can then turn around and immediately implement that feedback, ensuring a quicker and fairer route to a more impartial moderation process. While it’s not a cure for bias, it’s absolutely a salve that can only be improved over time.

Additionally, we must be wary of over-censorship, especially in a pre-moderation system. Having AI systems rely on years of existing data and guidelines might lead to excessive homogenization, potentially stifling free expression. Again, that’s the necessity of our “AI on tap, not on top” approach, which centralizes decision-making and course alteration in humanity while outsourcing task-based functions to AI-powered assistants. Self-effacing reliance on previous data can be mitigated by human feedback on old patterns or new threats – which AI systems are particularly suited to quickly meet the challenge of.

The automation of natural language processing represents a vital tool in the digital arsenal. AI-powered content moderation systems that are built and trained thoughtfully will be a powerful tool in scaling to meet the ever-growing amount of online content, filtering out hate speech and disinformation ahead of the 2024 elections, protecting human moderators, and sharpening our ability to adapt to new harms.

To fight extremism and hate speech online, invest in AI-powered content moderation

Recent Posts

Comments