Content Moderation Platform – a fight against hate speech in the Internet

Natural language processing has a great potential in IT. We used it to develop a tool fighting Internet hate speech!

Natural language processing has a great potential in creating modern software. This technology is used, among other things, to program intelligent speakers, auto-fill text in search engines, or develop keywords. However, these are not all its possibilities! It also works well in content moderation, and the best example is the Content Moderation Platform – a tool created by our team.

"WP Moderacje" project

Content Moderation Platform is a part of the WP Moderacje project created in cooperation with Wirtualna Polska, which was financed by Google. The project is aimed to improve the internal system for moderating comments used by Wirtualna Polska team.

WEBSENSA team, as a technological partner, was responsible for creating a new web application for manual comments moderation and building an automated comment moderation platform, called Content Moderation Platform.

About Content Moderation Platform

Content Moderation Platform was created to help moderating comments appearing on the Wirtualna Polska website.It is a tool that uses natural language processing to classify sending content. Based on an in-depth analysis, the system determines whether a particular comment should be published on the website.

Project Purposes

Our tool is intended to help in the fight against hatred on the Internet, which is unfortunately common today.Furthermore, it is designed to facilitate the job of people who work as moderators and would normally need to approve or reject comments manually. Thanks to our tool, the moderators have the time to carry out new, more creative tasks, such as engaging in discussion and community building. As a result, their duties' new scope will increase the attachment and engagement of Wirtualna Polska users.

A dataset of comments

To create a model, Wirtualna Polska provided us with:

  • about 18 million comments, which were used as a corpus needed to build the language model;
  • about 235 thousand comments moderated manually by Wirtualna Polska employees, which we used to create the model (thanks to which our algorithm learned how to moderate comments according to company policy).

We did not use additional texts for this task, such as, for example, articles from Wikipedia. – The available literature on natural language processing for the Polish language suggested that such a combination does not help. This is because the " pure " Polish language is very different from the language in the comments, which has a lot of borrowings, mental shortcuts, "Internet dialects" and, most of all, linguistic errors – explains Konrad Krawczyk, Machine Learning Engineer, who took part in the construction of the CMP.

To teach our system to read the intentions of the comments, we cleared them from unnecessary attributes that only added noise to the sentence itself. – We removed such noise or replaced it with special tokens that did not make any noise and informed us that such an attribute was there - explains Konrad Krawczyk.

Estimation of the certainty of decisions

As we know, both machines and people are not infallible. That is why our platform, during the evaluation, determines the extent to which it is sure of its decision.

And what if the model is uncertain? – A comment sent to the prediction can be classified as accepted or rejected, and if the model is uncertain, it is redirected to the moderator – explains Konrad Krawczyk. – The decision on the comment is made by assessing the probability, i.e., how confident the model is in its decision.

In the system, we have set a certainty threshold, which determines whether a particular comment should have additional verification. If the level of certainty is low, then the final decision is made by the moderator.

More than blocking vulgarisms

As you can guess, our system blocks all sorts of vulgarities, but it's more than just that. It is an advanced solution with many more features used to determine whether a comment is full of negative emotions and can be qualified as hate speech. – It is not a tool for catching curses because a simple list of keywords would be enough – explains Jan Twardowski, the project coordinator. – The strength of this solution lies in the fact that our model can understand the user's intentions, i.e., whether a particular comment contains negative emotions, even if there are no vulgar words.

CMP model can understand the user's intentions, even if there are no vulgar words in the comment

To create such an advanced tool, we focused primarily on analyzing the meaning of the content and emotions accompanying its authors. Proper processing of the entire content of the comments allowed the system to learn to recognize the emotions expressed not only in words but also with additional attributes such as special characters or emoticons.

Moreover, in our platform, we also included behavioral data from other WP systems, which CMP assists in its assessment. It allowed us to introduce even more advanced possibilities, such as, for example, blocking bot traffic.

Technologies used in the project

To create this project, we decided to choose Python. As Jan Twardowski explains, we chose Python because it is currently the best possible solution. The language has many useful libraries for various purposes, allowing for data analysis, data processing, and model writing. It works at all steps of creating projects related to machine learning or natural language processing.

See also: Top 5 Programming Languages for Artificial Intelligence

Python libraries were extremely useful in developing the CMP. The model we created was based on the PyTorch library, while for data processing, we used pandas library.

Neural network

Our system's critical element is the Polish language model, which allows it to analyze the meaning of a particular comment and determine whether it should be published on the page.

Our model was based on a neural network, which from the comment created its numerical representation based on the corpus, built using the SentencePiece tool.

Such a neural network consisted of about 7,000 neurons and allowed, among others:

  • proper division of the text into words or fragments of words,
  • embedding words in the overall language model,
  • evaluation of the "level of offense" of a particular comment.

Content Moderation Platform - effectiveness

Our model moderated comments from the validation collection with an efficiency of about 95%. It means that about 95% of the decisions about verifying the comments were consistent with the decisions of people working as moderators.

Moreover, our research shows that the model made fewer mistakes than people working as moderators. This result is related to the fact that people checking several hundred comments a day, manually classifying each comment, were losing their attention and then passing through comments that should not have been published.

Our CMP model turned out to be more effective than the solution of PolEval contest winners

Comparing the results, our model blocked comments more often than allowed for the publication of inappropriate content. Therefore, the system enables avoiding "human errors", caused by fatigue or oversight.

The effectiveness of the Content Moderation Platform is impressive. Interestingly, last year's participants of the PolEval contest had to create the best solution related to natural language processing for Polish language content to detect hate speech in tweets. We tested how our solution would work on the validation collection from this contest. The result exceeded our expectations because our project  turned out to be more effective than the winners' solution.

An impact of the CMP on the work of moderators

Did our system take away the job from people who worked as moderators? Of course not!

Only their responsibilities have changed. Instead of rejecting hateful comments, they can focus on other, more engaging activities. People working as moderators now take on the role of animators, activating the discussion and building a community around Wirtualna Polska. Their current work, therefore, requires more engagement and creativity, and thus also gives more satisfaction.

The Content Moderation Platform is a tool that we will continue to develop. It's excellent for moderating comments, but it can also be much more widely used.

Are you interested in solutions based on natural language processing and machine learning? Do you have an idea where else you can apply the technology? Write to us your thoughts under this post on our Facebook fanpage!

You may also like

How to find trending topics? Tools for journalists and editors

How to find trending topics? Tools for journalists and editors

AI tools are revolutionising modern journalism, providing newsrooms with information on trending topics. Read the article to learn more.

Publishing Industry Trends You Need to Know in 2023

Publishing Industry Trends You Need to Know in 2023

Pioneering publishing industry trends: constructive journalism, digital content, and artificial intelligence. Read about them in this article.