Date of Defense

7-11-2024 10:00 AM

Location

E1-1021

Document Type

Dissertation Defense

Degree Name

Doctor of Philosophy (PhD)

College

CIT

Department

Information Security

First Advisor

Dr. Marton Gergely

Keywords

Social media, online gaming, artificial intelligence, natural language processing, large language models, content moderation, hate speech, harassment.

Abstract

With the increase in popularity of online communities, such as social media platforms, online games, and chatroom servers, there is a need to improve chat and content moderation. Platforms have reported an increase in the prevalence of toxic behavior and hate speech. Meanwhile, moderators are reporting difficulties in keeping up with the amount of data to check as well and the type of content they are exposed to, which further harms their own mental health. The main objective of this work is to address the challenges that exist within online communities with the rising prevalence of hate speech. Additionally, some of the burden of reviewing incidents and applying accountability will be lifted off any human moderators. Leveraging the evolution of Large Language Models (LLMs), a Natural Language Processing (NLP)-based solution is prototyped that would monitor chat messages passing through a server and classifying them according to their content. To do this, several LLMs are selected based on the reviewed literature and tested against a dataset containing chat messages sourced from various platforms. The LLM that “behaved” the best is selected to run on a simulated chat server to monitor chat messages and maintain a record of classifications. The results of the model comparison showed that DistilBERT using two separate datasets dedicated to training and testing performed the best with an accuracy of 79%, which was higher than what the other models achieved. When embedding this model server-side within the chat simulation, it maintained its 79% accuracy after testing it again with a separate test set containing 1000 entries. With these results, leveraging LLMs for content moderation shows promise, as presented by the prototype moderator presented in this work. With further optimization and more data, the performance of the model will improve according to the environment it is monitoring.

Share

COinS
 
Nov 7th, 10:00 AM

TACKLING TOXICITY AND HARASSMENT IN ONLINE ENVIRONMENTS THROUGH THE USE OF ARTIFICIAL INTELLIGENCE

E1-1021

With the increase in popularity of online communities, such as social media platforms, online games, and chatroom servers, there is a need to improve chat and content moderation. Platforms have reported an increase in the prevalence of toxic behavior and hate speech. Meanwhile, moderators are reporting difficulties in keeping up with the amount of data to check as well and the type of content they are exposed to, which further harms their own mental health. The main objective of this work is to address the challenges that exist within online communities with the rising prevalence of hate speech. Additionally, some of the burden of reviewing incidents and applying accountability will be lifted off any human moderators. Leveraging the evolution of Large Language Models (LLMs), a Natural Language Processing (NLP)-based solution is prototyped that would monitor chat messages passing through a server and classifying them according to their content. To do this, several LLMs are selected based on the reviewed literature and tested against a dataset containing chat messages sourced from various platforms. The LLM that “behaved” the best is selected to run on a simulated chat server to monitor chat messages and maintain a record of classifications. The results of the model comparison showed that DistilBERT using two separate datasets dedicated to training and testing performed the best with an accuracy of 79%, which was higher than what the other models achieved. When embedding this model server-side within the chat simulation, it maintained its 79% accuracy after testing it again with a separate test set containing 1000 entries. With these results, leveraging LLMs for content moderation shows promise, as presented by the prototype moderator presented in this work. With further optimization and more data, the performance of the model will improve according to the environment it is monitoring.