Filtering Toxic Comments using NLP

Worked with Garima Sharma, Hongjia Xu, Meng(Mona) Xu, Jinwei Wu

(UC Irvine MSBA ‘21)


As more and more individuals utilize online platforms to share information, thoughts, and opinions to the public, the need for oversight becomes necessary. This is due to the fact that some individuals post with the intent to hurt, insult, and bring hatred to another person or group. This is a problem that must be addressed, and utilizing Natural Language Processing will be the core of this project. Several Machine Learning models have been developed and deployed to filter out the unruly language and protect internet users from becoming victims of online harassment.

The background for this problem comes from the numerous online forums where people actively participate and post comments. According to the 2014 Pew Research Center study, there were 22% Internet users who had been victims of online harassment in the comment section of a website. Toxic comments can sometimes be abusive, insulting or even hateful, it is the host institution’s responsibility to ensure that these conversations are not negative.


In the dataset, the total samples of comments are 1,048,576. There are 6 categories of multi-label classification which are toxic , severe toxic, obscene , threat, insult, and identity hate. The text of the individual comment is found in the comment_text column. Each comment in the Training file has a toxicity label, and models should predict the target toxicity for the Test data. These attributes are fractional values which represent the fraction of human raters who believed the attribute applied to the given comment. For evaluation, test set examples with target >= 0.5 will be considered to be in the positive class.

High Level Goal: To protect users from being exposed to offensive language on online forums or social media sites, companies have started flagging comments and blocking users who are found guilty of using unpleasant language.

GOAL: To build a multi-headed model that’s capable of detecting different types of toxicity such as threats, obscenity, insults, and identity-based hate. cyberbullying.

Data Descriptions:

There is a clear class imbalance, as only about 8% of the comments are toxic. It is reasonable to assume that, given homogeneous sampling, there will always be a bigger proportion of non-toxic comments.

It looks like insult toxicity types appear more often than others. On the other hand, the threat appears in a much lower percentage. This fact may lead to poor classification performance in regards to these classes. In the model evaluation stage, we’ll pay attention to the ratios of False Negatives and False Positives for the minority subgroups.

The first chart shows that most input texts are below the 400 character mark. The second chart shows that the toxic comments distribution has a higher mean than the non-toxic, which shows that such comments generally tend to be more negative on average.

Topic Modeling

Utilized Pandas to create data frames and remove missing values in the test data,

Converted sentences to words by removing punctuation, distracting single quotes, newline characters, and emails

Define a function for stopwords, bigrams, and trigrams

Do lemmatization keeping only nouns, adj, verb, and adverbs.

LDA Model

Create a dictionary, corpus, and document frequency table to build the LDA model

Next, we compute the perplexity and coherence score, which is shown below

After Analysis of the toxic comments, we find that these topics are the ones that these toxic comments are regarding: Trump, Canada Government, USA, Jobs, Family

Classification Algorithms Used/Methodology:

In order for us to speed up the run time, we choose to transform the target to Toxic. For every column, if target>=0.5, we labeled it as Toxic==1, otherwise 0. This new column Toxic will be used as y variables for us to build the model.

We sample the data to use only 20% of random sampling in order to shorten the run time for the model. The 20% sampling data contains 360,975 entries. The new dataset will shown below:

Model Preprocessing:

We split data into training and test, 75% and 25% respectively.

Use TF-IDF to value the term importance

Fit and transform training and validation X variables


We decided to use Logistic regression and SVM model and run the two model with original imbalance dataset

The accuracy is 0.92. However, the recall for 1 is 0.01 for logistic regression and 0.00 for SVM which means that the models are not predicting any Toxic comments. The 92% accuracy is mainly coming from the 92% of Non-toxic comment labels.

We decided to use resampling methods to further model the dataset:

Resampling Model 1: Random Over Sampling

This sampling method over-sample the minority group (Toxic=1) by picking samples at random with replacement (duplicate random minority group)

Accuracy for both LR and SVM is 0.67 while the recall for 1 is 0.61 for both LR and SVM. The model sacrifices the overall accuracy to predict a higher accuracy on True Positive.

Resampling Model 2: SMOTE

Synthetic minority oversampling technique aims to balance class distribution by randomly increasing minority class examples by replicating them (randomly creating new entries based on similar entries for minority groups).

For this method, Accuracy is 0.67 and 0.67 for LR and SVM respectively. The recall for predicting True positive is 0.50 and 0.51 for LR and SVM. This model performs worse than the previous resampling method.

Resampling method 3: Random Undersampling

Random undersampling involves randomly selecting examples from the majority class and deleting them from the training dataset.

The accuracy for both models is 0.66 and the recall for predicting true positive cases is 0.62 for both models. This sampling method has the best performance overall.

Conclusion/Key Findings:

After evaluating the results of each resampling model, we found that Random undersampling provides the best result for resampling the dataset. For the next step, we will want to try embedded models and try different parameters to see if we can improve the overall performance of the model. (Accuracy, f1 and recall for predicting TP are low for all sampling methods). We also try more resampling learning such as ENN or combined resampling to improve the performance and try to plot cost-sensitive learning to see the cost of misclassification error.

In this project, the key takeaway for us is that we were able to perform several different resampling techniques to handle imbalanced datasets.