Offensive Language Detection Using Soft Voting Ensemble Model

Loading...
Thumbnail Image
Date
2023-06-30
Authors
Fieri, Brillian
Suhartono, Derwin
ORCID
Advisor
Referee
Mark
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Automation and Computer Science, Brno University of Technology
Altmetrics
Abstract
Offensive language is one of the problems that have become increasingly severe along with the rise of the internet and social media usage. This language can be used to attack a person or specific groups. Automatic moderation, such as the usage of machine learning, can help detect and filter this particular language for someone who needs it. This study focuses on improving the performance of the soft voting classifier to detect offensive language by experimenting with the combinations of the soft voting estimators. The model was applied to a Twitter dataset that was augmented using several augmentation techniques. The features were extracted using Term Frequency-Inverse Document Frequency, sentiment analysis, and GloVe embedding. In this study, there were two types of soft voting models: machine learning-based, with the estimators of Random Forest, Decision Tree, Logistic Regression, Naïve Bayes, and AdaBoost as the best combination, and deep learning-based, with the best estimator combination of Convolutional Neural Network, Bidirectional Long Short-Term Memory, and Bidirectional Gated Recurrent Unit. The results of this study show that the soft voting classifier was better in performance compared to classic machine learning and deep learning models on both original and augmented datasets.
Description
Citation
Mendel. 2023 vol. 29, č. 1, s. 1-6. ISSN 1803-3814
https://mendel-journal.org/index.php/mendel/article/view/211
Document type
Peer-reviewed
Document version
Published version
Date of access to the full text
Language of document
en
Study field
Comittee
Date of acceptance
Defence
Result of defence
Document licence
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license
http://creativecommons.org/licenses/by-nc-sa/4.0
Collections
Citace PRO