Enhancing Hate Speech Detection: Leveraging Emoji Preprocessing with BI-LSTM Model
Abstract
Microblogging platforms like Twitter enable users to rapidly share opinions, information, and viewpoints. However, the vast volume of daily user-generated content poses challenges in ensuring the platform remains safe and inclusive. One key concern is the prevalence of hate speech, which must be addressed to foster a respectful and open environment. This study explores the effectiveness of the Emoji Description Method (EMJ DESC), which enhances tweet classification by converting emojis into descriptive text or sentences. These descriptions are then encoded into numerical vector matrices that capture the meaning and emotional tone of each emoji. Integrated into a basic text classification model, these vectors help improve detection performance. The research examines how different emoji preprocessing strategies affect the performance of a BI-LSTM model for hate speech classification. Results show that removing emojis significantly reduces accuracy (68%) and weakens the model’s ability to distinguish between hate and non-hate speech, due to the loss of valuable semantic context. In contrast, retaining emoji semantics either through textual descriptions or embeddings boosts classification accuracy to 93% and 94%, respectively. The highest performance is achieved through emoji embedding, highlighting its ability to capture subtle non-verbal cues critically for accurate hate speech detection. Overall, the findings emphasize the importance of incorporating emoji-aware preprocessing techniques to enhance the effectiveness of social media content classification.
Downloads
References
V. B. Lestari, E. Utami, and Hanafi, "Combining Bi-LSTM and Word2vec Embedding for Sentiment Analysis Models of Application User Reviews," Indonesian Journal of Computer Science, vol. 13, no. 1, pp. 312–326, 2024, doi: 10.33022/ijcs.v13i1.3647.
A. Salau and T. K. Yesufu, "Recent Trends in Image and Signal Processing in Computer Vision," unpublished, Dec. 2020.
Y. A. Jasim, M. G. Saeed, and M. B. Raewf, "Analyzing Social Media Sentiment: Twitter as a Case Study," Advances in Distributed Computing and Artificial Intelligence Journal, vol. 11, no. 4, pp. 427–450, 2022, doi: 10.14201/adcaij.28394.
M. A. Fauzi and A. Yuniarti, "Ensemble method for Indonesian Twitter hate speech detection," Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 1, pp. 294–299, 2018, doi: 10.11591/ijeecs.v11.i1.pp294-299.
S. W. Azumah, N. Elsayed, Z. ElSayed, M. Ozer, and A. La Guardia, "Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks," arXiv preprint, 2024. [Online]. Available: http://arxiv.org/abs/2406.17793
O. Adel, K. M. Fathalla, and A. Abo ElFarag, "MM-EMOR: Multi-Modal Emotion Recognition of Social Media Using Concatenated Deep Learning Networks," Big Data and Cognitive Computing, vol. 7, no. 4, 2023, doi: 10.3390/bdcc7040164.
A. A. Arifiyanti and E. D. Wahyuni, "Emoji and emoticon in tweet sentiment classification," in Proc. 6th Information Technology International Seminar (ITIS), 2020, pp. 145–150, doi: 10.1109/ITIS50118.2020.9320988.
M. Amrullah, I. Budi, A. Santoso, and P. Putra, "The effect of using Emoji and Hashtag in sentiment analysis on Twitter case study: Indonesian online travel agent," in AIP Conference Proceedings, vol. 2023, p. 20013, 2023, doi: 10.1063/5.0118228.
M. J. Althobaiti, "BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis," International Journal of Advanced Computer Science and Applications, vol. 13, no. 5, pp. 972–980, 2022, doi: 10.14569/IJACSA.2022.01305109.
U. Ite, "Perbandingan IndoBERT dan Bi-LSTM Dalam Mendeteksi Pelanggaran," Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 1, pp. 52–59, 2025.
E. Aurora, A. Zahra, Y. Sibaroni, and S. Prasetyowati, "Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method," JINAV: Journal of Information and Visualization, vol. 4, no. 2, pp. 2746–1440, 2023, doi: 10.35877/454RI.jinav1864.
B. Jang, M. Kim, G. Harerimana, S. U. Kang, and J. W. Kim, "Bi-LSTM model to increase accuracy in text classification: Combining word2vec CNN and attention mechanism," Applied Sciences, vol. 10, no. 17, 2020, doi: 10.3390/app10175841.
A. R. Gunawan, R. Faticha, and A. Aziza, "Sentiment Analysis Using LSTM Algorithm Regarding Grab Application Services in Indonesia," Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 2, pp. 322–332, 2025.
V. Prasetyo and A. Samudra, "Hate speech content detection system on Twitter using K-nearest neighbor method," in AIP Conference Proceedings, vol. 2022, p. 50001, 2022, doi: 10.1063/5.0080185.
K. Keykhosravi, A. Hamednia, H. Rastegarfar, and E. Agrell, "Data preprocessing for machine-learning-based adaptive data center transmission," ICT Express, vol. 8, no. 1, pp. 37–43, 2022, doi: 10.1016/j.icte.2022.02.002.
K. Maharana, S. Mondal, and B. Nemade, "A review: Data pre-processing and data augmentation techniques," Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, 2022, doi: 10.1016/j.gltp.2022.04.020.
N. Pandey, P. K. Patnaik, and S. Gupta, "Data Pre Processing for Machine Learning Models using Python Libraries," International Journal of Engineering and Advanced Technology, vol. 9, no. 4, pp. 1995–1999, 2020, doi: 10.35940/ijeat.d9057.049420.
P. Gong, Y. Ma, C. Li, X. Ma, and S. H. Noh, "Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks," arXiv preprint, 2023. [Online]. Available: http://arxiv.org/abs/2304.08925
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Elsevier, 2011, doi: 10.1016/C2009-0-61819-5.
L. Saragih, M. Nababan, Y. Simatupang, and J. Amalia, "Analisis Self-Attention Pada Bi-Directional LSTM Dengan Fasttext Dalam Mendeteksi Emosi Berdasarkan Text," Zo. Jurnal Sistem Informasi, vol. 4, no. 2, pp. 144–156, 2022, doi: 10.31849/zn.v4i2.10846.
L. F. A. O. Pellicer, T. M. Ferreira, and A. H. R. Costa, "Data augmentation techniques in natural language processing," Applied Soft Computing, vol. 132, p. 109803, 2023, doi: 10.1016/j.asoc.2022.109803.
D. Wang and J. Eisner, "Synthetic data made to order: The case of parsing," in Proc. 2018 Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 1325–1337, 2018, doi: 10.18653/v1/d18-1163.
D. Raka, V. Saputra, and E. R. Arumi, "Optimizing Aspect-Based Sentiment Analysis for Kyai Langgeng Park Using PSO and SVM," Jurnal Ilmu Sistem Informasi, vol. 6, no. 4, pp. 2856–2867, 2024, doi: 10.51519/journalisi.v6i4.930.
A. Novanto and D. Indra, "Analisis Pre-processing Sentimen Terhadap Komentar Layanan Indihome pada Twitter," Jurnal Teknologi dan Sistem Informasi, vol. 5, no. 1, pp. 30–36, 2024.
A. P. J. Dwitama, D. H. Fudholi, and S. Hidayat, "Indonesian Hate Speech Detection Using Bidirectional Long Short-Term Memory (Bi-LSTM)," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 2, pp. 302–309, 2023, doi: 10.29207/resti.v7i2.4642.


Copyright (c) 2025 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)