Integration of Hash Encoding Technique with Machine Learning for Employee Turnover Prediction
DOI:
https://doi.org/10.51519/journalisi.v7i2.1129Keywords:
Hash Encoding, Machine learning, Turnover Prediction, Random ForestAbstract
Employee turnover refers to the replacement of employees within an organization, which can lead to losses such as recruitment costs and decreased productivity. Predicting turnover is crucial for companies to anticipate and take appropriate actions to retain potential employees. This study aims to optimize the employee turnover prediction model by integrating hash encoding techniques and machine learning. The dataset used in this study is an open-source dataset obtained from Kaggle dataset. It consists of 14,994 rows and 10 columns (features) representing employee-related information such as satisfaction level, evaluation score, number of projects, average monthly hours, and whether the employee left the company. Among these features, some are of object data type. Since machine learning algorithms generally cannot work directly with object-type features, the use of hash encoding is proposed. This technique converts object-type data into numerical data. It is part of the preprocessing stage, aiming to reduce memory usage, speed up data preprocessing, and improve model performance. After preprocessing is completed, the prediction model is trained using the Random Forest algorithm to predict employee turnover. The evaluation is conducted using accuracy, recall, precision, and F1-score metrics, which yielded results of 0.988, 0.961, 0.988, and 0.974, respectively. These results indicate that the integration of hash encoding techniques and machine learning can produce a well-performing model for predicting employee turnover.
Downloads
References
K. S. Andrews and T. Mohammed, “Strategies for Reducing Employee Turnover in Small- and Medium-Sized Enterprises,” Westcliff International Journal of Applied Research, vol. 4, no. 1, pp. 57–71, Nov. 2020, doi: 10.47670/wuwijar202041katm.
A. F. Lestari, Y. M. Fauzi, A. I. Wazdi, and A. M. Sarusu, “Pengaruh Komitmen Organisasi dan Stres Kerja terhadap Turnover Intention Karyawan di PT BPRS HIK Parahyangan Bandung,” Jurnal Dimamu, vol. 1, no. 1, pp. 23–36, 2021, doi: 10.32627.
A. Wijaya, Tannia, Handoko, J. Matthew Karsten, and S. J. Salim, “The Effect Of Authentic Leadership On Turnover Intention In Service Sector With Work Engagement As Mediator,” Jurnal Muara Ilmu Ekonomi dan Bisnis, vol. 8, no. 1, pp. 75–86, Apr. 2024, doi: 10.24912/jmieb.v8i1.28150.
D. Ningsih, Maftukhin, I. D. Mulyani, A. Niasari, A. Sholeha, “Pengaruh Turnover dan Inventory Turnover terhadap Perubahan Laba pada Perusahaan Pertambangan Turnover and Inventory Turnover on Profit Changes in Mining Companies”, Journal of Accounting and Finance, vol.1, no.1, 2019.
P. Kumar, S. B. Gaikwad, S. T. Ramya, T. Tiwari, M. Tiwari, and B. Kumar, “Predicting Employee Turnover: A Systematic Machine Learning Approach for Resource Conservation and Workforce Stability †,” Engineering Proceedings, vol. 59, no. 1, 2023, doi: 10.3390/engproc2023059117.
M. Atef, D. S. Elzanfaly, and S. Ouf, “Early Prediction of Employee Turnover Using Machine Learning Algorithms 135 Original Scientific Paper”, International Journal of Electrical and Computer Engineering Systems, vol.13, no.2, 2022.
Y. Zhang, Z. Cai, and H. Fei, “Predicting Employee Turnover in High-Tech Enterprises Using Machine Learning: Based on the Psychological Contract Perspective”, Atlantis Press, pp. 341–352, 2024, doi: 10.2991/978-94-6463-488-4_38.
M. Al Akasheh, O. Hujran, E. Faisal Malik, and N. Zaki, “Enhancing the Prediction of Employee Turnover with Knowledge Graphs and Explainable AI,” IEEE Access, vol. 12, pp. 77041–77053, 2024, doi: 10.1109/ACCESS.2024.3404829.
J. Park, Y. Feng, and S. P. Jeong, “Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-023-50593-4.
G. Obaido et al., “Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects,” Machine Learning with Applications, vol. 17, p. 100576, Sep. 2024, doi: 10.1016/j.mlwa.2024.100576.
X. Huang, H. Chen, and Z. Zhang, “Design and Application of Deep Hash Embedding Algorithm with Fusion Entity Attribute Information,” Entropy, vol. 25, no. 2, Feb. 2023, doi: 10.3390/e25020361.
P. Cerda and G. Varoquaux, “Encoding High-Cardinality String Categorical Variables,” IEEE Trans Knowl Data Eng, vol. 34, no. 3, pp. 1164–1176, Mar. 2022, doi: 10.1109/TKDE.2020.2992529.
K. R. Putra and M. A. Rachman, “Perbandingan Metode Content-based, Collaborative dan Hybrid Filtering pada Sistem Rekomendasi Lagu,” MIND Journal, vol. 9, no. 2, pp. 179–193, Dec. 2024, doi: 10.26760/mindjournal.v9i2.179-193.
L. N. Aina, V. R. S. Nastiti, C. S. K. Aditya, “Implementasi Extra Trees Classifier dengan Optimasi Grid Search CV pada Prediksi Tingkat Adaptasi”, MIND (Multimedia Artificial Intelligent Networking Database)”, 2024, doi: 10.26760/mindjournal.v9i1.78-88.
D. Breskuvien and G. Dzemyda, “Categorical Feature Encoding Techniques for Improved Classifier Performance when Dealing with Imbalanced Data of Fraudulent Transactions,” International Journal of Computers, Communications and Control, vol. 18, no. 3, 2023, doi: 10.15837/ijccc.2023.3.5433.
M. Andrecut, “Additive Feature Hashing,” 2021, doi: 10.48550/arXiv.2102.03943.
A. Zheng and A. Casari, “Feature engineering for machine learning : principles and techniques for data scientists”. O’Reilly Media, 2018.
C. García-Vicente et al., “Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors,” Applied Sciences (Switzerland), vol. 13, no. 7, Apr. 2023, doi: 10.3390/app13074119.
I. Moura, A. Teles, D. Viana, J. Marques, L. Coutinho, and F. Silva, “Digital Phenotyping of Mental Health using multimodal sensing of multiple situations of interest: A Systematic Literature Review,” Feb. 01, 2023, Academic Press Inc. doi: 10.1016/j.jbi.2022.104278.
A. R. Kamila, J. F. Andry, A. W. C. Kusuma, E. W. Prasetyo, and G. H. Derhass, “Analysis Comparison of K-Nearest Neighbor, Multi-Layer Perceptron, and Decision Tree Algorithms in Diamond Price Prediction,” COGITO Smart Journal, vol. 10, no. 2, 2024.
J. Park, Y. Feng, and S. P. Jeong, “Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques,” Sci Rep, vol. 14, no. 1, 2024, doi: 10.1038/s41598-023-50593-4.
M. Cabanillas-Carbonell and J. Zapata-Paulini, “Evaluation of machine learning models for the prediction of Alzheimer’s: In search of the best performance,” Brain Behav Immun Health, vol. 44, Mar. 2025, doi: 10.1016/j.bbih.2025.100957.
A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Jun. 15, 2024, Elsevier Ltd. doi: 10.1016/j.eswa.2023.122778.
A. R. Kamila, F. Adikara, C. Herdian, and Sutrisno, “Pengaruh Penambahan Fitur dengan Perbandingan Algoritma berbasis Bagging dan Boosting pada Deteksi Phishing Link”, JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol.10, no.3, 2024.
J. Brabec and L. Machlica, “Decision-Forest Voting Scheme for Classification of Rare Classes in Network Intrusion Detection”, IEEE International Conference on Systems, Man, and Cybernetics, pp. 3325–3330, 2018.
Downloads
Published
Issue
Section
License
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














