Ensemble Learning for Pediatric Stunting Detection: A Comparative Study of XGBoost, Random Forest, and LightGBM with Oversampling Techniques
DOI:
https://doi.org/10.63158/journalisi.v8i2.1568Keywords:
Stunting Detection, Ensemble Learning, Imbalanced Classification, Oversampling, SMOTEAbstract
Stunting, driven by chronic childhood malnutrition, remains a critical global public health concern. Early detection is persistently challenged by class imbalance in pediatric health datasets and the absence of systematic comparisons between oversampling strategies and ensemble classifiers. This study develops and evaluates an ensemble learning pipeline for stunting detection, benchmarking XGBoost, Random Forest, and LightGBM across five oversampling configurations — Original, SMOTE, ADASYN, Borderline-SMOTE, and SMOTE-ENN — using 10,000 pediatric health records from posyandu activities in Bangka Belitung Province, Indonesia. Seven anthropometric and demographic features were utilized, with stratified 80:20 train-test splitting and five-fold cross-validation. XGBoost with original imbalanced data achieved the highest Recall (0.9573) and a competitive F1-Score (0.9158), while LightGBM with SMOTE delivered the strongest balanced performance (F1-Score: 0.9160, ROC-AUC: 0.8431). SMOTE-ENN consistently underperformed across all classifiers. To our knowledge, this is the first study to simultaneously compare five oversampling strategies across three ensemble models within a unified framework, offering a foundation for high-sensitivity stunting surveillance in resource-constrained healthcare settings.
Downloads
References
[1] UNICEF, WHO, and World Bank, Levels and Trends in Child Malnutrition: UNICEF/WHO/World Bank Group Joint Child Malnutrition Estimates, Key Findings of the 2023 Edition. Geneva, Switzerland: World Health Organization, 2023.
[2] C. G. Victora et al., “Maternal and child undernutrition: Consequences for adult health and human capital,” Lancet, vol. 371, no. 9609, pp. 340–357, Jan. 2008, doi: 10.1016/S0140-6736(07)61692-4.
[3] Kementerian Kesehatan Republik Indonesia, Hasil Survei Status Gizi Indonesia (SSGI) Tahun 2022. Jakarta, Indonesia: Kemenkes RI, 2023.
[4] T. Vaivada, N. Akseer, S. Akseer, A. Somaskandan, M. Stefopulos, and Z. A. Bhutta, “Stunting in childhood: An overview of global burden, trends, determinants, and drivers of decline,” Am. J. Clin. Nutr., vol. 112, suppl. 2, pp. 777S–791S, Aug. 2020, doi: 10.1093/ajcn/nqaa159.
[5] A. T. Mulyani, M. A. Khairinisa, and A. Khatib, “Understanding stunting: Impact, causes, and strategy to accelerate stunting reduction—a narrative review,” Nutrients, vol. 17, no. 5, p. 879, Feb. 2025, doi: 10.3390/nu17050879.
[6] L. Swastina, B. Rahmatullah, A. Saad, and H. Khan, “A systematic review on research trends, datasets, algorithms, and frameworks of children’s nutritional status prediction,” IAES Int. J. Artif. Intell. (IJ-AI), vol. 13, no. 2, pp. 1868–1877, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1868-1877.
[7] N. Novalina, I. A. A. Tarigan, F. K. Kameela, and M. Rizkinia, “Benchmarking machine learning algorithm for stunting risk prediction in Indonesia,” Bull. Electr. Eng. Inform., vol. 14, no. 3, pp. 2252–2263, Jun. 2025, doi: 10.11591/eei.v14i3.8997.
[8] S. Ndagijimana, I. H. Kabano, E. Masabo, and J. M. Ntaganda, “Prediction of stunting among under-5 children in Rwanda using machine learning techniques,” J. Prev. Med. Public Health, vol. 56, no. 1, pp. 41–49, Jan. 2023, doi: 10.3961/jpmph.22.367.
[9] Y. S. Dewi, S. Hastuti, and M. Fatekurohman, “Analysis of stunting in East Java, Indonesia using random forest and geographically weighted random forest regression,” Braz. J. Biometr., vol. 42, no. 3, pp. 213–224, 2024, doi: 10.28951/bjb.v42i3.679.
[10] A. A. G. Y. Pramana, M. F. Maulana, M. C. Tirtayasa, and D. A. Tyas, “Enhancing early stunting detection: A novel approach using artificial intelligence with an integrated SMOTE algorithm and ensemble learning model,” in Proc. IEEE Conf. Artif. Intell. (CAI), Singapore, Jun. 2024, pp. 486–493, doi: 10.1109/CAI59869.2024.00098.
[11] T. Sugihartono, B. Wijaya, Marini, A. F. Alkayes, and H. A. Anugrah, “Optimizing stunting detection through SMOTE and machine learning: A comparative study of XGBoost, Random Forest, SVM, and k-NN,” J. Appl. Data Sci., vol. 6, no. 1, pp. 667–682, Jan. 2025, doi: 10.47738/jads.v6i1.494.
[12] M. A. Hamid and E. R. Subhiyakto, “Performance comparison of Random Forest, SVM, and XGBoost algorithms with SMOTE for stunting prediction,” J. Appl. Informat. Comput. (JAIC), vol. 9, no. 4, pp. 1163–1169, Aug. 2025, doi: 10.30871/jaic.v9i4.9701.
[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
[14] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” in Proc. Int. Conf. Intell. Comput. (ICIC), Hefei, China, Aug. 2005, pp. 878–887, doi: 10.1007/11538059_91.
[15] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Hong Kong, China, Jun. 2008, pp. 1322–1328, doi: 10.1109/IJCNN.2008.4633969.
[16] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, Jun. 2004, doi: 10.1145/1007730.1007735.
[17] B. Krawczyk, “Learning from imbalanced data: Open challenges and future directions,” Prog. Artif. Intell., vol. 5, no. 4, pp. 221–232, Nov. 2016, doi: 10.1007/s13748-016-0094-0.
[18] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
[19] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), Montreal, Canada, Aug. 1995, pp. 1137–1143.
[20] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, CA, USA, Aug. 2016, pp. 785–794, doi: 10.1145/2939672.2939785.
[21] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.
[22] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, USA, Dec. 2017, pp. 3146–3154.
[23] I. Tsamardinos, E. Greasidou, and G. Borboudakis, “Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation,” Mach. Learn., vol. 107, no. 12, pp. 1895–1922, 2018.
[24] G. Lemaître, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning,” J. Mach. Learn. Res., vol. 18, no. 17, pp. 1–5, 2017.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














