A Comparative Study of Drug Prediction Models using KNN, SVM, and Random Forest

Susi Eva Maria Purba

doi:10.51519/journalisi.v7i1.1013

Authors

Susi Eva Maria Purba Institut Teknologi Del, Indonesia

DOI:

https://doi.org/10.51519/journalisi.v7i1.1013

Keywords:

Drug classification, Machine Learning, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest, Predictive Modeling

Abstract

Accurate drug classification is essential in medical decision-making to ensure patients receive appropriate prescriptions based on their physiological and biochemical characteristics. This study compares the performance of K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest models in predicting drug prescriptions using patient attributes such as age, sex, blood pressure, cholesterol level, and sodium-to-potassium ratio. The dataset, obtained from Kaggle, was preprocessed and split into training and testing sets to evaluate model performance using accuracy as the primary metric. The results indicate that Random Forest outperformed KNN and SVM, achieving a perfect test accuracy of 100%, demonstrating superior generalization and robustness. SVM also performed well, with a test accuracy of 97.50%, while KNN achieved the lowest accuracy of 70%, indicating its limitations in handling complex feature interactions. These findings highlight the effectiveness of ensemble learning methods in medical classification tasks, suggesting that Random Forest is the most suitable model for drug prediction. Furthermore, the potential applications of these findings in clinical settings could enhance treatment outcomes and patient care. Future research should explore feature engineering techniques, larger datasets, and additional machine learning approaches to enhance predictive accuracy and applicability in real-world healthcare settings.

Downloads

Download data is not yet available.

References

C. Silpa, B. Sravani, D. Vinay, C. Mounika, and K. Poorvitha, “Drug Recommendation System in Medical Emergencies using Machine Learning,” in Proc. Int. Conf. Innov. Data Commun. Technol. Appl. (ICIDCA), 2023, pp. 107–112, doi: 10.1109/ICIDCA56705.2023.10099607.

C. Chen, “Research on Drug Classification Using Machine Learning Model,” Highlights Sci. Eng. Technol. (EMIS), vol. 2023, p. 350, 2024, doi: 10.54097/nfpj0845.

A. Harry, “Revolutionizing Healthcare: How Machine Learning is Transforming Patient Diagnoses—A Comprehensive Review of AI’s Impact on Medical Diagnosis,” BULLET: J. Multidiscip. Sci., vol. 2, pp. 1259–1266, 2023.

S. Crisafulli, A. Fontana, L. L’Abbate, G. Vitturi, A. Cozzolino, D. Gianfrilli, M. C. De Martino, B. Amico, C. Combi, and G. Trifirò, “Machine learning-based algorithms applied to drug prescriptions and other healthcare services in the Sicilian claims database to identify acromegaly as a model for the earlier diagnosis of rare diseases,” Sci. Rep., vol. 14, no. 1, p. 6186, 2024, doi: 10.1038/s41598-024-56240-w.

F. Aldi, I. Nozomi, and S. Soeheri, “Comparison of Drug Type Classification Performance Using KNN Algorithm,” SinkrOn, vol. 7, no. 3, pp. 1028–1034, Jul. 2022, doi: 10.33395/sinkron.v7i3.11487.

B. A. Badwan, G. Liaropoulos, E. Kyrodimos, D. Skaltsas, A. Tsirigos, and V. G. Gorgoulis, “Machine learning approaches to predict drug efficacy and toxicity in oncology,” Cell Rep. Methods, vol. 3, no. 2, 2023, doi: 10.1016/j.crmeth.2023.100413.

S. Dara, S. Dhamercherla, S. S. Jadav, C. M. Babu, and M. J. Ahsan, “Machine Learning in Drug Discovery: A Review,” Artif. Intell. Rev., vol. 55, no. 3, pp. 1947–1999, Mar. 2022, doi: 10.1007/s10462-021-10058-4.

H. Zhao, J. Zhong, X. Liang, C. Xie, and S. Wang, “Application of machine learning in drug side effect prediction: databases, methods, and challenges,” Front. Comput. Sci., vol. 19, no. 5, p. 195902, 2025, doi: 10.1007/s11704-024-31063-0.

F. Aldi, I. Nozomi, and S. Soeheri, “Comparison of Drug Type Classification Performance Using KNN Algorithm,” SinkrOn, vol. 7, no. 3, pp. 1028–1034, Jul. 2022, doi: 10.33395/sinkron.v7i3.11487.

R. Hoque, M. Billah, A. Debnath, S. M. S. Hossain, and N. B. Sharif, “Heart Disease Prediction using SVM,” Int. J. Sci. Res. Arch., vol. 11, no. 2, pp. 412–420, Mar. 2024, doi: 10.30574/ijsra.2024.11.2.0435.

R. Meenal, P. A. Michael, D. Pamela, and E. Rajasekaran, “Weather prediction using random forest machine learning model,” Indones. J. Electr. Eng. Comput. Sci., vol. 22, no. 2, pp. 1208–1215, May 2021, doi: 10.11591/ijeecs.v22.i2.pp1208-1215.

A. Rajdhan, A. Agarwal, and M. Sai, “Heart Disease Prediction using Machine Learning,” Int. J. Eng. Res. Technol. (IJERT), no. 4, Apr. 2020, doi: 10.17577/IJERTV9IS040614.

R. N. Ndanuko, R. Ibrahim, R. A. Hapsari, E. P. Neale, D. Raubenheimer, and K. E. Charlton, “Association between the urinary sodium to potassium ratio and blood pressure in adults: A systematic review and meta-analysis,” Adv. Nutr., vol. 12, no. 5, pp. 1751–1767, 2021, doi: 10.1093/advances/nmab036.

A. V. Chobanian, G. L. Bakris, H. R. Black, W. C. Cushman, L. A. Green, J. L. Izzo Jr., D. W. Jones, et al., “The seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure: The JNC 7 report,” JAMA, vol. 289, no. 19, pp. 2560–2571, 2003.

B. Lepri, J. Staiano, D. Sangokoya, E. Letouzé, and N. Oliver, “The tyranny of data? The bright and dark sides of data-driven decision-making for social good,” in Transparent Data Mining for Big and Small Data, Springer, 2017, pp. 3–24.

A. C. Müller and S. Guido, Introduction to Machine Learning with Python, O’Reilly Media, Inc, 2017.

R. Rodríguez-Pérez and J. Bajorath, “Evolution of Support Vector Machine and Regression Modeling in Chemoinformatics and Drug Discovery,” J. Comput. Aided Mol. Des., vol. 36, no. 5, pp. 355–362, May 2022, doi: 10.1007/s10822-022-00442-9.

O. A. Montesinos López, A. Montesinos López, and J. Crossa, “Overfitting, Model Tuning, and Evaluation of Prediction Performance,” in Multivariate Statistical Machine Learning Methods for Genomic Prediction, Springer Int. Publ., 2022, pp. 109–139, doi: 10.1007/978-3-030-89010-0_4.

M. Rizki, A. Hermawan, and D. Avianto, “Optimization of Hyperparameter K in K-Nearest Neighbor Using Particle Swarm Optimization,” JUITA: J. Inform., vol. 12, no. 1, pp. 71–79, 2024.

N. Gul, M. Aamir, S. Aldahmani, and Z. Khan, “A Weighted k-Nearest Neighbours Ensemble with added Accuracy and Diversity,” IEEE Access, vol. 10, pp. 125920–125929, Nov. 2022, doi: 10.1109/ACCESS.2022.3225682.

R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Inf., vol. 15, no. 4, 2024, doi: 10.3390/info15040235.

J. Yang, Z. Wu, K. Peng, P. N. Okolo, W. Zhang, H. Zhao, and J. Sun, “Parameter selection of Gaussian kernel SVM based on local density of training set,” Inverse Probl. Sci. Eng., vol. 29, no. 4, pp. 536–548, 2021, doi: 10.1080/17415977.2020.1797716.

I. S. Al-Mejibli, J. K. Alwan, and D. H. Abd, “The effect of gamma value on support vector machine performance with different kernels,” Int. J. Electr. Comput. Eng., vol. 10, no. 5, pp. 5497–5506, Oct. 2020, doi: 10.11591/IJECE.V10I5.PP5497-5506.

S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm,” Int. J. Adv. Comput. Sci. Appl., no. 2, pp. 612–619, 2020, doi: 10.14569/ijacsa.2020.0110277.

H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylon. J. Mach. Learn., vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/bjml/2024/007.

N. S. Thomas and S. Kaliraj, “An Improved and Optimized Random Forest Based Approach to Predict the Software Faults,” SN Comput. Sci., vol. 5, no. 5, Jun. 2024, doi: 10.1007/s42979-024-02764-x.

A Comparative Study of Drug Prediction Models using KNN, SVM, and Random Forest

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

publisher

sidebar

certificate

template

gs-citation

index

stat