Machine Learning-Based E-Archive for Archives Management of South Sumatra Province
DOI:
https://doi.org/10.51519/journalisi.v5i4.566Keywords:
Information Retrieval, TF-IDF, BM25, ArchivesAbstract
Archives play a crucial role in institutional operations, yet efficiently retrieving specific information from them can be challenging. This research addresses this issue by developing an information retrieval system that incorporates advanced methods to enhance search efficiency. The system employs the TF-IDF (Term Frequency-Inverse Document Frequency) formula, which assesses the significance of a word within a document set, and the BM25 method, a sophisticated algorithm for ranking documents based on their relevance to the input query. Both methods undergo a preprocessing stage, enabling the system to calculate the relevance of each document to the given query accurately. The effectiveness of this system is evaluated using key performance metrics: precision (accuracy), recall (completeness), and the F1 Score (the harmonic means of precision and recall, representing the best value). Testing with various keywords revealed that the BM25 method yielded impressive results, achieving an average precision of 0.75, recall of 0.6, and an F1 Score of 0.6665. In contrast, the TF-IDF method scored lower, with a precision of 0.33, recall of 0.2, and an F1 Score of 0.2500. The system was tested using a dataset of 350 documents.
Downloads
References
R. R. Baihaqi, “Temu Kembali Informasi pada Berita Olahraga Berbahasa Indonesia dengan Metode BM25 dan Seleksi Fitur Term Frequency (TF),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 11, pp. 4200–4206, 2020.
J. Sistem, A. Cucus, Y. Aprilinda, I. Sistem, and I. Presensi, “768-1474-1-Sm,” 2018.
A. Roihan, P. A. Sunarya, and A. S. Rafika, “Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 5, no. 1, pp. 75–82, 2020, doi: 10.31294/ijcit.v5i1.7951.
M. Ula, A. Faridhatul Ulva, and Mauliza, “Implementasi Machine Learning Dengan Model Case Based Reasoning Dalam Mendagnosa Gizi Buruk Pada Anak,” J. Inform. Kaputama, vol. 5, no. 2, pp. 333–339, 2021.
A. I. Kadhim, “Term Weighting for Feature Extraction on Twitter: A Comparison between BM25 and TF-IDF,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 124–128, 2019, doi: 10.1109/ICOASE.2019.8723825.
“Faradila Puspa Wardani (1).pdf.” 2018.
W. Faradila Puspa, “Query Expansion Pada Sistem Temu Kembali Informasi Dokumen Jurnal Berbahasa Indonesia Menggunakan Metode BM25,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 3, pp. 2619–2625, 2019.
A. I. B. Pranata and M. Indriati, “Klasifikasi Dokumen pada Laporan Kepolisian dengan Menggunakan Metode BM25 dan Improved K-Nearest Neighbor (IKNN),” Teknol. Inf. dan Ilmu Komput., vol. 3, no. 5, pp. 4434–4438, 2019.
B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, “Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 1, pp. 306–312, 2018.
R. R. A. Siregar, F. A. Sinaga, and R. Arianto, “Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model,” Comput. J. Comput. Sci. Inf. Syst., vol. 1, no. 2, p. 171, 2017, doi: 10.24912/computatio.v1i2.1014.
H. K. Pambudi, P. G. A. Kusuma, F. Yulianti, and K. A. Julian, “Prediksi Status Pengiriman Barang Menggunakan Metode Machine Learning,” J. Ilm. Teknol. Infomasi Terap., vol. 6, no. 2, pp. 100–109, 2020, doi: 10.33197/jitter.vol6.iss2.2020.396.
N. L. P. C. Savitri, R. A. Rahman, R. Venyutzky, and N. A. Rakhmawati, “Analisis Klasifikasi Sentimen Terhadap Sekolah Daring pada Twitter Menggunakan Supervised Machine Learning,” J. Tek. Inform. dan Sist. Inf., vol. 7, no. 1, pp. 47–58, 2021, doi: 10.28932/jutisi.v7i1.3216.
R. Sistem and E. J. Evaluasi, “JURNAL RESTI Klasifikasi Citra Burung Lovebird Menggunakan Decision Tree dengan,” J. Resti, vol. 5, no. 10, pp. 688–696, 2021.
M. Martin and L. Nilawati, “Recall dan Precision Pada Sistem Temu Kembali Informasi Online Public Access Catalogue (OPAC) di Perpustakaan,” Paradig. - J. Komput. dan Inform., vol. 21, no. 1, pp. 77–84, 2019, doi: 10.31294/p.v21i1.5064.
C. H. Yutika, A. Adiwijaya, and S. Al Faraby, “Analisis Sentimen Berbasis Aspek pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 422, 2021, doi: 10.30865/mib.v5i2.2845.
Downloads
Published
Issue
Section
License
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














