COMPARATIVE STUDY OF CLASSIFICATION MODELS IN PROCESSING STUDENT TEST SCORES DATASETS

Authors

  • Rico Pramestiawan STKIP Rosalia
  • Arry Verdian STKIP Rosalia
  • Chindu Lintang Bhuana STKIP Rosalia
  • Lilik Joko Susanto STKIP Rosalia

DOI:

https://doi.org/10.69916/jkbti.v5i2.475

Keywords:

machine learning, classification, student test scores, model comparison, model evaluation

Abstract

The development of Machine Learning (ML) has contributed significantly to the field of education, particularly in analyzing student academic data to support data-driven decision-making. Predicting student exam results is important for identifying academic performance patterns, detecting potential failures, and improving learning interventions. However, variations in student characteristics and dataset complexity require the selection of appropriate classification models to achieve optimal prediction performance. This study aims to compare the effectiveness of several ML classification models in predicting student exam results using a student academic dataset. The dataset consists of 306 records, seven attributes, and five grade classes (A, B, C, D, and E), including attendance, quiz scores, midterm examination scores, final examination scores, and assignment scores. Data preprocessing was conducted to handle missing values, duplication, inconsistencies, and outliers. The dataset was split into training and testing data with a ratio of 75:25 and evaluated using 10-fold cross-validation. Several classification models were applied, including k-Nearest Neighbour (kNN), Decision Tree, Naive Bayes, Support Vector Machine (SVM), and Random Forest. Model performance was evaluated using accuracy, precision, recall, and F1-score metrics. The experimental results showed that Random Forest achieved the best performance with an accuracy of 73.9%, precision of 74.0%, recall of 73.9%, and F1-score of 73.9%, followed by Naive Bayes and Decision Tree. Meanwhile, SVM produced the lowest performance among the tested models. The findings indicate that Random Forest is the most effective method for predicting student exam results and has strong potential to support educational decision-making systems.

Downloads

Download data is not yet available.

Author Biographies

Arry Verdian, STKIP Rosalia

Department of Education Informatics

Chindu Lintang Bhuana, STKIP Rosalia

Department of Education Informatics

Lilik Joko Susanto, STKIP Rosalia

Department of Education Informatics

References

Z. Syahputra and R. Kurniawan, “Journal of Computer Networks , Architecture and High Performance Computing Journal of Computer Networks , Architecture and High Performance Computing,” J. Comput. Networks, Archit. High Perform. Comput., vol. 7, no. 1, pp. 341–352, 2025.

A. Wantoro, Zulkifli, P. Bintoro, T. H. Andika, F. Ardhy, and A. N. Al Aziz, “Performance Evaluation of Classification Multi Algorithms on Small Dataset: A Comparative-Based Analysis,” in 2025 Tenth International Conference on Informatics and Computing (ICIC), 2025, pp. 1–6. doi: 10.1109/ICIC68054.2025.11309491.

N. Schaduangrat, C. Nantasenamat, V. Prachayasittikul, and W. Shoombuatong, “Meta-iavp: A sequence-based meta-predictor for improving the prediction of antiviral peptides using effective feature representation,” Int. J. Mol. Sci., vol. 20, no. 22, 2019, doi: 10.3390/ijms20225743.

C. Karima and W. Anggraeni, “Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced Dataset,” Procedia Comput. Sci., vol. 234, pp. 645–653, 2024, doi: https://doi.org/10.1016/j.procs.2024.03.050.

H. Rohayani and M. C. Umam, “Prediksi Penentuan Program Studi Berdasarkan Nilai Siswa dengan Algoritma Backpropagation,” J. Inf. Syst. Res., vol. 3, no. 4, pp. 651–657, 2022, doi: 10.47065/josh.v3i4.1935.

R. D. K. Putra, K. S. Palupi, and N. Wakhidah, “Pengelompokkan Data Nilai Mahasiswa Menggunakan Metode K-Means,” J. Algoritm., vol. 6, no. 1, pp. 88–99, 2025, doi: 10.35957/algoritme.v6i1.11313.

Suraohman, L. Fabrianto, F. Riza, and N. M. Faizah, “Korelasi Antara Profil dan Nilai Akademis Siswa dengan Menggunakan Algoritma K-Means,” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 4, pp. 845–852, 2021, doi: 10.25126/jtiik.202183034.

F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. London, 2019. doi: 10.1007/979-981-13-8798-2-12.

H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717.

T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002.

A. Agliata, D. Giordano, F. Bardozzo, S. Bottiglieri, A. Facchiano, and R. Tagliaferri, “Machine Learning as a Support for the Diagnosis of Type 2 Diabetes,” International Journal of Molecular Sciences, vol. 24, no. 7. 2023. doi: 10.3390/ijms24076775.

A. Wantoro, A. F. Yuliana, D. Yana, A. Andini, and I. Awaliyani, “Optimizing Type 2 Diabetes Classification with Feature Selection and Class Balancing in Machine Learning,” J. Tek. Inform., vol. 6, no. 4, pp. 2625–2637, 2025.

I. Düntsch and G. Gediga, “Confusion Matrices and Rough Set Data Analysis,” J. Phys. Conf. Ser., vol. 1229, no. 1, 2019, doi: 10.1088/1742-6596/1229/1/012055.

B. Imran, H. Hambali, A. Subki, Z. Zaeniah, A. Yani, and M. R. Alfian, “Data Mining Using Random Forest, Naïve Bayes, and Adaboost Models for Prediction and Classification of Benign and Malignant Breast Cancer,” J. Pilar Nusa Mandiri, vol. 18, no. 1, pp. 37–46, 2022, doi: 10.33480/pilar.v18i1.2912.

E. Akkaya and S. Turgay, “Unveiling the Power: A Comparative Analysis of Data Mining Tools through Decision Tree Classification on the Bank Marketing Dataset,” Wseas Trans. Comput., vol. 23, pp. 95–105, 2024, doi: 10.37394/23205.2024.23.9.

Downloads

Published

2026-05-11

PlumX Metrics

Scite Metrics

Altmetric

How to Cite

[1]
R. Pramestiawan, A. Verdian, C. L. Bhuana, and L. J. Susanto, “COMPARATIVE STUDY OF CLASSIFICATION MODELS IN PROCESSING STUDENT TEST SCORES DATASETS”, JKBTI, vol. 5, no. 2, pp. 263–269, May 2026.

Issue

Section

Articles