COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION IN EARLY DETECTION OF DIABETES
DOI:
https://doi.org/10.69916/jkbti.v5i2.473Keywords:
Diabetes, Machine Learning, Feature Selection, anovaAbstract
Diabetes is one of the most serious global health problems and continues to increase significantly worldwide. Early detection is essential to reduce complications and improve patient survival rates. Recently, Machine Learning (ML) has shown great potential in supporting early diabetes prediction through data-driven analysis. However, the presence of irrelevant and redundant features may decrease model performance and increase computational complexity. Therefore, this study aims to evaluate the effectiveness of feature selection techniques and ML algorithms for early diabetes detection using the PIMA Indians Diabetes Dataset. The dataset consists of 768 records, 8 features, and two classes. Data preprocessing was conducted to handle missing values and outliers using mean imputation and data cleaning techniques. Three feature selection methods were applied, namely Information Gain (IG), Gain Ratio (GR), and ANOVA, to identify the most relevant features. Furthermore, several ML algorithms, including k-Nearest Neighbor (k-NN), Random Forest, Support Vector Machine (SVM), Naive Bayes, and Neural Network, were evaluated using 10-fold cross-validation. The results showed that feature selection techniques improved classification performance compared to using all features. Glucose, BMI, Age, and Insulin were identified as the most influential features in diabetes prediction. Among all evaluated models, Random Forest combined with ANOVA achieved the best performance with an accuracy of 0.753. In general, the application of feature selection techniques increased model accuracy by up to 3.82%. These findings demonstrate that combining effective feature selection methods with robust ML algorithms can significantly enhance the performance of early diabetes detection systems.
Downloads
References
E. Decroli, Diabetes Melitus Tipe 2, Pertama. Padang, Sumatera Barat: Fakultas Kedokteran Universitas Andalas, 2019.
H. Lu, S. Uddin, F. Hajati, M. A. Moni, and M. Khushi, “A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus,” Appl. Intell., vol. 52, no. 3, pp. 2411–2422, 2022, doi: 10.1007/s10489-021-02533-w.
M. Al Switi, B. Alshraideh, A. Alshraideh, A. Massad, and M. Alshraideh, “Treatment of diabetes type II using genetic algorithm,” Int. J. online Biomed. Eng., vol. 15, no. 11, pp. 53–68, 2019, doi: 10.3991/ijoe.v15i11.10751.
J. Lindstrom and J. Tuomilehto, “International Diabetes Federation - IDF Complications Congress 2020,” International Diabetes Federation. Accessed: Apr. 03, 2020. [Online]. Available: https://www.idf.org/our-activities/congress/idf-complications-congress-2020.html?gclid=EAIaIQobChMIzvimsLDL6AIVzBErCh3qnQb7EAAYASAAEgL6lfD_BwE
N. Fazakis, O. Kocsis, E. Dritsas, S. Alexiou, N. Fakotakis, and K. Moustakas, “Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction,” IEEE Access, vol. 9, pp. 103737–103757, 2021, doi: 10.1109/ACCESS.2021.3098691.
B. Mahesh, “Machine Learning Algorithms - A Review,” Int. J. Sci. Res., vol. 9, no. 1, pp. 381–386, 2020, doi: 10.21275/art20203995.
O. Iparraguirre-Villanueva, K. Espinola-Linares, R. O. Flores Castañeda, and M. Cabanillas-Carbonell, “Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes,” 2023. doi: 10.3390/diagnostics13142383.
S. S. Rautaray, S. Dey, M. Pandey, and M. K. Gourisaria, “Nuclei segmentation in cell images using fully convolutional neural networks,” Int. J. Emerg. Technol., vol. 11, no. 3, pp. 731–737, 2020, [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85087307723&partnerID=40&md5=4356c05335e3af18806af217ccfe5d55
X. Song et al., “Evolutionary computation for feature selection in classification: A comprehensive survey of solutions, applications and challenges,” Swarm Evol. Comput., vol. 90, p. 101661, 2024, doi: https://doi.org/10.1016/j.swevo.2024.101661.
L. K. Singh, M. Khanna, and R. Singh, “Efficient feature selection for breast cancer classification using soft computing approach: A novel clinical decision support system,” Multimed. Tools Appl., vol. 83, no. 14, pp. 43223–43276, 2024, doi: 10.1007/s11042-023-17044-8.
C. Sharma and A. Singla, “Advanced PTSVM Based Breast Cancer Classification with Weighted Feature Selection,” SN Comput. Sci., vol. 6, no. 1, p. 50, 2024, doi: 10.1007/s42979-024-03590-x.
K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data classification using stacked autoencoders in deep neural networks,” Clin. Epidemiol. Glob. Heal., vol. 7, no. 4, pp. 530–535, 2019, doi: 10.1016/j.cegh.2018.12.004.
H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717.
J. Wang, S. Zhou, Y. Yi, and J. Kong, “An improved feature selection based on effective range for classification,” Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/972125.
S. Bashir, Z. S. Khan, F. H. Khan, A. Anjum, and K. Bashir, “Improving Heart Disease Prediction Using Feature Selection Approaches,” in 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2019, pp. 619–623. doi: 10.1109/IBCAST.2019.8667106.
W. Van Casteren, “The Waterfall Model And The Agile Methodologies : A Comparison By Project Characteristics-Short The Waterfall Model and Agile Methodologies,” Acad. Competences Bachelor, no. February, pp. 10–13, 2017, [Online]. Available: https://www.researchgate.net/publication/313768860
J. Gao, Z. Wang, T. Jin, J. Cheng, Z. Lei, and S. Gao, “Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection,” Knowledge-Based Syst., vol. 286, p. 111380, 2024, doi: https://doi.org/10.1016/j.knosys.2024.111380.
P. Bhat and K. Dutta, “A multi-tiered feature selection model for android malware detection based on Feature discrimination and Information Gain,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 10, Part B, pp. 9464–9477, 2022, doi: https://doi.org/10.1016/j.jksuci.2021.11.004.
M. Trabelsi, N. Meddouri, and M. Maddouri, “A New Feature Selection Method for Nominal Classifier based on Formal Concept Analysis,” Procedia Comput. Sci., vol. 112, pp. 186–194, 2017, doi: 10.1016/j.procs.2017.08.227.
E. Taghizadeh, S. Heydarheydari, A. Saberi, S. JafarpoorNesheli, and S. M. Rezaeijo, “Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods,” BMC Bioinformatics, vol. 23, no. 1, pp. 1–9, 2022, doi: 10.1186/s12859-022-04965-8.
V. Vijayasarveswari et al., “Development of Statistically Modelled Feature Selection Method for Microwave Breast Cancer Detection,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 50, no. 1, pp. 250–263, 2025, doi: 10.37934/araset.50.1.250263.
M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249.
F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. London, 2019. doi: 10.1007/979-981-13-8798-2-12.
T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002
Downloads
Published
Scite Metrics
Altmetric
How to Cite
Issue
Section
License
Copyright (c) 2026 Lilik Joko Susanto, Agus Wantoro

This work is licensed under a Creative Commons Attribution 4.0 International License.
Most read articles by the same author(s)
- Arry Verdian, Agus Wantoro, EVALUATION OF IMBALANCE CLASS HANDLING STRATEGIES ON MACHINE LEARNING MODEL PERFORMANCE , Jurnal Kecerdasan Buatan dan Teknologi Informasi: Vol. 5 No. 2 (2026): May 2026













