COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION IN EARLY DETECTION OF DIABETES

Lilik Joko Susanto; Agus Wantoro

doi:10.69916/jkbti.v5i2.473

Authors

Lilik Joko Susanto STKIP Rosalia
Agus Wantoro Universitas Aisyah Pringsewu

DOI:

https://doi.org/10.69916/jkbti.v5i2.473

Keywords:

Diabetes, Machine Learning, Feature Selection, anova

Abstract

Diabetes is one of the most serious global health problems and continues to increase significantly worldwide. Early detection is essential to reduce complications and improve patient survival rates. Recently, Machine Learning (ML) has shown great potential in supporting early diabetes prediction through data-driven analysis. However, the presence of irrelevant and redundant features may decrease model performance and increase computational complexity. Therefore, this study aims to evaluate the effectiveness of feature selection techniques and ML algorithms for early diabetes detection using the PIMA Indians Diabetes Dataset. The dataset consists of 768 records, 8 features, and two classes. Data preprocessing was conducted to handle missing values and outliers using mean imputation and data cleaning techniques. Three feature selection methods were applied, namely Information Gain (IG), Gain Ratio (GR), and ANOVA, to identify the most relevant features. Furthermore, several ML algorithms, including k-Nearest Neighbor (k-NN), Random Forest, Support Vector Machine (SVM), Naive Bayes, and Neural Network, were evaluated using 10-fold cross-validation. The results showed that feature selection techniques improved classification performance compared to using all features. Glucose, BMI, Age, and Insulin were identified as the most influential features in diabetes prediction. Among all evaluated models, Random Forest combined with ANOVA achieved the best performance with an accuracy of 0.753. In general, the application of feature selection techniques increased model accuracy by up to 3.82%. These findings demonstrate that combining effective feature selection methods with robust ML algorithms can significantly enhance the performance of early diabetes detection systems.

Downloads

Download data is not yet available.

Author Biography

Lilik Joko Susanto, STKIP Rosalia

Department of Education Informatics

References

E. Decroli, Diabetes Melitus Tipe 2, Pertama. Padang, Sumatera Barat: Fakultas Kedokteran Universitas Andalas, 2019.

H. Lu, S. Uddin, F. Hajati, M. A. Moni, and M. Khushi, “A patient network-based machine learning model for disease prediction: The case of type 2 diabetes mellitus,” Appl. Intell., vol. 52, no. 3, pp. 2411–2422, 2022, doi: 10.1007/s10489-021-02533-w.

M. Al Switi, B. Alshraideh, A. Alshraideh, A. Massad, and M. Alshraideh, “Treatment of diabetes type II using genetic algorithm,” Int. J. online Biomed. Eng., vol. 15, no. 11, pp. 53–68, 2019, doi: 10.3991/ijoe.v15i11.10751.

J. Lindstrom and J. Tuomilehto, “International Diabetes Federation - IDF Complications Congress 2020,” International Diabetes Federation. Accessed: Apr. 03, 2020. [Online]. Available: https://www.idf.org/our-activities/congress/idf-complications-congress-2020.html?gclid=EAIaIQobChMIzvimsLDL6AIVzBErCh3qnQb7EAAYASAAEgL6lfD_BwE

N. Fazakis, O. Kocsis, E. Dritsas, S. Alexiou, N. Fakotakis, and K. Moustakas, “Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction,” IEEE Access, vol. 9, pp. 103737–103757, 2021, doi: 10.1109/ACCESS.2021.3098691.

B. Mahesh, “Machine Learning Algorithms - A Review,” Int. J. Sci. Res., vol. 9, no. 1, pp. 381–386, 2020, doi: 10.21275/art20203995.

O. Iparraguirre-Villanueva, K. Espinola-Linares, R. O. Flores Castañeda, and M. Cabanillas-Carbonell, “Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes,” 2023. doi: 10.3390/diagnostics13142383.

S. S. Rautaray, S. Dey, M. Pandey, and M. K. Gourisaria, “Nuclei segmentation in cell images using fully convolutional neural networks,” Int. J. Emerg. Technol., vol. 11, no. 3, pp. 731–737, 2020, [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85087307723&partnerID=40&md5=4356c05335e3af18806af217ccfe5d55

X. Song et al., “Evolutionary computation for feature selection in classification: A comprehensive survey of solutions, applications and challenges,” Swarm Evol. Comput., vol. 90, p. 101661, 2024, doi: https://doi.org/10.1016/j.swevo.2024.101661.

L. K. Singh, M. Khanna, and R. Singh, “Efficient feature selection for breast cancer classification using soft computing approach: A novel clinical decision support system,” Multimed. Tools Appl., vol. 83, no. 14, pp. 43223–43276, 2024, doi: 10.1007/s11042-023-17044-8.

C. Sharma and A. Singla, “Advanced PTSVM Based Breast Cancer Classification with Weighted Feature Selection,” SN Comput. Sci., vol. 6, no. 1, p. 50, 2024, doi: 10.1007/s42979-024-03590-x.

K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data classification using stacked autoencoders in deep neural networks,” Clin. Epidemiol. Glob. Heal., vol. 7, no. 4, pp. 530–535, 2019, doi: 10.1016/j.cegh.2018.12.004.

H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717.

J. Wang, S. Zhou, Y. Yi, and J. Kong, “An improved feature selection based on effective range for classification,” Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/972125.

S. Bashir, Z. S. Khan, F. H. Khan, A. Anjum, and K. Bashir, “Improving Heart Disease Prediction Using Feature Selection Approaches,” in 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2019, pp. 619–623. doi: 10.1109/IBCAST.2019.8667106.

W. Van Casteren, “The Waterfall Model And The Agile Methodologies : A Comparison By Project Characteristics-Short The Waterfall Model and Agile Methodologies,” Acad. Competences Bachelor, no. February, pp. 10–13, 2017, [Online]. Available: https://www.researchgate.net/publication/313768860

J. Gao, Z. Wang, T. Jin, J. Cheng, Z. Lei, and S. Gao, “Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection,” Knowledge-Based Syst., vol. 286, p. 111380, 2024, doi: https://doi.org/10.1016/j.knosys.2024.111380.

P. Bhat and K. Dutta, “A multi-tiered feature selection model for android malware detection based on Feature discrimination and Information Gain,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 10, Part B, pp. 9464–9477, 2022, doi: https://doi.org/10.1016/j.jksuci.2021.11.004.

M. Trabelsi, N. Meddouri, and M. Maddouri, “A New Feature Selection Method for Nominal Classifier based on Formal Concept Analysis,” Procedia Comput. Sci., vol. 112, pp. 186–194, 2017, doi: 10.1016/j.procs.2017.08.227.

E. Taghizadeh, S. Heydarheydari, A. Saberi, S. JafarpoorNesheli, and S. M. Rezaeijo, “Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods,” BMC Bioinformatics, vol. 23, no. 1, pp. 1–9, 2022, doi: 10.1186/s12859-022-04965-8.

V. Vijayasarveswari et al., “Development of Statistically Modelled Feature Selection Method for Microwave Breast Cancer Detection,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 50, no. 1, pp. 250–263, 2025, doi: 10.37934/araset.50.1.250263.

M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249.

F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. London, 2019. doi: 10.1007/979-981-13-8798-2-12.

T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002

Jurnal Kecerdasan Buatan dan Teknologi Informasi
Publisher	:	Ninety Media Publisher
Address	:	Perumahan Green Asia, Blok i2-04. Desa Bagik Polak Kecamatan Labuapi Kabupaten Lombok Barat
Website	:	https://ninetyjournal.com/ \| https://ojs.ninetyjournal.com/index.php/JKBTI
E-Mail	:	journal.jkbti@gmail.com \| admin@ninetyjournal.com
Contact	:	085337626083

This work is licensed under a Creative Commons Attribution 4.0 International License.

COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION IN EARLY DETECTION OF DIABETES

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Lilik Joko Susanto, STKIP Rosalia

References

Downloads

Published

PlumX Metrics

Scite Metrics

Altmetric

How to Cite

Issue

Section

License

Most read articles by the same author(s)

menuterbaru

Keywords

information