COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION (GINI DECREASE AND RELIEF-F) IN HEART DISEASE DATASET

Authors

  • Chindu Lintang Bhuana STKIP Rosalia
  • Rico Pramestiawan STKIP Rosalia
  • Lilik Joko Susanto STKIP Rosalia
  • Arry Verdian STKIP Rosalia

DOI:

https://doi.org/10.69916/jkbti.v5i2.477

Keywords:

Machine Learning, Feature Selection, ReliefF, Gini Decrease

Abstract

Heart disease remains one of the leading causes of mortality worldwide and presents a major challenge in healthcare systems. Early detection plays an essential role in improving survival rates and minimizing complications through timely intervention. Recent advances in Machine Learning (ML) have provided new opportunities for developing accurate and efficient prediction systems for heart disease detection. However, one of the major challenges in ML-based prediction is identifying the most relevant features to improve classification performance while reducing computational complexity and noise. This study aims to evaluate the effectiveness of two feature selection techniques, namely Gini Decrease (GD) and ReliefF, combined with several ML models, including Support Vector Machine (SVM), Tree, Naïve Bayes, and Random Forest, for heart disease classification. The study employed the UCI Heart Disease Dataset consisting of 303 records and 14 attributes. Data preprocessing included handling missing values using mean imputation, followed by feature selection and classification using 10-fold cross-validation with an 80:20 training-testing ratio. Experimental results showed that ReliefF outperformed GD, achieving the highest average accuracy of 0.796, compared to GD with 0.767 and all features with 0.771. The SVM model achieved the highest accuracy using GD (0.833), while Random Forest demonstrated optimal performance with ReliefF (0.817). Furthermore, the Tree model exhibited the fastest computational time among all evaluated models. These findings indicate that integrating suitable feature selection methods with ML models significantly enhances heart disease classification performance, particularly in improving predictive accuracy and computational efficiency for early medical diagnosis applications.

Downloads

Download data is not yet available.

Author Biography

Chindu Lintang Bhuana, STKIP Rosalia

Department of Education Informatics

References

O. O. Olufunke, U. O. Charles, A. K. Charles, A. Abraham, and V. Snasel, A fuzzy-mining approach for solving rule based expert system unwieldiness in medical domain, vol. 23, no. 5. dspace.vsb.cz, 2013. doi: 10.14311/nnw.2013.23.027.

I. Miramontes and P. Melin, “Sugeno-Based Fuzzy Systems for Accurate Heart Rate Level Classification BT - Intelligent System Design Based on Soft Computing Models,” O. Castillo and P. Melin, Eds., Cham: Springer Nature Switzerland, 2025, pp. 3–14. doi: 10.1007/978-3-031-97309-3_1.

A. Vianello et al., “Medical Expert System for Intelligent Telemonitoring of Patients with Chronic Heart Failure: Preliminary Validation and Perspectives,” 2025, ahajournals.org. doi: 10.1161/CIRCHEARTFAILURE.124.012478.

N. Fazakis, O. Kocsis, E. Dritsas, S. Alexiou, N. Fakotakis, and K. Moustakas, “Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction,” IEEE Access, vol. 9, pp. 103737–103757, 2021, doi: 10.1109/ACCESS.2021.3098691.

B. Mahesh, “Machine Learning Algorithms - A Review,” Int. J. Sci. Res., vol. 9, no. 1, pp. 381–386, 2020, doi: 10.21275/art20203995.

W. Hanon, H. A. Al Essa, and S. H. Jihad, “A Novel Hybrid Model Combining Feature Selection and Imbalance Handling for Prediction of Heart Failure Survival,” Ing. des Syst. d’Information, vol. 30, no. 6, pp. 1459–1467, 2025, doi: 10.18280/isi.300605.

S. S. Rautaray, S. Dey, M. Pandey, and M. K. Gourisaria, “Nuclei segmentation in cell images using fully convolutional neural networks,” Int. J. Emerg. Technol., vol. 11, no. 3, pp. 731–737, 2020, [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85087307723&partnerID=40&md5=4356c05335e3af18806af217ccfe5d55

H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717.

J. Wang, S. Zhou, Y. Yi, and J. Kong, “An improved feature selection based on effective range for classification,” Sci. World J., vol. 2014, 2014, doi: 10.1155/2014/972125.

S. Bashir, Z. S. Khan, F. H. Khan, A. Anjum, and K. Bashir, “Improving Heart Disease Prediction Using Feature Selection Approaches,” in 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2019, pp. 619–623. doi: 10.1109/IBCAST.2019.8667106.

W. Van Casteren, “The Waterfall Model And The Agile Methodologies : A Comparison By Project Characteristics-Short The Waterfall Model and Agile Methodologies,” Acad. Competences Bachelor, no. February, pp. 10–13, 2017, [Online]. Available: https://www.researchgate.net/publication/313768860

M. A. Elsadig, A. Altigani, and H. T. Elshoush, “Breast cancer detection using machine learning approaches: a comparative study,” Int. J. Electr. Comput. Eng., vol. 13, no. 1, pp. 736–745, 2023, doi: 10.11591/ijece.v13i1.pp736-745.

Y. Zhang et al., “Feature selection based on neighborhood rough sets and Gini index,” PeerJ Comput. Sci., vol. 9, p. e1711, 2023, doi: 10.7717/peerj-cs.1711.

N. Aggarwal et al., “Mean based relief: An improved feature selection method based on ReliefF,” Appl. Intell., vol. 53, no. 19, pp. 23004–23028, 2023, doi: 10.1007/s10489-023-04662-w.

C. Karima and W. Anggraeni, “Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced Dataset,” Procedia Comput. Sci., vol. 234, pp. 645–653, 2024, doi: https://doi.org/10.1016/j.procs.2024.03.050.

M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249.

F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. London, 2019. doi: 10.1007/979-981-13-8798-2-12.

T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002.

Downloads

Published

2026-05-12

PlumX Metrics

Scite Metrics

Altmetric

How to Cite

[1]
C. L. Bhuana, R. Pramestiawan, L. J. Susanto, and A. Verdian, “COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION (GINI DECREASE AND RELIEF-F) IN HEART DISEASE DATASET”, JKBTI, vol. 5, no. 2, pp. 270–278, May 2026.

Issue

Section

Articles