EVALUATION OF IMBALANCE CLASS HANDLING STRATEGIES ON MACHINE LEARNING MODEL PERFORMANCE
DOI:
https://doi.org/10.69916/jkbti.v5i2.459Keywords:
Breast Cancer, Imbalance Class, Machine Learning, smoteAbstract
Breast Cancer Dataset (BCD) represents a critical health problem due to the increasing prevalence of breast cancer and the importance of early detection of recurrence. Machine Learning (ML) approaches have been widely applied to support diagnosis and prediction; however, class imbalance remains a major challenge, where the majority class (“no-recurrence-events”) significantly outnumbers the minority class (“recurrence-events”). This imbalance can lead to biased models that fail to accurately detect recurrence cases. This study aims to evaluate the effectiveness of class imbalance handling using the Synthetic Minority Over-sampling Technique (SMOTE) on several ML models, including Decision Tree, Naïve Bayes, k-Nearest Neighbors (k-NN), and Random Forest. The dataset used consists of 286 records with 9 features obtained from the UCI Machine Learning repository. Data preprocessing was performed, including handling missing values and outliers, followed by class balancing using SMOTE. Model evaluation was conducted using 10-fold cross-validation and performance metrics such as accuracy, precision, recall, and F1-score. The results show that the application of SMOTE significantly improves model performance, with an average accuracy increase of 11.85%. Among the evaluated models, Random Forest combined with SMOTE achieved the best performance, with an accuracy of 79.79%. In contrast, models such as Naïve Bayes and k-NN demonstrated relatively lower performance. Overall, this study confirms that handling class imbalance using SMOTE can enhance classification performance, particularly in improving the detection of minority classes in breast cancer recurrence prediction tasks.
Downloads
References
M. F. Ak, “A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications,” Healthc., vol. 8, no. 2, 2020, doi: 10.3390/healthcare8020111.
Y. Cakmak and I. Pacal, “Enhancing Breast Cancer Diagnosis : A Comparative Evaluation of Machine Learning Algorithms Using the Wisconsin Dataset,” J. Oper. Intell., vol. 3, no. 1, pp. 175–196, 2025.
J. Li et al., “Predicting breast cancer 5-year survival using machine learning: A systematic review,” PLoS One, vol. 16, no. 4 April, pp. 1–23, 2021, doi: 10.1371/journal.pone.0250370.
C. Karima and W. Anggraeni, “Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced Dataset,” Procedia Comput. Sci., vol. 234, pp. 645–653, 2024, doi: https://doi.org/10.1016/j.procs.2024.03.050.
G. Kovács, “An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets,” Appl. Soft Comput., vol. 83, p. 105662, 2019, doi: https://doi.org/10.1016/j.asoc.2019.105662.
M. F. Ijaz, G. Alfian, M. Syafrudin, and J. Rhee, “Hybrid Prediction Model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, Synthetic Minority Over Sampling Technique (SMOTE), and random forest,” Appl. Sci., vol. 8, no. 8, 2018, doi: 10.3390/app8081325.
O. N. Oyelade, A. A. Obiniyi, S. B. Junaidu, and S. A. Adewuyi, “ST-ONCODIAG: A semantic rule-base approach to diagnosing breast cancer base on Wisconsin datasets,” Informatics Med. Unlocked, vol. 10, pp. 117–125, 2018, doi: 10.1016/j.imu.2017.12.008.
M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249.
G. Ramadhan and F. D. Adhinata, “Teknik SMOTE dan Gini Score dalam Klasifikasi Kanker Payudara,” J. Perad. Sains, Rekayasa, dan Teknol., vol. 9, no. 2, pp. 125–134, 2021.
N. Nurjanah et al., “Implementasi Model Klasifikasi Jenis Kanker Payudara Menggunakan Model SVM dan Logistic Regression berbasis Web,” Ris. dan E-Jurnal Manaj. Inform. Komput., vol. 7, no. 4, pp. 1739–1750, 2023, doi: http://doi.org/10.33395/remik.v7i4.12817.
R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split Data terhadap Performa Klasifikasi Penyakit Kanker Payudara menggunakan Model Machine Learning,” Jurnali Sainsi dan iInformatika, vol. 9, no. April, pp. 19–28, 2023, doi: 10.34128/jsi.v9i1.622.
K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data classification using stacked autoencoders in deep neural networks,” Clin. Epidemiol. Glob. Heal., vol. 7, no. 4, pp. 530–535, 2019, doi: 10.1016/j.cegh.2018.12.004.
R. Shakil, B. Akter, F. M. J. M. Shamrat, and S. R. H. Noori, “A novel automated feature selection based approach to recognize cauliflower disease,” Bull. Electr. Eng. Informatics, vol. 12, no. 6, pp. 3541–3551, 2023, doi: 10.11591/eei.v12i6.5359.
F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. London, 2019. doi: 10.1007/979-981-13-8798-2-12.
H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717.
T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002.
I. P. Adebayo, “Idowu Peter Adebayo. Predictive Model for the Classification of Hypertension Risk Using Decision Trees Algorithm,” Am. J. Math. Comput. Model., vol. 2, no. 2, pp. 48–59, 2017, doi: 10.11648/j.ajmcm.20170202.12.
F. A. Ibrahim and O. A. Shiba, “Data Mining : WEKA Software ( an Overview ),” J. Pure Appl. Sci., vol. 18, no. 3, pp. 54–58, 2019, [Online]. Available: www.Suj.sebhau.edu.ly
Downloads
Published
Scite Metrics
Altmetric
How to Cite
Issue
Section
License
Copyright (c) 2026 Agus Wantoro, Arry Verdian

This work is licensed under a Creative Commons Attribution 4.0 International License.
Most read articles by the same author(s)
- Lilik Joko Susanto, Agus Wantoro, COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION IN EARLY DETECTION OF DIABETES , Jurnal Kecerdasan Buatan dan Teknologi Informasi: Vol. 5 No. 2 (2026): May 2026
- Rico Pramestiawan, Arry Verdian, Chindu Lintang Bhuana, Lilik Joko Susanto, COMPARATIVE STUDY OF CLASSIFICATION MODELS IN PROCESSING STUDENT TEST SCORES DATASETS , Jurnal Kecerdasan Buatan dan Teknologi Informasi: Vol. 5 No. 2 (2026): May 2026
- Chindu Lintang Bhuana, Rico Pramestiawan, Lilik Joko Susanto, Arry Verdian, COMPARATIVE ANALYSIS OF PERFORMANCE OF MACHINE LEARNING FEATURE SELECTION (GINI DECREASE AND RELIEF-F) IN HEART DISEASE DATASET , Jurnal Kecerdasan Buatan dan Teknologi Informasi: Vol. 5 No. 2 (2026): May 2026













