EVALUATION OF IMBALANCE CLASS HANDLING STRATEGIES ON MACHINE LEARNING MODEL PERFORMANCE

Arry Verdian; Agus Wantoro

doi:10.69916/jkbti.v5i2.459

Authors

Arry Verdian STKIP Rosalia
Agus Wantoro Universitas Aisyah Pringsewu

DOI:

https://doi.org/10.69916/jkbti.v5i2.459

Keywords:

Breast Cancer, Imbalance Class, Machine Learning, smote

Abstract

Breast Cancer Dataset (BCD) represents a critical health problem due to the increasing prevalence of breast cancer and the importance of early detection of recurrence. Machine Learning (ML) approaches have been widely applied to support diagnosis and prediction; however, class imbalance remains a major challenge, where the majority class (“no-recurrence-events”) significantly outnumbers the minority class (“recurrence-events”). This imbalance can lead to biased models that fail to accurately detect recurrence cases. This study aims to evaluate the effectiveness of class imbalance handling using the Synthetic Minority Over-sampling Technique (SMOTE) on several ML models, including Decision Tree, Naïve Bayes, k-Nearest Neighbors (k-NN), and Random Forest. The dataset used consists of 286 records with 9 features obtained from the UCI Machine Learning repository. Data preprocessing was performed, including handling missing values and outliers, followed by class balancing using SMOTE. Model evaluation was conducted using 10-fold cross-validation and performance metrics such as accuracy, precision, recall, and F1-score. The results show that the application of SMOTE significantly improves model performance, with an average accuracy increase of 11.85%. Among the evaluated models, Random Forest combined with SMOTE achieved the best performance, with an accuracy of 79.79%. In contrast, models such as Naïve Bayes and k-NN demonstrated relatively lower performance. Overall, this study confirms that handling class imbalance using SMOTE can enhance classification performance, particularly in improving the detection of minority classes in breast cancer recurrence prediction tasks.

Downloads

Download data is not yet available.

Author Biography

Arry Verdian, STKIP Rosalia

Department of Informatics Education

References

M. F. Ak, “A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications,” Healthc., vol. 8, no. 2, 2020, doi: 10.3390/healthcare8020111.

Y. Cakmak and I. Pacal, “Enhancing Breast Cancer Diagnosis : A Comparative Evaluation of Machine Learning Algorithms Using the Wisconsin Dataset,” J. Oper. Intell., vol. 3, no. 1, pp. 175–196, 2025.

J. Li et al., “Predicting breast cancer 5-year survival using machine learning: A systematic review,” PLoS One, vol. 16, no. 4 April, pp. 1–23, 2021, doi: 10.1371/journal.pone.0250370.

C. Karima and W. Anggraeni, “Performance Analysis of the Ada-Boost Algorithm For Classification of Hypertension Risk With Clinical Imbalanced Dataset,” Procedia Comput. Sci., vol. 234, pp. 645–653, 2024, doi: https://doi.org/10.1016/j.procs.2024.03.050.

G. Kovács, “An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets,” Appl. Soft Comput., vol. 83, p. 105662, 2019, doi: https://doi.org/10.1016/j.asoc.2019.105662.

M. F. Ijaz, G. Alfian, M. Syafrudin, and J. Rhee, “Hybrid Prediction Model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, Synthetic Minority Over Sampling Technique (SMOTE), and random forest,” Appl. Sci., vol. 8, no. 8, 2018, doi: 10.3390/app8081325.

O. N. Oyelade, A. A. Obiniyi, S. B. Junaidu, and S. A. Adewuyi, “ST-ONCODIAG: A semantic rule-base approach to diagnosing breast cancer base on Wisconsin datasets,” Informatics Med. Unlocked, vol. 10, pp. 117–125, 2018, doi: 10.1016/j.imu.2017.12.008.

M. Ohsaki, P. Wang, K. Matsuda, S. Katagiri, H. Watanabe, and A. Ralescu, “Confusion-matrix-based kernel logistic regression for imbalanced data classification,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9, pp. 1806–1819, 2017, doi: 10.1109/TKDE.2017.2682249.

G. Ramadhan and F. D. Adhinata, “Teknik SMOTE dan Gini Score dalam Klasifikasi Kanker Payudara,” J. Perad. Sains, Rekayasa, dan Teknol., vol. 9, no. 2, pp. 125–134, 2021.

N. Nurjanah et al., “Implementasi Model Klasifikasi Jenis Kanker Payudara Menggunakan Model SVM dan Logistic Regression berbasis Web,” Ris. dan E-Jurnal Manaj. Inform. Komput., vol. 7, no. 4, pp. 1739–1750, 2023, doi: http://doi.org/10.33395/remik.v7i4.12817.

R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split Data terhadap Performa Klasifikasi Penyakit Kanker Payudara menggunakan Model Machine Learning,” Jurnali Sainsi dan iInformatika, vol. 9, no. April, pp. 19–28, 2023, doi: 10.34128/jsi.v9i1.622.

K. Kannadasan, D. R. Edla, and V. Kuppili, “Type 2 diabetes data classification using stacked autoencoders in deep neural networks,” Clin. Epidemiol. Glob. Heal., vol. 7, no. 4, pp. 530–535, 2019, doi: 10.1016/j.cegh.2018.12.004.

R. Shakil, B. Akter, F. M. J. M. Shamrat, and S. R. H. Noori, “A novel automated feature selection based approach to recognize cauliflower disease,” Bull. Electr. Eng. Informatics, vol. 12, no. 6, pp. 3541–3551, 2023, doi: 10.11591/eei.v12i6.5359.

F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. London, 2019. doi: 10.1007/979-981-13-8798-2-12.

H. Sulistiani, A. Syarif, K. Muludi, and Warsito, “Performance evaluation of feature selections on some ML approaches for diagnosing the narcissistic personality disorder,” Bull. Electr. Eng. Informatics, vol. 13, no. 2, pp. 1383–1391, 2024, doi: 10.11591/eei.v13i2.6717.

T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., vol. 14, no. 4, pp. 1292–1303, 2022, doi: https://doi.org/10.1016/j.jrmge.2022.03.002.

I. P. Adebayo, “Idowu Peter Adebayo. Predictive Model for the Classification of Hypertension Risk Using Decision Trees Algorithm,” Am. J. Math. Comput. Model., vol. 2, no. 2, pp. 48–59, 2017, doi: 10.11648/j.ajmcm.20170202.12.

F. A. Ibrahim and O. A. Shiba, “Data Mining : WEKA Software ( an Overview ),” J. Pure Appl. Sci., vol. 18, no. 3, pp. 54–58, 2019, [Online]. Available: www.Suj.sebhau.edu.ly

Jurnal Kecerdasan Buatan dan Teknologi Informasi
Publisher	:	Ninety Media Publisher
Address	:	Perumahan Green Asia, Blok i2-04. Desa Bagik Polak Kecamatan Labuapi Kabupaten Lombok Barat
Website	:	https://ninetyjournal.com/ \| https://ojs.ninetyjournal.com/index.php/JKBTI
E-Mail	:	journal.jkbti@gmail.com \| admin@ninetyjournal.com
Contact	:	085337626083

This work is licensed under a Creative Commons Attribution 4.0 International License.

EVALUATION OF IMBALANCE CLASS HANDLING STRATEGIES ON MACHINE LEARNING MODEL PERFORMANCE

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Arry Verdian, STKIP Rosalia

References

Downloads

Published

PlumX Metrics

Scite Metrics

Altmetric

How to Cite

Issue

Section

License

Most read articles by the same author(s)

menuterbaru

Keywords

information