Comparison of Support Vector Machine and K-Nearest Neighbors in Breast Cancer Classification

Anita Desiani; Adinda Ayu Lestari; M Al-Ariq; Ali Amran; Yuli Andriani

doi:10.30598/pijmathvol1iss1pp33-42

Anita Desiani Sriwijaya University
Adinda Ayu Lestari Sriwijaya University
M Al-Ariq Sriwijaya University
Ali Amran Sriwijaya University
Yuli Andriani Sriwijaya University

DOI: https://doi.org/10.30598/pijmathvol1iss1pp33-42

Keywords: Support Vector Machine, Data Mining, Breast Cancer, Classification

Abstract

Cancer is one of the leading causes of death, and breast cancer is the second leading cause of cancer death in women. One method to realize the level of malignancy of breast cancer from an early age is by classifying the cancer malignancy using data mining. One of the widely used data mining methods with a good level of accuracy is the Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). Evaluation techniques of percentage split and cross-validation were used to evaluate and compare the SVM and KNN classification models. The result was that the accuracy level of the SVM classification method was better than the KNN classification method when using the cross-validation technique, which is 95,7081%. Meanwhile, the KNN classification method was better than the SVM classification method when using the percentage split technique, which is 95,4220%. From the comparison results, it can be seen that the KNN and SVM methods work well in the classification of breast cancer.

Downloads

Download data is not yet available.

References

J. R. Benson and I. Jatoi, “The global breast cancer burden,” Futur. Oncol., vol. 8, no. 6, pp. 697–702, 2012, doi: 10.2217/fon.12.61.

Y.-S. Sun et al., “Risk Factors and Preventions of Breast Cancer,” Int. J. Biol. Sci., vol. 13, no. 11, pp. 1387–1397, Nov. 2017, doi: 10.7150/ijbs.21635.

C. E. Desantis et al., “Breast cancer statistics, 2019,” CA. Cancer J. Clin., vol. 69, no. 6, pp. 438–451, Nov. 2019, doi: https://doi.org/10.3322/caac.21583.

J. Yang et al., “Brief introduction of medical database and data mining technology in big data era,” J. Evid. Based. Med., vol. 13, no. 1, pp. 57–69, Feb. 2020, doi: https://doi.org/10.1111/jebm.12373.

M. J. H. Mughal, “Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview,” 2018.

G. Kesavaraj and S. Sukumaran, “A study on classification techniques in data mining,” in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013, pp. 1–7. doi: 10.1109/ICCCNT.2013.6726842.

R. Geetha, R. Professor, & Head, and G. Sivagami, “Parkinson Disease Classification using Data Mining Algorithms,” 2011.

H. Huang et al., “A new fruit fly optimization algorithm enhanced support vector machine for diagnosis of breast cancer based on high-level features,” BMC Bioinformatics, vol. 20, no. Suppl 8, pp. 1–15, 2019, doi: 10.1186/s12859-019-2771-z.

C. Aroef, Y. Rivan, and Z. Rustam, “Comparing random forest and support vector machines for breast cancer classification,” Telkomnika (Telecommunication Comput. Electron. Control., vol. 18, no. 2, pp. 815–821, 2020, doi: 10.12928/TELKOMNIKA.V18I2.14785.

N. Liu, J. Shen, M. Xu, D. Gan, E. S. Qi, and B. Gao, “Improved Cost-Sensitive Support Vector Machine Classifier for Breast Cancer Diagnosis,” Math. Probl. Eng., vol. 2018, 2018, doi: 10.1155/2018/3875082.

T. M. Cover and P. E. Hart, “Approximate Formulas for the Information Transmitted by a Discrete Communication Channel,” IEEE Manhattan, NY, USA, vol. 24, 1952.

I. Triguero, D. García-Gil, J. Maillo, J. Luengo, S. García, and F. Herrera, “Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data.” Wiley-Blackwell, Mar. 01, 2019.

H. Rajaguru and S. C. S R, “Analysis of Decision Tree and K-Nearest Neighbor Algorithm in the Classification of Breast Cancer,” Asian Pac. J. Cancer Prev., vol. 20, no. 12, pp. 3777–3781, Dec. 2019, doi: 10.31557/APJCP.2019.20.12.3777.

C. Eyupoglu, “Breast cancer classification using k-nearest neighbors algorithm,” Online J. Sci. Technol., vol. 8, no. 3, pp. 29–34, 2018.

Z. Mushtaq, A. Yaqub, S. Sani, and A. Khalid, “Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets,” Jan. 2020.

D. Singh and B. Singh, “Investigating the impact of data normalization on classification performance,” Appl. Soft Comput., vol. 97, Dec. 2020, doi: 10.1016/j.asoc.2019.105524.

H. Kwon, K. C. Oh, Y. Choi, Y. G. Chung, and J. Kim, “Development and application of machine learning-based prediction model for distillation column,” Int. J. Intell. Syst., vol. 36, no. 5, pp. 1970–1997, May 2021, doi: https://doi.org/10.1002/int.22368.

A. Rizwan, N. Iqbal, R. Ahmad, and D.-H. Kim, “WR-SVM model based on the margin radius approach for solving the minimum enclosing ball problem in support vector machine classification,” Appl. Sci., vol. 11, no. 10, p. 4657, 2021.

O. Anava, K. Y. Levy, and E. Zurich, “K-Nearest Neighbors: From Global to Local,” 2016.

H. A. Abu Alfeilat et al., “Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review,” Big Data, vol. 7, no. 4. Mary Ann Liebert Inc., pp. 221–248, Dec. 01, 2019. doi: 10.1089/big.2018.0175.

R. Paredes, J. S. Cardoso, and X. M. Pardo, “Pattern recognition and image analysis: 7th Iberian conference, IbPRIA 2015 Santiago de Compostela, Spain, june 17–19, 2015 proceedings,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 9117. doi: 10.1007/978-3-319-19390-8.

A. Krouska, C. Troussas, and M. Virvou, “Comparative evaluation of algorithms for sentiment analysis over social networking services.,” J. Univers. Comput. Sci., vol. 23, no. 8, pp. 755–768, 2017.