Mammographic mass discrimination using K-Nearest Neighbor and BIRADS attribute




breast cancer, knn, machine learning, prognostic


The mammography is the most effective method for the detection of breast cancer, however, the predictive value is low, and it can lead to unnecessary biopsies with benign results. This research aims to develop a predictive model for discrimination of mammographic masses using KNN and BIRADS attributes with an acceptable level of Accuracy, Precision, Recall and F1-Score. For this, we carried out the following phases: Data cleaning, KNN algorithm training and selection. The result obtained was a mammographic mass discrimination model with an accuracy=85% and acceptable levels of precision, sensitivity and F1-score. We concluded that it is possible to use this model as an element of judgment for the diagnosis of breast cancer; also that through the error rate it is possible to find optimal KNN models


Aguilar, R. M., Torres, J. M., & Martín, C. A. (2018). Aprendizaje Automático en la Identificación de Sistemas. Un Caso de Estudio en la Predicción de la Generación Eléctrica de un Parque Eólico. Revista Iberoamericana de Automática e Informática Industrial, 16(1), 114.

Al-Azzam, N., & Shatnawi, I. (2021). Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer. Annals of Medicine and Surgery, 53–64.

Alegría Delgado, D., & Huamani Navarro, M. (2019). Factores asociados a la toma de mamografía en mujeres peruanas: análisis de la Encuesta Demográfica de Salud Familiar, 2015. Anales de La Facultad de Medicina, 80(3), 327–331.

Arslan, H., & Arslan, H. (2021). A new COVID-19 detection method from human genome sequences using CpG island features and KNN classifier. Engineering Science and Technology, an International Journal, 24(4), 839–847.

Blanc Pihuave, G., Cevallos Torres, L., & Arteaga Vera, J. (2020). Modelo computacional de clasificación de aprendizaje de máquina supervisado, para el análisis de datos cardiovasculares y pronóstico médico. Ecuadorian Science Journal, 4(2), 71–79.

Pathanjali, C., Vimuktha, E. S., Jalaja, G., & Latha, A. (2018). A Comparative Study of Indian Food Image Classification Using K-Nearest-Neighbour and Support-Vector-Machines. International Journal of Engineering & Technology, 7(3.12), 521.

Chachaima-Mar, J. E., Pineda-Reyes, J., Marin, R., Lozano-Miranda, Z., & Chian-García, C. (2021). Perfil inmunofenotípico de cáncer de mama de pacientes atendidas en un hospital general de Lima, Perú. Revista Medica Herediana, 31(4), 235–241.

Cherif, W. (2018). Optimization of K-NN algorithm by clustering and reliability coefficients: Application to breast-cancer diagnosis. Procedia Computer Science, 127, 293–299.

Comarela, G., Franco, G., Trois, C., Liberato, A., Martinello, M., Corrêa, J. H., & Villaça, R. (2019). Introdução à Ciência de Dados: Uma Visão Pragmática utilizando Python, Aplicações e Oportunidades em Redes de Computadores. In Minicursos do XXXVII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (2019).

Dhahri, H., Al Maghayreh, E., Mahmood, A., Elkilani, W., & Faisal Nagi, M. (2019). Automated Breast Cancer Diagnosis Based on Machine Learning Algorithms. Journal of Healthcare Engineering, 2019.

Eedi, H., & Kolla, M. (2020). Machine learning approaches for healthcare data analysis. Journal of Critical Reviews, 7(4), 806–811.

Ehsani, R., & Drabløs, F. (2020). Robust Distance Measures for kNN Classification of Cancer Data. Cancer Informatics, 19.

Elter, M. (2007). Mammographic Mass Data Set. UCI Machine Learning Repository.

Franco, E. F., & Ramos, R. J. (2019). Aprendizaje de Máquina y Aprendizaje Profundo en Biotecnología: Aplicaciones, impactos y desafíos. Ciencia, Ambiente y Clima, 2(2), 7–26.

García Aranda, M., & Redondo, M. (2019). Immunotherapy: A challenge of breast cancer treatment. Cancers, 11(12), 1–18.

Gardezi, S. J. S., Elazab, A., Lei, B., & Wang, T. (2019). Breast cancer detection and diagnosis using mammographic data: Systematic Review. Journal of Medical Internet Research, 21(7), 1–22.

Jean Sunny, Nikita Rane, Rucha Kanade, & Sulochana Devi. (2020). Breast Cancer Classification and Prediction using Machine Learning. International Journal of Engineering Research And, V9(02), 576–580.

Mercaldo, F., Nardone, V., & Santone, A. (2017). Diabetes Mellitus Affected Patients Classification and Diagnosis through Machine Learning Techniques. Procedia Computer Science, 112, 2519–2528.

Resolución Ministerial N° 442-2017/MINSA[Ministerio de Salud]. Plan Nacional de Prevención y control de cáncer de mama en el Perú 2017-2021. 01 de enero 2017

Mohana Priya, T., & Punithavalli, M. (2019). An efficient data mining techniques - Multi-objective KNN algorithm to predict breast cancer. International Journal of Recent Technology and Engineering, 8(8), 986–990.

Naufal, S. A., Adiwijaya, A., & Astuti, W. (2020). Analisis Perbandingan Klasifikasi Support Vector Machine (SVM) dan K-Nearest Neighbors (KNN) untuk Deteksi Kanker dengan Data Microarray. JURIKOM (Jurnal Riset Komputer), 7(1), 162.

Rajaguru, H., & Sannasi Chakravarthy, S. R. (2019). Analysis of decision tree and k-nearest neighbor algorithm in the classification of breast cancer. Asian Pacific Journal of Cancer Prevention, 20(12), 3777–3781.

Rejón Herrera, E. G., Esparza Sánchez, R., Pasos Ruiz, A., & Moreno Caballero, E. (2021). Clasificación de Indicadores de Interacción del uso de la plataforma Moodle para cursos de modalidad B-learning. Tecnología Educativa Revista CONAIC, 2(1), 78–86.

Thanh Noi, P., & Kappas, M. (2017). Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery. Sensors (Basel, Switzerland), 18(1).

Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 1–16.

Vallejos Sologuren, C. S., Aguilar cartagena, A., & Flores Flores, C. J. (2020). Situación del Cáncer en el Perú. Diagnóstico, 52(2), 77–85.

Villavicencio Romero, M. E., Moreno Daza, G. A., Ordóñez Andrade, G. E., & Paredes Colcha, L. M. (2019). Diagnóstico por imágenes de cáncer de mamas. Comparación entre técnica ecográfica y mamografía. Dominio de Las Ciencias, 5(3), 647.

Wu, J., & Hicks, C. (2021). Breast cancer type classification using machine learning. Journal of Personalized Medicine, 11(2), 1–12.

Xing, W., & Bei, Y. (2020). Medical Health Big Data Classification Based on KNN Classification Algorithm. IEEE Access, 8, 28808–28819.

Xu, H., Zhou, J., Asteris, P. G., Armaghani, D. J., & Tahir, M. M. (2019). Supervised machine learning techniques to the prediction of tunnel boring machine penetration rate. Applied Sciences (Switzerland), 9(18), 1–19.

Zhang, S. (2021). Challenges in KNN Classification. IEEE Transactions on Knowledge and Data Engineering, 1–13.

Zielonke, N., Kregting, L. M., Heijnsdijk, E. A. M., Veerus, P., Heinävaara, S., McKee, M., de Kok, I. M. C. M., de Koning, H. J., van Ravesteyn, N. T., Gredinger, G., De Brabander, I., Arbyn, M., Simoens, C., Martens, P., Candeur, M., Arbyn, M., Simoens, C., Burrion, J. B., Dimitrov, P., … Latinovic, R. (2021). The potential of breast cancer screening in Europe. International Journal of Cancer, 148(2), 406–418.



How to Cite

Lévano-Rodriguez, D., & Cerdán-León, F. E. (2022). Mammographic mass discrimination using K-Nearest Neighbor and BIRADS attribute. Revista Científica De Sistemas E Informática, 2(1), e225.