Predictive analysis and modeling of academic performance using machine learning techniques in a secondary education institution
DOI:
https://doi.org/10.51252/rcsi.v6i1.1212Keywords:
machine learning, data science, secondary education, pythonAbstract
Academic performance is a key indicator for evaluating educational quality and identifying areas for improvement in teaching and learning processes. This study analyzes a dataset of first-year lower secondary students from an educational institution in the province of Salta, Argentina, with the aim of identifying variables that influence student performance and supporting decision-making to mitigate low academic achievement. Following the CRISP-DM methodology, an exploratory analysis was conducted to identify relevant patterns in grades, unsupervised learning models were applied to detect student profiles, and supervised models were used to predict year completion based on second-term grades. The best-performing model achieved an F1-score of 0.80 for the minority class and an overall accuracy of 89%. The results enable early identification of students at academic risk and the segmentation of student profiles, providing valuable insights for more effective pedagogical interventions.
Downloads
References
Amalia, N. L. R., Supianto, A. A., Setiawan, N. Y., Zilvan, V., Yuliani, A. R., & Ramdan, A. (2021). Student Academic Mark Clustering Analysis and Usability Scoring on Dashboard Development Using K-Means Algorithm and System Usability Scale. Jurnal Ilmu Komputer Dan Informasi, 14(2), 137–143. https://doi.org/10.21609/jiki.v14i2.980
Belete, D. M., & Huchaiah, M. D. (2022). Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. International Journal of Computers and Applications, 44(9), 875–886. https://doi.org/10.1080/1206212X.2021.1974663
Bellaj, M., Ben Dahmane, A., Boudra, S., & Lamarti Sefian, M. (2024). Educational Data Mining: Employing Machine Learning Techniques and Hyperparameter Optimization to Improve Students’ Academic Performance. International Journal of Online and Biomedical Engineering (IJOE), 20(03), 55–74. https://doi.org/10.3991/ijoe.v20i03.46287
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics - Theory and Methods, 3(1), 1–27. https://doi.org/10.1080/03610927408827101
Chapman, P. (2000). Chapman, P. (2000). CRISP-DM 1.0: Step-by-step data mining guide. https://www.semanticscholar.org/paper/CRISP-DM-1.0%3A-Step-by-step-data-mining-guide-Chapman/54bad20bbc7938991bf34f86dde0babfbd2d5a72
Chen, T., & Guestrin, C. (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
García, A. M. (2014). Rendimiento académico y abandono universitario modelos, resultados y alcances de la producción académica en la Argentina. Revista Argentina de Educación Superior. http://hdl.handle.net/11336/35674
Ghahramani, Z. (2003). Unsupervised Learning. ML Summer Schools. https://doi.org/https://doi.org/10.1007/978-3-540-28650-9_5
Guanin-Fajardo, J. H., Guaña-Moya, J., & Casillas, J. (2024). Predicting Academic Success of College Students Using Machine Learning Techniques. Data, 9(4), 60. https://doi.org/10.3390/data9040060
Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. https://doi.org/10.1016/j.neucom.2005.12.126
Ibarra, C. S. (2020). TÉCNICAS DE DATA MINING APLICADAS A LA DESERCIÓN DE LOS ESTUDIANTES DE LA FACULTAD DE CIENCIAS EXACTAS [Universidad del Norte Santo Tomás de Aquino]. https://doi.org/https://doi.org/10.13140/RG.2.2.29986.66244
Kohavi, R. (2001). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence, 2, 1137–1143. https://www.researchgate.net/publication/2352264_A_Study_of_Cross-Validation_and_Bootstrap_for_Accuracy_Estimation_and_Model_Selection
Leng, Q., Guo, J., Tao, J., Meng, X., & Wang, C. (2024). OBMI: oversampling borderline minority instances by a two-stage Tomek link-finding procedure for class imbalance problem. Complex & Intelligent Systems, 10(4), 4775–4792. https://doi.org/10.1007/s40747-024-01399-y
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Berkeley Symp. on Math. Statist. and Prob. University of California, Los Angeles.
Martínez, C. A., Hohl, D. M., Gutiérrez, M. de los A., Palmal, S., Faux, P., Adhikari, K., Gonzalez-Jose, R., Bortolini, M. C., Acuña-Alonzo, V., Gallo, C., Linares, A. R., Rothhammer, F., Catanesi, C. I., & Barrientos, R. J. (2025). DNA-based prediction of eye color in Latin American population applying Machine Learning models. Computers in Biology and Medicine, 194, 110404. https://doi.org/10.1016/j.compbiomed.2025.110404
Menacho Chiok, C. H. (2017). Predicción del rendimiento académico aplicando técnicas de minería de datos. Anales Científicos, 78(1), 26. https://doi.org/10.21704/ac.v78i1.811
Mohamed Nafuri, A. F., Sani, N. S., Zainudin, N. F. A., Rahman, A. H. A., & Aliff, M. (2022). Clustering Analysis for Classifying Student Academic Performance in Higher Education. Applied Sciences, 12(19), 9467. https://doi.org/10.3390/app12199467
Ogunsanya, M., Isichei, J., & Desai, S. (2023). Grid search hyperparameter tuning in additive manufacturing processes. SME North American Manufacturing Research Conference. https://doi.org/https://doi.org/10.1016/j.mfglet.2023.08.056
Plathottam, S. J., Rzonca, A., Lakhnori, R., & Iloeje, C. O. (2023). A review of artificial intelligence applications in manufacturing operations. Journal of Advanced Manufacturing and Processing, 5(3). https://doi.org/10.1002/amp2.10159
Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1), 6086. https://doi.org/10.1038/s41598-024-56706-x
Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Wiley Interdisciplinary Reviews, 10(3). https://doi.org/https://doi.org/10.1002/widm.1355
Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing, 528, 178–199. https://doi.org/10.1016/j.neucom.2023.01.043
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Saltos-Mero, J., & Cruz-Felipe, M. (2024). Análisis del rendimiento académico de estudiantes de las carreras Economía y Turismo con Power BI en los periodos (2021). 593 Digital Publisher CEIT, 9(1), 762–772. https://doi.org/10.33386/593dp.2024.1.2162
Shobha, G., & Rangaswamy, S. (2018). Machine Learning (pp. 197–228). https://doi.org/10.1016/bs.host.2018.07.004
Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Cornell University. https://doi.org/https://doi.org/10.48550/arXiv.1206.2944
Syakur, M. A., Khotimah, B. K., Rochman, E. M. S., & Satoto, B. D. (2018). Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. IOP Conference Series: Materials Science and Engineering, 336, 012017. https://doi.org/10.1088/1757-899X/336/1/012017
Thorndike, R. L. (1953). Who Belongs in the Family? Psychometrika, 18(4), 267–276. https://doi.org/10.1007/BF02289263
Tukey, J. W. (1977). Exploratory Data Analysis, Volumen 2 (18th ed.). Addison-Wesley Publishing Company.
Wang, J., Lu, S., Wang, S.-H., & Zhang, Y.-D. (2022). A review on extreme learning machine. Multimedia Tools and Applications, 81(29), 41611–41660. https://doi.org/10.1007/s11042-021-11007-7
Yang, S. J. H., Lu, O. H. T., Huang, A. Y. Q., Huang, J. C. H., & Hiroaki Ogata, A. J. Q. L. (2018). Predicting Students’ Academic Performance Using Multiple Linear Regression and Principal Component Analysis. J-Stage, 26, 170–176. https://doi.org/https://doi.org/10.2197/ipsjjip.26.170
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH. ACM SIGMOD Record, 25(2), 103–114. https://doi.org/10.1145/235968.233324
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Alejandro Miguel Zalasar, Ramón Aramayo, Cristian Alejandro Martínez

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors retain their rights:
a. The authors retain their trademark and patent rights, as well as any process or procedure described in the article.
b. The authors retain the right to share, copy, distribute, execute and publicly communicate the article published in the Revista Científica de Sistemas e Informática (RCSI) (for example, place it in an institutional repository or publish it in a book), with an acknowledgment of its initial publication in the RCSI.
c. Authors retain the right to make a subsequent publication of their work, to use the article or any part of it (for example: a compilation of their works, notes for conferences, thesis, or for a book), provided that they indicate the source of publication (authors of the work, journal, volume, number and date).





