Predictive analysis and modeling of academic performance using machine learning techniques in a secondary education institution

Authors

DOI:

https://doi.org/10.51252/rcsi.v6i1.1212

Keywords:

machine learning, data science, secondary education, python

Abstract

Academic performance is a key indicator for evaluating educational quality and identifying areas for improvement in teaching and learning processes. This study analyzes a dataset of first-year lower secondary students from an educational institution in the province of Salta, Argentina, with the aim of identifying variables that influence student performance and supporting decision-making to mitigate low academic achievement. Following the CRISP-DM methodology, an exploratory analysis was conducted to identify relevant patterns in grades, unsupervised learning models were applied to detect student profiles, and supervised models were used to predict year completion based on second-term grades. The best-performing model achieved an F1-score of 0.80 for the minority class and an overall accuracy of 89%. The results enable early identification of students at academic risk and the segmentation of student profiles, providing valuable insights for more effective pedagogical interventions.

Downloads

Download data is not yet available.

References

Amalia, N. L. R., Supianto, A. A., Setiawan, N. Y., Zilvan, V., Yuliani, A. R., & Ramdan, A. (2021). Student Academic Mark Clustering Analysis and Usability Scoring on Dashboard Development Using K-Means Algorithm and System Usability Scale. Jurnal Ilmu Komputer Dan Informasi, 14(2), 137–143. https://doi.org/10.21609/jiki.v14i2.980

Belete, D. M., & Huchaiah, M. D. (2022). Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. International Journal of Computers and Applications, 44(9), 875–886. https://doi.org/10.1080/1206212X.2021.1974663

Bellaj, M., Ben Dahmane, A., Boudra, S., & Lamarti Sefian, M. (2024). Educational Data Mining: Employing Machine Learning Techniques and Hyperparameter Optimization to Improve Students’ Academic Performance. International Journal of Online and Biomedical Engineering (IJOE), 20(03), 55–74. https://doi.org/10.3991/ijoe.v20i03.46287

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics - Theory and Methods, 3(1), 1–27. https://doi.org/10.1080/03610927408827101

Chapman, P. (2000). Chapman, P. (2000). CRISP-DM 1.0: Step-by-step data mining guide. https://www.semanticscholar.org/paper/CRISP-DM-1.0%3A-Step-by-step-data-mining-guide-Chapman/54bad20bbc7938991bf34f86dde0babfbd2d5a72

Chen, T., & Guestrin, C. (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

García, A. M. (2014). Rendimiento académico y abandono universitario modelos, resultados y alcances de la producción académica en la Argentina. Revista Argentina de Educación Superior. http://hdl.handle.net/11336/35674

Ghahramani, Z. (2003). Unsupervised Learning. ML Summer Schools. https://doi.org/https://doi.org/10.1007/978-3-540-28650-9_5

Guanin-Fajardo, J. H., Guaña-Moya, J., & Casillas, J. (2024). Predicting Academic Success of College Students Using Machine Learning Techniques. Data, 9(4), 60. https://doi.org/10.3390/data9040060

Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70(1–3), 489–501. https://doi.org/10.1016/j.neucom.2005.12.126

Ibarra, C. S. (2020). TÉCNICAS DE DATA MINING APLICADAS A LA DESERCIÓN DE LOS ESTUDIANTES DE LA FACULTAD DE CIENCIAS EXACTAS [Universidad del Norte Santo Tomás de Aquino]. https://doi.org/https://doi.org/10.13140/RG.2.2.29986.66244

Kohavi, R. (2001). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence, 2, 1137–1143. https://www.researchgate.net/publication/2352264_A_Study_of_Cross-Validation_and_Bootstrap_for_Accuracy_Estimation_and_Model_Selection

Leng, Q., Guo, J., Tao, J., Meng, X., & Wang, C. (2024). OBMI: oversampling borderline minority instances by a two-stage Tomek link-finding procedure for class imbalance problem. Complex & Intelligent Systems, 10(4), 4775–4792. https://doi.org/10.1007/s40747-024-01399-y

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Berkeley Symp. on Math. Statist. and Prob. University of California, Los Angeles.

Martínez, C. A., Hohl, D. M., Gutiérrez, M. de los A., Palmal, S., Faux, P., Adhikari, K., Gonzalez-Jose, R., Bortolini, M. C., Acuña-Alonzo, V., Gallo, C., Linares, A. R., Rothhammer, F., Catanesi, C. I., & Barrientos, R. J. (2025). DNA-based prediction of eye color in Latin American population applying Machine Learning models. Computers in Biology and Medicine, 194, 110404. https://doi.org/10.1016/j.compbiomed.2025.110404

Menacho Chiok, C. H. (2017). Predicción del rendimiento académico aplicando técnicas de minería de datos. Anales Científicos, 78(1), 26. https://doi.org/10.21704/ac.v78i1.811

Mohamed Nafuri, A. F., Sani, N. S., Zainudin, N. F. A., Rahman, A. H. A., & Aliff, M. (2022). Clustering Analysis for Classifying Student Academic Performance in Higher Education. Applied Sciences, 12(19), 9467. https://doi.org/10.3390/app12199467

Ogunsanya, M., Isichei, J., & Desai, S. (2023). Grid search hyperparameter tuning in additive manufacturing processes. SME North American Manufacturing Research Conference. https://doi.org/https://doi.org/10.1016/j.mfglet.2023.08.056

Plathottam, S. J., Rzonca, A., Lakhnori, R., & Iloeje, C. O. (2023). A review of artificial intelligence applications in manufacturing operations. Journal of Advanced Manufacturing and Processing, 5(3). https://doi.org/10.1002/amp2.10159

Rainio, O., Teuho, J., & Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, 14(1), 6086. https://doi.org/10.1038/s41598-024-56706-x

Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Wiley Interdisciplinary Reviews, 10(3). https://doi.org/https://doi.org/10.1002/widm.1355

Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing, 528, 178–199. https://doi.org/10.1016/j.neucom.2023.01.043

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Saltos-Mero, J., & Cruz-Felipe, M. (2024). Análisis del rendimiento académico de estudiantes de las carreras Economía y Turismo con Power BI en los periodos (2021). 593 Digital Publisher CEIT, 9(1), 762–772. https://doi.org/10.33386/593dp.2024.1.2162

Shobha, G., & Rangaswamy, S. (2018). Machine Learning (pp. 197–228). https://doi.org/10.1016/bs.host.2018.07.004

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Cornell University. https://doi.org/https://doi.org/10.48550/arXiv.1206.2944

Syakur, M. A., Khotimah, B. K., Rochman, E. M. S., & Satoto, B. D. (2018). Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. IOP Conference Series: Materials Science and Engineering, 336, 012017. https://doi.org/10.1088/1757-899X/336/1/012017

Thorndike, R. L. (1953). Who Belongs in the Family? Psychometrika, 18(4), 267–276. https://doi.org/10.1007/BF02289263

Tukey, J. W. (1977). Exploratory Data Analysis, Volumen 2 (18th ed.). Addison-Wesley Publishing Company.

Wang, J., Lu, S., Wang, S.-H., & Zhang, Y.-D. (2022). A review on extreme learning machine. Multimedia Tools and Applications, 81(29), 41611–41660. https://doi.org/10.1007/s11042-021-11007-7

Yang, S. J. H., Lu, O. H. T., Huang, A. Y. Q., Huang, J. C. H., & Hiroaki Ogata, A. J. Q. L. (2018). Predicting Students’ Academic Performance Using Multiple Linear Regression and Principal Component Analysis. J-Stage, 26, 170–176. https://doi.org/https://doi.org/10.2197/ipsjjip.26.170

Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH. ACM SIGMOD Record, 25(2), 103–114. https://doi.org/10.1145/235968.233324

Published

2026-01-20

How to Cite

Zalasar, A. M., Aramayo, R., & Martínez, C. A. (2026). Predictive analysis and modeling of academic performance using machine learning techniques in a secondary education institution. Revista Científica De Sistemas E Informática, 6(1), e1212. https://doi.org/10.51252/rcsi.v6i1.1212