SynKGen: A kernel PCA-Based oversampling method for enhanced credit card fraud detection
DOI:
https://doi.org/10.51252/rcsi.v5i2.952Keywords:
ADASYN, class imbalance, ensemble learning, financial security, Kernel PCA, machine learning, SMOTEAbstract
Credit card fraud detection is a growing challenge in the financial domain due to data imbalance, where fraudulent transactions are minimal compared to legitimate ones. This study presents SynKGen, a data augmentation method using Kernel PCA with Gaussian perturbations to generate synthetic samples of the minority class, contrasting it with ADASYN and SMOTE. By introducing variance analysis with controlled perturbations in the minority class, the proposed approach mitigates the risks of overfitting associated with traditional interpolation-based techniques. Four classifiers, XGBoost, RandomForest, AdaBoost and VotingClassifier, were evaluated using the original data set and variants with data augmentation. The RandomForest classifier achieved the best performance when using data generated with SynKGen (accuracy: 0.9949, precision:0.9899) outperforming the results obtained with ADASYN and SMOTE. Experimental results demonstrate that SynKGen improves the effectiveness of credit card bank fraud detection. These findings highlight the importance of data augmentation strategies to optimize classifier performance in financial contexts with unbalanced data.
References
Adil, M., Yinjun, Z., Jamjoom, M. M., & Ullah, Z. (2024). OptDevNet: A Optimized Deep Event-Based Network Framework for Credit Card Fraud Detection. IEEE Access, 12, 132421–132433. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3458944 DOI: https://doi.org/10.1109/ACCESS.2024.3458944
Alatawi, M. N. (2025). Detection of fraud in IoT based credit card collected dataset using machine learning. Machine Learning with Applications, 19, 100603. https://doi.org/10.1016/j.mlwa.2024.100603 DOI: https://doi.org/10.1016/j.mlwa.2024.100603
Alfaiz, N. S., & Fati, S. M. (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11(4), Article 4. https://doi.org/10.3390/electronics11040662 DOI: https://doi.org/10.3390/electronics11040662
Attouri, K., Mansouri, M., Hajji, M., Kouadri, A., Bensmail, A., Bouzrara, K., & Nounou, H. (2024). Improved fault detection based on kernel PCA for monitoring industrial applications. Journal of Process Control, 133, 103143. https://doi.org/10.1016/j.jprocont.2023.103143 DOI: https://doi.org/10.1016/j.jprocont.2023.103143
Becerra-Suarez, F. L., Fernández-Roman, I., & Forero, M. G. (2024). Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing. Mathematics, 12(9), Article 9. https://doi.org/10.3390/math12091294 DOI: https://doi.org/10.3390/math12091294
Charizanos, G., Demirhan, H., & İçen, D. (2024). An online fuzzy fraud detection framework for credit card transactions. Expert Systems with Applications, 252, 124127. https://doi.org/10.1016/j.eswa.2024.124127 DOI: https://doi.org/10.1016/j.eswa.2024.124127
Chatterjee, P., Das, D., & Rawat, D. B. (2024). Digital twin for credit card fraud detection: Opportunities, challenges, and fraud detection advancements. Future Generation Computer Systems, 158, 410–426. https://doi.org/10.1016/j.future.2024.04.057 DOI: https://doi.org/10.1016/j.future.2024.04.057
Cherif, A., Badhib, A., Ammar, H., Alshehri, S., Kalkatawi, M., & Imine, A. (2023). Credit card fraud detection in the era of disruptive technologies: A systematic review. Journal of King Saud University - Computer and Information Sciences, 35(1), 145–174. https://doi.org/10.1016/j.jksuci.2022.11.008 DOI: https://doi.org/10.1016/j.jksuci.2022.11.008
Coello, K., Zhou, K., Nutalapati, H., & Tiglao, N. M. C. (2023). Performance Analysis of Credit Card Fraud Analysis and Detection Machine Learning Algorithms. 2023 International Symposium on Networks, Computers and Communications (ISNCC), 1–6. https://doi.org/10.1109/ISNCC58260.2023.10323945 DOI: https://doi.org/10.1109/ISNCC58260.2023.10323945
Dastidar, K. G., Caelen, O., & Granitzer, M. (2024). Machine Learning Methods for Credit Card Fraud Detection: A Survey. IEEE Access, 12, 158939–158965. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3487298 DOI: https://doi.org/10.1109/ACCESS.2024.3487298
Hasan, M., Hoque, A., & Le, T. (2023). Big Data-Driven Banking Operations: Opportunities, Challenges, and Data Security Perspectives. FinTech, 2(3), Article 3. https://doi.org/10.3390/fintech2030028 DOI: https://doi.org/10.3390/fintech2030028
Hilal, W., Gadsden, S. A., & Yawney, J. (2022). Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193, 116429. https://doi.org/10.1016/j.eswa.2021.116429 DOI: https://doi.org/10.1016/j.eswa.2021.116429
Ileberi, E., & Sun, Y. (2024). A Hybrid Deep Learning Ensemble Model for Credit Card Fraud Detection. IEEE Access, 12, 175829–175838. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3502542 DOI: https://doi.org/10.1109/ACCESS.2024.3502542
Interbank. (2024). Banca por Internet: Es tiempo de ir por más - Interbank. https://interbank.pe/comunicado
Jain, Y., Tiwari, N., Dubey, S., & Jain, S. (2019). A comparative analysis of various credit card fraud detection techniques. International Journal of Recent Technology and Engineering, 7, 402–407.
Juniper Research. (n.d.). Online Payment Fraud Losses to Exceed $206 Billion Over the Next Five Years | Press. Retrieved December 16, 2024, from https://www.juniperresearch.com/press/online-payment-fraud-losses-to-exceed-206-billion/
Kaggle. (2024). Retrieved December 22, 2024, from https://www.kaggle.com/datasets/bhadramohit/credit-card-fraud-detection
Kaib, M. T. H., Kouadri, A., Harkat, M. F., Bensmail, A., & Mansouri, M. (2025). Data size reduction approach for nonlinear process monitoring refinement using Kernel PCA technique. Expert Systems with Applications, 274, 126975. https://doi.org/10.1016/j.eswa.2025.126975 DOI: https://doi.org/10.1016/j.eswa.2025.126975
Lazcano, A., & Jaramillo-Morán, M. A. (2025). Data preprocessing techniques and neural networks for trended time series forecasting. Applied Soft Computing, 174, 113063. https://doi.org/10.1016/j.asoc.2025.113063 DOI: https://doi.org/10.1016/j.asoc.2025.113063
Le, T.-T.-H., Hwang, Y., Kang, H., & Kim, H. (2024). Robust Credit Card Fraud Detection Based on Efficient Kolmogorov-Arnold Network Models. IEEE Access, 12, 157006–157020. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3485200 DOI: https://doi.org/10.1109/ACCESS.2024.3485200
Mondal, I. A., Haque, Md. E., Hassan, A.-M., & Shatabda, S. (2021). Handling Imbalanced Data for Credit Card Fraud Detection. 2021 24th International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT54785.2021.9689866 DOI: https://doi.org/10.1109/ICCIT54785.2021.9689866
Ranganatha, H. R., & Syed, A. (2025). Enhancing fraud detection efficiency in mobile transactions through the integration of bidirectional 3d Quasi-Recurrent Neural network and blockchain technologies. Expert Systems with Applications, 260, 125179. https://doi.org/10.1016/j.eswa.2024.125179 DOI: https://doi.org/10.1016/j.eswa.2024.125179
Rb, A., & Kr, S. K. (2021). Credit card fraud detection using artificial neural network. Global Transitions Proceedings, 2(1), 35–41. https://doi.org/10.1016/j.gltp.2021.01.006 DOI: https://doi.org/10.1016/j.gltp.2021.01.006
Sulaiman, S. S., Nadher, I., & Hameed, S. M. (2024). Credit Card Fraud Detection Using Improved Deep Learning Models. Computers, Materials and Continua, 78(1), 1049–1069. https://doi.org/10.32604/cmc.2023.046051 DOI: https://doi.org/10.32604/cmc.2023.046051
Tang, B., & He, H. (2015). KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning. 2015 IEEE Congress on Evolutionary Computation (CEC), 664–671. https://doi.org/10.1109/CEC.2015.7256954 DOI: https://doi.org/10.1109/CEC.2015.7256954
Tang, Y., & Liu, Z. (2024). A Credit Card Fraud Detection Algorithm Based on SDT and Federated Learning. IEEE Access, 12, 182547–182560. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3491175 DOI: https://doi.org/10.1109/ACCESS.2024.3491175
Wijaya, M. G., Pinaringgi, M. F., Zakiyyah, A. Y., & Meiliana. (2024). Comparative Analysis of Machine Learning Algorithms and Data Balancing Techniques for Credit Card Fraud Detection. Procedia Computer Science, 245, 677–688. https://doi.org/10.1016/j.procs.2024.10.294 DOI: https://doi.org/10.1016/j.procs.2024.10.294
Yang, Z., Wang, Y., Shi, H., & Qiu, Q. (2024). Leveraging Mixture of Experts and Deep Learning-Based Data Rebalancing to Improve Credit Fraud Detection. Big Data and Cognitive Computing, 8(11), Article 11. https://doi.org/10.3390/bdcc8110151 DOI: https://doi.org/10.3390/bdcc8110151
Zhang, C., Nie, F., & Xiang, S. (2010). A general kernelization framework for learning algorithms based on kernel PCA. Neurocomputing, 73(4), 959–967. https://doi.org/10.1016/j.neucom.2009.08.014 DOI: https://doi.org/10.1016/j.neucom.2009.08.014
Zhao, X., Liu, Y., & Zhao, Q. (2024). Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection. IEEE Access, 12, 159316–159335. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3487212 DOI: https://doi.org/10.1109/ACCESS.2024.3487212

Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Fray L. Becerra-Suarez, Luciani J. Jiménez-Fernández, Estrella D. Ticona-Tapia, José Rolando Cárdenas-Gonzáles, Pepe Humberto Bustamante-Quintana

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors retain their rights:
a. The authors retain their trademark and patent rights, as well as any process or procedure described in the article.
b. The authors retain the right to share, copy, distribute, execute and publicly communicate the article published in the Revista Científica de Sistemas e Informática (RCSI) (for example, place it in an institutional repository or publish it in a book), with an acknowledgment of its initial publication in the RCSI.
c. Authors retain the right to make a subsequent publication of their work, to use the article or any part of it (for example: a compilation of their works, notes for conferences, thesis, or for a book), provided that they indicate the source of publication (authors of the work, journal, volume, number and date).