SynKGen: A kernel PCA-Based oversampling method for enhanced credit card fraud detection

Fray L. Becerra-Suarez; Luciani J.  Jiménez-Fernández; Estrella D. Ticona-Tapia; José Rolando Cárdenas-Gonzáles; Pepe Humberto Bustamante-Quintana

doi:10.51252/rcsi.v5i2.952

Authors

Fray L. Becerra-Suarez Universidad Norbert Wiener https://orcid.org/0000-0001-7445-7132
Luciani J. Jiménez-Fernández Universidad Tecnológica del Perú https://orcid.org/0009-0001-0145-5168
Estrella D. Ticona-Tapia Universidad Señor de Sipán https://orcid.org/0009-0009-7768-3384
José Rolando Cárdenas-Gonzáles Universidad Señor de Sipán https://orcid.org/0000-0002-8141-9086
Pepe Humberto Bustamante-Quintana Universidad Señor de Sipán https://orcid.org/0000-0001-9842-8432

DOI:

https://doi.org/10.51252/rcsi.v5i2.952

Keywords:

ADASYN, class imbalance, ensemble learning, financial security, Kernel PCA, machine learning, SMOTE

Abstract

Credit card fraud detection is a growing challenge in the financial domain due to data imbalance, where fraudulent transactions are minimal compared to legitimate ones. This study presents SynKGen, a data augmentation method using Kernel PCA with Gaussian perturbations to generate synthetic samples of the minority class, contrasting it with ADASYN and SMOTE. By introducing variance analysis with controlled perturbations in the minority class, the proposed approach mitigates the risks of overfitting associated with traditional interpolation-based techniques. Four classifiers, XGBoost, RandomForest, AdaBoost and VotingClassifier, were evaluated using the original data set and variants with data augmentation. The RandomForest classifier achieved the best performance when using data generated with SynKGen (accuracy: 0.9949, precision:0.9899) outperforming the results obtained with ADASYN and SMOTE. Experimental results demonstrate that SynKGen improves the effectiveness of credit card bank fraud detection. These findings highlight the importance of data augmentation strategies to optimize classifier performance in financial contexts with unbalanced data.

Downloads

Download data is not yet available.

References

Adil, M., Yinjun, Z., Jamjoom, M. M., & Ullah, Z. (2024). OptDevNet: A Optimized Deep Event-Based Network Framework for Credit Card Fraud Detection. IEEE Access, 12, 132421–132433. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3458944

Alatawi, M. N. (2025). Detection of fraud in IoT based credit card collected dataset using machine learning. Machine Learning with Applications, 19, 100603. https://doi.org/10.1016/j.mlwa.2024.100603

Alfaiz, N. S., & Fati, S. M. (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11(4), Article 4. https://doi.org/10.3390/electronics11040662

Attouri, K., Mansouri, M., Hajji, M., Kouadri, A., Bensmail, A., Bouzrara, K., & Nounou, H. (2024). Improved fault detection based on kernel PCA for monitoring industrial applications. Journal of Process Control, 133, 103143. https://doi.org/10.1016/j.jprocont.2023.103143

Becerra-Suarez, F. L., Fernández-Roman, I., & Forero, M. G. (2024). Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing. Mathematics, 12(9), Article 9. https://doi.org/10.3390/math12091294

Charizanos, G., Demirhan, H., & İçen, D. (2024). An online fuzzy fraud detection framework for credit card transactions. Expert Systems with Applications, 252, 124127. https://doi.org/10.1016/j.eswa.2024.124127

Chatterjee, P., Das, D., & Rawat, D. B. (2024). Digital twin for credit card fraud detection: Opportunities, challenges, and fraud detection advancements. Future Generation Computer Systems, 158, 410–426. https://doi.org/10.1016/j.future.2024.04.057

Cherif, A., Badhib, A., Ammar, H., Alshehri, S., Kalkatawi, M., & Imine, A. (2023). Credit card fraud detection in the era of disruptive technologies: A systematic review. Journal of King Saud University - Computer and Information Sciences, 35(1), 145–174. https://doi.org/10.1016/j.jksuci.2022.11.008

Coello, K., Zhou, K., Nutalapati, H., & Tiglao, N. M. C. (2023). Performance Analysis of Credit Card Fraud Analysis and Detection Machine Learning Algorithms. 2023 International Symposium on Networks, Computers and Communications (ISNCC), 1–6. https://doi.org/10.1109/ISNCC58260.2023.10323945

Dastidar, K. G., Caelen, O., & Granitzer, M. (2024). Machine Learning Methods for Credit Card Fraud Detection: A Survey. IEEE Access, 12, 158939–158965. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3487298

Hasan, M., Hoque, A., & Le, T. (2023). Big Data-Driven Banking Operations: Opportunities, Challenges, and Data Security Perspectives. FinTech, 2(3), Article 3. https://doi.org/10.3390/fintech2030028

Hilal, W., Gadsden, S. A., & Yawney, J. (2022). Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193, 116429. https://doi.org/10.1016/j.eswa.2021.116429

Ileberi, E., & Sun, Y. (2024). A Hybrid Deep Learning Ensemble Model for Credit Card Fraud Detection. IEEE Access, 12, 175829–175838. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3502542

Interbank. (2024). Banca por Internet: Es tiempo de ir por más - Interbank. https://interbank.pe/comunicado

Jain, Y., Tiwari, N., Dubey, S., & Jain, S. (2019). A comparative analysis of various credit card fraud detection techniques. International Journal of Recent Technology and Engineering, 7, 402–407.

Juniper Research. (n.d.). Online Payment Fraud Losses to Exceed $206 Billion Over the Next Five Years | Press. Retrieved December 16, 2024, from https://www.juniperresearch.com/press/online-payment-fraud-losses-to-exceed-206-billion/

Kaggle. (2024). Retrieved December 22, 2024, from https://www.kaggle.com/datasets/bhadramohit/credit-card-fraud-detection

Kaib, M. T. H., Kouadri, A., Harkat, M. F., Bensmail, A., & Mansouri, M. (2025). Data size reduction approach for nonlinear process monitoring refinement using Kernel PCA technique. Expert Systems with Applications, 274, 126975. https://doi.org/10.1016/j.eswa.2025.126975

Lazcano, A., & Jaramillo-Morán, M. A. (2025). Data preprocessing techniques and neural networks for trended time series forecasting. Applied Soft Computing, 174, 113063. https://doi.org/10.1016/j.asoc.2025.113063

Le, T.-T.-H., Hwang, Y., Kang, H., & Kim, H. (2024). Robust Credit Card Fraud Detection Based on Efficient Kolmogorov-Arnold Network Models. IEEE Access, 12, 157006–157020. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3485200

Mondal, I. A., Haque, Md. E., Hassan, A.-M., & Shatabda, S. (2021). Handling Imbalanced Data for Credit Card Fraud Detection. 2021 24th International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT54785.2021.9689866

Ranganatha, H. R., & Syed, A. (2025). Enhancing fraud detection efficiency in mobile transactions through the integration of bidirectional 3d Quasi-Recurrent Neural network and blockchain technologies. Expert Systems with Applications, 260, 125179. https://doi.org/10.1016/j.eswa.2024.125179

Rb, A., & Kr, S. K. (2021). Credit card fraud detection using artificial neural network. Global Transitions Proceedings, 2(1), 35–41. https://doi.org/10.1016/j.gltp.2021.01.006

Sulaiman, S. S., Nadher, I., & Hameed, S. M. (2024). Credit Card Fraud Detection Using Improved Deep Learning Models. Computers, Materials and Continua, 78(1), 1049–1069. https://doi.org/10.32604/cmc.2023.046051

Tang, B., & He, H. (2015). KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning. 2015 IEEE Congress on Evolutionary Computation (CEC), 664–671. https://doi.org/10.1109/CEC.2015.7256954

Tang, Y., & Liu, Z. (2024). A Credit Card Fraud Detection Algorithm Based on SDT and Federated Learning. IEEE Access, 12, 182547–182560. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3491175

Wijaya, M. G., Pinaringgi, M. F., Zakiyyah, A. Y., & Meiliana. (2024). Comparative Analysis of Machine Learning Algorithms and Data Balancing Techniques for Credit Card Fraud Detection. Procedia Computer Science, 245, 677–688. https://doi.org/10.1016/j.procs.2024.10.294

Yang, Z., Wang, Y., Shi, H., & Qiu, Q. (2024). Leveraging Mixture of Experts and Deep Learning-Based Data Rebalancing to Improve Credit Fraud Detection. Big Data and Cognitive Computing, 8(11), Article 11. https://doi.org/10.3390/bdcc8110151

Zhang, C., Nie, F., & Xiang, S. (2010). A general kernelization framework for learning algorithms based on kernel PCA. Neurocomputing, 73(4), 959–967. https://doi.org/10.1016/j.neucom.2009.08.014

Zhao, X., Liu, Y., & Zhao, Q. (2024). Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection. IEEE Access, 12, 159316–159335. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3487212