SynKGen: A kernel PCA-Based oversampling method for enhanced credit card fraud detection

Authors

DOI:

https://doi.org/10.51252/rcsi.v5i2.952

Keywords:

ADASYN, class imbalance, ensemble learning, financial security, Kernel PCA, machine learning, SMOTE

Abstract

Credit card fraud detection is a growing challenge in the financial domain due to data imbalance, where fraudulent transactions are minimal compared to legitimate ones. This study presents SynKGen, a data augmentation method using Kernel PCA with Gaussian perturbations to generate synthetic samples of the minority class, contrasting it with ADASYN and SMOTE. By introducing variance analysis with controlled perturbations in the minority class, the proposed approach mitigates the risks of overfitting associated with traditional interpolation-based techniques. Four classifiers, XGBoost, RandomForest, AdaBoost and VotingClassifier, were evaluated using the original data set and variants with data augmentation. The RandomForest classifier achieved the best performance when using data generated with SynKGen (accuracy: 0.9949, precision:0.9899) outperforming the results obtained with ADASYN and SMOTE. Experimental results demonstrate that SynKGen improves the effectiveness of credit card bank fraud detection. These findings highlight the importance of data augmentation strategies to optimize classifier performance in financial contexts with unbalanced data.

References

Adil, M., Yinjun, Z., Jamjoom, M. M., & Ullah, Z. (2024). OptDevNet: A Optimized Deep Event-Based Network Framework for Credit Card Fraud Detection. IEEE Access, 12, 132421–132433. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3458944 DOI: https://doi.org/10.1109/ACCESS.2024.3458944

Alatawi, M. N. (2025). Detection of fraud in IoT based credit card collected dataset using machine learning. Machine Learning with Applications, 19, 100603. https://doi.org/10.1016/j.mlwa.2024.100603 DOI: https://doi.org/10.1016/j.mlwa.2024.100603

Alfaiz, N. S., & Fati, S. M. (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11(4), Article 4. https://doi.org/10.3390/electronics11040662 DOI: https://doi.org/10.3390/electronics11040662

Attouri, K., Mansouri, M., Hajji, M., Kouadri, A., Bensmail, A., Bouzrara, K., & Nounou, H. (2024). Improved fault detection based on kernel PCA for monitoring industrial applications. Journal of Process Control, 133, 103143. https://doi.org/10.1016/j.jprocont.2023.103143 DOI: https://doi.org/10.1016/j.jprocont.2023.103143

Becerra-Suarez, F. L., Fernández-Roman, I., & Forero, M. G. (2024). Improvement of Distributed Denial of Service Attack Detection through Machine Learning and Data Processing. Mathematics, 12(9), Article 9. https://doi.org/10.3390/math12091294 DOI: https://doi.org/10.3390/math12091294

Charizanos, G., Demirhan, H., & İçen, D. (2024). An online fuzzy fraud detection framework for credit card transactions. Expert Systems with Applications, 252, 124127. https://doi.org/10.1016/j.eswa.2024.124127 DOI: https://doi.org/10.1016/j.eswa.2024.124127

Chatterjee, P., Das, D., & Rawat, D. B. (2024). Digital twin for credit card fraud detection: Opportunities, challenges, and fraud detection advancements. Future Generation Computer Systems, 158, 410–426. https://doi.org/10.1016/j.future.2024.04.057 DOI: https://doi.org/10.1016/j.future.2024.04.057

Cherif, A., Badhib, A., Ammar, H., Alshehri, S., Kalkatawi, M., & Imine, A. (2023). Credit card fraud detection in the era of disruptive technologies: A systematic review. Journal of King Saud University - Computer and Information Sciences, 35(1), 145–174. https://doi.org/10.1016/j.jksuci.2022.11.008 DOI: https://doi.org/10.1016/j.jksuci.2022.11.008

Coello, K., Zhou, K., Nutalapati, H., & Tiglao, N. M. C. (2023). Performance Analysis of Credit Card Fraud Analysis and Detection Machine Learning Algorithms. 2023 International Symposium on Networks, Computers and Communications (ISNCC), 1–6. https://doi.org/10.1109/ISNCC58260.2023.10323945 DOI: https://doi.org/10.1109/ISNCC58260.2023.10323945

Dastidar, K. G., Caelen, O., & Granitzer, M. (2024). Machine Learning Methods for Credit Card Fraud Detection: A Survey. IEEE Access, 12, 158939–158965. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3487298 DOI: https://doi.org/10.1109/ACCESS.2024.3487298

Hasan, M., Hoque, A., & Le, T. (2023). Big Data-Driven Banking Operations: Opportunities, Challenges, and Data Security Perspectives. FinTech, 2(3), Article 3. https://doi.org/10.3390/fintech2030028 DOI: https://doi.org/10.3390/fintech2030028

Hilal, W., Gadsden, S. A., & Yawney, J. (2022). Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Systems with Applications, 193, 116429. https://doi.org/10.1016/j.eswa.2021.116429 DOI: https://doi.org/10.1016/j.eswa.2021.116429

Ileberi, E., & Sun, Y. (2024). A Hybrid Deep Learning Ensemble Model for Credit Card Fraud Detection. IEEE Access, 12, 175829–175838. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3502542 DOI: https://doi.org/10.1109/ACCESS.2024.3502542

Interbank. (2024). Banca por Internet: Es tiempo de ir por más - Interbank. https://interbank.pe/comunicado

Jain, Y., Tiwari, N., Dubey, S., & Jain, S. (2019). A comparative analysis of various credit card fraud detection techniques. International Journal of Recent Technology and Engineering, 7, 402–407.

Juniper Research. (n.d.). Online Payment Fraud Losses to Exceed $206 Billion Over the Next Five Years | Press. Retrieved December 16, 2024, from https://www.juniperresearch.com/press/online-payment-fraud-losses-to-exceed-206-billion/

Kaggle. (2024). Retrieved December 22, 2024, from https://www.kaggle.com/datasets/bhadramohit/credit-card-fraud-detection

Kaib, M. T. H., Kouadri, A., Harkat, M. F., Bensmail, A., & Mansouri, M. (2025). Data size reduction approach for nonlinear process monitoring refinement using Kernel PCA technique. Expert Systems with Applications, 274, 126975. https://doi.org/10.1016/j.eswa.2025.126975 DOI: https://doi.org/10.1016/j.eswa.2025.126975

Lazcano, A., & Jaramillo-Morán, M. A. (2025). Data preprocessing techniques and neural networks for trended time series forecasting. Applied Soft Computing, 174, 113063. https://doi.org/10.1016/j.asoc.2025.113063 DOI: https://doi.org/10.1016/j.asoc.2025.113063

Le, T.-T.-H., Hwang, Y., Kang, H., & Kim, H. (2024). Robust Credit Card Fraud Detection Based on Efficient Kolmogorov-Arnold Network Models. IEEE Access, 12, 157006–157020. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3485200 DOI: https://doi.org/10.1109/ACCESS.2024.3485200

Mondal, I. A., Haque, Md. E., Hassan, A.-M., & Shatabda, S. (2021). Handling Imbalanced Data for Credit Card Fraud Detection. 2021 24th International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT54785.2021.9689866 DOI: https://doi.org/10.1109/ICCIT54785.2021.9689866

Ranganatha, H. R., & Syed, A. (2025). Enhancing fraud detection efficiency in mobile transactions through the integration of bidirectional 3d Quasi-Recurrent Neural network and blockchain technologies. Expert Systems with Applications, 260, 125179. https://doi.org/10.1016/j.eswa.2024.125179 DOI: https://doi.org/10.1016/j.eswa.2024.125179

Rb, A., & Kr, S. K. (2021). Credit card fraud detection using artificial neural network. Global Transitions Proceedings, 2(1), 35–41. https://doi.org/10.1016/j.gltp.2021.01.006 DOI: https://doi.org/10.1016/j.gltp.2021.01.006

Sulaiman, S. S., Nadher, I., & Hameed, S. M. (2024). Credit Card Fraud Detection Using Improved Deep Learning Models. Computers, Materials and Continua, 78(1), 1049–1069. https://doi.org/10.32604/cmc.2023.046051 DOI: https://doi.org/10.32604/cmc.2023.046051

Tang, B., & He, H. (2015). KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning. 2015 IEEE Congress on Evolutionary Computation (CEC), 664–671. https://doi.org/10.1109/CEC.2015.7256954 DOI: https://doi.org/10.1109/CEC.2015.7256954

Tang, Y., & Liu, Z. (2024). A Credit Card Fraud Detection Algorithm Based on SDT and Federated Learning. IEEE Access, 12, 182547–182560. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3491175 DOI: https://doi.org/10.1109/ACCESS.2024.3491175

Wijaya, M. G., Pinaringgi, M. F., Zakiyyah, A. Y., & Meiliana. (2024). Comparative Analysis of Machine Learning Algorithms and Data Balancing Techniques for Credit Card Fraud Detection. Procedia Computer Science, 245, 677–688. https://doi.org/10.1016/j.procs.2024.10.294 DOI: https://doi.org/10.1016/j.procs.2024.10.294

Yang, Z., Wang, Y., Shi, H., & Qiu, Q. (2024). Leveraging Mixture of Experts and Deep Learning-Based Data Rebalancing to Improve Credit Fraud Detection. Big Data and Cognitive Computing, 8(11), Article 11. https://doi.org/10.3390/bdcc8110151 DOI: https://doi.org/10.3390/bdcc8110151

Zhang, C., Nie, F., & Xiang, S. (2010). A general kernelization framework for learning algorithms based on kernel PCA. Neurocomputing, 73(4), 959–967. https://doi.org/10.1016/j.neucom.2009.08.014 DOI: https://doi.org/10.1016/j.neucom.2009.08.014

Zhao, X., Liu, Y., & Zhao, Q. (2024). Improved LightGBM for Extremely Imbalanced Data and Application to Credit Card Fraud Detection. IEEE Access, 12, 159316–159335. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3487212 DOI: https://doi.org/10.1109/ACCESS.2024.3487212

Downloads

Published

2025-07-20

How to Cite

Becerra-Suarez, F. L., Jiménez-Fernández, L. J., Ticona-Tapia, E. D., Cárdenas-Gonzáles, J. R., & Bustamante-Quintana, P. H. (2025). SynKGen: A kernel PCA-Based oversampling method for enhanced credit card fraud detection. Revista Científica De Sistemas E Informática, 5(2), e952. https://doi.org/10.51252/rcsi.v5i2.952