pith. sign in

arxiv: 1907.03355 · v1 · pith:5W54ENAJnew · submitted 2019-07-07 · 💻 cs.LG · q-fin.RM

Improving Detection of Credit Card Fraudulent Transactions using Generative Adversarial Networks

Pith reviewed 2026-05-25 01:09 UTC · model grok-4.3

classification 💻 cs.LG q-fin.RM
keywords Generative Adversarial Networkscredit card fraudoversamplingimbalanced classificationWasserstein GANfraud detection
0
0 comments X

The pith

Training GANs on fraudulent credit card transactions improves classifier detection of real fraud.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that Generative Adversarial Networks can serve as an oversampling tool by creating synthetic fraudulent transactions to augment imbalanced training sets for fraud classifiers. A generator and discriminator compete so that the generator produces data realistic enough to strengthen downstream models. Experiments identify the Wasserstein-GAN variant as more stable during training and better at yielding usable samples than standard GANs. Conditional GANs that assign labels through k-means clustering do not reliably outperform their non-conditional counterparts. A reader would care because credit card fraud remains a high-volume problem where even modest gains in detection reduce losses for issuers and cardholders.

Core claim

The authors claim that training GANs on a set of credit card fraudulent transactions generates artificial data that improves the discriminatory power of classifiers, with Wasserstein-GAN proving more stable and realistic than other GAN variants while conditional versions using k-means labels do not necessarily improve performance.

What carries the argument

Generative Adversarial Networks (GANs) used for oversampling, in which a generator creates synthetic fraud examples to outwit a discriminator and thereby augment the minority class for training classifiers.

Load-bearing premise

The synthetic transactions produced by the trained GAN must be realistic enough to improve downstream classifier performance without introducing artifacts or distribution shift that would degrade accuracy on real test data.

What would settle it

An experiment in which classifiers trained on the real-plus-GAN data achieve lower precision, recall, or AUC on a held-out set of real transactions than classifiers trained only on the original real data.

Figures

Figures reproduced from arXiv: 1907.03355 by Hung Ba.

Figure 1
Figure 1. Figure 1: : Loss of Generative Adversarial Networks [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: : XGB Loss [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: : Additional Data vs Outsample Performance [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ROC Curves AUC AUPRC Recall Precison F1-Score Rank None 0.933 0.745 0.581 0.908 0.680 3.8 ROS 0.949 0.750 0.882 0.067 0.123 3.2 SMOTE 0.944 0.750 0.876 0.062 0.113 4.4 ADASYN 0.941 0.730 0.901 0.018 0.035 5.2 GAN 0.940 0.637 0.502 0.777 0.501 5.6 CGAN 0.901 0.631 0.564 0.643 0.444 6.4 WGAN 0.942 0.723 0.803 0.500 0.583 4.2 WCGAN 0.948 0.717 0.642 0.852 0.710 3.2 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

In this study, we employ Generative Adversarial Networks as an oversampling method to generate artificial data to assist with the classification of credit card fraudulent transactions. GANs is a generative model based on the idea of game theory, in which a generator G and a discriminator D are trying to outsmart each other. The objective of the generator is to confuse the discriminator. The objective of the discriminator is to distinguish the instances coming from the generator and the instances coming from the original dataset. By training GANs on a set of credit card fraudulent transactions, we are able to improve the discriminatory power of classifiers. The experiment results show that the Wasserstein-GAN is more stable in training and produce more realistic fraudulent transactions than the other GANs. On the other hand, the conditional version of GANs in which labels are set by k-means clustering does not necessarily improve the non-conditional versions of GANs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that Generative Adversarial Networks can be used as an oversampling method for credit card fraud detection by generating synthetic fraudulent transactions. Training GANs (including Wasserstein-GAN and conditional k-means variants) on fraud data is asserted to improve downstream classifier discriminatory power. The abstract states that WGAN training is more stable and yields more realistic samples than other GAN variants, while conditional k-means labeling does not necessarily improve results.

Significance. If the central empirical claim were supported by quantitative evidence, the work could offer a data-augmentation approach for severe class imbalance in fraud detection that avoids some artifacts of traditional methods such as SMOTE. No machine-checked proofs, reproducible code, or parameter-free derivations are present.

major comments (2)
  1. [Abstract] Abstract: The claim that 'we are able to improve the discriminatory power of classifiers' is unsupported by any reported metrics (AUC, F1, PR-AUC, or accuracy deltas), baseline comparisons (e.g., SMOTE, ADASYN, or no oversampling), statistical significance tests, or protocol details such as train/test splits and confirmation that the test set contains only real transactions unseen by the GAN.
  2. [Abstract] Abstract: The statements that 'the Wasserstein-GAN is more stable in training and produce more realistic fraudulent transactions' and that 'conditional version of GANs ... does not necessarily improve' lack any quantitative backing such as training-loss curves, distribution-distance metrics, or downstream classifier performance on held-out real data.
minor comments (1)
  1. [Abstract] Grammatical issues: 'GANs is a generative model' should read 'GANs are generative models'; 'produce more realistic' should be 'produces more realistic'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments on our manuscript. We address the major comments point-by-point below and will revise the abstract accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'we are able to improve the discriminatory power of classifiers' is unsupported by any reported metrics (AUC, F1, PR-AUC, or accuracy deltas), baseline comparisons (e.g., SMOTE, ADASYN, or no oversampling), statistical significance tests, or protocol details such as train/test splits and confirmation that the test set contains only real transactions unseen by the GAN.

    Authors: We agree the abstract is too terse and does not include the requested quantitative details. The body of the manuscript reports classifier performance improvements when using the generated samples, but we will revise the abstract to explicitly state key metrics (AUC/F1 deltas versus no-oversampling and SMOTE baselines), note the train/test protocol, and confirm the test set consists of real unseen transactions. revision: yes

  2. Referee: [Abstract] Abstract: The statements that 'the Wasserstein-GAN is more stable in training and produce more realistic fraudulent transactions' and that 'conditional version of GANs ... does not necessarily improve' lack any quantitative backing such as training-loss curves, distribution-distance metrics, or downstream classifier performance on held-out real data.

    Authors: We acknowledge that the abstract lacks explicit quantitative backing for the stability and realism claims. The manuscript body compares training behavior and downstream performance across GAN variants. We will revise the abstract to reference these comparisons (e.g., lower loss variance for WGAN and held-out classifier metrics) and qualify the conditional-GAN observation. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical claim with no derivation or self-referential steps.

full rationale

The paper reports an experimental application of GANs (including WGAN) for oversampling fraud data to improve downstream classifiers. No mathematical derivation, equations, or fitted parameters are presented as 'predictions.' No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim is tested via classifier performance on real data and does not reduce to its own inputs by construction. This is a standard empirical ML paper whose validity rests on reported metrics and protocols, not on any circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that GAN-generated samples lie close enough to the real fraud distribution to aid classification. No free parameters or invented entities are stated in the abstract.

axioms (1)
  • domain assumption GAN training on the fraud subset produces samples that are distributionally close enough to real fraud to improve classifier decision boundaries.
    Invoked when the abstract states that training GANs improves discriminatory power.

pith-pipeline@v0.9.0 · 5678 in / 1194 out tokens · 24886 ms · 2026-05-25T01:09:54.512690+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 3 internal anchors

  1. [1]

    Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875 [cs, stat]. arXiv: 1701.07875

  2. [2]

    V., Bowyer, K

    Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research , 16:321–357

  3. [3]

    and Guestrin, C

    Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 , pages 785–794, San Francisco, California, USA. ACM Press

  4. [4]

    and Bacao, F

    Douzas, G. and Bacao, F. (2018). Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications , 91:464–471

  5. [5]

    Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

    Fedus, W., Rosca, M., Lakshminarayanan, B., Dai, A. M., Mohamed, S., and Goodfellow, I. (2017). Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step. arXiv:1710.08446 [cs, stat] . arXiv: 1710.08446

  6. [6]

    Gauthier, J. (2014). Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester

  7. [7]

    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. In Ghahramani, Z., Welling, M.,

  8. [8]

    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat] . arXiv: 1704.00028

  9. [9]

    Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239

  10. [10]

    and Garcia, E

    He, H. and Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering , 21(9):1263–1284

  11. [11]

    and Torelli, N

    Menardi, G. and Torelli, N. (2014). Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery , 28(1):92–122

  12. [12]

    D., Caelen, O., Johnson, R

    Pozzolo, A. D., Caelen, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. In 2015 IEEE Symposium Series on Computational Intelligence, pages 159–166. 10