Improving Detection of Credit Card Fraudulent Transactions using Generative Adversarial Networks

Hung Ba

arxiv: 1907.03355 · v1 · pith:5W54ENAJnew · submitted 2019-07-07 · 💻 cs.LG · q-fin.RM

Improving Detection of Credit Card Fraudulent Transactions using Generative Adversarial Networks

Hung Ba This is my paper

Pith reviewed 2026-05-25 01:09 UTC · model grok-4.3

classification 💻 cs.LG q-fin.RM

keywords Generative Adversarial Networkscredit card fraudoversamplingimbalanced classificationWasserstein GANfraud detection

0 comments

The pith

Training GANs on fraudulent credit card transactions improves classifier detection of real fraud.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that Generative Adversarial Networks can serve as an oversampling tool by creating synthetic fraudulent transactions to augment imbalanced training sets for fraud classifiers. A generator and discriminator compete so that the generator produces data realistic enough to strengthen downstream models. Experiments identify the Wasserstein-GAN variant as more stable during training and better at yielding usable samples than standard GANs. Conditional GANs that assign labels through k-means clustering do not reliably outperform their non-conditional counterparts. A reader would care because credit card fraud remains a high-volume problem where even modest gains in detection reduce losses for issuers and cardholders.

Core claim

The authors claim that training GANs on a set of credit card fraudulent transactions generates artificial data that improves the discriminatory power of classifiers, with Wasserstein-GAN proving more stable and realistic than other GAN variants while conditional versions using k-means labels do not necessarily improve performance.

What carries the argument

Generative Adversarial Networks (GANs) used for oversampling, in which a generator creates synthetic fraud examples to outwit a discriminator and thereby augment the minority class for training classifiers.

Load-bearing premise

The synthetic transactions produced by the trained GAN must be realistic enough to improve downstream classifier performance without introducing artifacts or distribution shift that would degrade accuracy on real test data.

What would settle it

An experiment in which classifiers trained on the real-plus-GAN data achieve lower precision, recall, or AUC on a held-out set of real transactions than classifiers trained only on the original real data.

Figures

Figures reproduced from arXiv: 1907.03355 by Hung Ba.

**Figure 2.** Figure 2: : XGB Loss [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: : Additional Data vs Outsample Performance [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: ROC Curves AUC AUPRC Recall Precison F1-Score Rank None 0.933 0.745 0.581 0.908 0.680 3.8 ROS 0.949 0.750 0.882 0.067 0.123 3.2 SMOTE 0.944 0.750 0.876 0.062 0.113 4.4 ADASYN 0.941 0.730 0.901 0.018 0.035 5.2 GAN 0.940 0.637 0.502 0.777 0.501 5.6 CGAN 0.901 0.631 0.564 0.643 0.444 6.4 WGAN 0.942 0.723 0.803 0.500 0.583 4.2 WCGAN 0.948 0.717 0.642 0.852 0.710 3.2 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

In this study, we employ Generative Adversarial Networks as an oversampling method to generate artificial data to assist with the classification of credit card fraudulent transactions. GANs is a generative model based on the idea of game theory, in which a generator G and a discriminator D are trying to outsmart each other. The objective of the generator is to confuse the discriminator. The objective of the discriminator is to distinguish the instances coming from the generator and the instances coming from the original dataset. By training GANs on a set of credit card fraudulent transactions, we are able to improve the discriminatory power of classifiers. The experiment results show that the Wasserstein-GAN is more stable in training and produce more realistic fraudulent transactions than the other GANs. On the other hand, the conditional version of GANs in which labels are set by k-means clustering does not necessarily improve the non-conditional versions of GANs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies GAN oversampling to credit-card fraud but the abstract gives no metrics or baselines to show any real improvement.

read the letter

The paper's main move is training GANs on the fraud class to create synthetic examples that help downstream classifiers handle severe imbalance. They compare a standard GAN, WGAN, and a conditional version where labels come from k-means, and they note that WGAN trains more stably and yields more realistic samples while the conditional version adds little. That stability observation is consistent with what is already known about Wasserstein training, so the domain application is at least reasonable on that narrow point. The work is straightforward and does not claim a new algorithm. The central problem is that the abstract asserts improved classifier performance without any numbers, no AUC or PR-AUC deltas, no comparison to SMOTE or other established oversamplers, and no description of the train-test split or whether the test set stayed entirely real. The key assumption that the generated samples are close enough to real fraud to help without introducing artifacts is therefore untested in the text provided. If the full paper contains those experiments with proper controls and they hold up, the note could be useful to practitioners who already work on fraud detection pipelines and want to try data augmentation. A reader looking for new theory or broad methodological advances will not find much. I would bring the full version to a reading group only if the results section is solid; otherwise it stays too thin. I would not cite it on the basis of the abstract. It could merit peer review if the experiments are reproducible and include the missing baselines, since the idea is simple enough for referees to evaluate quickly.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that Generative Adversarial Networks can be used as an oversampling method for credit card fraud detection by generating synthetic fraudulent transactions. Training GANs (including Wasserstein-GAN and conditional k-means variants) on fraud data is asserted to improve downstream classifier discriminatory power. The abstract states that WGAN training is more stable and yields more realistic samples than other GAN variants, while conditional k-means labeling does not necessarily improve results.

Significance. If the central empirical claim were supported by quantitative evidence, the work could offer a data-augmentation approach for severe class imbalance in fraud detection that avoids some artifacts of traditional methods such as SMOTE. No machine-checked proofs, reproducible code, or parameter-free derivations are present.

major comments (2)

[Abstract] Abstract: The claim that 'we are able to improve the discriminatory power of classifiers' is unsupported by any reported metrics (AUC, F1, PR-AUC, or accuracy deltas), baseline comparisons (e.g., SMOTE, ADASYN, or no oversampling), statistical significance tests, or protocol details such as train/test splits and confirmation that the test set contains only real transactions unseen by the GAN.
[Abstract] Abstract: The statements that 'the Wasserstein-GAN is more stable in training and produce more realistic fraudulent transactions' and that 'conditional version of GANs ... does not necessarily improve' lack any quantitative backing such as training-loss curves, distribution-distance metrics, or downstream classifier performance on held-out real data.

minor comments (1)

[Abstract] Grammatical issues: 'GANs is a generative model' should read 'GANs are generative models'; 'produce more realistic' should be 'produces more realistic'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments on our manuscript. We address the major comments point-by-point below and will revise the abstract accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'we are able to improve the discriminatory power of classifiers' is unsupported by any reported metrics (AUC, F1, PR-AUC, or accuracy deltas), baseline comparisons (e.g., SMOTE, ADASYN, or no oversampling), statistical significance tests, or protocol details such as train/test splits and confirmation that the test set contains only real transactions unseen by the GAN.

Authors: We agree the abstract is too terse and does not include the requested quantitative details. The body of the manuscript reports classifier performance improvements when using the generated samples, but we will revise the abstract to explicitly state key metrics (AUC/F1 deltas versus no-oversampling and SMOTE baselines), note the train/test protocol, and confirm the test set consists of real unseen transactions. revision: yes
Referee: [Abstract] Abstract: The statements that 'the Wasserstein-GAN is more stable in training and produce more realistic fraudulent transactions' and that 'conditional version of GANs ... does not necessarily improve' lack any quantitative backing such as training-loss curves, distribution-distance metrics, or downstream classifier performance on held-out real data.

Authors: We acknowledge that the abstract lacks explicit quantitative backing for the stability and realism claims. The manuscript body compares training behavior and downstream performance across GAN variants. We will revise the abstract to reference these comparisons (e.g., lower loss variance for WGAN and held-out classifier metrics) and qualify the conditional-GAN observation. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical claim with no derivation or self-referential steps.

full rationale

The paper reports an experimental application of GANs (including WGAN) for oversampling fraud data to improve downstream classifiers. No mathematical derivation, equations, or fitted parameters are presented as 'predictions.' No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim is tested via classifier performance on real data and does not reduce to its own inputs by construction. This is a standard empirical ML paper whose validity rests on reported metrics and protocols, not on any circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that GAN-generated samples lie close enough to the real fraud distribution to aid classification. No free parameters or invented entities are stated in the abstract.

axioms (1)

domain assumption GAN training on the fraud subset produces samples that are distributionally close enough to real fraud to improve classifier decision boundaries.
Invoked when the abstract states that training GANs improves discriminatory power.

pith-pipeline@v0.9.0 · 5678 in / 1194 out tokens · 24886 ms · 2026-05-25T01:09:54.512690+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 3 internal anchors

[1]

Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875 [cs, stat]. arXiv: 1701.07875

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

V., Bowyer, K

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artiﬁcial intelligence research , 16:321–357

work page 2002
[3]

and Guestrin, C

Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 , pages 785–794, San Francisco, California, USA. ACM Press

work page 2016
[4]

and Bacao, F

Douzas, G. and Bacao, F. (2018). Eﬀective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications , 91:464–471

work page 2018
[5]

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

Fedus, W., Rosca, M., Lakshminarayanan, B., Dai, A. M., Mohamed, S., and Goodfellow, I. (2017). Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step. arXiv:1710.08446 [cs, stat] . arXiv: 1710.08446

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Gauthier, J. (2014). Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester

work page 2014
[7]

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. In Ghahramani, Z., Welling, M.,

work page 2014
[8]

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat] . arXiv: 1704.00028

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239

work page 2017
[10]

and Garcia, E

He, H. and Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering , 21(9):1263–1284

work page 2009
[11]

and Torelli, N

Menardi, G. and Torelli, N. (2014). Training and assessing classiﬁcation rules with imbalanced data. Data Mining and Knowledge Discovery , 28(1):92–122

work page 2014
[12]

D., Caelen, O., Johnson, R

Pozzolo, A. D., Caelen, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classiﬁcation. In 2015 IEEE Symposium Series on Computational Intelligence, pages 159–166. 10

work page 2015

[1] [1]

Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN. arXiv:1701.07875 [cs, stat]. arXiv: 1701.07875

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

V., Bowyer, K

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artiﬁcial intelligence research , 16:321–357

work page 2002

[3] [3]

and Guestrin, C

Chen, T. and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 , pages 785–794, San Francisco, California, USA. ACM Press

work page 2016

[4] [4]

and Bacao, F

Douzas, G. and Bacao, F. (2018). Eﬀective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications , 91:464–471

work page 2018

[5] [5]

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step

Fedus, W., Rosca, M., Lakshminarayanan, B., Dai, A. M., Mohamed, S., and Goodfellow, I. (2017). Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step. arXiv:1710.08446 [cs, stat] . arXiv: 1710.08446

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Gauthier, J. (2014). Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter semester

work page 2014

[7] [7]

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Nets. In Ghahramani, Z., Welling, M.,

work page 2014

[8] [8]

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat] . arXiv: 1704.00028

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239

work page 2017

[10] [10]

and Garcia, E

He, H. and Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering , 21(9):1263–1284

work page 2009

[11] [11]

and Torelli, N

Menardi, G. and Torelli, N. (2014). Training and assessing classiﬁcation rules with imbalanced data. Data Mining and Knowledge Discovery , 28(1):92–122

work page 2014

[12] [12]

D., Caelen, O., Johnson, R

Pozzolo, A. D., Caelen, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classiﬁcation. In 2015 IEEE Symposium Series on Computational Intelligence, pages 159–166. 10

work page 2015