pith. sign in

arxiv: 1907.07225 · v1 · pith:AYPHH7OPnew · submitted 2019-07-16 · 💻 cs.LG · cs.SI· stat.ML

DeepTrax: Embedding Graphs of Financial Transactions

Pith reviewed 2026-05-24 20:46 UTC · model grok-4.3

classification 💻 cs.LG cs.SIstat.ML
keywords graph embeddingsbipartite graphstransaction networkslink predictionfraud detectionrepresentation learningfinancial data
0
0 comments X

The pith

Graph representation learning on bipartite transaction graphs produces entity embeddings that support accurate link prediction and downstream fraud detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats financial transactions as edges in a large, sparse bipartite graph connecting accounts and merchants. It applies representation learning to map these entities into vector space while preserving the graph's topological properties. The resulting embeddings achieve strong performance on link prediction measured by AUC and F1 score. Qualitative checks show that the vectors reflect intuitive semantic similarities between entities. These vectors can then serve directly as input features for business machine learning tasks such as fraud detection.

Core claim

Representation learning applied to bipartite credit-card transaction graphs yields account and merchant embeddings that preserve topological structure. The method, trained on internal transaction datasets, produces vectors whose quality is confirmed by high link-prediction AUC and F1 scores together with visualizations that display expected semantic groupings. The same vectors function as ready-made features that improve machine-learning models for downstream tasks including fraud detection.

What carries the argument

A graph embedding framework, inspired by standard node-embedding techniques, that maps entities in a bipartite transaction graph to Euclidean vectors so that transaction edges correspond to proximity in vector space.

If this is right

  • Link prediction between accounts and merchants reaches high AUC and F1 scores.
  • Entity vectors display intuitive semantic similarity in visualizations and nearest-neighbor checks.
  • The vectors serve as input features that support improved performance in fraud-detection models.
  • The same embedding approach scales to graphs containing millions or billions of transaction edges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar embeddings could replace hand-crafted features in other sparse bipartite settings such as payment networks or supply-chain graphs.
  • Periodic retraining on fresh transaction windows would likely be needed to keep the vectors current with changing merchant and account behavior.
  • The vectors might enable transfer of learned structure across institutions if privacy-preserving alignment techniques are applied.

Load-bearing premise

The learned vectors capture patterns that remain useful when the same embedding method is applied to new transaction data or to downstream tasks rather than fitting only the training graphs.

What would settle it

Embeddings from the method fail to raise AUC on a held-out link-prediction task drawn from a later time period or produce no measurable lift when added as features to an existing fraud-detection classifier.

Figures

Figures reproduced from arXiv: 1907.07225 by Anish Khazane, Antonia Gogoglou, C. Bayan Bruss, Jonathan Rider, Keegan E. Hines, Richard Serpe.

Figure 1
Figure 1. Figure 1: Model pipeline, from data pre-processing to training with Skip-gram. Using stringent time windows and pairs of transaction pairs (example time [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of embedding dimensionality. While increasing embedding [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example clusters found in a two dimensional t-SNE projection of brand-level embeddings. Best viewed digitally. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Merchant embeddings tend to encode typical price point of goods. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Financial transactions can be considered edges in a heterogeneous graph between entities sending money and entities receiving money. For financial institutions, such a graph is likely large (with millions or billions of edges) while also sparsely connected. It becomes challenging to apply machine learning to such large and sparse graphs. Graph representation learning seeks to embed the nodes of a graph into a Euclidean vector space such that graph topological properties are preserved after the transformation. In this paper, we present a novel application of representation learning to bipartite graphs of credit card transactions in order to learn embeddings of account and merchant entities. Our framework is inspired by popular approaches in graph embeddings and is trained on two internal transaction datasets. This approach yields highly effective embeddings, as quantified by link prediction AUC and F1 score. Further, the resulting entity vectors retain intuitive semantic similarity that is explored through visualizations and other qualitative analyses. Finally, we show how these embeddings can be used as features in downstream machine learning business applications such as fraud detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces DeepTrax, a graph embedding framework for learning vector representations of accounts and merchants in large, sparse bipartite transaction graphs. It trains on two internal credit-card datasets using an approach inspired by existing graph embedding methods, reports strong link-prediction performance via AUC and F1, demonstrates semantic coherence through visualizations, and positions the embeddings as features for downstream tasks such as fraud detection.

Significance. If the embeddings prove transferable, the work could supply a practical feature-generation pipeline for financial ML on transaction graphs. The absence of external benchmarks, quantitative downstream evaluation, and comparison against strong baselines (raw aggregates or other embeddings) limits the strength of the generalization claim.

major comments (3)
  1. [Abstract] Abstract and evaluation sections: the central claim that embeddings are 'highly effective' and useful for fraud detection rests solely on link-prediction AUC/F1 plus qualitative visualizations performed on the same internal training graphs; no quantitative ablation measuring lift in a fraud classifier (versus transaction statistics or other embeddings) is reported.
  2. [Abstract] Evaluation protocol: training and test splits are drawn from the authors' internal datasets with no external public benchmarks or held-out cross-institution data; this raises the risk that reported AUC/F1 reflect dataset-specific artifacts rather than transferable entity semantics.
  3. [Abstract] Downstream utility: the statement that embeddings 'can be used as features in downstream machine learning business applications such as fraud detection' is presented without any numerical results, baseline comparisons, or ablation study, making the claim unsupported by the supplied evidence.
minor comments (2)
  1. The method description states it is 'inspired by popular approaches' but does not specify the exact base model, loss function, or negative-sampling details (e.g., embedding dimension and negative sampling rate are free parameters).
  2. No error bars, statistical significance tests, or sensitivity analysis to hyper-parameters are mentioned for the link-prediction results.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback. The comments correctly identify areas where the current manuscript's claims on downstream utility require stronger quantitative support. We address each point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation sections: the central claim that embeddings are 'highly effective' and useful for fraud detection rests solely on link-prediction AUC/F1 plus qualitative visualizations performed on the same internal training graphs; no quantitative ablation measuring lift in a fraud classifier (versus transaction statistics or other embeddings) is reported.

    Authors: We agree that the manuscript would be strengthened by quantitative evidence for the fraud detection use case. In the revision we will add an ablation experiment that measures the performance lift obtained by adding the learned embeddings as features to a fraud classifier, with comparisons against baselines using raw transaction aggregates. revision: yes

  2. Referee: [Abstract] Evaluation protocol: training and test splits are drawn from the authors' internal datasets with no external public benchmarks or held-out cross-institution data; this raises the risk that reported AUC/F1 reflect dataset-specific artifacts rather than transferable entity semantics.

    Authors: Financial transaction data are subject to strict privacy and proprietary constraints that preclude the use of public external benchmarks or cross-institution held-out sets. We will expand the manuscript with additional dataset statistics and an explicit discussion of this limitation and its implications for generalizability. revision: partial

  3. Referee: [Abstract] Downstream utility: the statement that embeddings 'can be used as features in downstream machine learning business applications such as fraud detection' is presented without any numerical results, baseline comparisons, or ablation study, making the claim unsupported by the supplied evidence.

    Authors: We concur that the claim currently lacks supporting numerical evidence. As indicated in the response to the first comment, the planned revision will include the requested quantitative ablation and baseline comparisons to substantiate the downstream utility statement. revision: yes

standing simulated objections not resolved
  • Provision of external public benchmarks or cross-institution data, which is precluded by privacy and proprietary restrictions on the transaction datasets.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an application of existing graph embedding methods to internal transaction graphs, with evaluation via standard link-prediction AUC/F1 on (presumably split) data from those graphs and qualitative analysis of embeddings. No equations, derivations, or claims are shown that reduce by construction to fitted inputs renamed as predictions, nor do any self-citations serve as the sole justification for load-bearing uniqueness or ansatzes. The downstream fraud-detection mention is framed as a demonstration of use rather than a quantitative result derived from the same fit. The work is therefore self-contained as an empirical study without the enumerated circular patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the standard assumption that Euclidean embeddings preserve graph topology, plus the modeling choice to treat transactions as bipartite edges; both are inherited from prior graph embedding literature rather than derived here.

free parameters (2)
  • embedding dimension
    Standard hyperparameter in representation learning models; value chosen to fit the scale of the internal graphs.
  • negative sampling rate
    Common training hyperparameter for link prediction objectives; tuned on the private data.
axioms (1)
  • domain assumption Graph topological properties are preserved after mapping nodes to Euclidean vectors.
    Invoked when the authors state that the framework is inspired by popular graph embedding approaches.

pith-pipeline@v0.9.0 · 5716 in / 1209 out tokens · 22000 ms · 2026-05-24T20:46:53.909653+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift

    cs.LG 2026-04 unverdicted novelty 7.0

    Under strict inductive protocols without temporal leakage, random forests on raw features achieve higher F1 scores than GNNs on Bitcoin fraud detection, and real graph structure can underperform random wiring.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    COSINE: community-preserving social network embedding from information diffusion cascades,

    Y . Zhang, T. Lyu, and Y . Zhang, “COSINE: community-preserving social network embedding from information diffusion cascades,” in Proceed- ings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Inte...

  2. [2]

    Scalable graph embedding for asymmetric proximity,

    C. Zhou, Y . Liu, X. Liu, Z. Liu, and J. Gao, “Scalable graph embedding for asymmetric proximity,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 2017, pp. 2942–2948

  3. [3]

    DeepWalk: Online Learning of Social Representations

    B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” CoRR, vol. abs/1403.6652, 2014

  4. [4]

    struc2vec: Learning node representations from structural identity,

    L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo, “struc2vec: Learning node representations from structural identity,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017 , 2017, pp. 385–394. [Online]. Available: https://doi.org/10.1145/3097983.3098061

  5. [5]

    Watch your step: Learning node embeddings via graph attention,

    S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi, “Watch your step: Learning node embeddings via graph attention,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neu- ral Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr ´eal, Canada., 2018, pp. 9198–9208

  6. [6]

    node2vec: Scalable Feature Learning for Networks

    A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” CoRR, vol. abs/1607.00653, 2016. [Online]. Available: http://arxiv.org/abs/1607.00653

  7. [7]

    Structural deep network embedding,

    D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016 , 2016, pp. 1225–1234

  8. [8]

    #tagspace: Semantic embeddings from hashtags,

    J. Weston, S. Chopra, and K. Adams, “#tagspace: Semantic embeddings from hashtags,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25- 29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, pp. 1822–1827

  9. [9]

    Grarep: Learning graph representations with global structural information,

    S. Cao, W. Lu, and Q. Xu, “Grarep: Learning graph representations with global structural information,” in Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19 - 23, 2015 , 2015, pp. 891–900

  10. [10]

    Don’t walk, skip!: Online learning of multi-scale network embeddings,

    B. Perozzi, V . Kulkarni, H. Chen, and S. Skiena, “Don’t walk, skip!: Online learning of multi-scale network embeddings,” in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, Sydney, Australia, July 31 - August 03, 2017, 2017, pp. 258–265

  11. [11]

    Metapath2vec: Scalable representation learning for heterogeneous networks,

    Y . Dong, N. V . Chawla, and A. Swami, “Metapath2vec: Scalable representation learning for heterogeneous networks,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. KDD ’17. New York, NY , USA: ACM, 2017, pp. 135–144. [Online]. Available: http://doi.acm.org/10. 1145/3097983.3098036

  12. [12]

    Representation learning for attributed multiplex heterogeneous network,

    Y . Cen, X. Zou, J. Zhang, H. Yang, J. Zhou, and J. Tang, “Representation learning for attributed multiplex heterogeneous network,” 2019

  13. [13]

    Inductive Representation Learning on Large Graphs

    W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” CoRR, vol. abs/1706.02216, 2017. [Online]. Available: http://arxiv.org/abs/1706.02216

  14. [14]

    Noise-contrastive estimation: A new estimation principle for unnormalized statistical models,

    M. Gutmann and A. Hyvrinen, “Noise-contrastive estimation: A new estimation principle for unnormalized statistical models,” Proceedings of Machine Learning Research , vol. 9, pp. 297–304, 13–15 May 2010. [Online]. Available: http://proceedings.mlr.press/v9/gutmann10a.html

  15. [15]

    Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics,

    M. U. Gutmann and A. Hyv ¨arinen, “Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 307–361, Feb. 2012. [Online]. Available: http://dl.acm.org/citation.cfm?id=2503308.2188396

  16. [16]

    Real-time personalization using embeddings for search ranking at airbnb,

    M. Grbovic and H. Cheng, “Real-time personalization using embeddings for search ranking at airbnb,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018 , 2018, pp. 311–320

  17. [17]

    Pytorch-biggraph: A large-scale graph embedding system

    A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich, “Pytorch-biggraph: A large-scale graph embedding system.” in Proceedings of the 2nd SysML Conference , 2019

  18. [18]

    Billion-scale commodity embedding for e-commerce recommendation in alibaba,

    J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee, “Billion-scale commodity embedding for e-commerce recommendation in alibaba,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018 , 2018, pp. 839–848

  19. [19]

    Graph convolutional neural networks for web-scale rec- ommender systems,

    R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale rec- ommender systems,” in Proceedings of the 24th ACM SIGKDD Interna- tional Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018 , 2018, pp. 974–983

  20. [20]

    Large-scale learnable graph convolutional networks,

    H. Gao, Z. Wang, and S. Ji, “Large-scale learnable graph convolutional networks,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1416–1424

  21. [21]

    client2vec: Towards Systematic Baselines for Banking Applications

    L. Baldassini and J. A. R. Serrano, “client2vec: Towards systematic baselines for banking applications,” CoRR, vol. abs/1802.04198, 2018. [Online]. Available: http://arxiv.org/abs/1802.04198

  22. [22]

    Graph convolutional neural networks for web-scale rec- ommender systems,

    R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale rec- ommender systems,” in Proceedings of the 24th ACM SIGKDD Inter- national Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 974–983

  23. [23]

    Network-efficient distributed 8 word2vec training system for large vocabularies,

    E. Ordentlich, L. Yang, A. Feng, P. Cnudde, M. Grbovic, N. Djuric, V . Radosavljevic, and G. Owens, “Network-efficient distributed 8 word2vec training system for large vocabularies,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2016, pp. 1139–1148

  24. [24]

    Distributed Representations of Words and Phrases and their Compositionality

    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” CoRR, vol. abs/1310.4546, 2013. [Online]. Available: http://arxiv.org/abs/1310. 4546

  25. [25]

    BERT: pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short ...

  26. [26]

    XLNet: Generalized Autoregressive Pretraining for Language Understanding

    Z. Yang, Z. Dai, Y . Yang, J. G. Carbonell, R. Salakhutdinov, and Q. V . Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” CoRR, vol. abs/1906.08237, 2019. [Online]. Available: http://arxiv.org/abs/1906.08237

  27. [27]

    Efficient estimation of word representations in vector space,

    T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings , 2013

  28. [28]

    Comparison of multiple ran- dom walks strategies for searching networks,

    Z. Zheng, H. Wang, S. Gao, and G. Wang, “Comparison of multiple ran- dom walks strategies for searching networks,” Mathematical Problems in Engineering, vol. 2013, 2013

  29. [29]

    Linguistic regularities in sparse and explicit word representations,

    O. Levy and Y . Goldberg, “Linguistic regularities in sparse and explicit word representations,” in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, Maryland, USA, June 26-27, 2014 , 2014, pp. 171–180

  30. [30]

    Linguistic regularities in continuous space word representations,

    T. Mikolov, W. Yih, and G. Zweig, “Linguistic regularities in continuous space word representations,” in Human Language Technologies: Confer- ence of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 9-14, 2013, Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA , 2013, pp. 746–751