pith. sign in

arxiv: 2507.14785 · v2 · submitted 2025-07-20 · 💻 cs.LG · cs.AI

Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs

Pith reviewed 2026-05-19 03:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords large language modelsmoney laundering detectionfinancial graphsin-context learninganti-money launderinggraph serializationexplainable detectionsynthetic scenarios
0
0 comments X

The pith

Large language models can flag suspicious financial patterns by reasoning over text versions of graph neighborhoods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether LLMs can act as reasoning engines for money laundering detection by taking local neighborhoods from financial graphs, turning them into structured text, and prompting the model with a few examples to judge suspiciousness and explain why. It uses synthetic scenarios that mimic typical laundering behaviors such as layered transactions and unusual entity links. A sympathetic reader would care because this offers a path toward more explainable detection tools that speak in natural language rather than opaque scores. The approach is presented as an initial demonstration that LLMs can mimic analyst-style checks on graph data.

Core claim

Using a lightweight pipeline that extracts k-hop neighborhoods around entities of interest from a financial knowledge graph, serializes those neighborhoods into structured text, and applies few-shot in-context learning, large language models can assess the suspiciousness of activity and generate coherent justifications that highlight red flags in synthetic AML scenarios.

What carries the argument

The pipeline that retrieves k-hop neighborhoods around entities of interest, serializes them into structured text, and prompts an LLM via few-shot in-context learning to assess suspiciousness and generate justifications.

If this is right

  • LLMs can highlight specific red flags such as unusual transaction chains in financial networks.
  • The models produce coherent natural-language explanations alongside their suspiciousness assessments.
  • This method supports the development of explainable, language-driven tools for financial crime analytics.
  • It provides an initial demonstration that in-context learning can handle localized graph reasoning tasks in AML.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same serialization step could be combined with existing graph databases to support ongoing monitoring of transaction flows.
  • This style of prompting might be tested as a complement to rule-based systems by adding contextual judgment on edge cases.
  • Performance on synthetic data suggests experiments that measure how well the approach generalizes when the underlying graph contains noise or missing links.

Load-bearing premise

Synthetic AML scenarios capture the key structural and behavioral traits of real money laundering and that turning graph neighborhoods into linear text keeps the information needed for reliable detection.

What would settle it

Apply the same pipeline to a collection of real financial transactions labeled by experts as laundering or non-laundering and check whether the LLM outputs match the expert labels at rates clearly above random guessing.

Figures

Figures reproduced from arXiv: 2507.14785 by Erfan Pirmorad.

Figure 1
Figure 1. Figure 1: Pipeline Overview for LLM-based AML Analysis. We evaluate this pipeline using synthetic AML scenarios that re￾arXiv:2507.14785v1 [cs.LG] 20 Jul 2025 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Canonical money laundering patterns used as reasoning demonstrations. Image adapted from the IBM AML dataset paper. To help the LLM learn to reason over financial subgraphs, our ap￾proach uses a few-shot prompting setup where each demonstration example showcases one of these laundering patterns. Each example consists of (i) a brief serialized subgraph in the textual format de￾scribed in Section 2.2, and (i… view at source ↗
Figure 2
Figure 2. Figure 2: Snapshot of a sample 2-hop subgraph centered around a transaction between acct_80FF89190 and acct_810147BB0. This extraction strategy frames the detection task as an edge￾centric classification problem over the transaction graph, where the objective is to predict whether the given transaction edge represents money laundering activity. 2.2 Textual Serialization The extracted subgraph is then serialized into… view at source ↗
read the original abstract

The complexity and interconnectivity of entities involved in money laundering demand investigative reasoning over graph-structured data. This paper explores the use of large language models (LLMs) as reasoning engines over localized subgraphs extracted from a financial knowledge graph. We propose a lightweight pipeline that retrieves k-hop neighborhoods around entities of interest, serializes them into structured text, and prompts an LLM via few-shot in-context learning to assess suspiciousness and generate justifications. Using synthetic anti-money laundering (AML) scenarios that reflect common laundering behaviors, we show that LLMs can emulate analyst-style logic, highlight red flags, and provide coherent explanations. While this study is exploratory, it illustrates the potential of LLM-based graph reasoning in AML and lays groundwork for explainable, language-driven financial crime analytics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript explores using large language models as reasoning engines over localized subgraphs in financial knowledge graphs for anti-money laundering detection. It proposes a pipeline that retrieves k-hop neighborhoods around entities of interest, serializes them into structured text, and applies few-shot in-context learning to prompt LLMs to assess suspiciousness and generate justifications. The approach is demonstrated qualitatively on synthetic AML scenarios that reflect common laundering behaviors, showing that LLMs can emulate analyst-style logic by highlighting red flags and providing coherent explanations. The work is framed as exploratory and illustrative rather than claiming real-world deployment or superiority over graph algorithms.

Significance. If the qualitative observations hold under more rigorous testing, the paper provides a lightweight, explainable alternative to traditional graph algorithms for financial crime analytics by leveraging natural language reasoning over serialized graph data. It offers a direct empirical demonstration on synthetic cases without fitted parameters or self-referential predictions, laying groundwork for language-driven AML tools. However, the reliance on synthetic data and absence of quantitative metrics limit its immediate significance to a proof-of-concept illustration.

major comments (2)
  1. [Experimental Results] The experimental results section presents only qualitative observations on synthetic data with no quantitative metrics (e.g., precision, recall, or agreement with expert labels), baseline comparisons against graph-based detectors, or error analysis. This leaves the central claim of LLMs emulating analyst-style detection without measurable support, making it difficult to assess reliability even within the synthetic setting.
  2. [Proposed Pipeline] The weakest assumption—that serializing k-hop neighborhoods into linear text preserves the structural and behavioral information needed for reliable detection—is not tested; no ablation on serialization methods or comparison to native graph representations is provided, which is load-bearing for the pipeline's validity.
minor comments (2)
  1. [Methods] Add concrete examples of the serialized text format and the few-shot prompt templates in the methods section to improve reproducibility.
  2. [Discussion] Expand the limitations paragraph to explicitly discuss the gap between synthetic AML scenarios and real-world financial graphs, including potential information loss in text serialization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our exploratory manuscript. We address the major comments below, maintaining the paper's framing as an initial demonstration of LLM-based reasoning on serialized financial graphs while incorporating clarifications and planned additions where appropriate.

read point-by-point responses
  1. Referee: [Experimental Results] The experimental results section presents only qualitative observations on synthetic data with no quantitative metrics (e.g., precision, recall, or agreement with expert labels), baseline comparisons against graph-based detectors, or error analysis. This leaves the central claim of LLMs emulating analyst-style detection without measurable support, making it difficult to assess reliability even within the synthetic setting.

    Authors: We agree that the current evaluation is strictly qualitative and does not include quantitative metrics or baselines. The manuscript explicitly positions the work as exploratory and illustrative, using synthetic AML scenarios to show that LLMs can highlight red flags and generate coherent justifications without fitted parameters or self-referential predictions. Because the scenarios are hand-crafted to reflect common laundering behaviors rather than drawn from a labeled corpus, we did not compute precision/recall or run graph-algorithm baselines. In revision we will add a dedicated error-analysis subsection that walks through representative failure cases (e.g., missed multi-hop cycles) and a forward-looking paragraph outlining how quantitative agreement studies could be performed once expert-labeled synthetic graphs become available. revision: partial

  2. Referee: [Proposed Pipeline] The weakest assumption—that serializing k-hop neighborhoods into linear text preserves the structural and behavioral information needed for reliable detection—is not tested; no ablation on serialization methods or comparison to native graph representations is provided, which is load-bearing for the pipeline's validity.

    Authors: We acknowledge that the serialization step is a core design choice whose information-preserving properties are not empirically validated in the current version. Our intent was to demonstrate a lightweight, text-only pipeline that directly exploits the in-context reasoning strengths of LLMs; therefore we selected a single structured serialization format and did not perform ablations or graph-native comparisons. We will revise the manuscript to include an explicit limitations paragraph that states this assumption, describes the serialization template used, and outlines concrete future experiments (different serialization orders, graph-to-text variants, and hybrid GNN-LLM architectures) that would test the assumption more rigorously. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an exploratory empirical demonstration on synthetic AML scenarios, serializing k-hop graph neighborhoods into text and prompting LLMs via few-shot in-context learning to identify red flags and generate explanations. No equations, derivations, fitted parameters, or predictions appear in the described pipeline. The central claim is scoped to illustrative behavior on synthetic data and does not rely on self-citations, uniqueness theorems, or any reduction of outputs to inputs by construction. The work is therefore self-contained as a direct empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical exploration that introduces no new mathematical axioms, free parameters, or postulated entities. It relies on standard LLM prompting practices and synthetic scenario construction.

pith-pipeline@v0.9.0 · 5655 in / 1107 out tokens · 44530 ms · 2026-05-19T03:52:39.700468+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

  1. [1]

    Altman, J

    E. Altman, J. Blanuša, L. von Niederhäusern, B. Egressy, A. Anghel, and K. Atasu. Realistic synthetic financial transactions for anti-money laun- dering models. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Datasets and Benchmarks Track, 2023

  2. [2]

    Z. Chen, L. D. Khoa, E. N. Teoh, A. Nazir, E. K. Karuppiah, and K. S. Lam. Machine learning techniques for anti-money laundering (aml) solutions in suspicious transaction detection: a review. Knowl. Inf. Syst., 57(2):245–285, Nov. 2018. ISSN 0219-1377. doi: 10.1007/ s10115-017-1144-z

  3. [3]

    Domashova and N

    J. Domashova and N. Mikhailina. Usage of machine learning methods for early detection of money laundering schemes. Procedia Computer Sci- ence, 190:184–192, 2021. ISSN 1877-0509. doi: https://doi.org/10.1016/ j.procs.2021.06.033. 2020 Annual International Conference on Brain- Inspired Cognitive Architectures for Artificial Intelligence: Eleventh An- nual ...

  4. [4]

    Fatemi, J

    B. Fatemi, J. J. Halcrow, and B. Perozzi. Talk like a graph: Encoding graphs for large language models. ArXiv, abs/2310.04560, 2023

  5. [5]

    X. V . Li and F. S. Passino. Findkg: Dynamic knowledge graphs with large language models for detecting global trends in financial markets. In International Conference on AI in Finance, 2024

  6. [6]

    A lightweight approach for network intrusion detection in industrial cyber-physical systems based on knowledge distillation and deep metric learning.Expert Syst

    S. Motie and B. Raahemi. Financial fraud detection using graph neu- ral networks: A systematic review. Expert Systems with Applications , 240:122156, 2024. ISSN 0957-4174. doi: https://doi.org/10.1016/j.eswa. 2023.122156

  7. [7]

    Chudnovsky and S

    B. Oztas, D. Cetinkaya, F. Adedoyin, M. Budka, G. Aksu, and H. Dogan. Transaction monitoring in anti-money laundering: A qualitative analysis and points of view from industry. Future Generation Computer Systems, 159:161–171, 2024. ISSN 0167-739X. doi: https://doi.org/10.1016/j. future.2024.05.027

  8. [8]

    S. Pan, L. Luo, Y . Wang, C. Chen, J. Wang, and X. Wu. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36:3580–3599, 2023

  9. [9]

    Perozzi, B

    B. Perozzi, B. Fatemi, D. Zelle, A. Tsitsulin, M. Kazemi, R. Al-Rfou, and J. J. Halcrow. Let your graph do the talking: Encoding structured data for llms. ArXiv, abs/2402.05862, 2024