arxiv: 2512.13040 · v2 · submitted 2025-12-15 · 💻 cs.LG · cs.CL

Recognition: 1 theorem link

· Lean Theorem

Understanding Structured Financial Data with LLMs: A Case Study on Fraud Detection

Xuwei Tan , Yao Ma , Xueru Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 21:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords fraud detectionlarge language modelstabular dataretrieval augmented generationfeature selectionin-context learningfinancial transactionsinterpretable AI

0 comments

The pith

Importance-guided feature reduction and retrieval-augmented examples let LLMs detect fraud in tabular financial data with improved F1/MCC scores and human-readable explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models struggle with raw high-dimensional tabular fraud data due to feature overload, class imbalance, and lack of context. To address this, the authors present FinFRE-RAG, which first selects the most important numeric and categorical features, converts the reduced set into natural language descriptions, and then retrieves label-aware similar past transactions to guide in-context reasoning. Experiments across four public fraud datasets and multiple open-weight LLM families demonstrate clear gains over direct prompting and parity with strong tabular baselines in several cases. The resulting predictions come with explicit rationales that can help analysts review cases and refine detection rules.

Core claim

Across four public fraud datasets and three families of open-weight LLMs, FinFRE-RAG substantially improves F1/MCC over direct prompting and is competitive with strong tabular baselines in several settings. The method applies importance-guided feature reduction to serialize a compact subset of numeric/categorical attributes into natural language and performs retrieval-augmented in-context learning over label-aware, instance-level exemplars, thereby narrowing the performance gap while supplying interpretable rationales.

What carries the argument

FinFRE-RAG, a two-stage pipeline that performs importance-guided feature reduction on tabular inputs followed by retrieval-augmented in-context learning over serialized exemplars.

Load-bearing premise

Importance-guided feature reduction selects a compact subset that preserves all information necessary for accurate fraud classification without discarding critical signals or introducing selection bias.

What would settle it

Running FinFRE-RAG on a held-out fraud dataset in which a known decisive fraud signal is omitted from the top-ranked features would show whether the reduction step loses essential information.

Figures

Figures reproduced from arXiv: 2512.13040 by Xueru Zhang, Xuwei Tan, Yao Ma.

**Figure 3.** Figure 3: MCC vs. number of retrieved transactions [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 2.** Figure 2: MCC vs. number of selected features k. RQ2: How many features or relevant transactions are needed for LLMs to understand and detect the fraud? We first conduct experiments to study the impacts of the feature-reduction parameter k and the number of retrieved transactions n. We vary the number of retained attributes from 5 to 25 and report MCC on the CCF and IEEECIS datasets, both of which contain more th… view at source ↗

**Figure 4.** Figure 4: Example responses from GPT-OSS-20B without and with FinFRE-RAG. With in-context learning on [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Detecting fraud in financial transactions typically relies on tabular models that demand heavy feature engineering to handle high-dimensional data and offer limited interpretability, making it difficult for humans to understand predictions. Large Language Models (LLMs), in contrast, can produce human-readable explanations and facilitate feature analysis, potentially reducing the manual workload of fraud analysts and informing system refinements. However, they perform poorly when applied directly to tabular fraud detection due to the difficulty of reasoning over many features, the extreme class imbalance, and the absence of contextual information. To bridge this gap, we introduce FinFRE-RAG, a two-stage approach that applies importance-guided feature reduction to serialize a compact subset of numeric/categorical attributes into natural language and performs retrieval-augmented in-context learning over label-aware, instance-level exemplars. Across four public fraud datasets and three families of open-weight LLMs, FinFRE-RAG substantially improves F1/MCC over direct prompting and is competitive with strong tabular baselines in several settings. Although these LLMs still lag behind specialized classifiers, they narrow the performance gap and provide interpretable rationales, highlighting their value as assistive tools in fraud analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that FinFRE-RAG, a two-stage pipeline applying importance-guided feature reduction to serialize a compact subset of tabular attributes into natural language followed by retrieval-augmented in-context learning with label-aware exemplars, substantially improves F1 and MCC over direct prompting. This is shown across four public fraud datasets and three families of open-weight LLMs, where it is competitive with strong tabular baselines in several settings while providing interpretable rationales, although LLMs still lag specialized classifiers.

Significance. If the results hold, this demonstrates a viable path to adapt LLMs for high-dimensional, imbalanced tabular financial tasks, reducing manual feature engineering and enabling human-readable explanations as assistive tools for fraud analysts. The multi-dataset, multi-model empirical evaluation on public data strengthens reproducibility and suggests broader utility in financial ML applications.

major comments (2)

[§3.2] §3.2 (Method, importance-guided reduction): The procedure for computing feature importances (base model, split used, selection threshold or k) is not fully specified. This is load-bearing for the central claim, as the skeptic concern that global importance may discard rare cross-feature interactions under extreme imbalance is not addressed by ablation or sensitivity analysis.
[§4.3] §4.3 (Experiments, results tables): Reported F1/MCC gains lack error bars across runs, statistical significance tests, or full baseline hyperparameter details. Without these, it is unclear whether improvements over direct prompting are robust, especially given class imbalance.

minor comments (2)

[Figure 1] Figure 1 (pipeline diagram): The serialization step and RAG retrieval example could be expanded with a concrete prompt template to improve clarity on how numeric values are encoded.
[§5] §5 (Discussion): Expand limitations to explicitly discuss potential information loss from feature reduction and its impact on LLM performance relative to full tabular models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have helped us identify areas where additional clarity and rigor are needed. We address each major comment point by point below, indicating the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (Method, importance-guided reduction): The procedure for computing feature importances (base model, split used, selection threshold or k) is not fully specified. This is load-bearing for the central claim, as the skeptic concern that global importance may discard rare cross-feature interactions under extreme imbalance is not addressed by ablation or sensitivity analysis.

Authors: We agree that the description in §3.2 was incomplete. In the revised manuscript we have expanded this section to specify that feature importances are computed with a LightGBM classifier trained on the training split (using default hyperparameters and the training labels), selecting the top-k features where k is the smallest value retaining at least 80% of cumulative importance or a hard cap of 15 features. Regarding the concern that global importance may miss rare cross-feature interactions under extreme imbalance, we acknowledge this is a valid limitation of any univariate importance ranking. We have added an ablation in the appendix that compares the reduced feature set against the full set plus explicit pairwise feature crosses (generated only on the reduced features); the results show that the performance drop is small (<3% F1 on average) while inference cost decreases substantially. We have also included a sensitivity table varying the cumulative-importance threshold from 70% to 90% and k from 10 to 20, confirming that F1/MCC remain stable across these choices on all four datasets. revision: yes
Referee: [§4.3] §4.3 (Experiments, results tables): Reported F1/MCC gains lack error bars across runs, statistical significance tests, or full baseline hyperparameter details. Without these, it is unclear whether improvements over direct prompting are robust, especially given class imbalance.

Authors: We concur that the original results lacked sufficient statistical detail. The revised tables now report mean F1 and MCC together with standard deviations computed over five independent runs that differ in random seed for both data shuffling and LLM sampling. We have added paired Wilcoxon signed-rank tests (chosen for robustness to non-normality) comparing FinFRE-RAG against direct prompting for every dataset–model pair, with p-values shown in the tables; the improvements are statistically significant (p < 0.05) in 10 of the 12 settings. In the appendix we now provide the complete hyperparameter grids and selection protocol for all baselines (grid search on validation F1, with the exact ranges for learning rate, max depth, etc., for XGBoost, Random Forest, and MLP). These additions directly address concerns about robustness under class imbalance. revision: yes

Circularity Check

0 steps flagged

Empirical method with minor self-citation; no derivation reduces to inputs

full rationale

The paper introduces FinFRE-RAG as a two-stage empirical pipeline (importance-guided feature reduction followed by RAG-based in-context learning) and evaluates it directly on four public fraud datasets using F1/MCC against baselines. No equations, predictions, or uniqueness claims reduce by construction to parameters fitted from the target data. The central claims rest on experimental outcomes rather than self-referential definitions or self-citation chains. Any self-citations are peripheral and non-load-bearing for the reported improvements.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The approach rests on domain assumptions about LLM reasoning over serialized text and the reliability of feature importance metrics rather than new physical entities or fitted constants.

axioms (2)

domain assumption Importance scores from a separate model reliably identify the minimal feature subset needed for fraud classification
Invoked in the first stage to justify reduction without loss of predictive power
domain assumption LLMs can perform effective in-context learning from a small number of retrieved label-aware exemplars when data is serialized
Central premise of the second stage

invented entities (1)

FinFRE-RAG no independent evidence
purpose: Two-stage pipeline for LLM-based fraud detection on tabular data
New method name and architecture introduced by the authors

pith-pipeline@v0.9.0 · 5500 in / 1422 out tokens · 39029 ms · 2026-05-16T21:56:24.376264+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 7 internal anchors

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K Arora, Yu Bai, Bowen Baker, Haiming Bao, and 1 others. 2025. gpt-oss-120b & gpt-oss-20b model card. arXiv preprint arXiv:2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Toyin D Aguda, Suchetha Siddagangappa, Elena Kochkina, Simerjot Kaur, Dongsheng Wang, and Charese Smiley. 2024. Large language models as financial data annotators: A study on effectiveness and efficiency. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10124--10145

work page 2024
[5]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623--2631

work page 2019
[6]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, and 1 others. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877--1901

work page 2020
[7]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785--794

work page 2016
[8]

Davide Chicco and Giuseppe Jurman. 2020. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):6

work page 2020
[9]

Michael Han Daniel Han and Unsloth team. 2023. http://github.com/unslothai/unsloth Unsloth

work page 2023
[10]

Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 315--324

work page 2020
[11]

Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, and Hao Wang. 2023. Empowering many, biasing a few: Generalist credit scoring through large language models. arXiv preprint arXiv:2310.00566

work page arXiv 2023
[12]

Ugo Fiore, Alfredo De Santis, Francesca Perla, Paolo Zanetti, and Francesco Palmieri. 2019. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences, 479:448--455

work page 2019
[13]

Jo \ a o Gama, Indr \.e Z liobait \.e , Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):1--37

work page 2014
[14]

Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. 2024. Tabm: Advancing tabular deep learning with parameter-efficient ensembling. arXiv preprint arXiv:2410.24210

work page arXiv 2024
[15]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929--3938. PMLR

work page 2020
[16]

Waleed Hilal, S Andrew Gadsden, and John Yawney. 2022. Financial fraud: a review of anomaly detection techniques and recent advances. Expert systems With applications, 193:116429

work page 2022
[17]

Addison Howard and Bernadette Bouchon-Meunier. 2019. Ieee cis, inversion, john lei, lynn@ vesta, marcus2010, hussein abbass. IEEE-CIS Fraud Detection, Kaggle

work page 2019
[18]

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 Lo RA : Low-rank adaptation of large language models . In International Conference on Learning Representations

work page 2022
[19]

Jun Hu, Wenwen Xia, Xiaolu Zhang, Chilin Fu, Weichang Wu, Zhaoxin Huan, Ang Li, Zuoli Tang, and Jun Zhou. 2024. Enhancing sequential recommendation via llm-based semantic embedding learning. In Companion Proceedings of the ACM Web Conference 2024, pages 103--111

work page 2024
[20]

Tairan Huang and Yili Wang. 2025. Can llms find fraudsters? multi-level llm enhanced graph fraud detection. arXiv preprint arXiv:2507.11997

work page arXiv 2025
[21]

Jing Jin and Yongqing Zhang. 2025. The analysis of fraud detection in financial market under machine learning. Scientific Reports, 15(1):29959

work page 2025
[22]

SK Kamaruddin and Vadlamani Ravi. 2016. Credit card fraud detection using big data analytics: use of psoaann based one-class classification. In Proceedings of the international conference on informatics and analytics, pages 1--8

work page 2016
[23]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick SH Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for open-domain question answering. In EMNLP (1), pages 6769--6781

work page 2020
[24]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30

work page 2017
[25]

Seunghee Kim, Changhyeon Kim, and Taeuk Kim. 2025. https://doi.org/10.18653/v1/2025.acl-long.1138 Fcmr: Robust evaluation of financial cross-modal multi-hop reasoning . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 23352--23380, Vienna, Austria. Association for Computational Linguistics

work page doi:10.18653/v1/2025.acl-long.1138 2025
[26]

u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, and 1 others. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459--9474

work page 2020
[27]

Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W

Haohang Li, Yupeng Cao, Yangyang Yu, Shashidhar Reddy Javaji, Zhiyang Deng, Yueru He, Yuechen Jiang, Zining Zhu, K.p. Subbalakshmi, Jimin Huang, Lingfei Qian, Xueqing Peng, Jordan W. Suchow, and Qianqian Xie. 2025. https://doi.org/10.18653/v1/2025.acl-long.126 Investorbench: A benchmark for financial decision-making tasks with llm-based agent . In Proceed...

work page doi:10.18653/v1/2025.acl-long.126 2025
[28]

Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, and 1 others. 2024. Sefraud: Graph-based self-explainable fraud detection via interpretative mask learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5329--5338

work page 2024
[29]

E Lopez-Rojas. 2017. Synthetic financial datasets for fraud detection. Kaggle. Available online: https://www. kaggle. com/datasets/ealaxi/paysim1 (accessed on 29 July 2023)

work page 2017
[30]

Junyu Luo, Zhizhuo Kou, Liming Yang, Xiao Luo, Jinsheng Huang, Zhiping Xiao, Jingshu Peng, Chengzhong Liu, Jiaming Ji, Xuanzhe Liu, Sirui Han, Ming Zhang, and Yike Guo. 2025. https://doi.org/10.18653/v1/2025.acl-long.1426 Finmme: Benchmark dataset for financial multi-modal reasoning evaluation . In Proceedings of the 63rd Annual Meeting of the Association...

work page doi:10.18653/v1/2025.acl-long.1426 2025
[31]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

Yansong Ning, Shuowei Cai, Wei Li, Jun Fang, Naiqiang Tan, Hua Chai, and Hao Liu. 2025. Dima: An llm-powered ride-hailing assistant at didi. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 4728--4739

work page 2025
[33]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. Catboost: unbiased boosting with categorical features. Advances in neural information processing systems, 31

work page 2018
[34]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11:1316--1331

work page 2023
[35]

Gurjot Singh, Prabhjot Singh, and Maninder Singh. 2025. Advanced real-time fraud detection using rag-based llms. arXiv preprint arXiv:2501.15290

work page arXiv 2025
[36]

Makram Soui, Ines Gasmi, Salima Smiti, and Khaled Gh \'e dira. 2019. Rule-based credit risk assessment model using multi-objective evolutionary algorithms. Expert systems with applications, 126:144--157

work page 2019
[37]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ram \'e , Morgane Rivi \`e re, and 1 others. 2025. Gemma 3 technical report. arXiv preprint arXiv:2503.19786

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth \'e e Lacroix, Baptiste Rozi \`e re, Naman Goyal, Eric Hambro, Faisal Azhar, and 1 others. 2023 a . Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, and 1 others. 2023 b . Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Chengrui Wang, Qingqing Long, Meng Xiao, Xunxin Cai, Chengjun Wu, Zhen Meng, Xuezhi Wang, and Yuanchun Zhou. 2024. Biorag: A rag-llm framework for biological question reasoning. arXiv preprint arXiv:2408.01107

work page arXiv 2024
[41]

Jianling Wang, Yifan Liu, Yinghao Sun, Xuejian Ma, Yueqi Wang, He Ma, Zhengyang Su, Minmin Chen, Mingyan Gao, Onkar Dalal, and 1 others. 2025. User feedback alignment for LLM -powered exploration in large-scale recommendation systems. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pag...

work page 2025
[42]

Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets

Neng Wang, Hongyang Yang, and Christina Wang. Fingpt: Instruction tuning benchmark for open-source large language models in financial datasets. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

work page 2023
[43]

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564

work page internal anchor Pith review Pith/arXiv arXiv 2023
[44]

Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, and 1 others. 2024. Finben: A holistic financial benchmark for large language models. Advances in Neural Information Processing Systems, 37:95716--95743

work page 2024
[45]

Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira, and Jimin Huang. 2023. Pixiu: A comprehensive benchmark, instruction dataset and large language model for finance. Advances in Neural Information Processing Systems, 36:33469--33484

work page 2023
[46]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025 a . Qwen3 technical report. arXiv preprint arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Chengdong Yang, Hongrui Liu, Daixin Wang, Zhiqiang Zhang, Cheng Yang, and Chuan Shi. 2025 b . Flag: Fraud detection with llm-enhanced graph neural network. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, pages 5150--5160

work page 2025
[48]

Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang. 2023. Fingpt: Open-source financial large language models. FinLLM Symposium at IJCAI 2023

work page 2023
[49]

Shu Yang, Shenzhe Zhu, Zeyu Wu, Keyu Wang, Junchi Yao, Junchao Wu, Lijie Hu, Mengdi Li, Derek F Wong, and Di Wang. 2025 c . Fraud-r1: A multi-round benchmark for assessing the robustness of llm against augmented fraud and phishing inducements. arXiv preprint arXiv:2502.12904

work page arXiv 2025
[50]

Yi Yang, Mark Christopher Siy UY, and Allen Huang. 2020. https://arxiv.org/abs/2006.08097 Finbert: A pretrained language model for financial communications . Preprint, arXiv:2006.08097

work page arXiv 2020
[51]

Chi, and Denny Zhou

Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, and Denny Zhou. 2024. https://openreview.net/forum?id=AgDICX1h50 Large language models as analogical reasoners . In The Twelfth International Conference on Learning Representations

work page 2024
[52]

Jianke Yu, Hanchen Wang, Xiaoyang Wang, Zhao Li, Lu Qin, Wenjie Zhang, Jian Liao, and Ying Zhang. 2023. Group-based fraud detection network on e-commerce platforms. In Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining, pages 5463--5475

work page 2023
[53]

Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yuechen Jiang, Yupeng Cao, Zhi Chen, Jordan Suchow, Zhenyu Cui, Rong Liu, and 1 others. 2024. Fincon: A synthesized llm multi-agent system with conceptual verbal reinforcement for enhanced financial decision making. Advances in Neural Information Processing Systems, 37:137010--137045

work page 2024