TREASURE: The Visa Payment Foundation Model for High-Volume Transaction Understanding

· 2025 · cs.LG · arXiv 2511.19693

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.

representative citing papers

Scaling Laws for Behavioral Foundation Models over User Event Sequences

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Across 600 runs from 10^15 to 10^19 FLOPs, behavioral models show a 2% embedder is compute-optimal at all scales, training is data-heavy at low compute, and optimal negatives increase with budget until memory-limited.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Scaling Laws for Behavioral Foundation Models over User Event Sequences cs.LG · 2026-06-03 · unverdicted · none · ref 14 · internal anchor
Across 600 runs from 10^15 to 10^19 FLOPs, behavioral models show a 2% embedder is compute-optimal at all scales, training is data-heavy at low compute, and optimal negatives increase with budget until memory-limited.

TREASURE: The Visa Payment Foundation Model for High-Volume Transaction Understanding

fields

years

verdicts

representative citing papers

citing papers explorer