Across 600 runs from 10^15 to 10^19 FLOPs, behavioral models show a 2% embedder is compute-optimal at all scales, training is data-heavy at low compute, and optimal negatives increase with budget until memory-limited.
PRAGMA: Revolut Foundation Model
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Modern financial systems generate vast quantities of transactional and event-level data that encode rich economic signals. This paper presents PRAGMA, a family of foundation models for multi-source banking event sequences. Our approach pre-trains a Transformer-based architecture with masked modelling on a large-scale, heterogeneous banking event corpus using a self-supervised objective tailored to the discrete, variable-length nature of financial records. The resulting model supports a wide range of downstream tasks such as credit scoring, fraud detection, and lifetime value prediction: strong performance can be achieved by training a simple linear model on top of the extracted embeddings and can be further improved with lightweight fine-tuning. Through extensive evaluation on downstream tasks, we demonstrate that PRAGMA achieves superior performance across multiple domains directly from raw event sequences, providing a general-purpose representation layer for financial applications.
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Scaling Laws for Behavioral Foundation Models over User Event Sequences
Across 600 runs from 10^15 to 10^19 FLOPs, behavioral models show a 2% embedder is compute-optimal at all scales, training is data-heavy at low compute, and optimal negatives increase with budget until memory-limited.