From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

· 2025 · cs.IR · arXiv 2511.12081

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the {predictable scaling laws} seen in large language models (LLMs). We identify the root cause as a {fundamental} \textit{structural misalignment}: {standard} Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over {heterogeneous} fields. To restore alignment, we introduce the \textbf{Field-Aware Transformer (FAT)}. {By reconstructing the standard Transformer block with field-centric parameters, FAT achieves \textit{structured expressivity}, {fundamentally shifting the model complexity dependence from the total vocabulary size $n$ with the number of fields $F$ ($n \gg F$).}} Crucially, to decouple model capacity from field cardinality, FAT employs a {{Basis-Composed Hypernetwork}} to synthesize field-specific parameters from shared bases, further reducing parameter complexity. {Theoretically, we ground this scaling behavior through a formal scaling law based on Rademacher complexity. Empirically, FAT outperforms exisiting state-of-the-art methods with up to \textbf{{+4.38\%}} AUC improvement, and delivers \textbf{+2.33\%} CTR and \textbf{+0.66\%} RPM in live production.} Our work establishes that scalable recommendation arises not from size alone, but from \textit{structured expressivity} -- architectural coherence with data semantics.

representative citing papers

On the Practice of Scaling Search Conversion Rate Prediction

cs.IR · 2026-05-28 · unverdicted · novelty 2.0

Empirical scaling of backbone, embeddings, and data shows largely independent additive gains, enabling a deployed model with 2.5x data and 8x compute that delivers +2.6% CVR improvement with minimal latency change.

citing papers explorer

Showing 1 of 1 citing paper.

On the Practice of Scaling Search Conversion Rate Prediction cs.IR · 2026-05-28 · unverdicted · none · ref 30 · internal anchor
Empirical scaling of backbone, embeddings, and data shows largely independent additive gains, enabling a deployed model with 2.5x data and 8x compute that delivers +2.6% CVR improvement with minimal latency change.

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

fields

years

verdicts

representative citing papers

citing papers explorer