Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Bojian Hou; Chao Chen; Chunzhi Yang; Darren Liu; Ellie Wen; Han Xu; Huaqing Xiong; Huayu Li; Jade Nie; Jiaqi Xu

arxiv: 2602.10016 · v3 · pith:GM3CDGJUnew · submitted 2026-02-10 · 💻 cs.IR · cs.AI

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Bojian Hou , Xiaolong Liu , Xiaoyi Liu , Jiaqi Xu , Yasmine Badr , Mengyue Hang , Sudhanshu Chanpuriya , Junqing Zhou

show 21 more authors

Yuhang Yang Han Xu Qiuling Suo Laming Chen Yuxi Hu Jiasheng Zhang Huaqing Xiong Yuzhen Huang Chao Chen Yue Dong Yi Yang Shuo Chang Xiaorui Gan Wenlin Chen Santanu Kolay Darren Liu Jade Nie Chunzhi Yang Ellie Wen Jiyan Yang Huayu Li

This is my paper

classification 💻 cs.IR cs.AI

keywords scalingefficiencykunlunlawsmodelrecommendationsystemsallocation

0 comments

read the original abstract

Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip (CompSkip) and Event-level Personalization. These advances increase MFU from 17% to 37% on NVIDIA B200 GPUs and double scaling efficiency over state-of-the-art methods. Kunlun is now deployed in major Meta Ads models, delivering significant production impact.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
cs.IR 2026-04 unverdicted novelty 7.0

TokenFormer unifies multi-field and sequential recommendation modeling via bottom-full-top-sliding attention and non-linear interaction representations to avoid sequential collapse and deliver state-of-the-art performance.