MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders
read the original abstract
As industrial recommender systems enter a scaling-driven regime, Transformer architectures have become increasingly attractive for scaling models towards larger capacity and longer sequence. However, existing Transformer-based recommendation models remain structurally fragmented, where sequence modeling and feature interaction are implemented as separate modules with independent parameterization. Such designs introduce a fundamental co-scaling challenge, as model capacity must be suboptimally allocated between dense feature interaction and sequence modeling under a limited computational budget. In this work, we propose MixFormer, a unified Transformer-style architecture tailored for recommender systems, which jointly models sequential behaviors and feature interactions within a single backbone. Through a unified parameterization, MixFormer enables effective co-scaling across both dense capacity and sequence length, mitigating the trade-off observed in decoupled designs. Moreover, the integrated architecture facilitates deep interaction between sequential and non-sequential representations, allowing high-order feature semantics to directly inform sequence aggregation and enhancing overall expressiveness. To ensure industrial practicality, we further introduce a user-item decoupling strategy for efficiency optimizations that significantly reduce redundant computation and inference latency. Extensive experiments on large-scale industrial datasets demonstrate that MixFormer consistently exhibits superior accuracy and efficiency. Furthermore, large-scale online A/B tests on two production recommender systems, Douyin and Douyin Lite, show consistent improvements in user engagement metrics, including active days and in-app usage duration.
This paper has not been read by Pith yet.
Forward citations
Cited by 9 Pith papers
-
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
-
Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
SIF encodes full historical raw samples as tokens via hierarchical quantization to preserve sample context and unify sequential/non-sequential features in large recommender models.
-
LENS: A Staged Design for Interaction Granularity in Sequential CTR Prediction
LENS restores target-specific control in latent-query CTR models via TCQG and TCPB modules plus QueryPos reference, reporting positive gains in all 12 backbone-dataset cells and a density-dependent conditioning rule.
-
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
-
NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems
NOVA deploys a level-aware agent system with architecture gradient and verification cascade for recommender architecture evolution, reporting 54.5-60% effective pass rates, 13x faster cycles, and online GMV gains of 1...
-
NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems
NOVA introduces a level-aware agent harness with architecture gradient and verification cascade to automate recommender architecture evolution while reducing silent failures and human effort.
-
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
RankUp enhances representation capacity in deep MetaFormer recommenders via permutation splitting and multi-embeddings, achieving GMV improvements of 2-5% in Weixin production systems.
-
Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
SIF encodes entire historical raw samples as tokens via hierarchical group-adaptive quantization and token/sample-level mixing to overcome partial encoding and feature heterogeneity limits in scaled recommender models.
-
UniFormer: Efficient and Unified Model-Centric Scaling for Industrial Recommendation
UniFormer introduces a unified model-centric scaling approach for recommender systems via feature-space and task-space modules, semantic tokenization, and multi-sequence attention, with reported gains in production A/...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.