MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders

Hao Zhang; Jinan Ni; Qiwei Chen; Xu Huang; Yuchao Zheng; Yunwen Huang; Zheng Chai; Zhifang Fan; Zhuoxing Wei

arxiv: 2602.14110 · v2 · pith:CJ7VKDR2new · submitted 2026-02-15 · 💻 cs.IR

MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders

Xu Huang , Hao Zhang , Zhifang Fan , Yunwen Huang , Zhuoxing Wei , Zheng Chai , Jinan Ni , Yuchao Zheng

show 1 more author

Qiwei Chen

This is my paper

classification 💻 cs.IR

keywords sequencefeatureindustrialmixformercapacityco-scalingdenseinteraction

0 comments

read the original abstract

As industrial recommender systems enter a scaling-driven regime, Transformer architectures have become increasingly attractive for scaling models towards larger capacity and longer sequence. However, existing Transformer-based recommendation models remain structurally fragmented, where sequence modeling and feature interaction are implemented as separate modules with independent parameterization. Such designs introduce a fundamental co-scaling challenge, as model capacity must be suboptimally allocated between dense feature interaction and sequence modeling under a limited computational budget. In this work, we propose MixFormer, a unified Transformer-style architecture tailored for recommender systems, which jointly models sequential behaviors and feature interactions within a single backbone. Through a unified parameterization, MixFormer enables effective co-scaling across both dense capacity and sequence length, mitigating the trade-off observed in decoupled designs. Moreover, the integrated architecture facilitates deep interaction between sequential and non-sequential representations, allowing high-order feature semantics to directly inform sequence aggregation and enhancing overall expressiveness. To ensure industrial practicality, we further introduce a user-item decoupling strategy for efficiency optimizations that significantly reduce redundant computation and inference latency. Extensive experiments on large-scale industrial datasets demonstrate that MixFormer consistently exhibits superior accuracy and efficiency. Furthermore, large-scale online A/B tests on two production recommender systems, Douyin and Douyin Lite, show consistent improvements in user engagement metrics, including active days and in-app usage duration.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
cs.IR 2026-04 unverdicted novelty 7.0

LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
cs.IR 2026-04 unverdicted novelty 7.0

SIF encodes full historical raw samples as tokens via hierarchical quantization to preserve sample context and unify sequential/non-sequential features in large recommender models.
LENS: A Staged Design for Interaction Granularity in Sequential CTR Prediction
cs.IR 2026-05 unverdicted novelty 6.0

LENS restores target-specific control in latent-query CTR models via TCQG and TCPB modules plus QueryPos reference, reporting positive gains in all 12 backbone-dataset cells and a density-dependent conditioning rule.
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
cs.IR 2026-04 unverdicted novelty 6.0

RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems
cs.IR 2026-06 unverdicted novelty 5.0

NOVA deploys a level-aware agent system with architecture gradient and verification cascade for recommender architecture evolution, reporting 54.5-60% effective pass rates, 13x faster cycles, and online GMV gains of 1...
NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems
cs.IR 2026-06 unverdicted novelty 5.0

NOVA introduces a level-aware agent harness with architecture gradient and verification cascade to automate recommender architecture evolution while reducing silent failures and human effort.
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
cs.IR 2026-04 unverdicted novelty 5.0

RankUp enhances representation capacity in deep MetaFormer recommenders via permutation splitting and multi-embeddings, achieving GMV improvements of 2-5% in Weixin production systems.
Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
cs.IR 2026-04 unverdicted novelty 5.0

SIF encodes entire historical raw samples as tokens via hierarchical group-adaptive quantization and token/sample-level mixing to overcome partial encoding and feature heterogeneity limits in scaled recommender models.
UniFormer: Efficient and Unified Model-Centric Scaling for Industrial Recommendation
cs.IR 2026-06 unverdicted novelty 4.0

UniFormer introduces a unified model-centric scaling approach for recommender systems via feature-space and task-space modules, semantic tokenization, and multi-sequence attention, with reported gains in production A/...