pith. sign in

arxiv: 2602.14110 · v2 · pith:CJ7VKDR2new · submitted 2026-02-15 · 💻 cs.IR

MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders

classification 💻 cs.IR
keywords sequencefeatureindustrialmixformercapacityco-scalingdenseinteraction
0
0 comments X
read the original abstract

As industrial recommender systems enter a scaling-driven regime, Transformer architectures have become increasingly attractive for scaling models towards larger capacity and longer sequence. However, existing Transformer-based recommendation models remain structurally fragmented, where sequence modeling and feature interaction are implemented as separate modules with independent parameterization. Such designs introduce a fundamental co-scaling challenge, as model capacity must be suboptimally allocated between dense feature interaction and sequence modeling under a limited computational budget. In this work, we propose MixFormer, a unified Transformer-style architecture tailored for recommender systems, which jointly models sequential behaviors and feature interactions within a single backbone. Through a unified parameterization, MixFormer enables effective co-scaling across both dense capacity and sequence length, mitigating the trade-off observed in decoupled designs. Moreover, the integrated architecture facilitates deep interaction between sequential and non-sequential representations, allowing high-order feature semantics to directly inform sequence aggregation and enhancing overall expressiveness. To ensure industrial practicality, we further introduce a user-item decoupling strategy for efficiency optimizations that significantly reduce redundant computation and inference latency. Extensive experiments on large-scale industrial datasets demonstrate that MixFormer consistently exhibits superior accuracy and efficiency. Furthermore, large-scale online A/B tests on two production recommender systems, Douyin and Douyin Lite, show consistent improvements in user engagement metrics, including active days and in-app usage duration.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction

    cs.IR 2026-04 unverdicted novelty 7.0

    LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.

  2. Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

    cs.IR 2026-04 unverdicted novelty 7.0

    SIF encodes full historical raw samples as tokens via hierarchical quantization to preserve sample context and unify sequential/non-sequential features in large recommender models.

  3. LENS: A Staged Design for Interaction Granularity in Sequential CTR Prediction

    cs.IR 2026-05 unverdicted novelty 6.0

    LENS restores target-specific control in latent-query CTR models via TCQG and TCPB modules plus QueryPos reference, reporting positive gains in all 12 backbone-dataset cells and a density-dependent conditioning rule.

  4. RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems

    cs.IR 2026-04 unverdicted novelty 6.0

    RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.

  5. NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems

    cs.IR 2026-06 unverdicted novelty 5.0

    NOVA deploys a level-aware agent system with architecture gradient and verification cascade for recommender architecture evolution, reporting 54.5-60% effective pass rates, 13x faster cycles, and online GMV gains of 1...

  6. NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems

    cs.IR 2026-06 unverdicted novelty 5.0

    NOVA introduces a level-aware agent harness with architecture gradient and verification cascade to automate recommender architecture evolution while reducing silent failures and human effort.

  7. RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems

    cs.IR 2026-04 unverdicted novelty 5.0

    RankUp enhances representation capacity in deep MetaFormer recommenders via permutation splitting and multi-embeddings, achieving GMV improvements of 2-5% in Weixin production systems.

  8. Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models

    cs.IR 2026-04 unverdicted novelty 5.0

    SIF encodes entire historical raw samples as tokens via hierarchical group-adaptive quantization and token/sample-level mixing to overcome partial encoding and feature heterogeneity limits in scaled recommender models.

  9. UniFormer: Efficient and Unified Model-Centric Scaling for Industrial Recommendation

    cs.IR 2026-06 unverdicted novelty 4.0

    UniFormer introduces a unified model-centric scaling approach for recommender systems via feature-space and task-space modules, semantic tokenization, and multi-sequence attention, with reported gains in production A/...