Make it long, keep it fast: End-to-end 10k-sequence modeling at billion scale on douyin

Lin Guan, Jia-Qi Yang, Zhishan Zhao, Beichuan Zhang, Bo Sun, Xuanyuan Luo, Jinan Ni, Xiaowen Li, Yuhang Qi, Zhifang Fan, et al · 2025 · arXiv 2511.06077

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Similar Users-Augmented Interest Network

cs.IR · 2026-04-26 · unverdicted · novelty 7.0

SUIN improves CTR prediction by augmenting target user sequences with similar users' behaviors via embedding-based retrieval, user-specific position encoding, and user-aware target attention.

IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems

cs.IR · 2026-04-10 · unverdicted · novelty 7.0

IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance and transferability.

One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving

cs.DC · 2026-05-06 · unverdicted · novelty 6.0 · 2 refs

HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.

citing papers explorer

Showing 3 of 3 citing papers.

Similar Users-Augmented Interest Network cs.IR · 2026-04-26 · unverdicted · none · ref 20 · internal anchor
SUIN improves CTR prediction by augmenting target user sequences with similar users' behaviors via embedding-based retrieval, user-specific position encoding, and user-aware target attention.
IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems cs.IR · 2026-04-10 · unverdicted · none · ref 6 · internal anchor
IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance and transferability.
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving cs.DC · 2026-05-06 · unverdicted · none · ref 14 · 2 links · internal anchor
HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.

Make it long, keep it fast: End-to-end 10k-sequence modeling at billion scale on douyin

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer