Behavior Sequence Transformer for E-commerce Recommendation in Alibaba

Huan Zhao; Pipei Huang; Qiwei Chen; Wei Li; Wenwu Ou

arxiv: 1905.06874 · v1 · pith:I6P4FY2Gnew · submitted 2019-05-15 · 💻 cs.IR · cs.LG

Behavior Sequence Transformer for E-commerce Recommendation in Alibaba

Qiwei Chen , Huan Zhao , Wei Li , Pipei Huang , Wenwu Ou This is my paper

classification 💻 cs.IR cs.LG

keywords recommendationalibababehaviorfeaturesmodelonlinesequentialthen

0 comments

read the original abstract

Deep learning based methods have been widely used in industrial recommendation systems (RSs). Previous works adopt an Embedding&MLP paradigm: raw features are embedded into low-dimensional vectors, which are then fed on to MLP for final recommendations. However, most of these works just concatenate different features, ignoring the sequential nature of users' behaviors. In this paper, we propose to use the powerful Transformer model to capture the sequential signals underlying users' behavior sequences for recommendation in Alibaba. Experimental results demonstrate the superiority of the proposed model, which is then deployed online at Taobao and obtain significant improvements in online Click-Through-Rate (CTR) comparing to two baselines.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FEDIN: Frequency-Enhanced Deep Interest Network for Click-Through Rate Prediction
cs.IR 2026-05 unverdicted novelty 6.0

FEDIN improves CTR prediction by using target-aware frequency filtering to isolate low-entropy periodic interest signals from high-entropy noise in user attention patterns.
Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation
cs.LG 2025-11 conditional novelty 6.0

Douyin deploys stacked target-to-history cross attention and request-level batching to scale end-to-end recommendation modeling to 10k-length histories, observing scaling-law gains and live engagement improvements.
Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation
cs.LG 2025-11 unverdicted novelty 5.0

Introduces STCA for linear-complexity target-to-history attention, RLB for shared user encoding across targets, and length-extrapolative training to enable end-to-end 10K sequence modeling with observed scaling-law ga...