pith. machine review for the scientific record. sign in

arxiv: 2604.20858 · v1 · submitted 2026-03-01 · 💻 cs.IR · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Mixture of Sequence: Theme-Aware Mixture-of-Experts for Long-Sequence Recommendation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:26 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords sequential recommendationmixture of expertslong sequencessession hoppingtheme-aware routingmulti-scale fusionclick-through rate
0
0 comments X

The pith

A theme-aware mixture-of-experts model splits long user sequences into coherent theme-specific subsequences to filter interest shifts and improve recommendations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Long user histories in sequential recommendation often mix stable short-term interests with abrupt shifts that add noise and hurt predictions. The paper shows this pattern, called session hopping, and introduces the Mixture of Sequence framework to handle it. MoS learns latent themes in the data, routes sessions into separate subsequences that stay within one theme, and then fuses outputs from experts that look at the full sequence, recent actions, and theme-specific details. If the approach works, models can maintain or raise accuracy on long sequences while using less computation than other mixture-of-experts methods.

Core claim

The Mixture of Sequence (MoS) framework is a model-agnostic MoE approach that extracts theme-specific and multi-scale subsequences from noisy raw user sequences. It employs a theme-aware routing mechanism to adaptively learn the latent themes of user sequences and organizes these sequences into multiple coherent subsequences. Each subsequence contains only sessions aligned with a specific theme. A multi-scale fusion mechanism then leverages three types of experts to capture global sequence characteristics, short-term user behaviors, and theme-specific semantic patterns.

What carries the argument

Theme-aware routing that groups sessions into theme-coherent subsequences, paired with multi-scale expert fusion for global, short-term, and semantic views.

If this is right

  • Predictions rely only on sessions that match the active theme, reducing the impact of misleading shifts.
  • The model remains compatible with many existing sequential backbones because the routing and fusion layers sit on top.
  • Computational cost drops relative to standard MoE baselines because each expert processes shorter, cleaner subsequences.
  • SOTA results hold across multiple datasets while reporting fewer total floating-point operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same routing logic could extend to other sequential tasks that show periodic reappearance of patterns, such as next-basket prediction in retail.
  • Increasing the number of themes or adding a hierarchical scale might handle extremely long histories with more frequent shifts.
  • Combining the router with explicit user profile features could make theme discovery more stable on cold-start users.

Load-bearing premise

User interests remain stable inside short sessions and shift in patterns that a learned router can reliably separate into distinct latent themes without losing key signals.

What would settle it

A controlled test on standard long-sequence benchmarks where replacing the theme-aware router with random session grouping produces equal or higher accuracy and lower FLOPs than the full MoS model.

Figures

Figures reproduced from arXiv: 2604.20858 by Hanghang Tong, Huayu Li, Hyunsik Yoo, Kai Wang, Mengyue Hang, Rong Jin, Ruizhong Qiu, Shuo Chang, Ting-Wei Li, Weilin Cong, Wen-yen Chen, Xiao Lin, Xuying Ning, Yajuan Wang, Zhicheng Tang, Zhichen Zeng, Zhining Liu.

Figure 1
Figure 1. Figure 1: An illustration of user interests. The user under [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Heatmap of the self-similarity matrix for a repre￾sentative user trans￾action history. Red indicates high simi￾larity, blue indicates low similarity, and green lines denote session boundaries. interests remain highly consistent and stable within a short tem￾poral span (a session), as indicated by the red area of [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The pipeline of MoS. The left panel illustrates theme-aware routing, which assigns inputs to experts according to theme vectors in the codebook. The right panel shows multi-scale fusion, which models user behaviors by extracting subsequences at different granularities from the full sequence. Here, the 𝑖-th row of the codebook describes the theme feature associated with the 𝑖-th expert. Given an item embedd… view at source ↗
Figure 4
Figure 4. Figure 4: Trade-off between utility and efficiency. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dispatch behavior of routers for the most popular [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Impact of 𝛼𝐼 and 𝛼𝑊 on model utility. and the interior region denotes full fusion of all experts. It is ev￾ident that full fusion consistently achieves the best results, while pairwise fusion outperforms using the single expert. This observation suggests that although the global expert alone already provides strong representations for recommendations, the three types of experts capture complementary behavi… view at source ↗
Figure 7
Figure 7. Figure 7: Scaling study of MoS. MoS consistently enhances AUC/GAUC with increased sequence length and number of experts. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Examples of Session Hopping (a) Router dispatch behavior for the most popular item. (b) Router dispatch behavior for the least popular item [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison between dispatch behavior of different MoE routers. [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
read the original abstract

Sequential recommendation has rapidly advanced in click-through rate prediction due to its ability to model dynamic user interests. A key challenge, however, lies in modeling long sequences: users often exhibit significant interest shifts, introducing substantial irrelevant or misleading information. Our empirical analysis corroborates this challenge and uncovers a recurring behavioral pattern in long sequences (\textit{session hopping}): user interests remain stable within short temporal spans (\textit{sessions}) but shift drastically across sessions and may reappear after multiple sessions. To address this challenge, we propose the Mixture of Sequence (MoS) framework, a model-agnostic MoE approach that achieves accurate predictions by extracting theme-specific and multi-scale subsequences from noisy raw user sequences. First, MoS employs a theme-aware routing mechanism to adaptively learn the latent themes of user sequences and organizes these sequences into multiple coherent subsequences. Each subsequence contains only sessions aligned with a specific theme, thereby effectively filtering out irrelevant or even misleading information introduced by user interest shifts in session hopping. In addition, to alleviate potential information loss, we introduce a multi-scale fusion mechanism, which leverages three types of experts to capture global sequence characteristics, short-term user behaviors, and theme-specific semantic patterns. Together, these two mechanisms endow MoS with the ability to deliver accurate recommendations from multi-faceted and multi-scale perspectives. Experimental results demonstrate that MoS consistently achieves the SOTA performance while introducing fewer FLOPs compared with other MoE counterparts, providing strong evidence of its excellent balance between utility and efficiency. The code is available at https://github.com/xiaolin-cs/MoS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the Mixture of Sequence (MoS) framework, a model-agnostic Mixture-of-Experts approach for long-sequence recommendation. It identifies a recurring 'session hopping' pattern in which user interests remain stable within short temporal sessions but shift drastically across sessions. MoS employs a theme-aware routing mechanism to adaptively learn latent themes and extract coherent theme-specific subsequences that filter irrelevant information, together with a multi-scale fusion mechanism that combines experts capturing global sequence characteristics, short-term behaviors, and theme-specific patterns. The paper claims that this yields state-of-the-art performance while using fewer FLOPs than other MoE baselines, with code released at https://github.com/xiaolin-cs/MoS.

Significance. If the empirical results hold, the work offers a practical, model-agnostic way to mitigate interest-shift noise in long user sequences by exploiting stable intra-session themes and multi-scale experts, potentially improving the accuracy-efficiency trade-off in sequential recommendation. The open-source code is a clear positive for reproducibility. The significance is limited, however, by the absence of concrete experimental support for the core assumptions and performance claims.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'MoS consistently achieves the SOTA performance while introducing fewer FLOPs compared with other MoE counterparts' is presented without any reference to datasets, baselines, metrics (AUC, NDCG, etc.), ablation tables, or statistical significance tests. This absence directly undermines verification of both the utility and efficiency assertions that constitute the paper's main contribution.
  2. [Abstract] Abstract: the theme-aware routing is asserted to 'adaptively learn the latent themes' and thereby filter misleading information from session hopping, yet no intra-/inter-session similarity statistics, theme coherence scores, or ablation isolating learned routing versus random routing is supplied. If the router fails to recover stable themes accurately, subsequence extraction becomes a source of information loss rather than a gain, rendering both the SOTA accuracy and the reported FLOPs reduction claims unsupported.
minor comments (1)
  1. [Abstract] The phrase 'session hopping' is introduced in italics without a formal definition or citation to prior session-based modeling literature; a brief operational definition would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the need for stronger empirical grounding of our claims. We address each comment below and have updated the manuscript to incorporate additional details and analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'MoS consistently achieves the SOTA performance while introducing fewer FLOPs compared with other MoE counterparts' is presented without any reference to datasets, baselines, metrics (AUC, NDCG, etc.), ablation tables, or statistical significance tests. This absence directly undermines verification of both the utility and efficiency assertions that constitute the paper's main contribution.

    Authors: The abstract is intentionally concise as a high-level summary. The full manuscript provides the requested details in Section 5, with results across multiple public datasets, standard metrics including AUC and NDCG, comparisons to MoE and other baselines, ablation tables, FLOPs measurements, and statistical significance tests. To improve accessibility, we have revised the abstract to briefly reference the evaluation setting on public benchmarks with AUC/NDCG metrics and direct readers to the experimental section for full tables and analyses. revision: yes

  2. Referee: [Abstract] Abstract: the theme-aware routing is asserted to 'adaptively learn the latent themes' and thereby filter misleading information from session hopping, yet no intra-/inter-session similarity statistics, theme coherence scores, or ablation isolating learned routing versus random routing is supplied. If the router fails to recover stable themes accurately, subsequence extraction becomes a source of information loss rather than a gain, rendering both the SOTA accuracy and the reported FLOPs reduction claims unsupported.

    Authors: The manuscript already includes an empirical analysis of the session-hopping pattern and demonstrates the routing's value through end-to-end gains and component ablations. We agree that more targeted diagnostics would strengthen the argument. In the revision we have added intra-/inter-session similarity statistics, theme coherence scores, and a new ablation comparing the learned router against random routing; these results confirm that the router recovers coherent themes and that subsequence extraction improves rather than harms performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; architectural proposal is self-contained

full rationale

The paper proposes the Mixture of Sequence (MoS) framework as a model-agnostic MoE architecture featuring a theme-aware routing mechanism to organize user sequences into theme-specific subsequences and a multi-scale fusion mechanism with three expert types. These components are motivated by an empirical observation of session-hopping patterns rather than being defined in terms of target performance metrics or reduced to fitted parameters called predictions. No equations or derivations in the provided text reduce the central claims to self-citations, ansatzes smuggled via prior work, or renaming of known results; the SOTA and efficiency claims rest on external experimental comparisons. The derivation chain is therefore independent and self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The framework rests on the empirical validity of the session-hopping pattern and on the assumption that latent themes can be learned and used to partition sequences without external supervision.

free parameters (2)
  • number of latent themes
    The routing mechanism adaptively learns latent themes; the number of themes functions as a model hyperparameter or learned capacity.
  • expert scale configuration
    Choice of three expert types (global, short-term, theme-specific) and their fusion weights are design decisions that affect the multi-scale component.
axioms (1)
  • domain assumption User interests remain stable within short temporal sessions but shift drastically across sessions (session hopping).
    Invoked as the basis for the theme-aware routing; stated as corroborated by empirical analysis in the abstract.
invented entities (2)
  • theme-aware routing mechanism no independent evidence
    purpose: To learn latent themes and reorganize raw sequences into coherent theme-specific subsequences that filter irrelevant information.
    Newly introduced component central to noise removal.
  • multi-scale fusion mechanism no independent evidence
    purpose: To combine outputs from global, short-term, and theme-specific experts and thereby mitigate information loss from subsequence extraction.
    Newly introduced component to preserve predictive signals.

pith-pipeline@v0.9.0 · 5648 in / 1588 out tokens · 81173 ms · 2026-05-15T17:26:12.434969+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

164 extracted references · 164 canonical work pages · 9 internal anchors

  1. [1]

    Mengting Ai, Tianxin Wei, Yifan Chen, Zhichen Zeng, Ritchie Zhao, Girish Varatkar, Bita Darvish Rouhani, Xianfeng Tang, Hanghang Tong, and Jingrui He. 2025. ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1–12

  2. [2]

    Yikun Ban, Yuchen Yan, Arindam Banerjee, and Jingrui He. 2021. Ee-net: Exploitation-exploration neural networks in contextual bandits.arXiv preprint arXiv:2110.03177(2021)

  3. [3]

    Yuanchen Bei, Weizhi Zhang, Siwen Wang, et al. 2025. Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities.arXiv preprint arXiv:2506.18019 (2025)

  4. [4]

    Shuqing Bian, Xingyu Pan, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, and Ji-Rong Wen. 2023. Multi-modal mixture of experts represetation learning for sequential recommendation. InProceedings of the 32nd ACM international conference on information and knowledge management. 110–119

  5. [5]

    Xuheng Cai, Chao Huang, Lianghao Xia, and Xubin Ren. 2023. LightGCL: Simple yet effective graph contrastive learning for recommendation.arXiv preprint arXiv:2302.08191(2023)

  6. [6]

    Yue Cao, Xiaojiang Zhou, Jiaqi Feng, Peihao Huang, Yao Xiao, Dayao Chen, and Sheng Chen. 2022. Sampling is all you need on modeling long-term user behaviors for CTR prediction. InProceedings of the 31st ACM International Conference on Information & Knowledge Management. 2974–2983

  7. [7]

    Valérie Castin, Pierre Ablin, and Gabriel Peyré. 2023. How smooth is attention? arXiv preprint arXiv:2312.14820(2023)

  8. [8]

    Zheng Chai, Qin Ren, Xijun Xiao, et al. 2025. Longer: Scaling up long sequence modeling in industrial recommenders. InProceedings of the Nineteenth ACM Conference on Recommender Systems. 247–256

  9. [9]

    Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2020. Bundle recommendation with graph convolutional networks. InProceedings of the 43rd international ACM SIGIR conference on Research and development in Information Retrieval. 1673–1676

  10. [11]

    Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential recommendation with graph neural networks. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 378–387

  11. [12]

    Jianxin Chang, Chenbin Zhang, Zhiyi Fu, Xiaoxue Zang, Lin Guan, Jing Lu, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, et al. 2023. TWIN: TWo-stage interest network for lifelong user behavior modeling in CTR prediction at kuaishou. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3785–3794

  12. [13]

    Hao Chen, Yuanchen Bei, Qijie Shen, Yue Xu, Sheng Zhou, Wenbing Huang, Feiran Huang, Senzhang Wang, and Xiao Huang. 2024. Macro graph neural networks for online billion-scale recommender systems. InProceedings of the ACM web conference 2024. 3598–3608

  13. [14]

    Hong Chen, Yudong Chen, Xin Wang, et al . 2021. Curriculum disentangled recommendation with noisy multi-feedback.NeurIPS34 (2021), 26924–26936

  14. [15]

    Huiyuan Chen, Zhe Xu, Chin-Chia Michael Yeh, Vivian Lai, Yan Zheng, Minghua Xu, and Hanghang Tong. 2024. Masked graph transformer for large-scale recommendation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2502–2506

  15. [16]

    Xusong Chen, Dong Liu, Zheng-Jun Zha, Wengang Zhou, Zhiwei Xiong, and Yan Li. 2018. Temporal hierarchical attention at category-and item-level for micro- video click-through prediction. InProceedings of the 26th ACM international conference on Multimedia. 1146–1153

  16. [17]

    Xu Chen, Hongteng Xu, Yongfeng Zhang, et al. 2018. Sequential recommenda- tion with user memory networks. InProceedings of the eleventh ACM interna- tional conference on web search and data mining. 108–116

  17. [18]

    Yongjun Chen, Zhiwei Liu, Jia Li, Julian McAuley, and Caiming Xiong. 2022. Intent contrastive learning for sequential recommendation. InProceedings of the ACM Web Conference 2022. 2172–2182

  18. [19]

    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, et al . 2016. Wide & deep learning for recommender systems. InProceedings of the 1st workshop on deep learning for recommender systems. 7–10

  19. [20]

    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al

  20. [21]

    InProceedings of the 1st workshop on deep learning for recommender systems

    Wide & deep learning for recommender systems. InProceedings of the 1st workshop on deep learning for recommender systems. 7–10

  21. [22]

    Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, et al. 2022. On the repre- sentation collapse of sparse mixture of experts.Advances in Neural Information Processing Systems35 (2022), 34600–34613

  22. [23]

    Minjin Choi, Hye-young Kim, Hyunsouk Cho, and Jongwuk Lee. 2024. Multi- intent-aware Session-based Recommendation.arXiv preprint arXiv:2405.00986 (2024)

  23. [24]

    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. InProceedings of the 10th ACM conference on recommender systems. 191–198

  24. [25]

    Damai Dai, Chengqi Deng, Chenggang Zhao, et al. 2024. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models.arXiv preprint arXiv:2401.06066(2024)

  25. [26]

    Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu. 2022. Data augmentation for deep graph learning: A survey.ACM SIGKDD Explorations Newsletter24, 2 (2022), 61–77

  26. [27]

    Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al

  27. [28]

    In International Conference on Machine Learning

    Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning. PMLR, 5547–5569

  28. [29]

    Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin

  29. [30]

    InThe world wide web conference

    Graph neural networks for social recommendation. InThe world wide web conference. 417–426

  30. [31]

    William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research23, 120 (2022), 1–39

  31. [32]

    Dongqi Fu, Zhe Xu, Hanghang Tong, and Jingrui He. 2023. Natural and artificial dynamics in gnns: A tutorial. InProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 1252–1255

  32. [33]

    Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. 2022. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5). InProceedings of the 16th ACM Conference on Recommender Systems. 299–315

  33. [34]

    Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247(2017)

  34. [35]

    Jiayan Guo, Yaming Yang, Xiangchen Song, Yuan Zhang, Yujing Wang, Jing Bai, and Yan Zhang. 2022. Learning multi-granularity consecutive user intent unit for session-based recommendation. InProceedings of the fifteenth ACM International conference on web search and data mining. 343–352

  35. [36]

    Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, et al . 2021. Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning.NeurIPS34 (2021), 29335–29347

  36. [37]

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648

  37. [40]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  38. [41]

    Session-based recommendations with recurrent neural networks.arXiv preprint arXiv:1511.06939(2015)

  39. [42]

    Yupeng Hou, Binbin Hu, Zhiqiang Zhang, and Wayne Xin Zhao. 2022. Core: simple and effective session-based recommendation within consistent represen- tation space. InProceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 1796–1801

  40. [43]

    Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, and Ji- Rong Wen. 2022. Towards universal sequence representation learning for recommender systems. InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. 585–593

  41. [44]

    Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton

  42. [45]

    Adaptive mixtures of local experts.Neural computation3, 1 (1991), 79–87

  43. [46]

    Baoyu Jing, Yuchen Yan, Kaize Ding, Chanyoung Park, Yada Zhu, Huan Liu, and Hanghang Tong. 2024. Sterling: Synergistic representation learning on bipartite graphs. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 12976–12984

  44. [47]

    Baoyu Jing, Yuchen Yan, Yada Zhu, and Hanghang Tong. 2022. Coin: Co-cluster infomax for bipartite graphs.arXiv preprint arXiv:2206.00006(2022)

  45. [50]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recom- mendation. In2018 IEEE international conference on data mining (ICDM). IEEE, 197–206

  46. [51]

    Diederik P Kingma. 2014. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014)

  47. [52]

    TN Kipf. 2016. Semi-supervised classification with graph convolutional net- works.arXiv preprint arXiv:1609.02907(2016)

  48. [53]

    Ivan Koychev and Ingo Schwab. 2000. Adaptation to drifting user’s interests. In Proceedings of ECML2000 workshop: machine learning in new information age. WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates Xiao Lin et al. 39–46

  49. [54]

    Johannes Kruse, Kasper Lindskow, et al. 2024. EB-NeRD a large-scale dataset for news recommendation. InProceedings of the Recommender Systems Challenge

  50. [55]

    Wai Lam and Javed Mostafa. 2001. Modeling user interest shift using a bayesian approach.Journal of the American society for Information Science and Technology 52, 5 (2001), 416–429

  51. [56]

    Wai Lam, Snehasis Mukhopadhyay, Javed Mostafa, and Mathew Palakal. 1996. Detection of shifts in user interests for personalized information filtering. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval. 317–325

  52. [57]

    Gyuseok Lee, Yaokun Liu, Yifan Liu, Susik Yoon, Dong Wang, and SeongKu Kang. 2025. Session-Based Recommendation with Validated and Enriched LLM Intents.arXiv preprint arXiv:2508.00570(2025)

  53. [58]

    Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, et al. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding.arXiv preprint arXiv:2006.16668(2020)

  54. [59]

    Dingcheng Li, Xu Li, Jun Wang, and Ping Li. 2020. Video recommendation with multi-gate mixture of experts soft actor critic. InProceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval. 1553–1556

  55. [60]

    Honghao Li, Yiwen Zhang, Yi Zhang, Hanwei Li, Lei Sang, and Jieming Zhu

  56. [61]

    FCN: Fusing Exponential and Linear Cross Network for Click-Through Rate Prediction.arXiv preprint arXiv:2407.13349(2024)

  57. [62]

    Haoxuan Li, Chunyuan Zheng, Wenjie Wang, Hao Wang, Fuli Feng, and Xiao- Hua Zhou. 2024. Debiased recommendation with noisy feedback. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1576–1586

  58. [63]

    Jinning Li, Ruipeng Han, Chenkai Sun, Dachun Sun, Ruijie Wang, Jingying Zeng, Yuchen Yan, Hanghang Tong, and Tarek Abdelzaher. 2024. Large language model-guided disentangled belief representation learning on polarized social graphs. In2024 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 1–9

  59. [64]

    Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. InProceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428

  60. [65]

    Jinning Li, Huajie Shao, Dachun Sun, Ruijie Wang, Yuchen Yan, Jinyang Li, Shengzhong Liu, Hanghang Tong, and Tarek Abdelzaher. 2022. Unsupervised belief representation learning with information-theoretic variational graph auto-encoders. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1728–1738

  61. [66]

    Ting Wei Li, Qiaozhu Mei, and Jiaqi Ma. 2023. A metadata-driven approach to understand graph neural networks.Advances in Neural Information Processing Systems36 (2023), 15320–15340

  62. [67]

    Yongqi Li, Meng Liu, Jianhua Yin, Chaoran Cui, Xin-Shun Xu, and Liqiang Nie. 2019. Routing micro-videos via a temporal graph-guided recommendation system. InProceedings of the 27th ACM international conference on multimedia. 1464–1472

  63. [68]

    Zihao Li, Zhichen Zeng, Xiao Lin, Feihao Fang, Yanru Qu, Zhe Xu, Zhining Liu, Xuying Ning, Tianxin Wei, Ge Liu, et al. 2025. Flow matching meets biology and life science: a survey.arXiv preprint arXiv:2507.17731(2025)

  64. [69]

    Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, et al. 2025. External large foun- dation model: How to efficiently serve trillions of parameters for online ads recommendation. InCompanion Proceedings of the ACM on Web Conference 2025. 344–353

  65. [70]

    Xiao Lin, Jian Kang, Weilin Cong, and Hanghang Tong. 2024. Bemap: Balanced message passing for fair graph neural network. InLearning on Graphs Conference. PMLR, 37–1

  66. [71]

    Xiao Lin, Zhining Liu, Dongqi Fu, Ruizhong Qiu, and Hanghang Tong. 2024. Backtime: Backdoor attacks on multivariate time series forecasting.Advances in Neural Information Processing Systems37 (2024), 131344–131368

  67. [72]

    Xiao Lin*, Zhining Liu*, Ze Yang*, Gaotang Li, Ruizhong Qiu, Shuke Wang, Hui Liu, Haotian Li, Sumit Keswani, Vishwa Pardeshi, et al. 2025. MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models.arXiv preprint arXiv:2505.14728(2025)

  68. [73]

    Xiao Lin, Zhichen Zeng, Tianxin Wei, et al. 2025. Cats: Mitigating correlation shift for multivariate time series classification.arXiv preprint arXiv:2504.04283 (2025)

  69. [74]

    Chengkai Liu, Jianghao Lin, Jianling Wang, et al. 2024. Mamba4rec: Towards efficient sequential recommendation with selective state space models.arXiv preprint arXiv:2403.03900(2024)

  70. [75]

    Cheng Liu, Chenhuan Yu, Ning Gui, Zhiwu Yu, and Songgaojun Deng. 2024. SimGCL: graph contrastive learning by finding homophily in heterophily.Knowl- edge and Information Systems66, 3 (2024), 2089–2114

  71. [76]

    Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018. STAMP: short- term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1831–1839

  72. [77]

    Xin Liu, Zheng Li, Yifan Gao, et al . 2023. Enhancing user intent capture in session-based recommendation with attribute patterns.Advances in Neural Information Processing Systems36 (2023), 30821–30839

  73. [78]

    Xin Liu, Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, and Yangqiu Song. 2024. Enhancing User Intent Capture in Session- Based Recommendation with Attribute Patterns.Advances in Neural Information Processing Systems36 (2024)

  74. [79]

    Xiaolong Liu, Zhichen Zeng, Xiaoyi Liu, Siyang Yuan, Weinan Song, Mengyue Hang, Yiqun Liu, Chaofei Yang, Donghyun Kim, Wen-Yen Chen, et al . 2024. A collaborative ensemble framework for ctr prediction.arXiv preprint arXiv:2411.13700(2024)

  75. [80]

    Zhining Liu, Rana Ali Amjad, Ravinarayana Adkathimar, Tianxin Wei, and Hanghang Tong. 2025. SelfElicit: Your language model secretly knows where is the relevant evidence. InProceedings of the 63rd Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 9153–9173

  76. [81]

    Zhining Liu, Ziyi Chen, Hui Liu, Chen Luo, Xianfeng Tang, Suhang Wang, Joy Zeng, Zhenwei Dai, Zhan Shi, Tianxin Wei, et al. 2025. Seeing but not believing: Probing the disconnect between visual attention and answer correctness in vlms.arXiv preprint arXiv:2510.17771(2025)

  77. [82]

    Zhining Liu, Zihao Li, Ze Yang, Tianxin Wei, Jian Kang, Yada Zhu, Hendrik Hamann, Jingrui He, and Hanghang Tong. 2025. CLIMB: Class-imbalanced Learning Benchmark on Tabular Data.arXiv preprint arXiv:2505.17451(2025)

  78. [83]

    Zhining Liu, Ruizhong Qiu, Zhichen Zeng, Hyunsik Yoo, David Zhou, Zhe Xu, Yada Zhu, Kommy Weldemariam, Jingrui He, and Hanghang Tong. 2024. Class-Imbalanced Graph Learning without Class Rebalancing. InForty-first International Conference on Machine Learning

  79. [84]

    Zhining Liu, Ze Yang, Xiao Lin, Ruizhong Qiu, Tianxin Wei, Yada Zhu, Hendrik Hamann, Jingrui He, and Hanghang Tong. 2025. Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting. InProceedings of the 42nd International Conference on Machine Learning, Vol. 267. PMLR, 40022–40042

  80. [85]

    Jiaqi Ma, Zhe Zhao, Jilin Chen, Ang Li, Lichan Hong, and Ed H Chi. 2019. Snr: Sub-network routing for flexible parameter sharing in multi-task learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 216–223

Showing first 80 references.