pith. sign in

arxiv: 1906.09217 · v1 · pith:PO6XRWCYnew · submitted 2019-06-21 · 💻 cs.IR

Hierarchical Gating Networks for Sequential Recommendation

Pith reviewed 2026-05-25 18:32 UTC · model grok-4.3

classification 💻 cs.IR
keywords sequential recommendationgating networksuser interestsimplicit feedbackBayesian Personalized RankingTop-N recommendationrecommender systemshierarchical networks
0
0 comments X

The pith

A hierarchical gating network with feature and instance selection modules models both long-term and short-term user interests for sequential recommendation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new model called HGN to address challenges in sequential recommender systems. It uses gating at feature and instance levels to handle sparse implicit feedback and capture user interests over time. An item-item product module explicitly models relations between past and future items. The model is trained with BPR and evaluated on five datasets showing effectiveness for Top-N recommendations. A sympathetic reader would care because better modeling of sequential behavior can lead to more accurate predictions of what users will interact with next.

Core claim

The hierarchical gating network (HGN), integrated with Bayesian Personalized Ranking, captures long-term and short-term user interests through a feature gating module that selects item features, an instance gating module that selects past instances, and an item-item product module that captures item relations.

What carries the argument

The hierarchical gating network consisting of feature gating module, instance gating module, and item-item product module that select relevant information from feature and instance levels and model item relations.

If this is right

  • Improved Top-N sequential recommendation performance compared to state-of-the-art methods.
  • Effective handling of sparse implicit feedback for modeling user interests.
  • Explicit capture of item relations between accessed and future items.
  • Integration with BPR loss for personalized ranking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such gating mechanisms could be adapted to other sequential prediction tasks like next basket recommendation or session-based systems.
  • If the gating proves reliable, it might reduce the need for complex attention mechanisms in recsys.
  • Extensions to incorporate side information or context could further improve the model.
  • Testing on larger scale datasets would validate scalability.

Load-bearing premise

The feature gating and instance gating modules can select relevant item features and past instances from sparse implicit feedback without introducing selection bias or losing critical signals.

What would settle it

Running the model on the five real-world datasets and finding that it does not outperform several state-of-the-art methods on the validation metrics used.

Figures

Figures reproduced from arXiv: 1906.09217 by Chen Ma, Peng Kang, Xue Liu.

Figure 1
Figure 1. Figure 1: An illustrative example of the feature gating, instance gating, and item-item product modules. In Figure 1a, the gray [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of HGN. HGN consists of three major components: the embedding layer, the hierarchical gating [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The performance comparison on MovieLens-20M. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The performance comparison on Amazon-Books. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The performance comparison on Amazon-CDs. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The performance comparison on Children. 5 10 15 20 k 0.00 0.05 0.10 0.15 0.20 0.25 Recall@k BPRMF GRU4Rec GRU4Rec+ NextItNet Caser SASRec HGN (a) Recall@k on Comics 5 10 15 20 k 0.00 0.05 0.10 0.15 0.20 0.25 NDCG@k BPRMF GRU4Rec GRU4Rec+ NextItNet Caser SASRec HGN (b) NDCG@k on Comics [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The performance comparison on Comics. a hyper-parameter—the maximum sequence length to reduce the computation burden, where only using part of the user data may lead to the insufficient understanding of long-term user interests; (3) SASRec does not explicitly model the item-item relations between two closely relevant items, which is captured by our item-item product module. Third, HGN outperforms Caser, on… view at source ↗
Figure 8
Figure 8. Figure 8: The dimension variations of embeddings. 5. Note that the time reported only includes the training time of models without including the negative sampling time [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

The chronological order of user-item interactions is a key feature in many recommender systems, where the items that users will interact may largely depend on those items that users just accessed recently. However, with the tremendous increase of users and items, sequential recommender systems still face several challenging problems: (1) the hardness of modeling the long-term user interests from sparse implicit feedback; (2) the difficulty of capturing the short-term user interests given several items the user just accessed. To cope with these challenges, we propose a hierarchical gating network (HGN), integrated with the Bayesian Personalized Ranking (BPR) to capture both the long-term and short-term user interests. Our HGN consists of a feature gating module, an instance gating module, and an item-item product module. In particular, our feature gating and instance gating modules select what item features can be passed to the downstream layers from the feature and instance levels, respectively. Our item-item product module explicitly captures the item relations between the items that users accessed in the past and those items users will access in the future. We extensively evaluate our model with several state-of-the-art methods and different validation metrics on five real-world datasets. The experimental results demonstrate the effectiveness of our model on Top-N sequential recommendation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a Hierarchical Gating Network (HGN) for sequential recommendation that integrates a feature gating module and an instance gating module to select relevant item features and past user instances from sparse implicit feedback, an item-item product module to explicitly model relations between past and future items, and Bayesian Personalized Ranking (BPR) loss to jointly capture long-term and short-term user interests. The model is evaluated on five real-world datasets against state-of-the-art baselines using standard Top-N metrics, with claims of superior effectiveness.

Significance. If the gating modules reliably perform non-trivial selection without bias or signal loss, the hierarchical design combined with explicit item-item products offers a structured approach to sparsity in sequential recommendation that could influence follow-up work on gated architectures. The use of standard BPR loss and real-world datasets is a strength, as is the absence of circular or self-referential derivations.

major comments (2)
  1. [§4 (Experiments)] §4 (Experiments) and associated tables: overall performance gains versus baselines are reported, but no ablation compares the full HGN against a non-gated variant of equal capacity (e.g., replacing gating with identity or simple averaging) or measures properties of the gated-out items (e.g., popularity or recency bias in selections). This leaves the central claim that the hierarchical gating modules are responsible for modeling long-term interests from sparse data unverified.
  2. [§3.2–3.3 (Feature and Instance Gating Modules)] §3.2–3.3 (Feature and Instance Gating Modules): the description states that the modules 'select what item features can be passed' and 'select relevant past instances,' yet no analysis, visualization, or quantitative check (such as overlap with popularity heuristics or retention rates on held-out signals) is provided to confirm that selection is reliable rather than near-identity or biased.
minor comments (1)
  1. [Abstract and §1] The abstract and §1 list five datasets but do not name them until later; early disclosure of dataset characteristics (e.g., sparsity levels, sequence lengths) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the validation of the gating modules.

read point-by-point responses
  1. Referee: [§4 (Experiments)] §4 (Experiments) and associated tables: overall performance gains versus baselines are reported, but no ablation compares the full HGN against a non-gated variant of equal capacity (e.g., replacing gating with identity or simple averaging) or measures properties of the gated-out items (e.g., popularity or recency bias in selections). This leaves the central claim that the hierarchical gating modules are responsible for modeling long-term interests from sparse data unverified.

    Authors: We agree that the current experiments leave the specific contribution of the gating modules less directly verified than ideal. In revision we will add an ablation study comparing the full HGN to non-gated variants of comparable capacity (replacing gating layers with identity or averaging operations) and will report performance differences. We will also include an analysis of the gated-out items, measuring properties such as popularity and recency to check for systematic bias. revision: yes

  2. Referee: [§3.2–3.3 (Feature and Instance Gating Modules)] §3.2–3.3 (Feature and Instance Gating Modules): the description states that the modules 'select what item features can be passed' and 'select relevant past instances,' yet no analysis, visualization, or quantitative check (such as overlap with popularity heuristics or retention rates on held-out signals) is provided to confirm that selection is reliable rather than near-identity or biased.

    Authors: We acknowledge the absence of direct inspection of the gating behavior. In the revised manuscript we will add quantitative checks (e.g., retention rates on held-out signals and overlap statistics with simple popularity or recency heuristics) together with illustrative visualizations where space permits, to demonstrate that the feature- and instance-level selections are non-trivial. revision: yes

Circularity Check

0 steps flagged

No significant circularity; model proposal rests on external empirical validation.

full rationale

The paper proposes a hierarchical gating network architecture (feature gating module, instance gating module, item-item product module) trained end-to-end with standard BPR loss on real-world datasets. No equations define a quantity in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise reduces to a self-citation chain or ansatz smuggled from prior author work. Claims of capturing long- and short-term interests are evaluated via top-N metrics on five external datasets, rendering the derivation self-contained against benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the unstated premise that gating can extract useful signals from sparse data and that item-item products add non-redundant relational information; no explicit free parameters, axioms, or invented entities are named.

pith-pipeline@v0.9.0 · 5744 in / 1062 out tokens · 25736 ms · 2026-05-25T18:32:34.822350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning

    cs.IR 2026-05 unverdicted novelty 5.0

    TwiSTAR learns to switch between fast SID retrieval and slow rationale-generating reasoning in generative recommendation, yielding better accuracy-latency trade-offs on three datasets.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential Recommendation with User Memory Networks. In WSDM. ACM, 108–116

  2. [2]

    Lyu, and Irwin King

    Chen Cheng, Haiqin Yang, Michael R. Lyu, and Irwin King. 2013. Where You Like to Go Next: Successive Point-of-Interest Recommendation. In IJCAI. IJCAI/AAAI, 2605–2611

  3. [3]

    Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP. ACL, 1724–1734

  4. [4]

    Dauphin, Angela Fan, Michael Auli, and David Grangier

    Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In ICML (Proceedings of Machine Learning Research), Vol. 70. PMLR, 933–941

  5. [5]

    Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In IJCAI. ijcai.org, 1725–1731

  6. [6]

    Maxwell Harper and Joseph A

    F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History and Context. TiiS 5, 4 (2016), 19:1–19:19

  7. [7]

    Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based Recommendation. In RecSys. ACM, 161–169

  8. [8]

    Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. In ICDM. IEEE, 191–200

  9. [9]

    Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW. ACM, 507–517

  10. [10]

    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. ACM, 173–182

  11. [11]

    Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast Matrix Factorization for Online Recommendation with Implicit Feedback. In SIGIR. ACM, 549–558

  12. [12]

    Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In CIKM. ACM, 843–852

  13. [13]

    Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk

  14. [14]

    Session-based Recommendations with Recurrent Neural Networks

    Session-based Recommendations with Recurrent Neural Networks. CoRR abs/1511.06939 (2015)

  15. [15]

    Belongie, and Deborah Estrin

    Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge J. Belongie, and Deborah Estrin. 2017. Collaborative Metric Learning. In WWW. ACM, 193–201

  16. [16]

    Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In ICDM. IEEE Computer Society, 263–272

  17. [17]

    Jin Huang, Wayne Xin Zhao, Hong-Jian Dou, Ji-Rong Wen, and Edward Y. Chang

  18. [18]

    In SIGIR

    Improving Sequential Recommendation with Knowledge-Enhanced Mem- ory Networks. In SIGIR. ACM, 505–514

  19. [19]

    Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: factored item simi- larity models for top-N recommender systems. In KDD. ACM, 659–667

  20. [20]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recom- mendation. In ICDM. IEEE Computer Society, 197–206

  21. [21]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization. CoRR abs/1412.6980 (2014)

  22. [22]

    Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In CIKM. ACM, 1419–1428

  23. [23]

    Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In KDD. ACM, 1754–1763

  24. [24]

    Dawen Liang, Laurent Charlin, James McInerney, and David M. Blei. 2016. Mod- eling User Exposure in Recommendation. In WWW. ACM, 951–961

  25. [25]

    Chen Ma, Peng Kang, Bin Wu, Qinglong Wang, and Xue Liu. 2019. Gated Attentive-Autoencoder for Content-Aware Recommendation. InWSDM. ACM, 519–527

  26. [26]

    Chen Ma, Yingxue Zhang, Qinglong Wang, and Xue Liu. 2018. Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence. In CIKM. ACM, 697–706

  27. [27]

    Xia Ning, Christian Desrosiers, and George Karypis. 2015. A Comprehensive Survey of Neighborhood-Based Recommendation Methods. In Recommender Systems Handbook. Springer, 37–76

  28. [28]

    Lukose, Martin Scholz, and Qiang Yang

    Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan M. Lukose, Martin Scholz, and Qiang Yang. 2008. One-Class Collaborative Filtering. In ICDM. IEEE Computer Society, 502–511

  29. [29]

    Wenjie Pei, Jie Yang, Zhu Sun, Jie Zhang, Alessandro Bozzon, and David M. J. Tax. 2017. Interacting Attention-gated Recurrent Networks for Recommendation. In CIKM. ACM, 1459–1468

  30. [30]

    Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi

  31. [31]

    In RecSys

    Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks. In RecSys. ACM, 130–137

  32. [32]

    Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme

  33. [33]

    BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. AUAI Press, 452–461

  34. [34]

    Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor- izing personalized Markov chains for next-basket recommendation. In WWW. ACM, 811–820

  35. [35]

    Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey E. Hinton. 2007. Restricted Boltzmann machines for collaborative filtering. In ICML (ACM International Conference Proceeding Series), Vol. 227. ACM, 791–798

  36. [36]

    Konstan, and John Riedl

    Badrul Munir Sarwar, George Karypis, Joseph A. Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In WWW. ACM, 285–295

  37. [37]

    Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 2015. End- To-End Memory Networks. In NIPS. 2440–2448

  38. [38]

    Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In WSDM. ACM, 565–573

  39. [39]

    Thanh Tran, Kyumin Lee, Yiming Liao, and Dongwon Lee. 2018. Regularizing Matrix Factorization with User and Item Embeddings for Recommendation. In CIKM. ACM, 687–696

  40. [40]

    Thanh Tran, Xinyue Liu, Kyumin Lee, and Xiangnan Kong. [n.d.]. Signed Distance- based Deep Memory Recommender

  41. [41]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 6000–6010

  42. [42]

    Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. In RecSys. ACM, 86–94

  43. [43]

    Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In KDD. ACM, 1235–1244

  44. [44]

    Zheng, and Martin Ester

    Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collabo- rative Denoising Auto-Encoders for Top-N Recommender Systems. In WSDM. ACM, 153–162

  45. [45]

    Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen

  46. [46]

    In IJCAI

    Deep Matrix Factorization Models for Recommender Systems. In IJCAI. ijcai.org, 3203–3209

  47. [47]

    Jose, and Xiangnan He

    Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose, and Xiangnan He. 2019. A Simple Convolutional Generative Network for Next Item Recommendation. In WSDM. ACM, 582–590