NPA: Neural News Recommendation with Personalized Attention
Pith reviewed 2026-05-24 22:40 UTC · model grok-4.3
The pith
A neural news recommendation model personalizes attention at word and news levels by generating query vectors from user ID embeddings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a neural news recommendation model with personalized attention (NPA). The core of our approach is a news representation model and a user representation model. In the news representation model we use a CNN network to learn hidden representations of news articles based on their titles. In the user representation model we learn the representations of users based on the representations of their clicked news articles. Since different words and different news articles may have different informativeness for representing news and users, we propose to apply both word- and news-level attention mechanism to help our model attend to important words and news articles. In addition, the sameニュース
What carries the argument
personalized attention network which exploits the embedding of user ID to generate the query vector for the word- and news-level attentions
If this is right
- The same news article receives different attention weights for different users based on their ID.
- User representations improve by weighting clicked articles according to personal informativeness.
- Word-level attention within titles also varies per user rather than applying a single global pattern.
- Overall news recommendation accuracy increases on real-world datasets when personalization is added to attention.
Where Pith is reading between the lines
- The technique might extend to other content domains such as product or video recommendation where user-specific attention on history items would be useful.
- Direct use of user ID embeddings could simplify deployment but might require careful handling of new users whose embeddings are not yet learned.
- Combining the personalized query generation with additional user features beyond ID could further refine attention without changing the core architecture.
Load-bearing premise
User ID embeddings can effectively capture and modulate individual differences in which words and past articles are informative for representing users.
What would settle it
An experiment on the MSN news dataset in which removing the user ID embedding from query vector generation produces no drop in recommendation metrics compared with the full model.
Figures
read the original abstract
News recommendation is very important to help users find interested news and alleviate information overload. Different users usually have different interests and the same user may have various interests. Thus, different users may click the same news article with attention on different aspects. In this paper, we propose a neural news recommendation model with personalized attention (NPA). The core of our approach is a news representation model and a user representation model. In the news representation model we use a CNN network to learn hidden representations of news articles based on their titles. In the user representation model we learn the representations of users based on the representations of their clicked news articles. Since different words and different news articles may have different informativeness for representing news and users, we propose to apply both word- and news-level attention mechanism to help our model attend to important words and news articles. In addition, the same news article and the same word may have different informativeness for different users. Thus, we propose a personalized attention network which exploits the embedding of user ID to generate the query vector for the word- and news-level attentions. Extensive experiments are conducted on a real-world news recommendation dataset collected from MSN news, and the results validate the effectiveness of our approach on news recommendation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes NPA, a neural news recommendation model whose core consists of a CNN-based news representation module operating on article titles and a user representation module built from sequences of clicked news. Word-level and news-level attention are applied to weight informative elements, and a personalized attention network is introduced that conditions both attention mechanisms on a learned user-ID embedding used to produce the query vector. Experiments on a real-world MSN news click dataset are reported to validate the approach's effectiveness for news recommendation.
Significance. If the reported gains hold after isolating the contribution of the user-ID-conditioned attention, the work would provide a concrete, easily implemented mechanism for injecting user-specific modulation into attention-based user modeling, which is relevant to the broader literature on personalized content recommendation. The architecture itself is modular and the use of a single user-ID embedding to drive both word- and news-level queries is a compact design choice.
major comments (2)
- [§3.3] §3.3 (Personalized Attention Network): the claim that the user-ID embedding generates a query vector that produces user-specific attention weights is load-bearing for the central contribution, yet the manuscript provides no ablation that replaces the user-ID query with a fixed or randomly initialized query while keeping all other components identical; without this comparison the empirical results cannot distinguish genuine personalization from standard attention.
- [§4.2] §4.2 (Experimental Results): the reported improvements on the MSN dataset are presented without a per-user history-length stratification or a breakdown of performance for users with fewer than k clicks; given that the majority of users in typical news logs have short histories, this omission leaves the weakest assumption (that user-ID embeddings reliably encode individual differences) untested and therefore weakens support for the personalization claim.
minor comments (2)
- [§3.3] Notation for the user-ID embedding and the attention query generation (Eq. (7) and surrounding text) is introduced without an explicit statement of the embedding dimension or initialization, making reproduction harder.
- [§4.1] The abstract states that 'extensive experiments' validate effectiveness, but the main text should include the exact train/validation/test split sizes and the number of unique users to allow readers to assess data sparsity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The points raised highlight opportunities to strengthen the empirical support for the personalized attention mechanism, and we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3.3] §3.3 (Personalized Attention Network): the claim that the user-ID embedding generates a query vector that produces user-specific attention weights is load-bearing for the central contribution, yet the manuscript provides no ablation that replaces the user-ID query with a fixed or randomly initialized query while keeping all other components identical; without this comparison the empirical results cannot distinguish genuine personalization from standard attention.
Authors: We agree that an explicit ablation isolating the user-ID embedding's role in query generation would more clearly separate the contribution of personalization from standard attention. In the revised manuscript we will add this comparison by training a variant that replaces the user-ID-derived query with a fixed (non-personalized) or randomly initialized query vector while holding all other architectural components constant. revision: yes
-
Referee: [§4.2] §4.2 (Experimental Results): the reported improvements on the MSN dataset are presented without a per-user history-length stratification or a breakdown of performance for users with fewer than k clicks; given that the majority of users in typical news logs have short histories, this omission leaves the weakest assumption (that user-ID embeddings reliably encode individual differences) untested and therefore weakens support for the personalization claim.
Authors: We acknowledge that stratifying results by history length is necessary to evaluate whether user-ID embeddings remain effective for users with short click sequences. We will add this analysis to the experimental section, reporting performance as a function of the number of historical clicks per user. revision: yes
Circularity Check
No circularity: architecture explicitly proposed and empirically validated on held-out data
full rationale
The paper defines an explicit neural architecture (CNN news encoder + word/news-level attention modulated by learned user-ID embeddings) and reports its performance after training on a real-world click dataset. No equations or claims reduce a 'prediction' or 'first-principles result' to the model's own fitted parameters by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing justification. The central contribution is the model design itself, which is evaluated externally rather than derived from prior fitted quantities within the paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- user ID embedding dimension and attention parameters
axioms (2)
- domain assumption CNN layers can extract useful hidden representations from news titles
- domain assumption User representations can be aggregated from clicked news articles
invented entities (1)
-
personalized attention network
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu, and Xing Xie. 2019. Neural News Recommendation with Long- and Short-term User Representations. In ACL
work page 2019
-
[2]
Trapit Bansal, Mrinal Das, and Chiranjib Bhattacharyya. 2015. Content driven user profiling for comment-worthy recommendations of news and blog articles. In RecSys. ACM, 195–202
work page 2015
-
[3]
Michel Capelle, Flavius Frasincar, Marnix Moerland, and Frederik Hogenboom
-
[4]
Semantics-based news recommendation. In WIMS. ACM, 27
-
[5]
Chong Chen, Min Zhang, Yiqun Liu, and Shaoping Ma. 2018. Neural attentional rating regression with review-level explanations. In WWW. 1583–1592
work page 2018
-
[6]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al
-
[7]
Wide & deep learning for recommender systems. In DLRS. ACM, 7–10
-
[8]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. ACM, 191–198
work page 2016
-
[9]
Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In WWW. ACM, 271–280
work page 2007
-
[10]
Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 2012. From chatter to headlines: harnessing the real-time web for personalized news recommendation. In WSDM. ACM, 153–162
work page 2012
-
[11]
Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In WWW. 278–288
work page 2015
-
[12]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In AISTATS. 315–323
work page 2011
-
[13]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In AAAI. AAAI Press, 1725–1731
work page 2017
-
[14]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173–182
work page 2017
-
[15]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM. ACM, 2333–2338
work page 2013
-
[16]
Wouter IJntema, Frank Goossen, Flavius Frasincar, and Frederik Hogenboom
-
[17]
In Proceedings of the 2010 EDBT/ICDT Workshops
Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops. ACM, 16
work page 2010
-
[18]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. 1746–1751
work page 2014
-
[19]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[20]
Joseph A Konstan, Bradley N Miller, David Maltz, Jonathan L Herlocker, Lee R Gordon, and John Riedl. 1997. GroupLens: applying collaborative filtering to Usenet news. Commun. ACM 40, 3 (1997), 77–87
work page 1997
-
[21]
Talia Lavie, Michal Sela, Ilit Oppenheim, Ohad Inbar, and Joachim Meyer. 2010. User attitudes towards news content personalization. International journal of human-computer studies 68, 8 (2010), 483–495
work page 2010
-
[22]
Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system. In SIGIR. ACM, 125–134
work page 2011
-
[23]
Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. 2018. Towards Better Representation Learning for Personalized News Recommendation: a Multi- Channel Deep Fusion Approach. In IJCAI. 3805–3811
work page 2018
-
[24]
Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In IUI. ACM, 31–40
work page 2010
-
[25]
Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. 2015. Content-Based Collaborative Filtering for News Topic Recommendation. InAAAI. 217–223
work page 2015
-
[26]
Yuanhua Lv, Taesup Moon, Pranam Kolari, Zhaohui Zheng, Xuanhui Wang, and Yi Chang. 2011. Learning to model relatedness for news recommendation. In WWW. ACM, 57–66
work page 2011
-
[27]
Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. In KDD. ACM, 1933–1942
work page 2017
-
[28]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543
work page 2014
-
[29]
Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. In ECIR. Springer, 448–459
work page 2011
-
[30]
Owen Phelan, Kevin McCarthy, and Barry Smyth. 2009. Using twitter to recom- mend real-time topical news. In RecSys. ACM, 385–388
work page 2009
-
[31]
Steffen Rendle. 2012. Factorization machines with libfm. TIST 3, 3 (2012), 57
work page 2012
-
[32]
Jeong-Woo Son, A Kim, Seong-Bae Park, et al . 2013. A location-based news article recommendation with explicit localized semantic analysis. In SIGIR. ACM, 293–302
work page 2013
-
[33]
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1 (2014), 1929–1958
work page 2014
-
[34]
Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In KDD. ACM, 1235–1244
work page 2015
-
[35]
Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep Knowledge-Aware Network for News Recommendation. In WWW. 1835–1844
work page 2018
-
[36]
Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. InKDD. ACM, 2051–2059
work page 2017
-
[37]
Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural News Recommendation with Attentive Multi-View Learning. In IJCAI
work page 2019
-
[38]
Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural News Recommendation with Topic-Aware News Representation. In ACL, short paper
work page 2019
-
[39]
Chuhan Wu, Fangzhao Wu, Junxin Liu, Shaojian He, Yongfeng Huang, and Xing Xie. 2019. Neural Demographic Prediction using Search Query. In WSDM. ACM, 654–662
work page 2019
-
[40]
Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen
- [41]
-
[42]
Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. Deepintent: Learning attentions for online advertising with recurrent neural networks. In KDD. ACM, 1295–1304
work page 2016
-
[43]
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In WWW. 167–176
work page 2018
-
[44]
Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation. In WSDM. ACM, 425–434
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.