pith. sign in

arxiv: 1907.05559 · v1 · pith:O4NW4XHCnew · submitted 2019-07-12 · 💻 cs.IR · cs.CL

NPA: Neural News Recommendation with Personalized Attention

Pith reviewed 2026-05-24 22:40 UTC · model grok-4.3

classification 💻 cs.IR cs.CL
keywords news recommendationpersonalized attentionneural networksattention mechanismuser modelingCNNuser representation
0
0 comments X

The pith

A neural news recommendation model personalizes attention at word and news levels by generating query vectors from user ID embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes NPA, a model that builds news representations from article titles using CNN and user representations from clicked news, then applies attention mechanisms at both word and news levels. The central addition is a personalized attention network that takes a user ID embedding to create the query vector guiding those attentions. A sympathetic reader would care because users differ in which aspects of the same article matter, so standard attention may miss individual patterns and lead to less relevant recommendations. The approach is tested on a real-world MSN news dataset to show improved recommendation results.

Core claim

We propose a neural news recommendation model with personalized attention (NPA). The core of our approach is a news representation model and a user representation model. In the news representation model we use a CNN network to learn hidden representations of news articles based on their titles. In the user representation model we learn the representations of users based on the representations of their clicked news articles. Since different words and different news articles may have different informativeness for representing news and users, we propose to apply both word- and news-level attention mechanism to help our model attend to important words and news articles. In addition, the sameニュース

What carries the argument

personalized attention network which exploits the embedding of user ID to generate the query vector for the word- and news-level attentions

If this is right

  • The same news article receives different attention weights for different users based on their ID.
  • User representations improve by weighting clicked articles according to personal informativeness.
  • Word-level attention within titles also varies per user rather than applying a single global pattern.
  • Overall news recommendation accuracy increases on real-world datasets when personalization is added to attention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique might extend to other content domains such as product or video recommendation where user-specific attention on history items would be useful.
  • Direct use of user ID embeddings could simplify deployment but might require careful handling of new users whose embeddings are not yet learned.
  • Combining the personalized query generation with additional user features beyond ID could further refine attention without changing the core architecture.

Load-bearing premise

User ID embeddings can effectively capture and modulate individual differences in which words and past articles are informative for representing users.

What would settle it

An experiment on the MSN news dataset in which removing the user ID embedding from query vector generation produces no drop in recommendation metrics compared with the full model.

Figures

Figures reproduced from arXiv: 1907.05559 by Chuhan Wu, Fangzhao Wu, Jianqiang Huang, Mingxiao An, Xing Xie, Yongfeng Huang.

Figure 1
Figure 1. Figure 1: An illustrative example of two users and their [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The framework of our NPA approach for news recommendation. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The architecture of the personalized attention [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The performance of our [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: The influence of negative sampling on the perfor [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: The effectiveness of the word-level and news-level [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of the attention weights from the word- and news-level personalized attention network. The users and [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
read the original abstract

News recommendation is very important to help users find interested news and alleviate information overload. Different users usually have different interests and the same user may have various interests. Thus, different users may click the same news article with attention on different aspects. In this paper, we propose a neural news recommendation model with personalized attention (NPA). The core of our approach is a news representation model and a user representation model. In the news representation model we use a CNN network to learn hidden representations of news articles based on their titles. In the user representation model we learn the representations of users based on the representations of their clicked news articles. Since different words and different news articles may have different informativeness for representing news and users, we propose to apply both word- and news-level attention mechanism to help our model attend to important words and news articles. In addition, the same news article and the same word may have different informativeness for different users. Thus, we propose a personalized attention network which exploits the embedding of user ID to generate the query vector for the word- and news-level attentions. Extensive experiments are conducted on a real-world news recommendation dataset collected from MSN news, and the results validate the effectiveness of our approach on news recommendation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NPA, a neural news recommendation model whose core consists of a CNN-based news representation module operating on article titles and a user representation module built from sequences of clicked news. Word-level and news-level attention are applied to weight informative elements, and a personalized attention network is introduced that conditions both attention mechanisms on a learned user-ID embedding used to produce the query vector. Experiments on a real-world MSN news click dataset are reported to validate the approach's effectiveness for news recommendation.

Significance. If the reported gains hold after isolating the contribution of the user-ID-conditioned attention, the work would provide a concrete, easily implemented mechanism for injecting user-specific modulation into attention-based user modeling, which is relevant to the broader literature on personalized content recommendation. The architecture itself is modular and the use of a single user-ID embedding to drive both word- and news-level queries is a compact design choice.

major comments (2)
  1. [§3.3] §3.3 (Personalized Attention Network): the claim that the user-ID embedding generates a query vector that produces user-specific attention weights is load-bearing for the central contribution, yet the manuscript provides no ablation that replaces the user-ID query with a fixed or randomly initialized query while keeping all other components identical; without this comparison the empirical results cannot distinguish genuine personalization from standard attention.
  2. [§4.2] §4.2 (Experimental Results): the reported improvements on the MSN dataset are presented without a per-user history-length stratification or a breakdown of performance for users with fewer than k clicks; given that the majority of users in typical news logs have short histories, this omission leaves the weakest assumption (that user-ID embeddings reliably encode individual differences) untested and therefore weakens support for the personalization claim.
minor comments (2)
  1. [§3.3] Notation for the user-ID embedding and the attention query generation (Eq. (7) and surrounding text) is introduced without an explicit statement of the embedding dimension or initialization, making reproduction harder.
  2. [§4.1] The abstract states that 'extensive experiments' validate effectiveness, but the main text should include the exact train/validation/test split sizes and the number of unique users to allow readers to assess data sparsity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The points raised highlight opportunities to strengthen the empirical support for the personalized attention mechanism, and we will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [§3.3] §3.3 (Personalized Attention Network): the claim that the user-ID embedding generates a query vector that produces user-specific attention weights is load-bearing for the central contribution, yet the manuscript provides no ablation that replaces the user-ID query with a fixed or randomly initialized query while keeping all other components identical; without this comparison the empirical results cannot distinguish genuine personalization from standard attention.

    Authors: We agree that an explicit ablation isolating the user-ID embedding's role in query generation would more clearly separate the contribution of personalization from standard attention. In the revised manuscript we will add this comparison by training a variant that replaces the user-ID-derived query with a fixed (non-personalized) or randomly initialized query vector while holding all other architectural components constant. revision: yes

  2. Referee: [§4.2] §4.2 (Experimental Results): the reported improvements on the MSN dataset are presented without a per-user history-length stratification or a breakdown of performance for users with fewer than k clicks; given that the majority of users in typical news logs have short histories, this omission leaves the weakest assumption (that user-ID embeddings reliably encode individual differences) untested and therefore weakens support for the personalization claim.

    Authors: We acknowledge that stratifying results by history length is necessary to evaluate whether user-ID embeddings remain effective for users with short click sequences. We will add this analysis to the experimental section, reporting performance as a function of the number of historical clicks per user. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture explicitly proposed and empirically validated on held-out data

full rationale

The paper defines an explicit neural architecture (CNN news encoder + word/news-level attention modulated by learned user-ID embeddings) and reports its performance after training on a real-world click dataset. No equations or claims reduce a 'prediction' or 'first-principles result' to the model's own fitted parameters by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing justification. The central contribution is the model design itself, which is evaluated externally rather than derived from prior fitted quantities within the paper.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The model rests on standard neural assumptions plus the novel personalized attention component; effectiveness is asserted via one-dataset empirical validation.

free parameters (1)
  • user ID embedding dimension and attention parameters
    Learned or chosen hyperparameters that control how user identity modulates attention weights.
axioms (2)
  • domain assumption CNN layers can extract useful hidden representations from news titles
    Invoked in the news representation model section of the abstract.
  • domain assumption User representations can be aggregated from clicked news articles
    Core premise of the user representation model.
invented entities (1)
  • personalized attention network no independent evidence
    purpose: Generates user-specific query vectors from user ID embeddings for word- and news-level attention
    New component introduced to make attention user-dependent; no independent evidence outside the model itself.

pith-pipeline@v0.9.0 · 5755 in / 1330 out tokens · 50549 ms · 2026-05-24T22:40:51.562679+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

  1. [1]

    Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu, and Xing Xie. 2019. Neural News Recommendation with Long- and Short-term User Representations. In ACL

  2. [2]

    Trapit Bansal, Mrinal Das, and Chiranjib Bhattacharyya. 2015. Content driven user profiling for comment-worthy recommendations of news and blog articles. In RecSys. ACM, 195–202

  3. [3]

    Michel Capelle, Flavius Frasincar, Marnix Moerland, and Frederik Hogenboom

  4. [4]

    Semantics-based news recommendation. In WIMS. ACM, 27

  5. [5]

    Chong Chen, Min Zhang, Yiqun Liu, and Shaoping Ma. 2018. Neural attentional rating regression with review-level explanations. In WWW. 1583–1592

  6. [6]

    Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al

  7. [7]

    Wide & deep learning for recommender systems. In DLRS. ACM, 7–10

  8. [8]

    Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. ACM, 191–198

  9. [9]

    Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. 2007. Google news personalization: scalable online collaborative filtering. In WWW. ACM, 271–280

  10. [10]

    Gianmarco De Francisci Morales, Aristides Gionis, and Claudio Lucchese. 2012. From chatter to headlines: harnessing the real-time web for personalized news recommendation. In WSDM. ACM, 153–162

  11. [11]

    Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In WWW. 278–288

  12. [12]

    Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In AISTATS. 315–323

  13. [13]

    Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In AAAI. AAAI Press, 1725–1731

  14. [14]

    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW. 173–182

  15. [15]

    Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM. ACM, 2333–2338

  16. [16]

    Wouter IJntema, Frank Goossen, Flavius Frasincar, and Frederik Hogenboom

  17. [17]

    In Proceedings of the 2010 EDBT/ICDT Workshops

    Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops. ACM, 16

  18. [18]

    Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. 1746–1751

  19. [19]

    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980 (2014)

  20. [20]

    Joseph A Konstan, Bradley N Miller, David Maltz, Jonathan L Herlocker, Lee R Gordon, and John Riedl. 1997. GroupLens: applying collaborative filtering to Usenet news. Commun. ACM 40, 3 (1997), 77–87

  21. [21]

    Talia Lavie, Michal Sela, Ilit Oppenheim, Ohad Inbar, and Joachim Meyer. 2010. User attitudes towards news content personalization. International journal of human-computer studies 68, 8 (2010), 483–495

  22. [22]

    Lei Li, Dingding Wang, Tao Li, Daniel Knox, and Balaji Padmanabhan. 2011. SCENE: a scalable two-stage personalized news recommendation system. In SIGIR. ACM, 125–134

  23. [23]

    Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. 2018. Towards Better Representation Learning for Personalized News Recommendation: a Multi- Channel Deep Fusion Approach. In IJCAI. 3805–3811

  24. [24]

    Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In IUI. ACM, 31–40

  25. [25]

    Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. 2015. Content-Based Collaborative Filtering for News Topic Recommendation. InAAAI. 217–223

  26. [26]

    Yuanhua Lv, Taesup Moon, Pranam Kolari, Zhaohui Zheng, Xuanhui Wang, and Yi Chang. 2011. Learning to model relatedness for news recommendation. In WWW. ACM, 57–66

  27. [27]

    Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. In KDD. ACM, 1933–1942

  28. [28]

    Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543

  29. [29]

    Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. 2011. Terms of a feather: Content-based news recommendation and discovery using twitter. In ECIR. Springer, 448–459

  30. [30]

    Owen Phelan, Kevin McCarthy, and Barry Smyth. 2009. Using twitter to recom- mend real-time topical news. In RecSys. ACM, 385–388

  31. [31]

    Steffen Rendle. 2012. Factorization machines with libfm. TIST 3, 3 (2012), 57

  32. [32]

    Jeong-Woo Son, A Kim, Seong-Bae Park, et al . 2013. A location-based news article recommendation with explicit localized semantic analysis. In SIGIR. ACM, 293–302

  33. [33]

    Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1 (2014), 1929–1958

  34. [34]

    Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In KDD. ACM, 1235–1244

  35. [35]

    Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep Knowledge-Aware Network for News Recommendation. In WWW. 1835–1844

  36. [36]

    Xuejian Wang, Lantao Yu, Kan Ren, Guanyu Tao, Weinan Zhang, Yong Yu, and Jun Wang. 2017. Dynamic attention deep model for article recommendation by learning human editors’ demonstration. InKDD. ACM, 2051–2059

  37. [37]

    Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural News Recommendation with Attentive Multi-View Learning. In IJCAI

  38. [38]

    Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural News Recommendation with Topic-Aware News Representation. In ACL, short paper

  39. [39]

    Chuhan Wu, Fangzhao Wu, Junxin Liu, Shaojian He, Yongfeng Huang, and Xing Xie. 2019. Neural Demographic Prediction using Search Query. In WSDM. ACM, 654–662

  40. [40]

    Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen

  41. [41]

    In IJCAI

    Deep Matrix Factorization Models for Recommender Systems. In IJCAI. 3203–3209

  42. [42]

    Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. Deepintent: Learning attentions for online advertising with recurrent neural networks. In KDD. ACM, 1295–1304

  43. [43]

    Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In WWW. 167–176

  44. [44]

    Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users and items using reviews for recommendation. In WSDM. ACM, 425–434