Neural News Recommendation with Attentive Multi-View Learning

Chuhan Wu; Fangzhao Wu; Jianqiang Huang; Mingxiao An; Xing Xie; Yongfeng Huang

arxiv: 1907.05576 · v1 · pith:3FILI7AInew · submitted 2019-07-12 · 💻 cs.CL · cs.IR· cs.LG

Neural News Recommendation with Attentive Multi-View Learning

Chuhan Wu , Fangzhao Wu , Mingxiao An , Jianqiang Huang , Yongfeng Huang , Xing Xie This is my paper

Pith reviewed 2026-05-24 22:48 UTC · model grok-4.3

classification 💻 cs.CL cs.IRcs.LG

keywords news recommendationmulti-view learningattention mechanismneural networksuser modelingpersonalized recommendation

0 comments

The pith

A neural model learns unified news representations from titles, bodies and categories using word and view attention to improve recommendations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a neural news recommendation system that builds representations of news and users by drawing on several kinds of information about each article. Its news encoder treats the title, body text and topic category as distinct views of the same story and applies attention at the word level inside each view and at the view level across them to form a single informative vector. The user encoder then uses attention over a user's history of read articles to create a user profile. Experiments on a real-world dataset indicate that the resulting recommendations perform better than prior methods. A reader would care because more accurate suggestions can help users locate relevant stories amid large daily news volumes.

Core claim

The paper claims that an attentive multi-view learning model in the news encoder can produce unified representations from titles, bodies and topic categories while word-level and view-level attention select salient details, and that an attention-based user encoder operating on browsed news yields informative user representations, with the combined system improving news recommendation performance on a real-world dataset.

What carries the argument

The attentive multi-view learning model in the news encoder that treats titles, bodies and topic categories as different views and applies word-level and view-level attention.

If this is right

Important words inside each view are emphasized during representation learning.
The contribution of each view (title, body, category) is weighted according to its usefulness.
User profiles focus on the most relevant articles from a user's reading history.
The overall system produces higher recommendation performance than single-view approaches on the tested data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same view-fusion pattern could be tested on other recommendation settings that combine textual metadata with longer content.
Attention scores might reveal which news attributes most influence different user groups.
Re-running the experiments on news data from additional platforms would test whether the gains depend on the original dataset's characteristics.

Load-bearing premise

Performance gains are produced by the multi-view attention design rather than by dataset-specific tuning or unstated differences in the baselines.

What would settle it

An ablation study on the same dataset that removes the view-level and word-level attention components and finds no drop in recommendation metrics would falsify the claim that the design drives the gains.

Figures

Figures reproduced from arXiv: 1907.05576 by Chuhan Wu, Fangzhao Wu, Jianqiang Huang, Mingxiao An, Xing Xie, Yongfeng Huang.

**Figure 2.** Figure 2: The framework of our NAML approach for news recommendation. resentation of the i-th word as c t i , which is calculated by: c t i = ReLU(Ft × e t (i−K):(i+K) + bt), (1) where e t (i−K):(i+K) is the concatenation of word embeddings from position (i − K) to (i + K). Ft ∈ RNf ×(2K+1)D and bt ∈ RNf are the kernel and bias parameters of the CNN filters, Nf is the number of CNN filters and 2K + 1 is their window… view at source ↗

**Figure 3.** Figure 3: The effectiveness of the multi-view learning framework [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Visualization of the word- and news-level attention [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Personalized news recommendation is very important for online news platforms to help users find interested news and improve user experience. News and user representation learning is critical for news recommendation. Existing news recommendation methods usually learn these representations based on single news information, e.g., title, which may be insufficient. In this paper we propose a neural news recommendation approach which can learn informative representations of users and news by exploiting different kinds of news information. The core of our approach is a news encoder and a user encoder. In the news encoder we propose an attentive multi-view learning model to learn unified news representations from titles, bodies and topic categories by regarding them as different views of news. In addition, we apply both word-level and view-level attention mechanism to news encoder to select important words and views for learning informative news representations. In the user encoder we learn the representations of users based on their browsed news and apply attention mechanism to select informative news for user representation learning. Extensive experiments on a real-world dataset show our approach can effectively improve the performance of news recommendation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a standard incremental neural rec paper that adds word- and view-level attention over title/body/category inputs and reports gains on one dataset, but the experiments do not isolate whether the attention design drives the improvement.

read the letter

The core idea is to encode news from three views (title, body, category) with both word-level and view-level attention, then attend over a user's history to build a user vector. That combination is presented as new for this task. The approach is a logical next step once you accept that single-view title-only models are too limited and that attention can help weigh parts of the input. The user encoder follows the usual pattern but fits the setting cleanly. The paper does a decent job laying out why multi-view input matters for news and how the two attention layers operate inside the news encoder. That part reads as honest engineering rather than over-claiming. The main weakness is the evaluation. The abstract says extensive experiments on a real-world dataset show improvement, yet there is no sign of component ablations, no confirmation that baselines received the same three-view inputs, and no statistical significance numbers. If the baselines were title-only while the proposed model got body and category as well, the lift could come from the extra text rather than the attention machinery. One dataset also leaves open questions about whether the gains hold on other news corpora or user populations. The math is straightforward attention and pooling; nothing circular or invented. This paper is for practitioners who need a concrete multi-view news encoder they can re-implement or extend. It is not foundational, but the architecture is clear enough that a serious referee could usefully comment on the missing ablations and baseline details. I would send it to review rather than desk-reject.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a neural news recommendation model consisting of a news encoder that applies attentive multi-view learning to titles, bodies, and topic categories (with word-level and view-level attention) and a user encoder that aggregates representations of browsed news via attention. The central claim is that this architecture yields improved recommendation performance, as shown by extensive experiments on a real-world dataset.

Significance. If the reported gains are robust, statistically significant, and attributable to the multi-view attention components rather than richer inputs or tuning differences, the work would provide a practical advance in multi-source news representation learning for recommendation systems.

major comments (2)

[Experiments] Experiments section: no ablation studies or controlled re-implementations of baselines are described that would isolate the contribution of the word-level and view-level attention mechanisms from the simple effect of supplying title+body+category inputs to all models.
[Abstract] Abstract and Experiments section: the claim that the approach 'can effectively improve the performance of news recommendation' is presented without any reported metrics, baseline names, data-split details, or statistical significance tests, leaving the central empirical result unverifiable from the manuscript.

minor comments (1)

[News Encoder] The description of the view-level attention could benefit from an explicit equation showing how the view weights are computed and normalized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and outline the revisions we will make.

read point-by-point responses

Referee: [Experiments] Experiments section: no ablation studies or controlled re-implementations of baselines are described that would isolate the contribution of the word-level and view-level attention mechanisms from the simple effect of supplying title+body+category inputs to all models.

Authors: We acknowledge the lack of ablation studies in the current version. To isolate the contributions of the word-level and view-level attention, we will add ablation experiments in the revised manuscript. These will include model variants without each attention mechanism and controlled comparisons where all baselines receive the same title, body, and category inputs. This will help clarify that the performance gains are due to the attentive multi-view learning rather than input differences. revision: yes
Referee: [Abstract] Abstract and Experiments section: the claim that the approach 'can effectively improve the performance of news recommendation' is presented without any reported metrics, baseline names, data-split details, or statistical significance tests, leaving the central empirical result unverifiable from the manuscript.

Authors: While the abstract is a concise summary and does not typically include numerical results, we agree that including key metrics would make the claim more concrete. We will update the abstract to report specific performance improvements (such as AUC and MRR gains over baselines), name the main baselines, and reference the dataset split and significance testing. The experiments section already details the results, but we will ensure data-split information and statistical tests are explicitly stated or added if missing to enhance verifiability. revision: yes

Circularity Check

0 steps flagged

Empirical neural model with experimental results; no derivation chain present

full rationale

The paper describes a neural news recommendation architecture consisting of a news encoder (attentive multi-view learning over title/body/category with word- and view-level attention) and a user encoder (attention over browsed news). The central claim is an empirical performance improvement on one real-world dataset. No mathematical derivation, uniqueness theorem, or predictive equation is offered that could reduce to its own inputs by construction. No self-citation is invoked as load-bearing justification for any core premise. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit mathematical axioms or invented entities; the work relies on standard assumptions of deep learning (gradient-based optimization converges to useful minima, attention weights are interpretable) that are not derived in the paper.

pith-pipeline@v0.9.0 · 5723 in / 967 out tokens · 16782 ms · 2026-05-24T22:48:26.308760+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

[1]

Content driven user proﬁling for comment-worthy recommendations of news and blog ar- ticles

[Bansal et al., 2015] Trapit Bansal, Mrinal Das, and Chi- ranjib Bhattacharyya. Content driven user proﬁling for comment-worthy recommendations of news and blog ar- ticles. In RecSys., pages 195–202. ACM,

work page 2015
[2]

Wide & deep learning for recommender systems

[Cheng et al., 2016] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. Wide & deep learning for recommender systems. In DLRS, pages 7–10,

work page 2016
[3]

Google news personalization: scalable online collaborative ﬁltering

[Das et al., 2007] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news personalization: scalable online collaborative ﬁltering. In WWW, pages 271–280. ACM,

work page 2007
[4]

From chatter to headlines: harnessing the real-time web for personalized news recommendation

[De Francisci Morales et al., 2012] Gianmarco De Fran- cisci Morales, Aristides Gionis, and Claudio Lucchese. From chatter to headlines: harnessing the real-time web for personalized news recommendation. In WSDM, pages 153–162. ACM,

work page 2012
[5]

Deep sparse rectiﬁer neural networks

[Glorot et al., 2011] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectiﬁer neural networks. In AISTATS, pages 315–323,

work page 2011
[6]

Deepfm: a factorization-machine based neural network for ctr predic- tion

[Guo et al., 2017] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. Deepfm: a factorization-machine based neural network for ctr predic- tion. In AAAI, pages 1725–1731. AAAI Press,

work page 2017
[7]

Learning deep structured semantic models for web search using click- through data

[Huang et al., 2013] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using click- through data. In CIKM, pages 2333–2338. ACM,

work page 2013
[8]

Ontology-based news recommendation

[IJntema et al., 2010] Wouter IJntema, Frank Goossen, Flav- ius Frasincar, and Frederik Hogenboom. Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops, page

work page 2010
[9]

Weave& rec: A word embedding based 3-d convolutional network for news rec- ommendation

[Khattar et al., 2018] Dhruv Khattar, Vaibhav Kumar, Va- sudeva Varma, and Manish Gupta. Weave& rec: A word embedding based 3-d convolutional network for news rec- ommendation. In CIKM, pages 1855–1858. ACM,

work page 2018
[10]

Convolutional neural networks for sentence classiﬁcation

[Kim, 2014] Yoon Kim. Convolutional neural networks for sentence classiﬁcation. In EMNLP, pages 1746–1751,

work page 2014
[11]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Word semantics based 3-d convolutional neural networks for news recom- mendation

[Kumar et al., 2017] Vaibhav Kumar, Dhruv Khattar, Shashank Gupta, and Vasudeva Varma. Word semantics based 3-d convolutional neural networks for news recom- mendation. In 2017 IEEE International Conference on Data Mining Workshops, pages 761–764,

work page 2017
[13]

User attitudes to- wards news content personalization

[Lavie et al., 2010] Talia Lavie, Michal Sela, Ilit Oppen- heim, Ohad Inbar, and Joachim Meyer. User attitudes to- wards news content personalization. International journal of human-computer studies, 68(8):483–495,

work page 2010
[14]

Towards better representation learning for personalized news recommendation: a multi- channel deep fusion approach

[Lian et al., 2018] Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. Towards better representation learning for personalized news recommendation: a multi- channel deep fusion approach. In IJCAI, pages 3805– 3811,

work page 2018
[15]

Personalized news recommendation based on click behavior

[Liu et al., 2010] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. Personalized news recommendation based on click behavior. In IUI, pages 31–40. ACM,

work page 2010
[16]

Embedding-based news recommendation for millions of users

[Okura et al., 2017] Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. Embedding-based news recommendation for millions of users. In KDD, pages 1933–1942. ACM,

work page 2017
[17]

Glove: Global vectors for word representation

[Pennington et al., 2014] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543,

work page 2014
[18]

Terms of a feather: Content- based news recommendation and discovery using twitter

[Phelan et al., 2011] Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. Terms of a feather: Content- based news recommendation and discovery using twitter. In ECIR, pages 448–459. Springer,

work page 2011
[19]

Factorization machines with libfm

[Rendle, 2012] Steffen Rendle. Factorization machines with libfm. TIST, 3(3):57,

work page 2012
[20]

A location-based news article recommendation with explicit localized semantic analysis

[Son et al., 2013] Jeong-Woo Son, A Kim, Seong-Bae Park, et al. A location-based news article recommendation with explicit localized semantic analysis. In SIGIR, pages 293–

work page 2013
[21]

Dropout: a simple way to prevent neural networks from overﬁtting.JMLR, 15(1):1929–1958,

[Srivastava et al., 2014] Nitish Srivastava, Geoffrey E Hin- ton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting.JMLR, 15(1):1929–1958,

work page 2014
[22]

Dkn: Deep knowledge-aware net- work for news recommendation

[Wang et al., 2018] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. Dkn: Deep knowledge-aware net- work for news recommendation. In WWW, pages 1835– 1844,

work page 2018
[23]

Deepintent: Learning attentions for online advertising with recurrent neural net- works

[Zhai et al., 2016] Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. Deepintent: Learning attentions for online advertising with recurrent neural net- works. In KDD, pages 1295–1304. ACM,

work page 2016
[24]

Drn: A deep reinforcement learning frame- work for news recommendation

[Zheng et al., 2018] Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. Drn: A deep reinforcement learning frame- work for news recommendation. InWWW, pages 167–176, 2018

work page 2018

[1] [1]

Content driven user proﬁling for comment-worthy recommendations of news and blog ar- ticles

[Bansal et al., 2015] Trapit Bansal, Mrinal Das, and Chi- ranjib Bhattacharyya. Content driven user proﬁling for comment-worthy recommendations of news and blog ar- ticles. In RecSys., pages 195–202. ACM,

work page 2015

[2] [2]

Wide & deep learning for recommender systems

[Cheng et al., 2016] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. Wide & deep learning for recommender systems. In DLRS, pages 7–10,

work page 2016

[3] [3]

Google news personalization: scalable online collaborative ﬁltering

[Das et al., 2007] Abhinandan S Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news personalization: scalable online collaborative ﬁltering. In WWW, pages 271–280. ACM,

work page 2007

[4] [4]

From chatter to headlines: harnessing the real-time web for personalized news recommendation

[De Francisci Morales et al., 2012] Gianmarco De Fran- cisci Morales, Aristides Gionis, and Claudio Lucchese. From chatter to headlines: harnessing the real-time web for personalized news recommendation. In WSDM, pages 153–162. ACM,

work page 2012

[5] [5]

Deep sparse rectiﬁer neural networks

[Glorot et al., 2011] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectiﬁer neural networks. In AISTATS, pages 315–323,

work page 2011

[6] [6]

Deepfm: a factorization-machine based neural network for ctr predic- tion

[Guo et al., 2017] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. Deepfm: a factorization-machine based neural network for ctr predic- tion. In AAAI, pages 1725–1731. AAAI Press,

work page 2017

[7] [7]

Learning deep structured semantic models for web search using click- through data

[Huang et al., 2013] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using click- through data. In CIKM, pages 2333–2338. ACM,

work page 2013

[8] [8]

Ontology-based news recommendation

[IJntema et al., 2010] Wouter IJntema, Frank Goossen, Flav- ius Frasincar, and Frederik Hogenboom. Ontology-based news recommendation. In Proceedings of the 2010 EDBT/ICDT Workshops, page

work page 2010

[9] [9]

Weave& rec: A word embedding based 3-d convolutional network for news rec- ommendation

[Khattar et al., 2018] Dhruv Khattar, Vaibhav Kumar, Va- sudeva Varma, and Manish Gupta. Weave& rec: A word embedding based 3-d convolutional network for news rec- ommendation. In CIKM, pages 1855–1858. ACM,

work page 2018

[10] [10]

Convolutional neural networks for sentence classiﬁcation

[Kim, 2014] Yoon Kim. Convolutional neural networks for sentence classiﬁcation. In EMNLP, pages 1746–1751,

work page 2014

[11] [11]

Adam: A Method for Stochastic Optimization

[Kingma and Ba, 2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Word semantics based 3-d convolutional neural networks for news recom- mendation

[Kumar et al., 2017] Vaibhav Kumar, Dhruv Khattar, Shashank Gupta, and Vasudeva Varma. Word semantics based 3-d convolutional neural networks for news recom- mendation. In 2017 IEEE International Conference on Data Mining Workshops, pages 761–764,

work page 2017

[13] [13]

User attitudes to- wards news content personalization

[Lavie et al., 2010] Talia Lavie, Michal Sela, Ilit Oppen- heim, Ohad Inbar, and Joachim Meyer. User attitudes to- wards news content personalization. International journal of human-computer studies, 68(8):483–495,

work page 2010

[14] [14]

Towards better representation learning for personalized news recommendation: a multi- channel deep fusion approach

[Lian et al., 2018] Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. Towards better representation learning for personalized news recommendation: a multi- channel deep fusion approach. In IJCAI, pages 3805– 3811,

work page 2018

[15] [15]

Personalized news recommendation based on click behavior

[Liu et al., 2010] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. Personalized news recommendation based on click behavior. In IUI, pages 31–40. ACM,

work page 2010

[16] [16]

Embedding-based news recommendation for millions of users

[Okura et al., 2017] Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. Embedding-based news recommendation for millions of users. In KDD, pages 1933–1942. ACM,

work page 2017

[17] [17]

Glove: Global vectors for word representation

[Pennington et al., 2014] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In EMNLP, pages 1532–1543,

work page 2014

[18] [18]

Terms of a feather: Content- based news recommendation and discovery using twitter

[Phelan et al., 2011] Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth. Terms of a feather: Content- based news recommendation and discovery using twitter. In ECIR, pages 448–459. Springer,

work page 2011

[19] [19]

Factorization machines with libfm

[Rendle, 2012] Steffen Rendle. Factorization machines with libfm. TIST, 3(3):57,

work page 2012

[20] [20]

A location-based news article recommendation with explicit localized semantic analysis

[Son et al., 2013] Jeong-Woo Son, A Kim, Seong-Bae Park, et al. A location-based news article recommendation with explicit localized semantic analysis. In SIGIR, pages 293–

work page 2013

[21] [21]

Dropout: a simple way to prevent neural networks from overﬁtting.JMLR, 15(1):1929–1958,

[Srivastava et al., 2014] Nitish Srivastava, Geoffrey E Hin- ton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting.JMLR, 15(1):1929–1958,

work page 2014

[22] [22]

Dkn: Deep knowledge-aware net- work for news recommendation

[Wang et al., 2018] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. Dkn: Deep knowledge-aware net- work for news recommendation. In WWW, pages 1835– 1844,

work page 2018

[23] [23]

Deepintent: Learning attentions for online advertising with recurrent neural net- works

[Zhai et al., 2016] Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. Deepintent: Learning attentions for online advertising with recurrent neural net- works. In KDD, pages 1295–1304. ACM,

work page 2016

[24] [24]

Drn: A deep reinforcement learning frame- work for news recommendation

[Zheng et al., 2018] Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. Drn: A deep reinforcement learning frame- work for news recommendation. InWWW, pages 167–176, 2018

work page 2018