An Item Recommendation Approach by Fusing Images based on Neural Networks

Lin Li; Weibin Lin

arxiv: 1907.02203 · v1 · pith:NAW5SO4Hnew · submitted 2019-07-04 · 💻 cs.IR · cs.LG

An Item Recommendation Approach by Fusing Images based on Neural Networks

Weibin Lin , Lin Li This is my paper

Pith reviewed 2026-05-25 09:34 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords item recommendationneural collaborative filteringimage fusionconvolutional neural networkmatrix factorizationmulti-layer perceptronvisual featuresRMSE

0 comments

The pith

Incorporating visual features from images into a neural recommendation model improves prediction accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce MF-VMLP, a model that extracts visual features from item images using a pre-trained convolutional neural network and fuses them with user and item latent factors using a multi-layer perceptron. This fusion is combined with matrix factorization to make preference predictions. The approach aims to account for how an item's appearance influences user choices beyond ratings alone. Experiments on an Amazon dataset show that this method reduces root-mean-square error compared to models without visual information. If correct, it means recommendation systems can leverage image data to make more accurate suggestions for items where looks matter.

Core claim

The paper presents MF-VMLP, which obtains visual representations via a pre-trained CNN, uses an MLP to learn nonlinear interactions between latent vectors and visual vectors, and combines MF and MLP for collaborative filtering. Experiments on Amazon's public dataset using RMSE demonstrate that the model boosts recommendation performance.

What carries the argument

MF-VMLP model that fuses CNN-extracted visual features with matrix factorization and multi-layer perceptron for nonlinear combination.

If this is right

Visual characteristics of items can be used to predict user preferences.
The combination of MF and MLP achieves collaborative filtering that incorporates images.
The model shows improved performance on real-world data as measured by lower RMSE.
Item images provide additional information not captured by ratings or text alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For product categories like fashion or home goods, visual data may be particularly valuable for recommendations.
The method could be extended by training the CNN on domain-specific images rather than using a pre-trained general model.
Similar fusion techniques might apply to other data types such as video or audio for items.

Load-bearing premise

The visual features from the pre-trained CNN capture meaningful item characteristics that influence user preferences in a way that can be combined with latent factors.

What would settle it

If adding the visual features to the model on the Amazon dataset does not result in a lower RMSE value than the version without images.

Figures

Figures reproduced from arXiv: 1907.02203 by Lin Li, Weibin Lin.

**Figure 1.** Figure 1: VMLP model input of user and item into low-dimensional embedding. The same as VMF, we need to reduce the dimensional of original image features. To address this issue, we propose to add hidden layers on the concatenated vector. In order to integrate the latent vectors of items and image vector, we concatenate these vectors into item enhanced factor. However, to learn and predict users’ preferences for item… view at source ↗

**Figure 2.** Figure 2: MF-VMLP model VMLP based on NCF framework, so as to learn the user-item interactions better. There are two possible ways to solve the issue. Firstly, one of the easiest ways to work is to share the same input and embedding layers between them, and then combine the outputs of their interaction functions. However, the performance of the fused model might be limited by sharing embedding layers. Once sharing e… view at source ↗

**Figure 3.** Figure 3: Experimental comparison consider that the functional items are not much different in appearance, such as phones, which play a little role on the model. More importantly, neural networks have a large impact on the models. E. Conclusion And Future Work The recommendation system combined with deep learning has become a hot research topic at present. With the rapid development of deep learning, image informati… view at source ↗

read the original abstract

There are rich formats of information in the network, such as rating, text, image, and so on, which represent different aspects of user preferences. In the field of recommendation, how to use those data effectively has become a difficult subject. With the rapid development of neural network, researching on multi-modal method for recommendation has become one of the major directions. In the existing recommender systems, numerical rating, item description and review are main information to be considered by researchers. However, the characteristics of the item may affect the user's preferences, which are rarely used for recommendation models. In this work, we propose a novel model to incorporate visual factors into predictors of people's preferences, namely MF-VMLP, based on the recent developments of neural collaborative filtering (NCF). Firstly, we get visual presentation via a pre-trained convolutional neural network (CNN) model. To obtain the nonlinearities interaction of latent vectors and visual vectors, we propose to leverage a multi-layer perceptron (MLP) to learn. Moreover, the combination of MF and MLP has achieved collaborative filtering recommendation between users and items. Our experiments conduct Amazon's public dataset for experimental validation and root-mean-square error (RMSE) as evaluation metrics. To some extent, experimental result on a real-world data set demonstrates that our model can boost the recommendation performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Standard NCF plus pre-trained CNN visuals via MLP, but the abstract supplies no baselines or protocol details to check the performance claim.

read the letter

The main point is that this paper takes the existing NCF framework and adds item image features extracted by a pre-trained CNN, then fuses them through an MLP. That is the extent of the contribution. It follows the usual pattern of layering side information onto neural collaborative filtering without introducing new mechanisms or derivations. The architecture description itself is clear and follows directly from prior NCF work, which is a reasonable incremental step for anyone already working on multimodal recommendation. The motivation about item visuals affecting preferences is also stated plainly. The soft spot is the evaluation. The abstract reports an RMSE improvement on Amazon data but names no baselines such as plain MF or NCF, gives no dataset statistics or split details, and includes no ablation that isolates the visual component. Without those, the claim that the fusion boosts performance cannot be verified or attributed to the images rather than the MLP structure alone. The stress-test concern holds on the information provided. This kind of paper mainly interests specialists already tracking visual modality additions in collaborative filtering. A reader looking for a new method, reproducible evidence, or broader impact will not find it. I would not bring it to a reading group or cite it. It does not look ready for peer review because the central result rests on unverifiable experiments.

Referee Report

3 major / 2 minor

Summary. The paper proposes MF-VMLP, a neural collaborative filtering model extending NCF by extracting visual features from items via a pre-trained CNN and fusing them with user/item latent factors through an MLP to capture nonlinear interactions; it reports RMSE results on an Amazon public dataset and claims that incorporating these visual factors boosts recommendation performance.

Significance. If the performance gains are shown to hold under standard controls, the work would contribute evidence that pre-trained visual embeddings can be usefully combined with latent factors in multimodal recommendation, extending the NCF framework to image data.

major comments (3)

[§4] §4 (Experiments): The reported RMSE improvement is presented without any baseline comparisons (e.g., standard MF, NCF, or other multimodal models), dataset statistics, train/test split details, or negative sampling protocol, making it impossible to verify whether the claimed boost is attributable to visual fusion rather than architecture or evaluation choices.
[§4] §4 (Experiments): No ablation isolating the visual component (e.g., MF-VMLP vs. MF+MLP without images) is provided, so the central claim that visual features from the CNN meaningfully affect user preferences cannot be assessed.
[§3] §3 (Model): The exact dimensions of the visual vectors, MLP layer widths/depths, loss function, and optimization procedure for fusing visual and latent vectors are not specified, leaving the implementation of the claimed nonlinear interaction underspecified.

minor comments (2)

[Abstract] Abstract contains grammatical issues (e.g., 'get visual presentation via' should read 'obtain visual representations using'; 'conduct Amazon's public dataset' should read 'conduct experiments on Amazon's public dataset').
[§3] Notation for latent factors and visual vectors is introduced without consistent symbols or a clear diagram of the fusion architecture.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the experimental section and model description require additional details for reproducibility and to substantiate the claims. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (Experiments): The reported RMSE improvement is presented without any baseline comparisons (e.g., standard MF, NCF, or other multimodal models), dataset statistics, train/test split details, or negative sampling protocol, making it impossible to verify whether the claimed boost is attributable to visual fusion rather than architecture or evaluation choices.

Authors: We acknowledge that the current experimental reporting lacks these essential details. In the revised manuscript we will add baseline comparisons against standard MF, the original NCF, and at least one other multimodal model; report dataset statistics (users, items, ratings); specify the train/test split procedure; and describe the negative sampling protocol. These additions will allow readers to assess whether gains are due to visual fusion. revision: yes
Referee: [§4] §4 (Experiments): No ablation isolating the visual component (e.g., MF-VMLP vs. MF+MLP without images) is provided, so the central claim that visual features from the CNN meaningfully affect user preferences cannot be assessed.

Authors: We agree an ablation is required to isolate the visual component. We will add results comparing the full MF-VMLP model against an MF+MLP variant that omits the CNN-derived visual vectors, thereby demonstrating the contribution of the visual features. revision: yes
Referee: [§3] §3 (Model): The exact dimensions of the visual vectors, MLP layer widths/depths, loss function, and optimization procedure for fusing visual and latent vectors are not specified, leaving the implementation of the claimed nonlinear interaction underspecified.

Authors: We will expand Section 3 to specify the visual vector dimension produced by the pre-trained CNN, the exact widths and depths of the MLP layers, the loss function employed, and the optimization procedure used to fuse the visual and latent vectors. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model proposal with independent validation

full rationale

The paper proposes MF-VMLP by extracting visual features via pre-trained CNN then combining latent factors with MLP on top of matrix factorization. The performance claim rests on RMSE measured on Amazon dataset experiments. No equations, derivations, or predictions are presented that reduce to fitted inputs by construction. No self-citations, uniqueness theorems, or ansatzes appear in the provided text. The result is an empirical demonstration on external real-world data and is therefore self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that visual item features are predictive of preference and on standard neural-network training assumptions; no new entities are postulated.

free parameters (2)

latent factor dimension
Embedding size for users and items, chosen during model design
MLP layer widths and depths
Architecture parameters selected to model interactions between latent and visual vectors

axioms (1)

domain assumption Visual characteristics of items influence user preferences independently of numerical ratings
Invoked in the abstract as the reason for adding image data

pith-pipeline@v0.9.0 · 5758 in / 1202 out tokens · 47725 ms · 2026-05-25T09:34:01.772441+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MF estimates an interaction yui as the inner product of pu and qi... VMLP model... φL(zL−1)... MF-VMLP... φMF=pu⊙qi, φVMLP=... concatenate last hidden layers
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We get visual presentation via a pre-trained convolutional neural network (CNN) model... experiments on Amazon Women/Men/Phones datasets, RMSE metric

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

A neural collaborative ﬁltering model with interaction-based neighborhood

Ting Bai, Ji-Rong Wen, Jun Zhang, and Wayne Xin Zhao. A neural collaborative ﬁltering model with interaction-based neighborhood. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1979–1982. ACM, 2017

work page 2017
[2]

Topicmf: Simultaneously exploiting ratings and reviews for recommendation

Yang Bao, Hui Fang, and Jie Zhang. Topicmf: Simultaneously exploiting ratings and reviews for recommendation. In AAAI, volume 14, pages 2–8, 2014

work page 2014
[3]

A generic coordinate descent framework for learning from implicit feedback

Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. A generic coordinate descent framework for learning from implicit feedback. In Proceedings of the 26th International Conference on World Wide Web , pages 1341–1350. International World Wide Web Conferences Steering Committee, 2017

work page 2017
[4]

Latent cross: Making use of context in recurrent rec- ommender systems

Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. Latent cross: Making use of context in recurrent rec- ommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 46–54. ACM, 2018

work page 2018
[5]

Hybrid recommender systems: Survey and experiments

Robin Burke. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction , 12(4):331–370, 2002

work page 2002
[6]

Hybrid web recommender systems

Robin Burke. Hybrid web recommender systems. In The adaptive web , pages 377–408. Springer, 2007

work page 2007
[7]

Aˆ 3ncf: An adaptive aspect attention model for rating prediction

Zhiyong Cheng, Ying Ding, Xiangnan He, Lei Zhu, Xuemeng Song, and Mohan S Kankanhalli. Aˆ 3ncf: An adaptive aspect attention model for rating prediction. In IJCAI, pages 3748–3754, 2018

work page 2018
[8]

A uniﬁed approach to building hybrid recommender systems

Asela Gunawardana and Christopher Meek. A uniﬁed approach to building hybrid recommender systems. In Proceedings of the third ACM conference on Recommender systems , pages 117–124. ACM, 2009

work page 2009
[9]

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative ﬁltering

Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative ﬁltering. In proceedings of the 25th international conference on world wide web , pages 507–517. International World Wide Web Conferences Steering Committee, 2016

work page 2016
[10]

Vbpr: Visual bayesian personalized ranking from implicit feedback

Ruining He and Julian McAuley. Vbpr: Visual bayesian personalized ranking from implicit feedback. In AAAI, pages 144–150, 2016

work page 2016
[11]

Neural collaborative ﬁltering

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative ﬁltering. In Proceedings of the 26th International Conference on World Wide Web , pages 173–182. International World Wide Web Conferences Steering Committee, 2017

work page 2017
[12]

Fast matrix factorization for online recommendation with implicit feedback

Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval , pages 549–558. ACM, 2016

work page 2016
[13]

Caffe: Convolutional architecture for fast feature embedding

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia , pages 675–

work page
[14]

Matrix factorization techniques for recommender systems

Yehuda Koren, Robert Bell, and Chris V olinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009

work page 2009
[15]

Content-based collaborative ﬁltering for news topic recommendation

Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. Content-based collaborative ﬁltering for news topic recommendation. In AAAI, pages 217–223, 2015

work page 2015
[16]

Image-based recommendations on styles and substitutes

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 43–52. ACM, 2015

work page 2015
[17]

Content-based recommendation systems

Michael J Pazzani and Daniel Billsus. Content-based recommendation systems. In The adaptive web , pages 325–341. Springer, 2007

work page 2007
[18]

Combining heterogenous social and geographical information for event recommendation

Zhi Qiao, Peng Zhang, Yanan Cao, Chuan Zhou, Li Guo, and Binxing Fang. Combining heterogenous social and geographical information for event recommendation. In AAAI, volume 14, pages 145–151, 2014

work page 2014
[19]

Bpr: Bayesian personalized ranking from implicit feedback

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-ﬁfth conference on uncertainty in artiﬁcial intelligence , pages 452–461. AUAI Press, 2009

work page 2009
[20]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision , 115(3):211–252, 2015

work page 2015
[21]

Restricted boltzmann machines for collaborative ﬁltering

Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. Restricted boltzmann machines for collaborative ﬁltering. In Proceedings of the 24th international conference on Machine learning , pages 791–798. ACM, 2007

work page 2007
[22]

A survey of collaborative ﬁltering techniques

Xiaoyuan Su and Taghi M Khoshgoftaar. A survey of collaborative ﬁltering techniques. Advances in artiﬁcial intelligence , 2009, 2009

work page 2009
[23]

Rating-boosted latent topics: Understanding users and items with ratings and reviews

Yunzhi Tan, Min Zhang, Yiqun Liu, and Shaoping Ma. Rating-boosted latent topics: Understanding users and items with ratings and reviews. In IJCAI, pages 2640–2646, 2016

work page 2016
[24]

Effective multi- query expansions: Robust landmark retrieval

Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multi- query expansions: Robust landmark retrieval. In Proceedings of the 23rd ACM international conference on Multimedia, pages 79–88. ACM, 2015

work page 2015
[25]

Effective multi- query expansions: Collaborative deep networks for robust landmark retrieval

Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multi- query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Transactions on Image Processing , 26(3):1393–1404, 2017

work page 2017
[26]

Robust subspace clustering for multi-view data by ex- ploiting correlation consensus

Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. Robust subspace clustering for multi-view data by ex- ploiting correlation consensus. IEEE Transactions on Image Processing, 24(11):3939–3949, 2015

work page 2015
[27]

Multiview spectral clustering via structured low-rank matrix factorization

Yang Wang, Lin Wu, Xuemin Lin, and Junbin Gao. Multiview spectral clustering via structured low-rank matrix factorization. IEEE transac- tions on neural networks and learning systems , (99):1–11, 2018

work page 2018
[28]

Iterative Views Agreement: An Iterative Low-Rank based Structured Optimization Method to Multi-View Spectral Clustering

Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. arXiv preprint arXiv:1608.05560, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[29]

Deep attention-based spatially recursive networks for ﬁne-grained visual recognition

Lin Wu, Yang Wang, Xue Li, and Junbin Gao. Deep attention-based spatially recursive networks for ﬁne-grained visual recognition. IEEE transactions on cybernetics , (99):1–12, 2018

work page 2018
[30]

3-d personvlad: Learning deep global representations for video-based person reidenti- ﬁcation

Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 3-d personvlad: Learning deep global representations for video-based person reidenti- ﬁcation. IEEE transactions on neural networks and learning systems , 2019

work page 2019
[31]

Col- laborative denoising auto-encoders for top-n recommender systems

Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. Col- laborative denoising auto-encoders for top-n recommender systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining , pages 153–162. ACM, 2016

work page 2016
[32]

Collaborative multi-level embedding learning from reviews for rating prediction

Wei Zhang, Quan Yuan, Jiawei Han, and Jianyong Wang. Collaborative multi-level embedding learning from reviews for rating prediction. In IJCAI, pages 2986–2992, 2016

work page 2016
[33]

Joint deep modeling of users and items using reviews for recommendation

Lei Zheng, Vahid Noroozi, and Philip S Yu. Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 425–434. ACM, 2017

work page 2017

[1] [1]

A neural collaborative ﬁltering model with interaction-based neighborhood

Ting Bai, Ji-Rong Wen, Jun Zhang, and Wayne Xin Zhao. A neural collaborative ﬁltering model with interaction-based neighborhood. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1979–1982. ACM, 2017

work page 2017

[2] [2]

Topicmf: Simultaneously exploiting ratings and reviews for recommendation

Yang Bao, Hui Fang, and Jie Zhang. Topicmf: Simultaneously exploiting ratings and reviews for recommendation. In AAAI, volume 14, pages 2–8, 2014

work page 2014

[3] [3]

A generic coordinate descent framework for learning from implicit feedback

Immanuel Bayer, Xiangnan He, Bhargav Kanagal, and Steffen Rendle. A generic coordinate descent framework for learning from implicit feedback. In Proceedings of the 26th International Conference on World Wide Web , pages 1341–1350. International World Wide Web Conferences Steering Committee, 2017

work page 2017

[4] [4]

Latent cross: Making use of context in recurrent rec- ommender systems

Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. Latent cross: Making use of context in recurrent rec- ommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 46–54. ACM, 2018

work page 2018

[5] [5]

Hybrid recommender systems: Survey and experiments

Robin Burke. Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction , 12(4):331–370, 2002

work page 2002

[6] [6]

Hybrid web recommender systems

Robin Burke. Hybrid web recommender systems. In The adaptive web , pages 377–408. Springer, 2007

work page 2007

[7] [7]

Aˆ 3ncf: An adaptive aspect attention model for rating prediction

Zhiyong Cheng, Ying Ding, Xiangnan He, Lei Zhu, Xuemeng Song, and Mohan S Kankanhalli. Aˆ 3ncf: An adaptive aspect attention model for rating prediction. In IJCAI, pages 3748–3754, 2018

work page 2018

[8] [8]

A uniﬁed approach to building hybrid recommender systems

Asela Gunawardana and Christopher Meek. A uniﬁed approach to building hybrid recommender systems. In Proceedings of the third ACM conference on Recommender systems , pages 117–124. ACM, 2009

work page 2009

[9] [9]

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative ﬁltering

Ruining He and Julian McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative ﬁltering. In proceedings of the 25th international conference on world wide web , pages 507–517. International World Wide Web Conferences Steering Committee, 2016

work page 2016

[10] [10]

Vbpr: Visual bayesian personalized ranking from implicit feedback

Ruining He and Julian McAuley. Vbpr: Visual bayesian personalized ranking from implicit feedback. In AAAI, pages 144–150, 2016

work page 2016

[11] [11]

Neural collaborative ﬁltering

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative ﬁltering. In Proceedings of the 26th International Conference on World Wide Web , pages 173–182. International World Wide Web Conferences Steering Committee, 2017

work page 2017

[12] [12]

Fast matrix factorization for online recommendation with implicit feedback

Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. Fast matrix factorization for online recommendation with implicit feedback. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval , pages 549–558. ACM, 2016

work page 2016

[13] [13]

Caffe: Convolutional architecture for fast feature embedding

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia , pages 675–

work page

[14] [14]

Matrix factorization techniques for recommender systems

Yehuda Koren, Robert Bell, and Chris V olinsky. Matrix factorization techniques for recommender systems. Computer, (8):30–37, 2009

work page 2009

[15] [15]

Content-based collaborative ﬁltering for news topic recommendation

Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. Content-based collaborative ﬁltering for news topic recommendation. In AAAI, pages 217–223, 2015

work page 2015

[16] [16]

Image-based recommendations on styles and substitutes

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 43–52. ACM, 2015

work page 2015

[17] [17]

Content-based recommendation systems

Michael J Pazzani and Daniel Billsus. Content-based recommendation systems. In The adaptive web , pages 325–341. Springer, 2007

work page 2007

[18] [18]

Combining heterogenous social and geographical information for event recommendation

Zhi Qiao, Peng Zhang, Yanan Cao, Chuan Zhou, Li Guo, and Binxing Fang. Combining heterogenous social and geographical information for event recommendation. In AAAI, volume 14, pages 145–151, 2014

work page 2014

[19] [19]

Bpr: Bayesian personalized ranking from implicit feedback

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-ﬁfth conference on uncertainty in artiﬁcial intelligence , pages 452–461. AUAI Press, 2009

work page 2009

[20] [20]

Imagenet large scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision , 115(3):211–252, 2015

work page 2015

[21] [21]

Restricted boltzmann machines for collaborative ﬁltering

Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. Restricted boltzmann machines for collaborative ﬁltering. In Proceedings of the 24th international conference on Machine learning , pages 791–798. ACM, 2007

work page 2007

[22] [22]

A survey of collaborative ﬁltering techniques

Xiaoyuan Su and Taghi M Khoshgoftaar. A survey of collaborative ﬁltering techniques. Advances in artiﬁcial intelligence , 2009, 2009

work page 2009

[23] [23]

Rating-boosted latent topics: Understanding users and items with ratings and reviews

Yunzhi Tan, Min Zhang, Yiqun Liu, and Shaoping Ma. Rating-boosted latent topics: Understanding users and items with ratings and reviews. In IJCAI, pages 2640–2646, 2016

work page 2016

[24] [24]

Effective multi- query expansions: Robust landmark retrieval

Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multi- query expansions: Robust landmark retrieval. In Proceedings of the 23rd ACM international conference on Multimedia, pages 79–88. ACM, 2015

work page 2015

[25] [25]

Effective multi- query expansions: Collaborative deep networks for robust landmark retrieval

Yang Wang, Xuemin Lin, Lin Wu, and Wenjie Zhang. Effective multi- query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Transactions on Image Processing , 26(3):1393–1404, 2017

work page 2017

[26] [26]

Robust subspace clustering for multi-view data by ex- ploiting correlation consensus

Yang Wang, Xuemin Lin, Lin Wu, Wenjie Zhang, Qing Zhang, and Xiaodi Huang. Robust subspace clustering for multi-view data by ex- ploiting correlation consensus. IEEE Transactions on Image Processing, 24(11):3939–3949, 2015

work page 2015

[27] [27]

Multiview spectral clustering via structured low-rank matrix factorization

Yang Wang, Lin Wu, Xuemin Lin, and Junbin Gao. Multiview spectral clustering via structured low-rank matrix factorization. IEEE transac- tions on neural networks and learning systems , (99):1–11, 2018

work page 2018

[28] [28]

Iterative Views Agreement: An Iterative Low-Rank based Structured Optimization Method to Multi-View Spectral Clustering

Yang Wang, Wenjie Zhang, Lin Wu, Xuemin Lin, Meng Fang, and Shirui Pan. Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering. arXiv preprint arXiv:1608.05560, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[29] [29]

Deep attention-based spatially recursive networks for ﬁne-grained visual recognition

Lin Wu, Yang Wang, Xue Li, and Junbin Gao. Deep attention-based spatially recursive networks for ﬁne-grained visual recognition. IEEE transactions on cybernetics , (99):1–12, 2018

work page 2018

[30] [30]

3-d personvlad: Learning deep global representations for video-based person reidenti- ﬁcation

Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 3-d personvlad: Learning deep global representations for video-based person reidenti- ﬁcation. IEEE transactions on neural networks and learning systems , 2019

work page 2019

[31] [31]

Col- laborative denoising auto-encoders for top-n recommender systems

Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. Col- laborative denoising auto-encoders for top-n recommender systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining , pages 153–162. ACM, 2016

work page 2016

[32] [32]

Collaborative multi-level embedding learning from reviews for rating prediction

Wei Zhang, Quan Yuan, Jiawei Han, and Jianyong Wang. Collaborative multi-level embedding learning from reviews for rating prediction. In IJCAI, pages 2986–2992, 2016

work page 2016

[33] [33]

Joint deep modeling of users and items using reviews for recommendation

Lei Zheng, Vahid Noroozi, and Philip S Yu. Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 425–434. ACM, 2017

work page 2017