Joint Neural Collaborative Filtering for Recommender Systems
Pith reviewed 2026-05-25 01:05 UTC · model grok-4.3
The pith
Joint training lets deep feature learning and interaction modeling in neural recommenders refine each other end to end.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
J-NCF applies a joint neural network that couples deep feature learning and deep interaction modeling with a rating matrix. Deep feature learning extracts feature representations of users and items with a deep learning architecture based on a user-item rating matrix. Deep interaction modeling captures non-linear user-item interactions with a deep neural network using the feature representations generated by the deep feature learning process as input. J-NCF enables the deep feature learning and deep interaction modeling processes to optimize each other through joint training, which leads to improved recommendation performance. In addition, a new loss function takes both implicit and explicit,
What carries the argument
The joint neural network that couples deep feature learning on the rating matrix with deep interaction modeling that takes those features as input.
If this is right
- Recommendation accuracy rises on MovieLens 100K, MovieLens 1M, and Amazon Movies by the reported margins in HR@10 and NDCG@10.
- The model maintains competitive results when data is sparse or users have few ratings.
- The loss function allows the model to use both point-wise and pair-wise signals from implicit and explicit feedback at once.
- Scalability and sensitivity tests show the architecture remains practical across varying dataset sizes.
Where Pith is reading between the lines
- The same joint-training pattern could be tested on sequential or session-based recommendation tasks where feature and interaction signals interact strongly.
- End-to-end coupling may reduce the need for separate validation of feature quality before interaction modeling begins.
- If the joint objective proves stable, similar coupling could be applied to other paired deep networks in ranking or retrieval settings.
Load-bearing premise
Joint end-to-end training of the two networks will produce mutual gains instead of instability, one component dominating, or overfitting that erases the benefit.
What would settle it
A controlled comparison on the same datasets where separately trained feature extraction and interaction models reach equal or higher HR@10 and NDCG@10 than the jointly trained J-NCF.
Figures
read the original abstract
We propose a J-NCF method for recommender systems. The J-NCF model applies a joint neural network that couples deep feature learning and deep interaction modeling with a rating matrix. Deep feature learning extracts feature representations of users and items with a deep learning architecture based on a user-item rating matrix. Deep interaction modeling captures non-linear user-item interactions with a deep neural network using the feature representations generated by the deep feature learning process as input. J-NCF enables the deep feature learning and deep interaction modeling processes to optimize each other through joint training, which leads to improved recommendation performance. In addition, we design a new loss function for optimization, which takes both implicit and explicit feedback, point-wise and pair-wise loss into account. Experiments on several real-word datasets show significant improvements of J-NCF over state-of-the-art methods, with improvements of up to 8.24% on the MovieLens 100K dataset, 10.81% on the MovieLens 1M dataset, and 10.21% on the Amazon Movies dataset in terms of HR@10. NDCG@10 improvements are 12.42%, 14.24% and 15.06%, respectively. We also conduct experiments to evaluate the scalability and sensitivity of J-NCF. Our experiments show that the J-NCF model has a competitive recommendation performance with inactive users and different degrees of data sparsity when compared to state-of-the-art baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes J-NCF, which couples a deep feature learning network (extracting user/item representations from the rating matrix) with a deep interaction modeling network (capturing non-linear interactions from those features) via joint end-to-end training, plus a new loss combining implicit/explicit feedback and pointwise/pairwise terms. It claims this mutual optimization yields improved recommendation performance and reports gains of up to 8.24% HR@10 and 15.06% NDCG@10 over baselines on MovieLens 100K/1M and Amazon Movies, with additional tests on scalability and sparsity.
Significance. If the reported gains are attributable to the joint training mechanism rather than the new loss alone, the work would demonstrate a concrete benefit of end-to-end optimization between feature extraction and interaction modeling in neural recommenders, extending prior NCF approaches. The use of public datasets and standard top-K metrics (HR, NDCG) aids reproducibility, though the absence of isolating experiments weakens the evidential basis for the central mutual-optimization claim.
major comments (3)
- [Experiments] Experiments section: no ablation is reported that trains the deep feature learning and deep interaction modeling components separately, freezes the feature extractor during interaction training, or replaces the proposed multi-feedback loss with standard BPR/MSE while retaining the joint architecture; without these, the gains (e.g., 8.24% HR@10 on MovieLens 100K) cannot be attributed to the claimed mutual optimization rather than the loss function.
- [Experiments] Experiments section: baseline implementations, hyperparameter search ranges, random seeds, and statistical significance tests (e.g., paired t-tests or Wilcoxon on the reported HR@10/NDCG@10 deltas) are not described, preventing verification that the improvements over state-of-the-art methods are robust and due to the joint-training component.
- [Model description] Model description (joint training paragraph): the claim that 'J-NCF enables the deep feature learning and deep interaction modeling processes to optimize each other through joint training' is load-bearing for the contribution, yet the architecture description provides no analysis of gradient flow, potential dominance of one network, or instability that could negate the mutual benefit.
minor comments (2)
- [Abstract] Abstract: 'real-word datasets' should read 'real-world datasets'.
- [Experiments] The paper mentions sensitivity experiments with 'inactive users' and 'different degrees of data sparsity' but does not define the exact thresholds or user/activity bins used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important aspects for strengthening the attribution of gains to joint training and improving reproducibility. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Experiments] Experiments section: no ablation is reported that trains the deep feature learning and deep interaction modeling components separately, freezes the feature extractor during interaction training, or replaces the proposed multi-feedback loss with standard BPR/MSE while retaining the joint architecture; without these, the gains (e.g., 8.24% HR@10 on MovieLens 100K) cannot be attributed to the claimed mutual optimization rather than the loss function.
Authors: We agree that the absence of such ablations limits the ability to isolate the contribution of joint training from the new multi-feedback loss. In the revised manuscript we will add these ablation experiments, including separate training of the components, freezing the feature extractor, and replacing the loss with BPR/MSE while retaining the joint architecture, to better support the mutual-optimization claim. revision: yes
-
Referee: [Experiments] Experiments section: baseline implementations, hyperparameter search ranges, random seeds, and statistical significance tests (e.g., paired t-tests or Wilcoxon on the reported HR@10/NDCG@10 deltas) are not described, preventing verification that the improvements over state-of-the-art methods are robust and due to the joint-training component.
Authors: We acknowledge the need for these details to ensure reproducibility and robustness. The revised version will include explicit descriptions of baseline implementations, hyperparameter search ranges, random seeds, and statistical significance tests on the performance improvements. revision: yes
-
Referee: [Model description] Model description (joint training paragraph): the claim that 'J-NCF enables the deep feature learning and deep interaction modeling processes to optimize each other through joint training' is load-bearing for the contribution, yet the architecture description provides no analysis of gradient flow, potential dominance of one network, or instability that could negate the mutual benefit.
Authors: The end-to-end training allows gradients from the interaction modeling loss to update the feature learning network and vice versa. We will expand the model description section to discuss gradient flow through the coupled networks. A comprehensive empirical analysis of dominance or instability would require new experiments; we can provide a partial discussion based on observed training behavior but may not fully resolve all aspects within the current scope. revision: partial
Circularity Check
No circularity; empirical claims rest on external benchmarks
full rationale
The paper introduces a joint neural architecture for feature learning and interaction modeling plus a composite loss, then reports HR@10 and NDCG@10 gains on public datasets (MovieLens, Amazon) against published external baselines. No equations, self-citations, or derivation steps are shown that reduce the claimed mutual optimization or performance numbers to quantities fitted or defined inside the model itself. The architecture and loss are presented as design choices whose value is tested empirically rather than derived by construction from the inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- network depth and width
- loss weighting coefficients
axioms (1)
- domain assumption Deep neural networks can capture non-linear user-item interactions when given learned feature vectors as input.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
J-NCF enables the deep feature learning and deep interaction modeling processes to optimize each other through joint training... new loss function... both implicit and explicit feedback, point-wise and pair-wise loss
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We design a Joint Neural Collaborative Filtering model (J-NCF) for recommendation, which enables deep feature learning and deep user-item interaction modeling to be coupled tightly and jointly optimized in a single neural network.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
David Adedayo Adeniyi, Zhaoqiang Wei, and Yongquan Yang. 2016. Automated web usage data mining and recom- mendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics 12, 1 (2016), 90–108
work page 2016
-
[2]
Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on Knowledge and Data Engineering. 17, 6 (2005), 734–749
work page 2005
-
[3]
Betru Basiliyos, Tilahun, Onana Charles, Awono, and Batchakui Bernabe. 2017. Deep Learning Methods on Recom- mender System: A Survey of State-of-the-art. International Journal of Computer Applications 162, 10 (2017), 17–22
work page 2017
-
[4]
Alejandro Bellogin, Pablo Castells, and Ivan Cantador. 2011. Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparison. In RecSys ’11. ACM, 333–336
work page 2011
-
[5]
Fei Cai and Maarten de Rijke. 2016. Learning from homologous queries and semantically related terms for query auto completion. Information Processing & Management 52, 4 (2016), 628–643
work page 2016
-
[6]
Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. Foundations and Trends in Information Retrieval 10, 4 (2016), 273–363
work page 2016
-
[7]
Fei Cai, Shangsong Liang, and Maarten de Rijke. 2016. Prefix-Adaptive and Time-Sensitive Personalized Query Auto Completion. IEEE Transactions on Knowledge and Data Engineering 28, 9 (Sep 2016), 2452–2466
work page 2016
-
[8]
Fei Cai, Ridho Reinanda, and Maarten de Rijke. 2016. Diversifying Query Auto-Completion. ACM Transactions on Information Systems 34, 4 (June 2016), 25:1–25:33
work page 2016
-
[9]
Chatzis, Panayiotis Christodoulou, and Andreas S
Sotirios P. Chatzis, Panayiotis Christodoulou, and Andreas S. Andreou. 2017. Recurrent Latent Variable Networks for Session-Based Recommendation. In DLRS ’17. 38–45
work page 2017
-
[10]
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In SIGIR ’17. ACM, 335–344
work page 2017
-
[11]
Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2018. Attention-based Hierarchical Neural Query Suggestion. In SIGIR ’18. ACM, 1093–1096
work page 2018
-
[12]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS 2016. ACM, 7–10
work page 2016
-
[13]
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of Recommender Algorithms on Top-n Recommendation Tasks. In RecSys ’10. ACM, 39–46
work page 2010
-
[14]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-machine Based Neural Network for CTR Prediction. In IJCAI’17. AAAI Press, 1725–1731. ACM Transactions on Information Systems, Vol. 1, No. 1, Article . Publication date: July 2019. Joint Neural Collaborative Filtering for Recommender Systems † 29
work page 2017
-
[15]
Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse Predictive Analytics. In SIGIR ’17. ACM, 355–364
work page 2017
-
[16]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW ’17. ACM, 173–182
work page 2017
-
[17]
Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast Matrix Factorization for Online Recom- mendation with Implicit Feedback. In SIGIR ’16. ACM, 549–558
work page 2016
-
[18]
Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems 22, 1 (2004), 5–53
work page 2004
-
[19]
Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In CIKM ’18. ACM, 843–852
work page 2018
-
[20]
Balazs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In ICLR ’16
work page 2016
-
[21]
Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel Recurrent Neural Network Architectures for Feature-rich Session-based Recommendations. In RecSys ’16. ACM, 241–248
work page 2016
-
[22]
Xue Hong-Jian, Dai Xinyu, Zhang Jianbing, Huang Shujian, and Chen Jiajun. 2017. Deep Matrix Factorization Models for Recommender Systems. In IJCAI ’17. 3203–3209
work page 2017
-
[23]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM ’13. ACM, 2333–2338
work page 2013
-
[24]
Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored item similarity models for Top-N recommender systems. In KDD ’13. ACM, 659–667
work page 2013
-
[25]
Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In RecSys ’16. 233–240
work page 2016
-
[26]
Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[27]
Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In KDD ’08. ACM, 426–434
work page 2008
-
[28]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42, 8 (2009), 30–37
work page 2009
-
[29]
Sheng Li, Jaya Kawale, and Yun Fu. 2015. Deep Collaborative Filtering via Marginalized Denoising Auto-encoder. In CIKM ’15. ACM, 811–820
work page 2015
-
[30]
Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. 2017. CCCFNet: A Content-Boosted Collaborative Filtering Neural Network for Cross Domain Recommender Systems. In WWW ’17. ACM, 817–818
work page 2017
-
[31]
Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing 7, 1 (2003), 76–80
work page 2003
-
[32]
Juntao Liu and Caihua Wu. 2017. Deep Learning Based Recommendation: A Survey. In ICISA ’17. 451–458
work page 2017
-
[33]
Xiaomeng Liu, Yuanxin Ouyang, Wenge Rong, and Zhang Xiong. 2015. Item Category Aware Conditional Restricted Boltzmann Machine Based Recommendation. In ICONIP ’15. 609–616
work page 2015
-
[34]
Wallace, Maarten de Rijke, and Matthew Lease
Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, Md Mustafizur Rahman, Pinar Karagoz, Alex Braylan, Brandon Dang, Heng-Lu Chang, Henna Kim, Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, Tyler McDonnell, An Thanh Nguyen, Dan Xu, Byron C. Wallace, Maarten de Rijke, and Matthew Lease. 2018. Neural information retrieval: At the end of the...
work page 2018
-
[35]
Arkadiusz Paterek. 2007. Improving regularized singular value decomposition for collaborative filtering. In KDD ’07. ACM
work page 2007
-
[36]
Scalable Recommendation with Poisson Factorization
Gopalan Prem, Jake M. Hofman, and David M. Blei. 2013. Scalable Recommendation with Poisson Factorization. arXiv preprint arXiv:1311.1704 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[37]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI ’09. 452–461
work page 2009
-
[38]
Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. In NIPS’07. Curran Associates Inc., 1257–1264
work page 2007
-
[39]
Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltzmann Machines for Collaborative Filtering. In ICML ’07. 791–798
work page 2007
-
[40]
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommen- dation Algorithms. In WWW ’01. ACM, 285–295
work page 2001
-
[41]
Konstan, and John Thomas Riedl
Badrul Munir Sarwar, George Karypis, Joseph A. Konstan, and John Thomas Riedl. 2000. Application of Dimensionality Reduction in Recommender System–A Case Study. In ACM WebKDD Workshop. ACM
work page 2000
-
[42]
Suvash Sedhain, Aditya Menon, Scott Sanner, and Lexing Xie. 2015. AutoRec: Autoencoders Meet Collaborative Filtering. In WWW ’15. ACM, 111–112. ACM Transactions on Information Systems, Vol. 1, No. 1, Article . Publication date: July 2019. 30 Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke
work page 2015
-
[43]
Lei Shi, Wayne Xin Zhao, and Yi-Dong Shen. 2017. Local Representative-Based Matrix Factorization for Cold-Start Recommendation. ACM Transaction on Information Systems 36, 2 (Aug. 2017), 22:1–22:28
work page 2017
-
[44]
Xiaoyuan Su and Taghi M. Khoshgoftaar. 2009. A Survey of Collaborative Filtering Techniques. Advances in Artificial Intelligence 2009 (2009), Article 4
work page 2009
-
[45]
Bansal Trapit, Belanger David, and McCallum Andrew. 2016. Ask the GRU: Multi-task Learning for Deep Text Recommendations. In RecSys ’16. 107–114
work page 2016
-
[46]
Tran The Truyen, Dinh Q. Phung, and Svetha Venkatesh. 2009. Ordinal Boltzmann Machines for Collaborative Filtering. In UAI ’09. 548–556
work page 2009
-
[47]
Aaron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep Content-based Music Recommendation. In NIPS ’13. 2643–2651
work page 2013
-
[48]
Chong Wang and David M. Blei. 2011. Collaborative Topic Modeling for Recommending Scientific Articles. InKDD ’11. 448–456
work page 2011
-
[49]
Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In KDD ’15. ACM, 1235–1244
work page 2015
-
[50]
Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. 2017. What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation. In WWW ’17. 391–400
work page 2017
-
[51]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In SIGIR ’19. ACM
work page 2019
-
[52]
Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In WSDM ’16. ACM, 153–162
work page 2016
-
[53]
Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. 2016. Collaborative Knowledge Base Embedding for Recommender Systems. In KDD ’16. ACM, 353–362
work page 2016
-
[54]
Shuai Zhang, Lina Yao, and Aixin Sun. 2017. Deep Learning based Recommender System: A Survey and New Perspectives. arXiv preprint arXiv:1707.07435 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[55]
Lei Zheng, Vahid Noroozi, and Philip S. Yu. 2017. Joint Deep Modeling of Users and Items Using Reviews for Recommendation. In WSDM ’17. ACM, 425–434
work page 2017
-
[56]
Yin Zheng, Bangsheng Tang, Wenkui Ding, and Hanning Zhou. 2016. A Neural Autoregressive Approach to Collabora- tive Filtering. In ICML’16. 764–773. ACM Transactions on Information Systems, Vol. 1, No. 1, Article . Publication date: July 2019
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.