Click-Through Rate Prediction with the User Memory Network
Pith reviewed 2026-05-25 00:29 UTC · model grok-4.3
The pith
MA-DNN augments standard DNNs with two user memory vectors to incorporate historical behavior information for CTR prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By creating two external memory vectors for each user that memorize high-level abstractions of what the user possibly likes and dislikes, the MA-DNN model enables exploitation of useful information from users' historical behaviors while keeping the model as simple as a standard DNN.
What carries the argument
Two external memory vectors per user, one for positive preferences and one for negative, updated from historical impressions and clicks to capture user preference abstractions.
If this is right
- MA-DNN achieves improved prediction performance over plain DNNs.
- The model remains simple in both offline training and online prediction unlike RNNs.
- The memory component can be augmented to other models such as Wide&Deep.
- Both offline and online experiments support its effectiveness for practical CTR services.
Where Pith is reading between the lines
- This memory augmentation might allow real-time updating of user preferences in production systems without full sequence processing.
- The approach could extend to other sequential recommendation tasks where full RNN modeling is too costly.
- Memory vectors might be combined with attention mechanisms in future variants for better abstraction.
Load-bearing premise
That the two memory vectors can extract useful high-level user preference information from historical data without explicit modeling of temporal sequences.
What would settle it
A direct comparison on a public CTR dataset where MA-DNN shows no statistically significant AUC improvement over a baseline DNN with the same features.
Figures
read the original abstract
Click-through rate (CTR) prediction is a critical task in online advertising systems. Models like Deep Neural Networks (DNNs) are simple but stateless. They consider each target ad independently and cannot directly extract useful information contained in users' historical ad impressions and clicks. In contrast, models like Recurrent Neural Networks (RNNs) are stateful but complex. They model temporal dependency between users' sequential behaviors and can achieve improved prediction performance than DNNs. However, both the offline training and online prediction process of RNNs are much more complex and time-consuming. In this paper, we propose Memory Augmented DNN (MA-DNN) for practical CTR prediction services. In particular, we create two external memory vectors for each user, memorizing high-level abstractions of what a user possibly likes and dislikes. The proposed MA-DNN achieves a good compromise between DNN and RNN. It is as simple as DNN, but has certain ability to exploit useful information contained in users' historical behaviors as RNN. Both offline and online experiments demonstrate the effectiveness of MA-DNN for practical CTR prediction services. Actually, the memory component can be augmented to other models as well (e.g., the Wide&Deep model).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Memory Augmented DNN (MA-DNN) for CTR prediction. It augments standard DNNs with two external memory vectors per user that memorize high-level abstractions of likes and dislikes from historical ad impressions and clicks. The model is positioned as a practical compromise: as simple as DNNs for training and inference yet able to exploit sequential user behavior like RNNs. The memory component is also claimed to be augmentable to other architectures such as Wide&Deep. Effectiveness is asserted via offline and online experiments.
Significance. If the empirical results hold under proper controls, MA-DNN would supply an efficient, low-complexity route to incorporate user history in production CTR systems, where RNN overhead is often prohibitive. The extensibility claim to other base models is a secondary strength that could broaden impact.
major comments (2)
- [Abstract] Abstract: the central claim that 'both offline and online experiments demonstrate the effectiveness' is load-bearing yet unsupported by any mention of baselines, metrics (AUC, log-loss, etc.), statistical tests, or data-splitting rules. Without these, the practical-utility assertion cannot be evaluated.
- [Model] Model section (description of the two external memory vectors): the update rules and the precise mechanism by which the vectors extract 'high-level abstractions' from impressions/clicks are stated at a conceptual level only. This leaves open whether the construction actually captures sequential preference information without explicit temporal modeling, which is the key assumption underlying the DNN-RNN compromise claim.
minor comments (1)
- [Abstract] The final sentence of the abstract states that the memory component 'can be augmented to other models as well'; this extensibility claim should be supported by at least one concrete example or ablation in the experiments section.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment below and agree to revisions that strengthen the presentation without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'both offline and online experiments demonstrate the effectiveness' is load-bearing yet unsupported by any mention of baselines, metrics (AUC, log-loss, etc.), statistical tests, or data-splitting rules. Without these, the practical-utility assertion cannot be evaluated.
Authors: We agree that the abstract would be clearer with supporting details. The experiments section of the manuscript reports AUC and log-loss results against DNN and RNN baselines on standard CTR datasets using chronological train/validation/test splits. We will revise the abstract to briefly reference these elements (e.g., metrics and baseline comparisons) while keeping the length appropriate. revision: yes
-
Referee: [Model] Model section (description of the two external memory vectors): the update rules and the precise mechanism by which the vectors extract 'high-level abstractions' from impressions/clicks are stated at a conceptual level only. This leaves open whether the construction actually captures sequential preference information without explicit temporal modeling, which is the key assumption underlying the DNN-RNN compromise claim.
Authors: The memory vectors are updated incrementally with each new impression/click to accumulate high-level like/dislike signals, which are then concatenated with the current ad features for the DNN. This provides a lightweight history summary without RNN-style recurrence. We acknowledge the description remains high-level and will add explicit update equations and a short discussion of how sequential information is retained in the revised model section. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces MA-DNN by defining two external memory vectors that store high-level like/dislike abstractions from user history, then augments a standard DNN. No equations, derivations, or self-citations are shown that reduce the target CTR prediction to fitted inputs by construction, rename known results, or import uniqueness from prior author work. Performance claims rest on offline/online experiments rather than internal self-definition, so the construction remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
invented entities (1)
-
two external memory vectors per user
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we create two external memory vectors for each user, memorizing high-level abstractions of what a user possibly likes and dislikes... loss2 = 1/|Y| sum [y mu1 + (1-y) mu0 - zL]^2
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The proposed MA-DNN achieves a good compromise between DNN and RNN
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In WSDM. ACM, 108–116
work page 2018
-
[2]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al
-
[3]
Wide & deep learning for recommender systems. In DLRS. ACM, 7–10
-
[4]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[5]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. ACM, 191–198
work page 2016
-
[6]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159
work page 2011
-
[7]
Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[8]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. In IJCAI. 1725–1731
work page 2017
-
[9]
Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In SIGIR. ACM, 355–364
work page 2017
-
[10]
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting clicks on ads at facebook. In ADKDD. ACM, 1–9
work page 2014
-
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780
work page 1997
-
[12]
H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. 2013. Ad click prediction: a view from the trenches. In KDD. ACM, 1222–1230
work page 2013
-
[13]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807–814
work page 2010
-
[14]
Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang
-
[15]
Product-based neural networks for user response prediction. In ICDM. IEEE, 1149–1154
-
[16]
Steffen Rendle. 2010. Factorization machines. In ICDM. IEEE, 995–1000
work page 2010
-
[17]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In WWW. ACM, 521–530
work page 2007
-
[18]
Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In DLRS. ACM, 17–22
work page 2016
-
[19]
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In ADKDD. ACM, 12
work page 2017
-
[20]
Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-field categorical data. In ECIR. Springer, 45–57
work page 2016
-
[21]
Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks.. In AAAI, Vol. 14. 1369–1375
work page 2014
-
[22]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In KDD. ACM, 1059–1068
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.