Click-Through Rate Prediction with the User Memory Network

Li Li; Shukui Ren; Wentao Ouyang; Xiuwu Zhang; Yanlong Du; Zhaojie Liu

arxiv: 1907.04667 · v2 · pith:B6P5B5HWnew · submitted 2019-07-09 · 💻 cs.IR · cs.LG

Click-Through Rate Prediction with the User Memory Network

Wentao Ouyang , Xiuwu Zhang , Shukui Ren , Li Li , Zhaojie Liu , Yanlong Du This is my paper

Pith reviewed 2026-05-25 00:29 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords click-through rate predictionmemory augmented DNNuser behavior modelingdeep neural networksonline advertisingrecurrent neural networkshistorical behaviorCTR prediction

0 comments

The pith

MA-DNN augments standard DNNs with two user memory vectors to incorporate historical behavior information for CTR prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a Memory Augmented DNN (MA-DNN) that maintains two external memory vectors per user to record high-level abstractions of likes and dislikes from past ad impressions and clicks. This design aims to give DNNs some of the historical exploitation power of RNNs while remaining as simple and efficient as DNNs for both training and prediction. The approach is positioned as a practical compromise for real-world CTR prediction in online advertising systems. It can also be added to other base models such as Wide&Deep. Experiments are said to show its effectiveness in both offline and online settings.

Core claim

By creating two external memory vectors for each user that memorize high-level abstractions of what the user possibly likes and dislikes, the MA-DNN model enables exploitation of useful information from users' historical behaviors while keeping the model as simple as a standard DNN.

What carries the argument

Two external memory vectors per user, one for positive preferences and one for negative, updated from historical impressions and clicks to capture user preference abstractions.

If this is right

MA-DNN achieves improved prediction performance over plain DNNs.
The model remains simple in both offline training and online prediction unlike RNNs.
The memory component can be augmented to other models such as Wide&Deep.
Both offline and online experiments support its effectiveness for practical CTR services.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This memory augmentation might allow real-time updating of user preferences in production systems without full sequence processing.
The approach could extend to other sequential recommendation tasks where full RNN modeling is too costly.
Memory vectors might be combined with attention mechanisms in future variants for better abstraction.

Load-bearing premise

That the two memory vectors can extract useful high-level user preference information from historical data without explicit modeling of temporal sequences.

What would settle it

A direct comparison on a public CTR dataset where MA-DNN shows no statistically significant AUC improvement over a baseline DNN with the same features.

Figures

Figures reproduced from arXiv: 1907.04667 by Li Li, Shukui Ren, Wentao Ouyang, Xiuwu Zhang, Yanlong Du, Zhaojie Liu.

**Figure 2.** Figure 2: Illustration of data preparation. Each GRU cell is defined as z = siдmoid(xtU z + st−1Wz ), r = siдmoid(xtU r + st−1Wr ), h = tanh(xtU h + (st−1 ◦ r)Wh ), st = (1 − z) ◦ h + z ◦ st−1, where ◦ denotes element-wise product, xt and st−1 are the input, st is the output and others are all model parameters. We can clearly observe that GRU is much more complex than DNN. The complexity exists for both offline trai… view at source ↗

read the original abstract

Click-through rate (CTR) prediction is a critical task in online advertising systems. Models like Deep Neural Networks (DNNs) are simple but stateless. They consider each target ad independently and cannot directly extract useful information contained in users' historical ad impressions and clicks. In contrast, models like Recurrent Neural Networks (RNNs) are stateful but complex. They model temporal dependency between users' sequential behaviors and can achieve improved prediction performance than DNNs. However, both the offline training and online prediction process of RNNs are much more complex and time-consuming. In this paper, we propose Memory Augmented DNN (MA-DNN) for practical CTR prediction services. In particular, we create two external memory vectors for each user, memorizing high-level abstractions of what a user possibly likes and dislikes. The proposed MA-DNN achieves a good compromise between DNN and RNN. It is as simple as DNN, but has certain ability to exploit useful information contained in users' historical behaviors as RNN. Both offline and online experiments demonstrate the effectiveness of MA-DNN for practical CTR prediction services. Actually, the memory component can be augmented to other models as well (e.g., the Wide&Deep model).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MA-DNN adds two fixed memory vectors per user to a DNN for CTR so it can use history without RNN cost, but the real test is whether the claimed gains hold up in the experiments.

read the letter

The core move here is straightforward: keep one memory vector for what a user likes and one for what they dislike, both updated from past impressions and clicks, then feed them into an otherwise ordinary DNN. This is presented as a middle ground that stays as easy to train and serve as a plain DNN while picking up some signal from user history the way an RNN would. The authors also note the memory piece can be dropped into models like Wide&Deep, which is a useful practical point. That design choice is the main thing that is new relative to standard memory-augmented networks applied elsewhere. It fits the CTR setting where positive and negative feedback are both abundant and asymmetric. The paper does a clean job of stating the motivation and the architecture without overclaiming theoretical novelty. The stress-test note is right that nothing in the construction itself creates an internal contradiction or obvious missing control. The weakest assumption is that the two vectors will actually extract useful high-level abstractions rather than just acting as extra feature slots, but that is an empirical question rather than a flaw in the setup. The main soft spot is the evidence. The abstract asserts that offline and online experiments show effectiveness, yet the provided text gives no numbers, no baseline list, no significance tests, and no description of how the memory is updated during training versus serving. If the full paper supplies those details with reasonable controls, the practical claim lands; if not, the compromise story stays unverified. This paper is for people who already run DNN-based CTR systems in production advertising or recommendation and want a low-overhead way to inject limited history. A practitioner could read the architecture section and try the idea quickly. It is worth sending to peer review because the application is real, the design is simple enough to reproduce, and the online experiment claim matters to the target audience even if the novelty is incremental.

Referee Report

2 major / 1 minor

Summary. The paper proposes Memory Augmented DNN (MA-DNN) for CTR prediction. It augments standard DNNs with two external memory vectors per user that memorize high-level abstractions of likes and dislikes from historical ad impressions and clicks. The model is positioned as a practical compromise: as simple as DNNs for training and inference yet able to exploit sequential user behavior like RNNs. The memory component is also claimed to be augmentable to other architectures such as Wide&Deep. Effectiveness is asserted via offline and online experiments.

Significance. If the empirical results hold under proper controls, MA-DNN would supply an efficient, low-complexity route to incorporate user history in production CTR systems, where RNN overhead is often prohibitive. The extensibility claim to other base models is a secondary strength that could broaden impact.

major comments (2)

[Abstract] Abstract: the central claim that 'both offline and online experiments demonstrate the effectiveness' is load-bearing yet unsupported by any mention of baselines, metrics (AUC, log-loss, etc.), statistical tests, or data-splitting rules. Without these, the practical-utility assertion cannot be evaluated.
[Model] Model section (description of the two external memory vectors): the update rules and the precise mechanism by which the vectors extract 'high-level abstractions' from impressions/clicks are stated at a conceptual level only. This leaves open whether the construction actually captures sequential preference information without explicit temporal modeling, which is the key assumption underlying the DNN-RNN compromise claim.

minor comments (1)

[Abstract] The final sentence of the abstract states that the memory component 'can be augmented to other models as well'; this extensibility claim should be supported by at least one concrete example or ablation in the experiments section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and agree to revisions that strengthen the presentation without altering the core contributions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'both offline and online experiments demonstrate the effectiveness' is load-bearing yet unsupported by any mention of baselines, metrics (AUC, log-loss, etc.), statistical tests, or data-splitting rules. Without these, the practical-utility assertion cannot be evaluated.

Authors: We agree that the abstract would be clearer with supporting details. The experiments section of the manuscript reports AUC and log-loss results against DNN and RNN baselines on standard CTR datasets using chronological train/validation/test splits. We will revise the abstract to briefly reference these elements (e.g., metrics and baseline comparisons) while keeping the length appropriate. revision: yes
Referee: [Model] Model section (description of the two external memory vectors): the update rules and the precise mechanism by which the vectors extract 'high-level abstractions' from impressions/clicks are stated at a conceptual level only. This leaves open whether the construction actually captures sequential preference information without explicit temporal modeling, which is the key assumption underlying the DNN-RNN compromise claim.

Authors: The memory vectors are updated incrementally with each new impression/click to accumulate high-level like/dislike signals, which are then concatenated with the current ad features for the DNN. This provides a lightweight history summary without RNN-style recurrence. We acknowledge the description remains high-level and will add explicit update equations and a short discussion of how sequential information is retained in the revised model section. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces MA-DNN by defining two external memory vectors that store high-level like/dislike abstractions from user history, then augments a standard DNN. No equations, derivations, or self-citations are shown that reduce the target CTR prediction to fitted inputs by construction, rename known results, or import uniqueness from prior author work. Performance claims rest on offline/online experiments rather than internal self-definition, so the construction remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the introduction of two new memory vectors whose update rules and integration into the DNN are not derived from first principles but postulated as effective abstractions.

invented entities (1)

two external memory vectors per user no independent evidence
purpose: to memorize high-level abstractions of what a user possibly likes and dislikes
These vectors are introduced to give the DNN limited access to historical behavior without RNN complexity.

pith-pipeline@v0.9.0 · 5752 in / 1152 out tokens · 21250 ms · 2026-05-25T00:29:20.788207+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we create two external memory vectors for each user, memorizing high-level abstractions of what a user possibly likes and dislikes... loss2 = 1/|Y| sum [y mu1 + (1-y) mu0 - zL]^2
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The proposed MA-DNN achieves a good compromise between DNN and RNN

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

[1]

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In WSDM. ACM, 108–116

work page 2018
[2]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al

work page
[3]

Wide & deep learning for recommender systems. In DLRS. ACM, 7–10

work page
[4]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[5]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. ACM, 191–198

work page 2016
[6]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159

work page 2011
[7]

Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[8]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. In IJCAI. 1725–1731

work page 2017
[9]

Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In SIGIR. ACM, 355–364

work page 2017
[10]

Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting clicks on ads at facebook. In ADKDD. ACM, 1–9

work page 2014
[11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780

work page 1997
[12]

H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. 2013. Ad click prediction: a view from the trenches. In KDD. ACM, 1222–1230

work page 2013
[13]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807–814

work page 2010
[14]

Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang

work page
[15]

Product-based neural networks for user response prediction. In ICDM. IEEE, 1149–1154

work page
[16]

Steffen Rendle. 2010. Factorization machines. In ICDM. IEEE, 995–1000

work page 2010
[17]

Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In WWW. ACM, 521–530

work page 2007
[18]

Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In DLRS. ACM, 17–22

work page 2016
[19]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In ADKDD. ACM, 12

work page 2017
[20]

Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-field categorical data. In ECIR. Springer, 45–57

work page 2016
[21]

Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks.. In AAAI, Vol. 14. 1369–1375

work page 2014
[22]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In KDD. ACM, 1059–1068

work page 2018

[1] [1]

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In WSDM. ACM, 108–116

work page 2018

[2] [2]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al

work page

[3] [3]

Wide & deep learning for recommender systems. In DLRS. ACM, 7–10

work page

[4] [4]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[5] [5]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In RecSys. ACM, 191–198

work page 2016

[6] [6]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul (2011), 2121–2159

work page 2011

[7] [7]

Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[8] [8]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. Deepfm: a factorization-machine based neural network for ctr prediction. In IJCAI. 1725–1731

work page 2017

[9] [9]

Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In SIGIR. ACM, 355–364

work page 2017

[10] [10]

Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et al. 2014. Practical lessons from predicting clicks on ads at facebook. In ADKDD. ACM, 1–9

work page 2014

[11] [11]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780

work page 1997

[12] [12]

H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. 2013. Ad click prediction: a view from the trenches. In KDD. ACM, 1222–1230

work page 2013

[13] [13]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807–814

work page 2010

[14] [14]

Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang

work page

[15] [15]

Product-based neural networks for user response prediction. In ICDM. IEEE, 1149–1154

work page

[16] [16]

Steffen Rendle. 2010. Factorization machines. In ICDM. IEEE, 995–1000

work page 2010

[17] [17]

Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In WWW. ACM, 521–530

work page 2007

[18] [18]

Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In DLRS. ACM, 17–22

work page 2016

[19] [19]

Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In ADKDD. ACM, 12

work page 2017

[20] [20]

Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-field categorical data. In ECIR. Springer, 45–57

work page 2016

[21] [21]

Yuyu Zhang, Hanjun Dai, Chang Xu, Jun Feng, Taifeng Wang, Jiang Bian, Bin Wang, and Tie-Yan Liu. 2014. Sequential Click Prediction for Sponsored Search with Recurrent Neural Networks.. In AAAI, Vol. 14. 1369–1375

work page 2014

[22] [22]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In KDD. ACM, 1059–1068

work page 2018