Deep Personalized Re-targeting

Meisam Hejazinia; Pavlos Mitsoulis-Ntompos; Serena Zhang

arxiv: 1907.02822 · v2 · pith:LC3JX4ZCnew · submitted 2019-07-03 · 💻 cs.IR · cs.LG

Deep Personalized Re-targeting

Meisam Hejazinia , Pavlos Mitsoulis-Ntompos , Serena Zhang This is my paper

Pith reviewed 2026-05-25 09:56 UTC · model grok-4.3

classification 💻 cs.IR cs.LG

keywords personalized retargetinghybrid modelneural embeddingsgradient boosting treesbooking predictionsession logscomputational advertisingvacation rentals

0 comments

The pith

A hybrid model infusing deep and shallow neural embeddings into gradient boosting trees improves booking prediction performance by seven percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a hybrid architecture can predict individual travelers' booking probabilities and spending values more accurately than simpler baselines. It learns hidden preferences automatically from sparse session logs in vacation rental marketplaces by passing embeddings from both complex deep networks and simple shallow networks into a gradient boosting tree. The approach is presented with its production deployment details. A sympathetic reader would care because these predictions directly support personalized advertising in markets where users spend long periods in discovery. Offline tests indicate the hybrid reaches a useful balance that delivers a seven percent performance gain without the full cost of deep-only models.

Core claim

The central claim is that infusing embeddings from deep and shallow neural networks into a gradient boosting tree model automatically learns latent preferences of millions of travelers from sparse session logs, yielding a seven percent increase in prediction performance on booking probability and value, with the full architecture deployed in production.

What carries the argument

The hybrid model that infuses deep and shallow neural network embeddings into a gradient boosting tree to capture latent preferences.

If this is right

The model supports traveler-level prediction of booking probability and value for computational advertising.
It handles long shopping cycles and sparse data footprints without requiring a single fixed architecture.
The production architecture demonstrates that the hybrid can run at the scale of millions of users.
Performance gains arise specifically from the pragmatic mix of deep and shallow embeddings rather than either alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hybrids could reduce reliance on full deep-learning pipelines in other sparse-data personalization tasks.
The seven percent offline gain suggests testing whether the same lift appears in revenue metrics on new marketplaces.
Extending the approach to additional embedding sources might further improve capture of session-based preferences.

Load-bearing premise

Embeddings from the neural networks will reliably extract meaningful latent preferences from sparse session logs and generalize without major overfitting or extra tuning.

What would settle it

An online A/B test on live marketplace traffic that shows no statistically significant lift in booking rates or ad conversion when the hybrid embeddings are added versus a plain gradient boosting tree baseline.

Figures

Figures reproduced from arXiv: 1907.02822 by Meisam Hejazinia, Pavlos Mitsoulis-Ntompos, Serena Zhang.

**Figure 2.** Figure 2: Each traveler’s interaction (listing view, dated search, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 2.** Figure 2: High-level overview of the Deep Personalized Re-targeting System. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Predicting booking probability and value at the traveler level plays a central role in computational advertising for massive two-sided vacation rental marketplaces. These marketplaces host millions of travelers with long shopping cycles, spending a lot of time in the discovery phase. The footprint of the travelers in their discovery is a useful data source to help these marketplaces to predict shopping probability and value. However, there is no one-size-fits-all solution for this purpose. In this paper, we propose a hybrid model that infuses deep and shallow neural network embeddings into a gradient boosting tree model. This approach allows the latent preferences of millions of travelers to be automatically learned from sparse session logs. In addition, we present the architecture that we deployed into our production system. We find that there is a pragmatic sweet spot between expensive complex deep neural networks and simple shallow neural networks that can increase the prediction performance of a model by seven percent, based on offline analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a hybrid of deep/shallow NN embeddings into GBT for vacation rental booking prediction and reports a 7% offline lift, but supplies almost no experimental details to back the number.

read the letter

The core claim is that feeding embeddings from both deep and shallow neural nets into a gradient boosting tree improves booking probability prediction by 7% on offline session log data from a vacation rental platform. They also sketch the architecture they put into production. That is the whole contribution in the abstract. The hybrid pattern itself was not new in 2019, so the value is in showing it works on sparse, long-cycle traveler logs for this specific marketplace. The production deployment note is useful for practitioners who need to move from research to serving. The soft spot is obvious and load-bearing: the 7% figure comes with zero information on baselines, hold-out strategy, statistical tests, or error bars. There is also no online A/B result or temporal validation, so it is impossible to judge whether the embeddings actually generalize or just fit the particular log sample. The assumption that the model captures latent preferences without overfitting is stated but not checked. This is the kind of applied systems paper that can be helpful to teams already working on similar ad prediction problems, but it will not move the broader literature. A serious referee could ask for the missing experimental controls and online metrics; without those the central result stays provisional. I would send it to review only if the authors can supply a fuller evaluation section.

Referee Report

2 major / 1 minor

Summary. The paper proposes a hybrid model that infuses embeddings from deep and shallow neural networks into a gradient boosting tree (GBT) to predict traveler booking probability and value from sparse session logs in vacation rental marketplaces. It claims this yields a 7% performance increase based on offline analysis and describes the deployed production architecture.

Significance. If the central empirical claim holds under proper validation, the work identifies a practical hybrid architecture for sparse, long-cycle user data in computational advertising, bridging complex deep models and simpler alternatives. The explicit description of a deployed system is a strength that could inform similar IR applications.

major comments (2)

[Abstract] Abstract: the central claim of a seven percent offline improvement supplies no baseline model, evaluation metric (e.g., AUC or log-loss), statistical test, data exclusion rules, or error bars, leaving the performance gain impossible to assess from the given text.
[Abstract] Abstract: the paper reports only offline analysis and mentions deployment but provides no online A/B results, temporal hold-out details, or distribution-shift tests, so the claim that embeddings generalize from sparse logs to production rests on an unverified assumption.

minor comments (1)

[Abstract] The abstract could clarify whether the 7% figure is relative improvement on a specific metric or an aggregate across tasks.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments on our manuscript. We address the major comments point-by-point below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of a seven percent offline improvement supplies no baseline model, evaluation metric (e.g., AUC or log-loss), statistical test, data exclusion rules, or error bars, leaving the performance gain impossible to assess from the given text.

Authors: We concur that the abstract should be more informative regarding the experimental setup. The full paper provides these details in the experiments section, including the use of a standard GBT as baseline, AUC as the metric, and cross-validation with temporal splits. In the revision, we will expand the abstract to briefly mention the baseline, metric, and that the 7% lift is statistically significant. This will make the central claim more assessable without reading the full text. revision: yes
Referee: [Abstract] Abstract: the paper reports only offline analysis and mentions deployment but provides no online A/B results, temporal hold-out details, or distribution-shift tests, so the claim that embeddings generalize from sparse logs to production rests on an unverified assumption.

Authors: The paper's contribution centers on the hybrid model and its offline performance, with a description of the deployed architecture. We will revise to include more specifics on the temporal hold-out strategy used in offline tests and any steps taken to mitigate distribution shift. However, online A/B test results are not available in the current work as the focus was on model development and offline validation. We will add a note acknowledging this limitation. revision: partial

standing simulated objections not resolved

Online A/B test results to verify generalization to production

Circularity Check

0 steps flagged

No circularity: empirical offline performance claim is self-contained

full rationale

The paper reports an empirical 7% lift from a hybrid architecture (deep+shallow NN embeddings fed to GBT) on offline session-log data. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided text. The performance number is presented as a direct experimental observation rather than a quantity forced by construction from the model inputs or prior self-referential results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the hybrid architecture is described at a high level without equations or modeling choices that can be audited.

pith-pipeline@v0.9.0 · 5684 in / 1029 out tokens · 47341 ms · 2026-05-25T09:56:02.485345+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 5 internal anchors

[1]

Display advertising with real-time bidding (rtb) and behavioural targeting,

J. Wang, W. Zhang, S. Yuan et al. , “Display advertising with real-time bidding (rtb) and behavioural targeting,” F oundations and Trends® in Information Retrieval , vol. 11, no. 4-5, pp. 297–435, 2017

work page 2017
[2]

Counting your customers the easy way: An alternative to the pareto/nbd model,

P. S. Fader, B. G. Hardie, and K. L. Lee, “Counting your customers the easy way: An alternative to the pareto/nbd model,” Marketing science , vol. 24, no. 2, pp. 275–284, 2005

work page 2005
[3]

The pareto/nbd is not a lost-for-good model,

P. S. Fader and B. G. Hardie, “The pareto/nbd is not a lost-for-good model,” 2016

work page 2016
[4]

Probability models for customer-base analysis,

P. Fader and B. G. Hardie, “Probability models for customer-base analysis,” Journal of interactive marketing , vol. 23, no. 1, pp. 61–69, 2009

work page 2009
[5]

A modiﬁed pareto/nbd approach for predicting customer lifetime value,

N. Glady, B. Baesens, and C. Croux, “A modiﬁed pareto/nbd approach for predicting customer lifetime value,” Expert Systems with Applica- tions, vol. 36, no. 2, pp. 2062–2071, 2009

work page 2062
[6]

Measuring the lifetime value of customers acquired from google search advertising,

T. Y . Chan, C. Wu, and Y . Xie, “Measuring the lifetime value of customers acquired from google search advertising,” Marketing Science, vol. 30, no. 5, pp. 837–850, 2011

work page 2011
[7]

A hidden markov model of customer relationship dynamics,

O. Netzer, J. M. Lattin, and V . Srinivasan, “A hidden markov model of customer relationship dynamics,” Marketing science, vol. 27, no. 2, pp. 185–204, 2008

work page 2008
[8]

Field-aware factorization machines for ctr prediction,

Y . Juan, Y . Zhuang, W.-S. Chin, and C.-J. Lin, “Field-aware factorization machines for ctr prediction,” in Proceedings of the 10th ACM Conference on Recommender Systems . ACM, 2016, pp. 43–50

work page 2016
[9]

Xgboost: A scalable tree boosting system,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining . ACM, 2016, pp. 785–794

work page 2016
[10]

An engagement- based customer lifetime value system for e-commerce,

A. Vanderveld, A. Pandey, A. Han, and R. Parekh, “An engagement- based customer lifetime value system for e-commerce,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016, pp. 293–302

work page 2016
[11]

Intent-aware audience targeting for ride-hailing service,

Y . Xia, J. Zhou, J. Cao, Y . Li, F. Gao, K. Liu, H. Wu, and H. Xiong, “Intent-aware audience targeting for ride-hailing service,” in Machine Learning and Knowledge Discovery in Databases , U. Brefeld, E. Curry, E. Daly, B. MacNamee, A. Marascu, F. Pinelli, M. Berlingerio, and N. Hurley, Eds. Cham: Springer International Publishing, 2019, pp. 136–151

work page 2019
[12]

Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks

H. Sheil, O. Rana, and R. G. Reilly, “Predicting purchasing intent: Automatic feature learning using recurrent neural networks,” CoRR, vol. abs/1807.08207, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Deep & cross network for ad click predictions,

R. Wang, B. Fu, G. Fu, and M. Wang, “Deep & cross network for ad click predictions,” in Proceedings of the ADKDD’17 . ACM, 2017, p. 12

work page 2017
[14]

Deep Neural Net with Attention for Multi-channel Multi-touch Attribution

S. K. Arava, C. Dong, Z. Yan, A. Pani et al. , “Deep neural net with attention for multi-channel multi-touch attribution,” arXiv preprint arXiv:1809.02230, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

K. Gai, X. Zhu, H. Li, K. Liu, and Z. Wang, “Learning piece-wise linear models from large scale data for ad click prediction,” arXiv preprint arXiv:1704.05194, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Deep interest network for click-through rate prediction,

G. Zhou, X. Zhu, C. Song, Y . Fan, H. Zhu, X. Ma, Y . Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1059–1068

work page 2018
[17]

A hierarchical bayesian model for size recommendation in fashion,

R. Guigour `es, Y . K. Ho, E. Koriagin, A.-S. Sheikh, U. Bergmann, and R. Shirvany, “A hierarchical bayesian model for size recommendation in fashion,” in Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018, pp. 392–396

work page 2018
[18]

Next basket recommendation with neural networks

S. Wan, Y . Lan, P. Wang, J. Guo, J. Xu, and X. Cheng, “Next basket recommendation with neural networks.” in RecSys Posters, 2015

work page 2015
[19]

Content-aware hierarchical point-of-interest embedding model for successive poi rec- ommendation

B. Chang, Y . Park, D. Park, S. Kim, and J. Kang, “Content-aware hierarchical point-of-interest embedding model for successive poi rec- ommendation.” in IJCAI, 2018, pp. 3301–3307

work page 2018
[20]

Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,

D. Liang, J. Altosaar, L. Charlin, and D. M. Blei, “Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,” in Proceedings of the 10th ACM conference on recom- mender systems . ACM, 2016, pp. 59–66

work page 2016
[21]

Logistic matrix factorization for implicit feedback data,

C. C. Johnson, “Logistic matrix factorization for implicit feedback data,” Advances in Neural Information Processing Systems , vol. 27, 2014

work page 2014
[22]

Item2vec: neural item embedding for collaborative ﬁltering,

O. Barkan and N. Koenigstein, “Item2vec: neural item embedding for collaborative ﬁltering,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) . IEEE, 2016, pp. 1–6

work page 2016
[23]

A survey on session-based recommender systems,

S. Wang, L. Cao, and Y . Wang, “A survey on session-based recommender systems,” arXiv preprint arXiv:1902.04864 , 2019

work page arXiv 1902
[24]

Session- based recommendation with graph neural networks,

S. Wu, Y . Tang, Y . Zhu, L. Wang, X. Xie, and T. Tan, “Session- based recommendation with graph neural networks,” arXiv preprint arXiv:1811.00855, 2018

work page arXiv 2018
[25]

Distributed representations of words and phrases and their composi- tionality,

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their composi- tionality,” pp. 3111–3119, 2013

work page 2013
[26]

E-commerce in your inbox: Product recommendations at scale,

M. Grbovic, V . Radosavljevic, N. Djuric, N. Bhamidipati, J. Savla, V . Bhagwan, and D. Sharp, “E-commerce in your inbox: Product recommendations at scale,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 1809–1818

work page 2015
[27]

Real-time personalization using embeddings for search ranking at airbnb,

M. Grbovic and H. Cheng, “Real-time personalization using embeddings for search ranking at airbnb,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 311–320

work page 2018
[28]

Billion-scale commodity embedding for e-commerce recommendation in alibaba,

J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee, “Billion-scale commodity embedding for e-commerce recommendation in alibaba,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 839–848

work page 2018
[29]

A simple but tough-to-beat baseline for sentence embeddings,

S. Arora, Y . Liang, and T. Ma, “A simple but tough-to-beat baseline for sentence embeddings,” 2016

work page 2016
[30]

Meta-prod2vec: Product em- beddings using side-information for recommendation,

F. Vasile, E. Smirnova, and A. Conneau, “Meta-prod2vec: Product em- beddings using side-information for recommendation,” in Proceedings of the 10th ACM Conference on Recommender Systems . ACM, 2016, pp. 225–232

work page 2016
[31]

Word2vec applied to recommendation: Hyperparameters matter,

H. Caselles-Dupr ´e, F. Lesaint, and J. Royo-Letelier, “Word2vec applied to recommendation: Hyperparameters matter,” inProceedings of the 12th ACM Conference on Recommender Systems . ACM, 2018, pp. 352–356

work page 2018
[32]

Incorporating dwell time in session-based rec- ommendations with recurrent neural networks

V . Bogina and T. Kuﬂik, “Incorporating dwell time in session-based rec- ommendations with recurrent neural networks.” in RecTemp@ RecSys, 2017, pp. 57–59

work page 2017
[33]

Deep crossing: Web-scale modeling without manually crafted combinatorial features,

Y . Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao, “Deep crossing: Web-scale modeling without manually crafted combinatorial features,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016, pp. 255–262

work page 2016
[34]

Wide & deep learning for recommender systems,

H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al. , “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems . ACM, 2016, pp. 7–10

work page 2016
[35]

Deep unordered composition rivals syntactic methods for text classiﬁcation,

M. Iyyer, V . Manjunatha, J. Boyd-Graber, and H. Daum ´e III, “Deep unordered composition rivals syntactic methods for text classiﬁcation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (V olume 1: Long Papers), vol. 1, 2015, pp. 1681–1691

work page 2015
[36]

Autorec: Autoencoders meet collaborative ﬁltering,

S. Sedhain, A. K. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders meet collaborative ﬁltering,” in Proceedings of the 24th International Conference on World Wide Web . ACM, 2015, pp. 111–112

work page 2015
[37]

Learning tree-based deep model for recommender systems,

H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1079–1088

work page 2018
[38]

Understanding consumer behavior with recurrent neural networks,

T. Lang and M. Rettenmeier, “Understanding consumer behavior with recurrent neural networks,” in Workshop on Machine Learning Methods for Recommender Systems , 2017

work page 2017
[39]

Close: C ontextualized lo cation se quence recommender,

R. Baral, S. Iyengar, T. Li, and N. Balakrishnan, “Close: C ontextualized lo cation se quence recommender,” in Proceedings of the 12th ACM conference on recommender systems . ACM, 2018, pp. 470–474

work page 2018
[40]

Deep neural network marketplace recommenders in online experiments,

S. Eide and N. Zhou, “Deep neural network marketplace recommenders in online experiments,” in Proceedings of the 12th ACM Conference on Recommender Systems . ACM, 2018, pp. 387–391

work page 2018
[41]

Deep inventory time translation to improve recommendations for real-world retail,

B. Pr ´evost, J. L. Janssen, J. R. Camacaro, and C. Bessega, “Deep inventory time translation to improve recommendations for real-world retail,” in Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018, pp. 195–199

work page 2018
[42]

Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction

B. Liu, R. Tang, Y . Chen, J. Yu, H. Guo, and Y . Zhang, “Feature gener- ation by convolutional neural network for click-through rate prediction,” arXiv preprint arXiv:1904.04447 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[43]

An attentive survey of attention models,

S. Chaudhari, G. Polatkan, R. Ramanath, and V . Mithal, “An attentive survey of attention models,” arXiv preprint arXiv:1904.02874 , 2019

work page arXiv 1904
[44]

Reference product search,

C. Wang, L. Tang, S. Bian, D. Zhang, Z. Zhang, and Y . Wu, “Reference product search,” 2019

work page 2019
[45]

Attention- based bidirectional long short-term memory networks for relation classi- ﬁcation,

P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, “Attention- based bidirectional long short-term memory networks for relation classi- ﬁcation,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (V olume 2: Short Papers) , vol. 2, 2016, pp. 207–212

work page 2016
[46]

Large-scale Collaborative Filtering with Product Embeddings

T. Lake, S. A. Williamson, A. T. Hawk, C. C. Johnson, and B. P. Wing, “Large-scale collaborative ﬁltering with product embeddings,” arXiv preprint arXiv:1901.04321 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[47]

Customer lifetime value prediction using embeddings,

B. P. Chamberlain, A. Cardoso, C. H. Liu, R. Pagliari, and M. P. Deisenroth, “Customer lifetime value prediction using embeddings,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2017, pp. 1753–1762

work page 2017
[48]

Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors),

J. Friedman, T. Hastie, R. Tibshirani et al., “Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors),” The annals of statistics , vol. 28, no. 2, pp. 337–407, 2000

work page 2000
[49]

Broderick and D

A. Broderick and D. Pickton, Integrated marketing communications . Pearson Education UK, 2005

work page 2005
[50]

A test of services marketing theory: consumer infor- mation acquisition activities,

K. B. Murray, “A test of services marketing theory: consumer infor- mation acquisition activities,” Journal of marketing , vol. 55, no. 1, pp. 10–25, 1991

work page 1991
[51]

Memory and attentional factors in consumer choice: Concepts and research methods,

J. G. Lynch Jr and T. K. Srull, “Memory and attentional factors in consumer choice: Concepts and research methods,” Journal of consumer research, vol. 9, no. 1, pp. 18–37, 1982

work page 1982
[52]

[Online]

H2O.ai, H2O, October 2019. [Online]. Available: https://github.com/ h2oai/h2o-3

work page 2019
[53]

TensorFlow: Large-scale machine learning on heterogeneous systems,

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y . Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man ´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V . Vanho...

work page
[54]

Available: http://tensorﬂow.org/

[Online]. Available: http://tensorﬂow.org/

work page

[1] [1]

Display advertising with real-time bidding (rtb) and behavioural targeting,

J. Wang, W. Zhang, S. Yuan et al. , “Display advertising with real-time bidding (rtb) and behavioural targeting,” F oundations and Trends® in Information Retrieval , vol. 11, no. 4-5, pp. 297–435, 2017

work page 2017

[2] [2]

Counting your customers the easy way: An alternative to the pareto/nbd model,

P. S. Fader, B. G. Hardie, and K. L. Lee, “Counting your customers the easy way: An alternative to the pareto/nbd model,” Marketing science , vol. 24, no. 2, pp. 275–284, 2005

work page 2005

[3] [3]

The pareto/nbd is not a lost-for-good model,

P. S. Fader and B. G. Hardie, “The pareto/nbd is not a lost-for-good model,” 2016

work page 2016

[4] [4]

Probability models for customer-base analysis,

P. Fader and B. G. Hardie, “Probability models for customer-base analysis,” Journal of interactive marketing , vol. 23, no. 1, pp. 61–69, 2009

work page 2009

[5] [5]

A modiﬁed pareto/nbd approach for predicting customer lifetime value,

N. Glady, B. Baesens, and C. Croux, “A modiﬁed pareto/nbd approach for predicting customer lifetime value,” Expert Systems with Applica- tions, vol. 36, no. 2, pp. 2062–2071, 2009

work page 2062

[6] [6]

Measuring the lifetime value of customers acquired from google search advertising,

T. Y . Chan, C. Wu, and Y . Xie, “Measuring the lifetime value of customers acquired from google search advertising,” Marketing Science, vol. 30, no. 5, pp. 837–850, 2011

work page 2011

[7] [7]

A hidden markov model of customer relationship dynamics,

O. Netzer, J. M. Lattin, and V . Srinivasan, “A hidden markov model of customer relationship dynamics,” Marketing science, vol. 27, no. 2, pp. 185–204, 2008

work page 2008

[8] [8]

Field-aware factorization machines for ctr prediction,

Y . Juan, Y . Zhuang, W.-S. Chin, and C.-J. Lin, “Field-aware factorization machines for ctr prediction,” in Proceedings of the 10th ACM Conference on Recommender Systems . ACM, 2016, pp. 43–50

work page 2016

[9] [9]

Xgboost: A scalable tree boosting system,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining . ACM, 2016, pp. 785–794

work page 2016

[10] [10]

An engagement- based customer lifetime value system for e-commerce,

A. Vanderveld, A. Pandey, A. Han, and R. Parekh, “An engagement- based customer lifetime value system for e-commerce,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016, pp. 293–302

work page 2016

[11] [11]

Intent-aware audience targeting for ride-hailing service,

Y . Xia, J. Zhou, J. Cao, Y . Li, F. Gao, K. Liu, H. Wu, and H. Xiong, “Intent-aware audience targeting for ride-hailing service,” in Machine Learning and Knowledge Discovery in Databases , U. Brefeld, E. Curry, E. Daly, B. MacNamee, A. Marascu, F. Pinelli, M. Berlingerio, and N. Hurley, Eds. Cham: Springer International Publishing, 2019, pp. 136–151

work page 2019

[12] [12]

Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks

H. Sheil, O. Rana, and R. G. Reilly, “Predicting purchasing intent: Automatic feature learning using recurrent neural networks,” CoRR, vol. abs/1807.08207, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Deep & cross network for ad click predictions,

R. Wang, B. Fu, G. Fu, and M. Wang, “Deep & cross network for ad click predictions,” in Proceedings of the ADKDD’17 . ACM, 2017, p. 12

work page 2017

[14] [14]

Deep Neural Net with Attention for Multi-channel Multi-touch Attribution

S. K. Arava, C. Dong, Z. Yan, A. Pani et al. , “Deep neural net with attention for multi-channel multi-touch attribution,” arXiv preprint arXiv:1809.02230, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

K. Gai, X. Zhu, H. Li, K. Liu, and Z. Wang, “Learning piece-wise linear models from large scale data for ad click prediction,” arXiv preprint arXiv:1704.05194, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Deep interest network for click-through rate prediction,

G. Zhou, X. Zhu, C. Song, Y . Fan, H. Zhu, X. Ma, Y . Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1059–1068

work page 2018

[17] [17]

A hierarchical bayesian model for size recommendation in fashion,

R. Guigour `es, Y . K. Ho, E. Koriagin, A.-S. Sheikh, U. Bergmann, and R. Shirvany, “A hierarchical bayesian model for size recommendation in fashion,” in Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018, pp. 392–396

work page 2018

[18] [18]

Next basket recommendation with neural networks

S. Wan, Y . Lan, P. Wang, J. Guo, J. Xu, and X. Cheng, “Next basket recommendation with neural networks.” in RecSys Posters, 2015

work page 2015

[19] [19]

Content-aware hierarchical point-of-interest embedding model for successive poi rec- ommendation

B. Chang, Y . Park, D. Park, S. Kim, and J. Kang, “Content-aware hierarchical point-of-interest embedding model for successive poi rec- ommendation.” in IJCAI, 2018, pp. 3301–3307

work page 2018

[20] [20]

Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,

D. Liang, J. Altosaar, L. Charlin, and D. M. Blei, “Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,” in Proceedings of the 10th ACM conference on recom- mender systems . ACM, 2016, pp. 59–66

work page 2016

[21] [21]

Logistic matrix factorization for implicit feedback data,

C. C. Johnson, “Logistic matrix factorization for implicit feedback data,” Advances in Neural Information Processing Systems , vol. 27, 2014

work page 2014

[22] [22]

Item2vec: neural item embedding for collaborative ﬁltering,

O. Barkan and N. Koenigstein, “Item2vec: neural item embedding for collaborative ﬁltering,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) . IEEE, 2016, pp. 1–6

work page 2016

[23] [23]

A survey on session-based recommender systems,

S. Wang, L. Cao, and Y . Wang, “A survey on session-based recommender systems,” arXiv preprint arXiv:1902.04864 , 2019

work page arXiv 1902

[24] [24]

Session- based recommendation with graph neural networks,

S. Wu, Y . Tang, Y . Zhu, L. Wang, X. Xie, and T. Tan, “Session- based recommendation with graph neural networks,” arXiv preprint arXiv:1811.00855, 2018

work page arXiv 2018

[25] [25]

Distributed representations of words and phrases and their composi- tionality,

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their composi- tionality,” pp. 3111–3119, 2013

work page 2013

[26] [26]

E-commerce in your inbox: Product recommendations at scale,

M. Grbovic, V . Radosavljevic, N. Djuric, N. Bhamidipati, J. Savla, V . Bhagwan, and D. Sharp, “E-commerce in your inbox: Product recommendations at scale,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 1809–1818

work page 2015

[27] [27]

Real-time personalization using embeddings for search ranking at airbnb,

M. Grbovic and H. Cheng, “Real-time personalization using embeddings for search ranking at airbnb,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 311–320

work page 2018

[28] [28]

Billion-scale commodity embedding for e-commerce recommendation in alibaba,

J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee, “Billion-scale commodity embedding for e-commerce recommendation in alibaba,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 839–848

work page 2018

[29] [29]

A simple but tough-to-beat baseline for sentence embeddings,

S. Arora, Y . Liang, and T. Ma, “A simple but tough-to-beat baseline for sentence embeddings,” 2016

work page 2016

[30] [30]

Meta-prod2vec: Product em- beddings using side-information for recommendation,

F. Vasile, E. Smirnova, and A. Conneau, “Meta-prod2vec: Product em- beddings using side-information for recommendation,” in Proceedings of the 10th ACM Conference on Recommender Systems . ACM, 2016, pp. 225–232

work page 2016

[31] [31]

Word2vec applied to recommendation: Hyperparameters matter,

H. Caselles-Dupr ´e, F. Lesaint, and J. Royo-Letelier, “Word2vec applied to recommendation: Hyperparameters matter,” inProceedings of the 12th ACM Conference on Recommender Systems . ACM, 2018, pp. 352–356

work page 2018

[32] [32]

Incorporating dwell time in session-based rec- ommendations with recurrent neural networks

V . Bogina and T. Kuﬂik, “Incorporating dwell time in session-based rec- ommendations with recurrent neural networks.” in RecTemp@ RecSys, 2017, pp. 57–59

work page 2017

[33] [33]

Deep crossing: Web-scale modeling without manually crafted combinatorial features,

Y . Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao, “Deep crossing: Web-scale modeling without manually crafted combinatorial features,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016, pp. 255–262

work page 2016

[34] [34]

Wide & deep learning for recommender systems,

H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al. , “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems . ACM, 2016, pp. 7–10

work page 2016

[35] [35]

Deep unordered composition rivals syntactic methods for text classiﬁcation,

M. Iyyer, V . Manjunatha, J. Boyd-Graber, and H. Daum ´e III, “Deep unordered composition rivals syntactic methods for text classiﬁcation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (V olume 1: Long Papers), vol. 1, 2015, pp. 1681–1691

work page 2015

[36] [36]

Autorec: Autoencoders meet collaborative ﬁltering,

S. Sedhain, A. K. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders meet collaborative ﬁltering,” in Proceedings of the 24th International Conference on World Wide Web . ACM, 2015, pp. 111–112

work page 2015

[37] [37]

Learning tree-based deep model for recommender systems,

H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1079–1088

work page 2018

[38] [38]

Understanding consumer behavior with recurrent neural networks,

T. Lang and M. Rettenmeier, “Understanding consumer behavior with recurrent neural networks,” in Workshop on Machine Learning Methods for Recommender Systems , 2017

work page 2017

[39] [39]

Close: C ontextualized lo cation se quence recommender,

R. Baral, S. Iyengar, T. Li, and N. Balakrishnan, “Close: C ontextualized lo cation se quence recommender,” in Proceedings of the 12th ACM conference on recommender systems . ACM, 2018, pp. 470–474

work page 2018

[40] [40]

Deep neural network marketplace recommenders in online experiments,

S. Eide and N. Zhou, “Deep neural network marketplace recommenders in online experiments,” in Proceedings of the 12th ACM Conference on Recommender Systems . ACM, 2018, pp. 387–391

work page 2018

[41] [41]

Deep inventory time translation to improve recommendations for real-world retail,

B. Pr ´evost, J. L. Janssen, J. R. Camacaro, and C. Bessega, “Deep inventory time translation to improve recommendations for real-world retail,” in Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018, pp. 195–199

work page 2018

[42] [42]

Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction

B. Liu, R. Tang, Y . Chen, J. Yu, H. Guo, and Y . Zhang, “Feature gener- ation by convolutional neural network for click-through rate prediction,” arXiv preprint arXiv:1904.04447 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[43] [43]

An attentive survey of attention models,

S. Chaudhari, G. Polatkan, R. Ramanath, and V . Mithal, “An attentive survey of attention models,” arXiv preprint arXiv:1904.02874 , 2019

work page arXiv 1904

[44] [44]

Reference product search,

C. Wang, L. Tang, S. Bian, D. Zhang, Z. Zhang, and Y . Wu, “Reference product search,” 2019

work page 2019

[45] [45]

Attention- based bidirectional long short-term memory networks for relation classi- ﬁcation,

P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, “Attention- based bidirectional long short-term memory networks for relation classi- ﬁcation,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (V olume 2: Short Papers) , vol. 2, 2016, pp. 207–212

work page 2016

[46] [46]

Large-scale Collaborative Filtering with Product Embeddings

T. Lake, S. A. Williamson, A. T. Hawk, C. C. Johnson, and B. P. Wing, “Large-scale collaborative ﬁltering with product embeddings,” arXiv preprint arXiv:1901.04321 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[47] [47]

Customer lifetime value prediction using embeddings,

B. P. Chamberlain, A. Cardoso, C. H. Liu, R. Pagliari, and M. P. Deisenroth, “Customer lifetime value prediction using embeddings,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2017, pp. 1753–1762

work page 2017

[48] [48]

Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors),

J. Friedman, T. Hastie, R. Tibshirani et al., “Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors),” The annals of statistics , vol. 28, no. 2, pp. 337–407, 2000

work page 2000

[49] [49]

Broderick and D

A. Broderick and D. Pickton, Integrated marketing communications . Pearson Education UK, 2005

work page 2005

[50] [50]

A test of services marketing theory: consumer infor- mation acquisition activities,

K. B. Murray, “A test of services marketing theory: consumer infor- mation acquisition activities,” Journal of marketing , vol. 55, no. 1, pp. 10–25, 1991

work page 1991

[51] [51]

Memory and attentional factors in consumer choice: Concepts and research methods,

J. G. Lynch Jr and T. K. Srull, “Memory and attentional factors in consumer choice: Concepts and research methods,” Journal of consumer research, vol. 9, no. 1, pp. 18–37, 1982

work page 1982

[52] [52]

[Online]

H2O.ai, H2O, October 2019. [Online]. Available: https://github.com/ h2oai/h2o-3

work page 2019

[53] [53]

TensorFlow: Large-scale machine learning on heterogeneous systems,

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y . Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man ´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V . Vanho...

work page

[54] [54]

Available: http://tensorﬂow.org/

[Online]. Available: http://tensorﬂow.org/

work page