Deep Personalized Re-targeting
Pith reviewed 2026-05-25 09:56 UTC · model grok-4.3
The pith
A hybrid model infusing deep and shallow neural embeddings into gradient boosting trees improves booking prediction performance by seven percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that infusing embeddings from deep and shallow neural networks into a gradient boosting tree model automatically learns latent preferences of millions of travelers from sparse session logs, yielding a seven percent increase in prediction performance on booking probability and value, with the full architecture deployed in production.
What carries the argument
The hybrid model that infuses deep and shallow neural network embeddings into a gradient boosting tree to capture latent preferences.
If this is right
- The model supports traveler-level prediction of booking probability and value for computational advertising.
- It handles long shopping cycles and sparse data footprints without requiring a single fixed architecture.
- The production architecture demonstrates that the hybrid can run at the scale of millions of users.
- Performance gains arise specifically from the pragmatic mix of deep and shallow embeddings rather than either alone.
Where Pith is reading between the lines
- Similar hybrids could reduce reliance on full deep-learning pipelines in other sparse-data personalization tasks.
- The seven percent offline gain suggests testing whether the same lift appears in revenue metrics on new marketplaces.
- Extending the approach to additional embedding sources might further improve capture of session-based preferences.
Load-bearing premise
Embeddings from the neural networks will reliably extract meaningful latent preferences from sparse session logs and generalize without major overfitting or extra tuning.
What would settle it
An online A/B test on live marketplace traffic that shows no statistically significant lift in booking rates or ad conversion when the hybrid embeddings are added versus a plain gradient boosting tree baseline.
Figures
read the original abstract
Predicting booking probability and value at the traveler level plays a central role in computational advertising for massive two-sided vacation rental marketplaces. These marketplaces host millions of travelers with long shopping cycles, spending a lot of time in the discovery phase. The footprint of the travelers in their discovery is a useful data source to help these marketplaces to predict shopping probability and value. However, there is no one-size-fits-all solution for this purpose. In this paper, we propose a hybrid model that infuses deep and shallow neural network embeddings into a gradient boosting tree model. This approach allows the latent preferences of millions of travelers to be automatically learned from sparse session logs. In addition, we present the architecture that we deployed into our production system. We find that there is a pragmatic sweet spot between expensive complex deep neural networks and simple shallow neural networks that can increase the prediction performance of a model by seven percent, based on offline analysis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hybrid model that infuses embeddings from deep and shallow neural networks into a gradient boosting tree (GBT) to predict traveler booking probability and value from sparse session logs in vacation rental marketplaces. It claims this yields a 7% performance increase based on offline analysis and describes the deployed production architecture.
Significance. If the central empirical claim holds under proper validation, the work identifies a practical hybrid architecture for sparse, long-cycle user data in computational advertising, bridging complex deep models and simpler alternatives. The explicit description of a deployed system is a strength that could inform similar IR applications.
major comments (2)
- [Abstract] Abstract: the central claim of a seven percent offline improvement supplies no baseline model, evaluation metric (e.g., AUC or log-loss), statistical test, data exclusion rules, or error bars, leaving the performance gain impossible to assess from the given text.
- [Abstract] Abstract: the paper reports only offline analysis and mentions deployment but provides no online A/B results, temporal hold-out details, or distribution-shift tests, so the claim that embeddings generalize from sparse logs to production rests on an unverified assumption.
minor comments (1)
- [Abstract] The abstract could clarify whether the 7% figure is relative improvement on a specific metric or an aggregate across tasks.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our manuscript. We address the major comments point-by-point below and outline the revisions we plan to make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of a seven percent offline improvement supplies no baseline model, evaluation metric (e.g., AUC or log-loss), statistical test, data exclusion rules, or error bars, leaving the performance gain impossible to assess from the given text.
Authors: We concur that the abstract should be more informative regarding the experimental setup. The full paper provides these details in the experiments section, including the use of a standard GBT as baseline, AUC as the metric, and cross-validation with temporal splits. In the revision, we will expand the abstract to briefly mention the baseline, metric, and that the 7% lift is statistically significant. This will make the central claim more assessable without reading the full text. revision: yes
-
Referee: [Abstract] Abstract: the paper reports only offline analysis and mentions deployment but provides no online A/B results, temporal hold-out details, or distribution-shift tests, so the claim that embeddings generalize from sparse logs to production rests on an unverified assumption.
Authors: The paper's contribution centers on the hybrid model and its offline performance, with a description of the deployed architecture. We will revise to include more specifics on the temporal hold-out strategy used in offline tests and any steps taken to mitigate distribution shift. However, online A/B test results are not available in the current work as the focus was on model development and offline validation. We will add a note acknowledging this limitation. revision: partial
- Online A/B test results to verify generalization to production
Circularity Check
No circularity: empirical offline performance claim is self-contained
full rationale
The paper reports an empirical 7% lift from a hybrid architecture (deep+shallow NN embeddings fed to GBT) on offline session-log data. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided text. The performance number is presented as a direct experimental observation rather than a quantity forced by construction from the model inputs or prior self-referential results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Display advertising with real-time bidding (rtb) and behavioural targeting,
J. Wang, W. Zhang, S. Yuan et al. , “Display advertising with real-time bidding (rtb) and behavioural targeting,” F oundations and Trends® in Information Retrieval , vol. 11, no. 4-5, pp. 297–435, 2017
work page 2017
-
[2]
Counting your customers the easy way: An alternative to the pareto/nbd model,
P. S. Fader, B. G. Hardie, and K. L. Lee, “Counting your customers the easy way: An alternative to the pareto/nbd model,” Marketing science , vol. 24, no. 2, pp. 275–284, 2005
work page 2005
-
[3]
The pareto/nbd is not a lost-for-good model,
P. S. Fader and B. G. Hardie, “The pareto/nbd is not a lost-for-good model,” 2016
work page 2016
-
[4]
Probability models for customer-base analysis,
P. Fader and B. G. Hardie, “Probability models for customer-base analysis,” Journal of interactive marketing , vol. 23, no. 1, pp. 61–69, 2009
work page 2009
-
[5]
A modified pareto/nbd approach for predicting customer lifetime value,
N. Glady, B. Baesens, and C. Croux, “A modified pareto/nbd approach for predicting customer lifetime value,” Expert Systems with Applica- tions, vol. 36, no. 2, pp. 2062–2071, 2009
work page 2062
-
[6]
Measuring the lifetime value of customers acquired from google search advertising,
T. Y . Chan, C. Wu, and Y . Xie, “Measuring the lifetime value of customers acquired from google search advertising,” Marketing Science, vol. 30, no. 5, pp. 837–850, 2011
work page 2011
-
[7]
A hidden markov model of customer relationship dynamics,
O. Netzer, J. M. Lattin, and V . Srinivasan, “A hidden markov model of customer relationship dynamics,” Marketing science, vol. 27, no. 2, pp. 185–204, 2008
work page 2008
-
[8]
Field-aware factorization machines for ctr prediction,
Y . Juan, Y . Zhuang, W.-S. Chin, and C.-J. Lin, “Field-aware factorization machines for ctr prediction,” in Proceedings of the 10th ACM Conference on Recommender Systems . ACM, 2016, pp. 43–50
work page 2016
-
[9]
Xgboost: A scalable tree boosting system,
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining . ACM, 2016, pp. 785–794
work page 2016
-
[10]
An engagement- based customer lifetime value system for e-commerce,
A. Vanderveld, A. Pandey, A. Han, and R. Parekh, “An engagement- based customer lifetime value system for e-commerce,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016, pp. 293–302
work page 2016
-
[11]
Intent-aware audience targeting for ride-hailing service,
Y . Xia, J. Zhou, J. Cao, Y . Li, F. Gao, K. Liu, H. Wu, and H. Xiong, “Intent-aware audience targeting for ride-hailing service,” in Machine Learning and Knowledge Discovery in Databases , U. Brefeld, E. Curry, E. Daly, B. MacNamee, A. Marascu, F. Pinelli, M. Berlingerio, and N. Hurley, Eds. Cham: Springer International Publishing, 2019, pp. 136–151
work page 2019
-
[12]
Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks
H. Sheil, O. Rana, and R. G. Reilly, “Predicting purchasing intent: Automatic feature learning using recurrent neural networks,” CoRR, vol. abs/1807.08207, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
Deep & cross network for ad click predictions,
R. Wang, B. Fu, G. Fu, and M. Wang, “Deep & cross network for ad click predictions,” in Proceedings of the ADKDD’17 . ACM, 2017, p. 12
work page 2017
-
[14]
Deep Neural Net with Attention for Multi-channel Multi-touch Attribution
S. K. Arava, C. Dong, Z. Yan, A. Pani et al. , “Deep neural net with attention for multi-channel multi-touch attribution,” arXiv preprint arXiv:1809.02230, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
K. Gai, X. Zhu, H. Li, K. Liu, and Z. Wang, “Learning piece-wise linear models from large scale data for ad click prediction,” arXiv preprint arXiv:1704.05194, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Deep interest network for click-through rate prediction,
G. Zhou, X. Zhu, C. Song, Y . Fan, H. Zhu, X. Ma, Y . Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1059–1068
work page 2018
-
[17]
A hierarchical bayesian model for size recommendation in fashion,
R. Guigour `es, Y . K. Ho, E. Koriagin, A.-S. Sheikh, U. Bergmann, and R. Shirvany, “A hierarchical bayesian model for size recommendation in fashion,” in Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018, pp. 392–396
work page 2018
-
[18]
Next basket recommendation with neural networks
S. Wan, Y . Lan, P. Wang, J. Guo, J. Xu, and X. Cheng, “Next basket recommendation with neural networks.” in RecSys Posters, 2015
work page 2015
-
[19]
Content-aware hierarchical point-of-interest embedding model for successive poi rec- ommendation
B. Chang, Y . Park, D. Park, S. Kim, and J. Kang, “Content-aware hierarchical point-of-interest embedding model for successive poi rec- ommendation.” in IJCAI, 2018, pp. 3301–3307
work page 2018
-
[20]
Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,
D. Liang, J. Altosaar, L. Charlin, and D. M. Blei, “Factorization meets the item embedding: Regularizing matrix factorization with item co- occurrence,” in Proceedings of the 10th ACM conference on recom- mender systems . ACM, 2016, pp. 59–66
work page 2016
-
[21]
Logistic matrix factorization for implicit feedback data,
C. C. Johnson, “Logistic matrix factorization for implicit feedback data,” Advances in Neural Information Processing Systems , vol. 27, 2014
work page 2014
-
[22]
Item2vec: neural item embedding for collaborative filtering,
O. Barkan and N. Koenigstein, “Item2vec: neural item embedding for collaborative filtering,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) . IEEE, 2016, pp. 1–6
work page 2016
-
[23]
A survey on session-based recommender systems,
S. Wang, L. Cao, and Y . Wang, “A survey on session-based recommender systems,” arXiv preprint arXiv:1902.04864 , 2019
-
[24]
Session- based recommendation with graph neural networks,
S. Wu, Y . Tang, Y . Zhu, L. Wang, X. Xie, and T. Tan, “Session- based recommendation with graph neural networks,” arXiv preprint arXiv:1811.00855, 2018
-
[25]
Distributed representations of words and phrases and their composi- tionality,
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their composi- tionality,” pp. 3111–3119, 2013
work page 2013
-
[26]
E-commerce in your inbox: Product recommendations at scale,
M. Grbovic, V . Radosavljevic, N. Djuric, N. Bhamidipati, J. Savla, V . Bhagwan, and D. Sharp, “E-commerce in your inbox: Product recommendations at scale,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2015, pp. 1809–1818
work page 2015
-
[27]
Real-time personalization using embeddings for search ranking at airbnb,
M. Grbovic and H. Cheng, “Real-time personalization using embeddings for search ranking at airbnb,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 311–320
work page 2018
-
[28]
Billion-scale commodity embedding for e-commerce recommendation in alibaba,
J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee, “Billion-scale commodity embedding for e-commerce recommendation in alibaba,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 839–848
work page 2018
-
[29]
A simple but tough-to-beat baseline for sentence embeddings,
S. Arora, Y . Liang, and T. Ma, “A simple but tough-to-beat baseline for sentence embeddings,” 2016
work page 2016
-
[30]
Meta-prod2vec: Product em- beddings using side-information for recommendation,
F. Vasile, E. Smirnova, and A. Conneau, “Meta-prod2vec: Product em- beddings using side-information for recommendation,” in Proceedings of the 10th ACM Conference on Recommender Systems . ACM, 2016, pp. 225–232
work page 2016
-
[31]
Word2vec applied to recommendation: Hyperparameters matter,
H. Caselles-Dupr ´e, F. Lesaint, and J. Royo-Letelier, “Word2vec applied to recommendation: Hyperparameters matter,” inProceedings of the 12th ACM Conference on Recommender Systems . ACM, 2018, pp. 352–356
work page 2018
-
[32]
Incorporating dwell time in session-based rec- ommendations with recurrent neural networks
V . Bogina and T. Kuflik, “Incorporating dwell time in session-based rec- ommendations with recurrent neural networks.” in RecTemp@ RecSys, 2017, pp. 57–59
work page 2017
-
[33]
Deep crossing: Web-scale modeling without manually crafted combinatorial features,
Y . Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao, “Deep crossing: Web-scale modeling without manually crafted combinatorial features,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2016, pp. 255–262
work page 2016
-
[34]
Wide & deep learning for recommender systems,
H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al. , “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems . ACM, 2016, pp. 7–10
work page 2016
-
[35]
Deep unordered composition rivals syntactic methods for text classification,
M. Iyyer, V . Manjunatha, J. Boyd-Graber, and H. Daum ´e III, “Deep unordered composition rivals syntactic methods for text classification,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (V olume 1: Long Papers), vol. 1, 2015, pp. 1681–1691
work page 2015
-
[36]
Autorec: Autoencoders meet collaborative filtering,
S. Sedhain, A. K. Menon, S. Sanner, and L. Xie, “Autorec: Autoencoders meet collaborative filtering,” in Proceedings of the 24th International Conference on World Wide Web . ACM, 2015, pp. 111–112
work page 2015
-
[37]
Learning tree-based deep model for recommender systems,
H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . ACM, 2018, pp. 1079–1088
work page 2018
-
[38]
Understanding consumer behavior with recurrent neural networks,
T. Lang and M. Rettenmeier, “Understanding consumer behavior with recurrent neural networks,” in Workshop on Machine Learning Methods for Recommender Systems , 2017
work page 2017
-
[39]
Close: C ontextualized lo cation se quence recommender,
R. Baral, S. Iyengar, T. Li, and N. Balakrishnan, “Close: C ontextualized lo cation se quence recommender,” in Proceedings of the 12th ACM conference on recommender systems . ACM, 2018, pp. 470–474
work page 2018
-
[40]
Deep neural network marketplace recommenders in online experiments,
S. Eide and N. Zhou, “Deep neural network marketplace recommenders in online experiments,” in Proceedings of the 12th ACM Conference on Recommender Systems . ACM, 2018, pp. 387–391
work page 2018
-
[41]
Deep inventory time translation to improve recommendations for real-world retail,
B. Pr ´evost, J. L. Janssen, J. R. Camacaro, and C. Bessega, “Deep inventory time translation to improve recommendations for real-world retail,” in Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018, pp. 195–199
work page 2018
-
[42]
Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction
B. Liu, R. Tang, Y . Chen, J. Yu, H. Guo, and Y . Zhang, “Feature gener- ation by convolutional neural network for click-through rate prediction,” arXiv preprint arXiv:1904.04447 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[43]
An attentive survey of attention models,
S. Chaudhari, G. Polatkan, R. Ramanath, and V . Mithal, “An attentive survey of attention models,” arXiv preprint arXiv:1904.02874 , 2019
-
[44]
C. Wang, L. Tang, S. Bian, D. Zhang, Z. Zhang, and Y . Wu, “Reference product search,” 2019
work page 2019
-
[45]
Attention- based bidirectional long short-term memory networks for relation classi- fication,
P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, “Attention- based bidirectional long short-term memory networks for relation classi- fication,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (V olume 2: Short Papers) , vol. 2, 2016, pp. 207–212
work page 2016
-
[46]
Large-scale Collaborative Filtering with Product Embeddings
T. Lake, S. A. Williamson, A. T. Hawk, C. C. Johnson, and B. P. Wing, “Large-scale collaborative filtering with product embeddings,” arXiv preprint arXiv:1901.04321 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[47]
Customer lifetime value prediction using embeddings,
B. P. Chamberlain, A. Cardoso, C. H. Liu, R. Pagliari, and M. P. Deisenroth, “Customer lifetime value prediction using embeddings,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 2017, pp. 1753–1762
work page 2017
-
[48]
J. Friedman, T. Hastie, R. Tibshirani et al., “Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors),” The annals of statistics , vol. 28, no. 2, pp. 337–407, 2000
work page 2000
-
[49]
A. Broderick and D. Pickton, Integrated marketing communications . Pearson Education UK, 2005
work page 2005
-
[50]
A test of services marketing theory: consumer infor- mation acquisition activities,
K. B. Murray, “A test of services marketing theory: consumer infor- mation acquisition activities,” Journal of marketing , vol. 55, no. 1, pp. 10–25, 1991
work page 1991
-
[51]
Memory and attentional factors in consumer choice: Concepts and research methods,
J. G. Lynch Jr and T. K. Srull, “Memory and attentional factors in consumer choice: Concepts and research methods,” Journal of consumer research, vol. 9, no. 1, pp. 18–37, 1982
work page 1982
- [52]
-
[53]
TensorFlow: Large-scale machine learning on heterogeneous systems,
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y . Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man ´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V . Vanho...
- [54]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.