pith. sign in

arxiv: 2604.21675 · v2 · submitted 2026-04-23 · 💻 cs.IR

Counterfactual Multi-task Learning for Delayed Conversion Modeling in E-commerce Sales Pre-Promotion

Pith reviewed 2026-05-12 00:46 UTC · model grok-4.3

classification 💻 cs.IR
keywords delayed conversion modelingmulti-task learningcounterfactual causal inferencee-commerce pre-promotionconversion rate predictionuser behavior gatingadd-to-cart transitions
0
0 comments X

The pith

A counterfactual multi-task model predicts delayed conversions better during e-commerce pre-promotion periods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a model to predict both immediate and delayed purchases in the period right before a sales promotion, when users add items to carts but wait for discounts. Traditional methods struggle because behavior shifts and data is sparse in these short windows. By combining multi-task learning with a user-specific gating mechanism and counterfactual reasoning about why carts turn into buys, the approach uses past pre-promotion records to improve forecasts. If successful, this leads to more accurate ad targeting and higher revenue during big events.

Core claim

The authors propose the Counterfactual Multi-task Delayed Conversion Model (CM-DCM) that leverages historical pre-promotion data to enhance CVR prediction for both delayed and direct conversions. It does so through a multi-task architecture that jointly models the two conversion types, a personalized user behavior gating module to handle sparsity in brief pre-promotion periods, and a counterfactual causal approach to estimate the transition probability from add-to-cart events to delayed conversions. Experiments show it outperforms baselines, and live A/B tests during major promotions confirm gains in advertising revenue, delayed conversion GMV, and overall GMV.

What carries the argument

The CM-DCM model, which jointly models direct and delayed conversions using a multi-task architecture, a personalized user behavior gating module, and counterfactual causal modeling of the add-to-cart to delayed-conversion transition probability.

If this is right

  • Joint multi-task modeling of direct and delayed conversions improves CVR estimates specifically in pre-promotion windows.
  • The personalized gating module mitigates sparsity problems that arise in short pre-promotion intervals.
  • Counterfactual modeling of the add-to-cart transition probability reduces bias in delayed conversion forecasts.
  • Deployment yields measurable gains in advertising revenue and both delayed and total GMV during live promotions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same joint modeling plus causal transition structure could be adapted to other delayed-feedback settings such as subscription renewals or content engagement.
  • The gating component may generalize to reduce sparsity in any short-window prediction task where user signals are thin before an event.
  • If the counterfactual component holds, similar causal adjustments could debias intent shifts in non-promotional but time-sensitive recommendation problems.

Load-bearing premise

Historical pre-promotion data sufficiently captures the unique distribution shifts in conversion behavior before promotional events, and the counterfactual approach accurately models the add-to-cart to delayed-conversion transition without bias from unobserved factors.

What would settle it

An online A/B test during a major promotional event in which CM-DCM produces no measurable lift in advertising revenue, delayed conversion GMV, or overall GMV relative to standard CVR baselines.

Figures

Figures reproduced from arXiv: 2604.21675 by Jinxin Hu, Kaiyuan Li, Xin Song.

Figure 1
Figure 1. Figure 1: User behavior trend during a Sales Promotion. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of the Counterfactual Multi-task [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: User action trends for the Taobao (a) and Tmall (b) datasets. "#CLK" "#ATC" and "#BUY" denote the number of clicks, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Sales promotions, as short-term incentives to stimulate product purchases, play a pivotal role in modern e-commerce marketing strategies. During promotional events, user behavior patterns exhibit distinct characteristics compared to regular periods. In the pre-promotion phase, users typically engage in product search and browsing without immediate purchases, adding items to carts in anticipation of promotional discounts. This behavior leads to delayed conversions, resulting in significantly lower conversion rates (CVR) before the promotion day. Although existing research has made progress in CVR prediction for promotion days using historical data, it largely overlooks the critical pre-promotion period. And delayed feedback modeling has been extensively studied, current approaches fail to account for the unique distribution shifts in conversion behavior before promotional events, where delayed conversions predominantly occur on the promotion day rather than over continuous time windows. To address these limitations, we propose the Counterfactual Multi-task Delayed Conversion Model (CM-DCM), which leverages historical pre-promotion data to enhance CVR prediction for both delayed and direct conversions. Our model incorporates three key innovations: (i) A multi-task architecture that jointly models direct and delayed conversions using historical pre-promotion data; (ii) A personalized user behavior gating module to mitigate data sparsity issues during brief pre-promotion periods; (iii) A counterfactual causal approach to model the transition probability from add-to-cart (ATC) to delayed conversion. Extensive experiments demonstrate that CM-DCM outperforms baselines in pre-promotion scenarios. Online A/B tests during major promotional events showed significant improvements in advertising revenue, delayed conversion GMV, and overall GMV, validating the effectiveness of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes the Counterfactual Multi-task Delayed Conversion Model (CM-DCM) for CVR prediction in the pre-promotion phase of e-commerce sales events. It jointly models direct and delayed conversions via a multi-task architecture on historical pre-promotion data, incorporates a personalized user behavior gating module to address sparsity, and uses a counterfactual causal method to estimate the transition probability from add-to-cart to delayed conversion. The authors report that CM-DCM outperforms baselines in offline experiments and yields gains in advertising revenue, delayed-conversion GMV, and overall GMV in online A/B tests during major promotions.

Significance. If the counterfactual transition estimator is shown to be unbiased under the distribution shifts characteristic of pre-promotion periods, the work would supply a practical, deployable approach to delayed-feedback modeling that combines multi-task learning with causal adjustment. This could improve advertising efficiency in high-stakes promotional settings where conversion rates drop sharply before the event and then spike on the promotion day.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (counterfactual component): the claim that the counterfactual approach recovers an unbiased ATC-to-delayed-conversion transition probability rests on an unstated identification strategy. Observational pre-promotion logs are subject to unobserved confounders (price sensitivity, intent shifts, external promotions) that affect both ATC and eventual conversion; without explicit propensity weighting, instrumental-variable details, or sensitivity bounds in the methods, the estimator can inherit selection bias and the reported gains may reflect spurious correlation rather than the claimed causal modeling.
  2. [Abstract and §4] Abstract and §4 (experiments): the statements that CM-DCM 'outperforms baselines' and that A/B tests showed 'significant improvements' supply no information on baseline definitions, dataset sizes, cross-validation scheme, statistical tests, or confidence intervals. Without these, the empirical support for the three claimed innovations cannot be assessed and the central performance claim remains unverifiable.
minor comments (1)
  1. [Abstract] Abstract: the phrasing 'delayed conversions predominantly occur on the promotion day rather than over continuous time windows' would benefit from a brief quantitative illustration (e.g., fraction of conversions occurring on day 0 vs. later) to clarify the distribution shift being modeled.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where the comments correctly identify gaps in exposition or detail, we have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (counterfactual component): the claim that the counterfactual approach recovers an unbiased ATC-to-delayed-conversion transition probability rests on an unstated identification strategy. Observational pre-promotion logs are subject to unobserved confounders (price sensitivity, intent shifts, external promotions) that affect both ATC and eventual conversion; without explicit propensity weighting, instrumental-variable details, or sensitivity bounds in the methods, the estimator can inherit selection bias and the reported gains may reflect spurious correlation rather than the claimed causal modeling.

    Authors: We agree that the identification assumptions underlying the counterfactual transition probability estimator were not stated with sufficient precision in the original submission. In the revised manuscript we have expanded §3 to articulate the identification strategy explicitly: conditional on the observed user covariates, historical behavior sequences, and product features, add-to-cart is assumed independent of the potential delayed-conversion outcome under the no-promotion regime. We further acknowledge the possibility of unobserved confounding and have added a sensitivity-analysis section in the appendix that reports bias bounds under varying degrees of unobserved confounding. While these additions do not constitute a fully nonparametric identification result, they make the assumptions transparent and allow readers to assess robustness. The multi-task architecture and personalized gating are presented as complementary regularization mechanisms rather than substitutes for causal identification. revision: yes

  2. Referee: [Abstract and §4] Abstract and §4 (experiments): the statements that CM-DCM 'outperforms baselines' and that A/B tests showed 'significant improvements' supply no information on baseline definitions, dataset sizes, cross-validation scheme, statistical tests, or confidence intervals. Without these, the empirical support for the three claimed innovations cannot be assessed and the central performance claim remains unverifiable.

    Authors: We concur that the original manuscript omitted critical experimental details, rendering the performance claims difficult to verify. In the revised §4 we have added: (i) explicit definitions and citations for every baseline; (ii) dataset statistics (number of users, impressions, and pre-promotion windows) together with the exact train/validation/test splits; (iii) the time-ordered cross-validation protocol used to avoid temporal leakage; (iv) the statistical tests performed (paired t-tests) and associated p-values; and (v) 95 % confidence intervals for all reported offline and online metrics. A new summary table consolidates these settings. These changes directly address the referee’s concern and allow independent assessment of the three modeling contributions. revision: yes

Circularity Check

0 steps flagged

No circularity identified; derivation self-contained at described level

full rationale

The provided manuscript text (abstract plus high-level description) outlines a multi-task architecture, personalized gating module, and counterfactual approach to model ATC-to-delayed-conversion transitions using historical pre-promotion data. No equations, parameter-fitting procedures, or derivation steps are exhibited that would allow inspection for self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations. The counterfactual component is presented as a methodological innovation rather than a quantity derived tautologically from the same pre-promotion outcomes. Absent any quoted formulas showing equivalence by construction, the central claims retain independent content and do not reduce to their inputs. This matches the default expectation for non-circular papers when no inspectable reduction is available.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the model presumably relies on standard supervised learning assumptions plus fitted parameters inside the gating and counterfactual modules, but none are enumerated.

pith-pipeline@v0.9.0 · 5592 in / 1403 out tokens · 74719 ms · 2026-05-12T00:46:00.864486+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Our model incorporates three key innovations: (i) A multi-task architecture that jointly models direct and delayed conversions... (ii) A personalized user behavior gating module... (iii) A counterfactual causal approach to model the transition probability from add-to-cart (ATC) to delayed conversion.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Zhangming Chan, Yu Zhang, Shuguang Han, Yong Bai, Xiang-Rong Sheng, Siyuan Lou, Jiacen Hu, Baolin Liu, Yuning Jiang, Jian Xu, and Bo Zheng. 2023. Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(Long Beach, CA, USA)(...

  2. [3]

    Jianxin Chang, Chenbin Zhang, Yiqun Hui, Dewei Leng, Yanan Niu, Yang Song, and Kun Gai. 2023. PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(Long Beach, CA, USA)(KDD ’23). Association for Computing Machinery, New Y...

  3. [4]

    Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(New York, New York, USA)(KDD ’14). Association for Computing Machinery, New York, NY, USA, 1097–1105. doi:10.1145/2623330. 2623634

  4. [5]

    Yu Chen, Jiaqi Jin, Hui Zhao, Pengjie Wang, Guojun Liu, Jian Xu, and Bo Zheng

  5. [6]

    InProceedings of the ACM Web Conference 2022(Virtual Event, Lyon, France)(WWW ’22)

    Asymptotically Unbiased Estimation for Delayed Feedback Modeling via Label Correction. InProceedings of the ACM Web Conference 2022(Virtual Event, Lyon, France)(WWW ’22). Association for Computing Machinery, New York, NY, USA, 369–379. doi:10.1145/3485447.3511965

  6. [7]

    Henard, and Traci H

    Devon DelVecchio, David H. Henard, and Traci H. Freling. 2006. The effect of sales promotion on post-promotion brand preference: A meta-analysis.Journal of Retailing82, 3 (2006), 203–213. doi:10.1016/j.jretai.2005.10.001

  7. [8]

    Mahsa Familmaleki, Alireza Aghighi, and Kambiz Hamidi. 2015. Analyzing the influence of sales promotion on customer purchasing behavior.International Journal of Economics & management sciences4, 4 (2015), 1–6

  8. [9]

    Siyu Gu, Xiang-Rong Sheng, Ying Fan, Guorui Zhou, and Xiaoqiang Zhu

  9. [10]

    InProceedings of the 27th ACM SIGKDD Confer- ence on Knowledge Discovery & Data Mining(Virtual Event, Singapore)(KDD ’21)

    Real Negatives Matter: Continuous Training with Real Negatives for Delayed Feedback Modeling. InProceedings of the 27th ACM SIGKDD Confer- ence on Knowledge Discovery & Data Mining(Virtual Event, Singapore)(KDD ’21). Association for Computing Machinery, New York, NY, USA, 2890–2898. doi:10.1145/3447548.3467086

  10. [11]

    Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszár, Steven Yoo, and Wenzhe Shi. 2019. Addressing De- layed Feedback for Continuous Training ith Neural Networks in CTR Prediction. InProceedings of the 13th ACM Conference on Recommender Systems(Copenhagen, Denmark)(RecSys ’19). Association for Computing Machin...

  11. [12]

    Xiaochen Li, Xin Song, Pengjia Yuan, Xialong Liu, and Yu Zhang. 2022. Soft Retargeting Network for Click Through Rate Prediction. arXiv:2206.01894 [cs.IR] https://arxiv.org/abs/2206.01894

  12. [13]

    Shu-Ling Liao, Yung-Cheng Shen, and Chia-Hsien Chu. 2009. The effects of sales promotion strategy, product appeal and consumer traits on reminder impulse buying behaviour.International Journal of Consumer Studies33, 3 (2009), 274–284

  13. [14]

    Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, and Ed H. Chi. 2018. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture- of-Experts.Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(2018). https://api.semanticscholar.org/ CorpusID:50770252

  14. [15]

    Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. InThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval(Ann Arbor, MI, USA)(SIGIR ’18). Association for Computing Machinery, New York...

  15. [16]

    Xin Song, Xiaochen Li, Jinxin Hu, Hong Wen, Zulong Chen, Yu Zhang, Xiaoyi Zeng, and Jing Zhang. 2025. Lrea: Low-rank efficient attention on modeling long- term user behaviors for ctr prediction. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy)(SIGIR ’25). 2843–2847

  16. [17]

    Hao Wang, Tai-Wei Chang, Tianqiao Liu, Jianmin Huang, Zhichao Chen, Chao Yu, Ruopeng Li, and Wei Chu. 2022. ESCM2: Entire Space Counterfactual Multi- Task Model for Post-Click Conversion Rate Estimation. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval(Madrid, Spain)(SIGIR ’22). Association ...

  17. [18]

    Hong Wen, Jing Zhang, Yuan Wang, Fuyu Lv, Wentian Bao, Quan Lin, and Keping Yang. 2020. Entire Space Multi-Task Modeling via Post-Click Behavior Decom- position for Conversion Rate Prediction. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China)(SIGIR ’20). Association for...

  18. [19]

    Jia-Qi Yang, Xiang Li, Shuguang Han, Tao Zhuang, De-Chuan Zhan, Xiaoyi Zeng, and Bin Tong. 2021. Capturing Delayed Feedback in Conversion Rate Prediction via Elapsed-Time Sampling.Proceedings of the AAAI Conference on Artificial Intelligence35, 5 (May 2021), 4582–4589. doi:10.1609/aaai.v35i5.16587

  19. [20]

    Yunfeng Zhao, Xu Yan, Xiaoqiang Gui, Shuguang Han, Xiang-Rong Sheng, Guox- ian Yu, Jufeng Chen, Zhao Xu, and Bo Zheng. 2023. Entire Space Cascade Delayed Feedback Modeling for Effective Conversion Rate Prediction. InProceedings of the 32nd ACM International Conference on Information and Knowledge Manage- ment(Birmingham, United Kingdom)(CIKM ’23). Associa...