Causal Multi-Task Demand Learning

Varun Gupta; Vijay Kamble

arxiv: 2602.09969 · v2 · pith:3DGR3CLZnew · submitted 2026-02-10 · 💻 cs.LG · econ.EM· stat.ML

Causal Multi-Task Demand Learning

Varun Gupta , Vijay Kamble This is my paper

Pith reviewed 2026-05-16 02:19 UTC · model grok-4.3

classification 💻 cs.LG econ.EMstat.ML

keywords causal inferencemulti-task learningdemand estimationmeta-learningendogeneityprice responsetransfer learningconfounding

0 comments

The pith

A meta-learning framework identifies causal demand parameters across tasks by conditioning on all prices while masking two outcomes for supervision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles estimating heterogeneous linear price-response functions in multiple retail contexts where each context has rich covariates but limited price variation and prices may correlate with unobserved demand factors. It proposes identifying the conditional mean of task-specific causal demand parameters given a subset of observables by using an information design that includes all prices to handle cross-task confounding yet masks two demand outcomes to supply randomized supervision for identifiability. This design is shown to be maximally uniformly valid because revealing the masked outcomes does not guarantee identification of the causal target. A sympathetic reader would care because the approach allows borrowing strength across tasks for more accurate pricing while preserving causality despite endogeneity. Validation on real and synthetic data shows improved recovery of demand responses over standard transfer-learning methods.

Core claim

We propose a new meta-learning framework that identifies the conditional mean of task-specific causal demand parameters given a subset of task-specific observables despite such confounding, assuming that each task contains at least two distinct locally exogenous price points. This subset is carefully designed to include all of the prices to address cross-task confounding, while masking two demand outcomes that provide randomized supervision to address identifiability issues arising from the inclusion of all prices. We show that this information design is maximally uniformly valid, in that any refinement of the conditioning set that reveals withheld-outcome information is not guaranteed to be

What carries the argument

The information design that conditions on all prices while masking two demand outcomes to supply randomized supervision for identifying the conditional mean of causal demand parameters.

If this is right

The framework recovers demand responses more accurately than standard transfer-learning baselines on both real and synthetic data.
Causal estimation becomes feasible in multi-task settings even when prices are arbitrarily endogenous across tasks.
Rich covariates can be leveraged for transfer while the masking step prevents bias from full outcome revelation.
The design remains valid under any refinement that withholds the masked outcomes but loses guarantees if those outcomes are revealed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same masking strategy could be tested for identifying causal effects in other multi-task settings such as personalized pricing or heterogeneous treatment effects.
Relaxing the linear demand assumption while keeping the two-exogenous-points condition would be a direct next step to broaden applicability.
Empirical checks on datasets with varying numbers of exogenous prices per task would quantify how sensitive performance is to the minimal assumption.

Load-bearing premise

Each task contains at least two distinct locally exogenous price points and the proposed information design is maximally uniformly valid.

What would settle it

Collect a dataset in which some tasks have only one locally exogenous price point and check whether the recovered conditional mean causal parameters match ground-truth values from a fully randomized experiment.

Figures

Figures reproduced from arXiv: 2602.09969 by Varun Gupta, Vijay Kamble.

**Figure 2.** Figure 2: Estimation performance of the simple outcome-based meta-estimator [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Estimation performance of the DCMOML meta-learner [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Estimation error across confounding levels for [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Held-out RMSE with 95% confidence intervals. DCMOML is highlighted in red. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

We study a canonical multi-task demand-learning problem motivated by retail pricing, where a firm seeks to estimate heterogeneous linear price-response functions across multiple decision contexts. Each context is described by rich covariates but exhibits limited price variation, motivating transfer learning across tasks. A central challenge in leveraging cross-task transfer is endogeneity: prices may be arbitrarily correlated with unobserved task-level demand determinants across tasks. We propose a new meta-learning framework that identifies the conditional mean of task-specific causal demand parameters given a subset of task-specific observables despite such confounding, assuming that each task contains at least two distinct locally exogenous price points. This subset is carefully designed to include all of the prices to address cross-task confounding, while masking two demand outcomes that provide randomized supervision to address identifiability issues arising from the inclusion of all prices. We show that this information design is maximally uniformly valid, in that any refinement of the conditioning set that reveals withheld-outcome information is not guaranteed to identify the conditional mean causal target. We validate our method on real and synthetic data, demonstrating improved recovery of demand responses relative to standard transfer-learning baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a meta-learning framework for identifying causal demand parameters across tasks by conditioning on all prices while masking two outcomes, under the assumption of local exogeneity in each task.

read the letter

The main takeaway is that this work proposes a meta-learning approach to recover the conditional mean of task-specific causal price-response parameters in a multi-task demand setting. It does so by feeding all prices into the conditioning set to block cross-task confounders while holding out two demand outcomes for supervision, and it claims this design is maximally uniformly valid. The motivation from retail pricing with limited per-task price variation is clear, and the empirical checks on real and synthetic data reportedly beat standard transfer-learning baselines on recovery of demand responses.

Referee Report

3 major / 2 minor

Summary. The paper proposes a meta-learning framework for identifying the conditional mean of task-specific causal demand parameters in a multi-task retail pricing setting with heterogeneous linear price-response functions. Despite endogeneity from prices correlated with unobserved task-level demand determinants, the method identifies the target conditional mean given a subset of observables, under the assumption that each task has at least two distinct locally exogenous price points. The information design conditions on all prices while masking two demand outcomes for randomized supervision; the abstract claims this design is maximally uniformly valid (any refinement revealing withheld outcomes is not guaranteed to identify the target). Empirical results on real and synthetic data show improved recovery of demand responses relative to standard transfer-learning baselines.

Significance. If the identification result and maximal uniform validity hold, the work would offer a principled way to perform causal transfer learning across tasks with limited price variation and cross-task confounding, which is relevant for applied demand estimation in operations and marketing. The explicit conditioning on the local exogeneity assumption and the uniform-validity guarantee are theoretically attractive strengths; the empirical improvements over baselines provide initial evidence of practical value. However, the significance is tempered by the absence of visible derivations supporting the central claims.

major comments (3)

[Abstract and §3] Abstract and §3 (identification result): the claim that the proposed information design (conditioning on all prices while masking two outcomes) is 'maximally uniformly valid' is stated without any derivation, proof sketch, or reference to a theorem establishing that any refinement revealing the withheld outcomes fails to identify the conditional mean causal target. This is load-bearing for the central contribution.
[§4] §4 (empirical validation): the real-data application reports improved recovery relative to baselines, but provides no details on how the key assumption (each task contains at least two distinct locally exogenous price points) is verified or tested; without this, the empirical support for the identification result cannot be assessed.
[§2] §2 (framework): the meta-learning procedure is described at a high level, but the manuscript does not show the explicit mapping from the masked-outcome supervision to the estimator of the conditional mean of the task-specific causal parameters, leaving the link between the information design and the identification result opaque.

minor comments (2)

[§2] Notation for the task-specific observables and the masked outcomes should be introduced with a single consistent symbol table or diagram to improve readability.
[Abstract] The abstract and introduction use 'maximally uniformly valid' without a forward reference to the precise definition or theorem number; add such a reference.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (identification result): the claim that the proposed information design (conditioning on all prices while masking two outcomes) is 'maximally uniformly valid' is stated without any derivation, proof sketch, or reference to a theorem establishing that any refinement revealing the withheld outcomes fails to identify the conditional mean causal target. This is load-bearing for the central contribution.

Authors: We acknowledge that while Section 3 presents the information design and states the maximal uniform validity property, a self-contained proof sketch is not included in the main text. The argument relies on showing that revealing the masked outcomes introduces additional dependencies that violate identification under arbitrary cross-task confounding. In the revision we will add a formal theorem statement and proof outline in Section 3 (with full details moved to the appendix) that explicitly demonstrates why any refinement revealing the withheld demand outcomes is not guaranteed to identify the target conditional mean. revision: yes
Referee: [§4] §4 (empirical validation): the real-data application reports improved recovery relative to baselines, but provides no details on how the key assumption (each task contains at least two distinct locally exogenous price points) is verified or tested; without this, the empirical support for the identification result cannot be assessed.

Authors: We agree that explicit verification details are needed. The current manuscript treats the assumption as a maintained condition justified by the retail pricing domain (prices exhibit local randomness from promotions and inventory shocks). In the revision we will add a new subsection in Section 4 that reports per-task price variation statistics, confirms that every task satisfies the minimum of two distinct locally exogenous points, and includes a robustness check restricting the sample to tasks with the strongest evidence of local exogeneity. revision: yes
Referee: [§2] §2 (framework): the meta-learning procedure is described at a high level, but the manuscript does not show the explicit mapping from the masked-outcome supervision to the estimator of the conditional mean of the task-specific causal parameters, leaving the link between the information design and the identification result opaque.

Authors: We accept that the link between the masked supervision and the estimator could be stated more explicitly. Section 2 currently describes the framework conceptually. In the revision we will expand Section 2 with an explicit algorithmic mapping (including pseudocode) that shows how the supervised loss on the two masked demand outcomes, conditioned on all prices, produces the estimator for the conditional mean of the task-specific causal parameters. This will directly connect the information design to the identification result. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the meta-learning identification framework

full rationale

The paper's derivation chain centers on a meta-learning identification result for the conditional mean of task-specific causal demand parameters, explicitly conditioned on the domain assumption that each task contains at least two distinct locally exogenous price points. The information design (conditioning on all prices while masking two outcomes) is shown to be maximally uniformly valid without reducing the target quantity to a fitted parameter by construction, a self-definition, or a load-bearing self-citation. No equations or steps in the abstract or description rename known results, smuggle ansatzes via prior work, or force predictions from inputs; the result remains independent of the fitted values and is supported by external validation on real and synthetic data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of locally exogenous price points per task and the validity of the specific information design for identification.

axioms (1)

domain assumption Each task contains at least two distinct locally exogenous price points
Required for identifying the conditional mean of causal demand parameters despite cross-task confounding.

pith-pipeline@v0.9.0 · 5485 in / 1144 out tokens · 48879 ms · 2026-05-16T02:19:33.695599+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a new meta-learning framework that identifies the conditional mean of task-specific causal demand parameters given a subset of task-specific observables despite such confounding, assuming that each task contains at least two distinct locally exogenous price points.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This yields Decision-Conditioned Masked-Outcome Meta-Learning (DCMOML), which masks outcomes at two candidate query points and randomizes which one is used for training.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Steven Berry, James Levinsohn, and Ariel Pakes

doi: 10.1109/IJCNN.1991.155621. Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium.Econometrica, 63(4): 841–890,

work page doi:10.1109/ijcnn.1991.155621 1991
[2]

doi: 10.1201/9781420057669

ISBN 9781584881704. doi: 10.1201/9781420057669. Rich Caruana. Multitask learning.Machine Learning, 28(1):41–75,

work page doi:10.1201/9781420057669
[3]

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins

DOI: https://doi.org/10.24432/C5BW33. Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21 (1):C1–C68,

work page doi:10.24432/c5bw33
[4]

Applied causal inference powered by ML and AI

Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. Applied causal inference powered by ML and AI. arXiv:2403.02467,

work page arXiv
[5]

doi: 10.1017/CBO9780511761362

ISBN 9781107619678. doi: 10.1017/CBO9780511761362. Paperback reprint available (later printing). Bradley Efron and Carl Morris. Stein’s estimation rule and its competitors—an empirical bayes approach.Journal of the American Statistical Association, 68(341):117–130,

work page doi:10.1017/cbo9780511761362
[6]

Theodoros Evgeniou and Massimiliano Pontil

doi: 10.1080/01621459.1973.10481350. Theodoros Evgeniou and Massimiliano Pontil. Regularized multi-task learning. InProceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 109–117,

work page doi:10.1080/01621459.1973.10481350 1973
[7]

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy

doi: 10.1073/pnas.2014602118. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A flexible approach for counterfactual prediction.Proceedings of the 34th International Conference on Machine Learning (ICML), pages 1414–1423,

work page doi:10.1073/pnas.2014602118
[8]

Zhiyi Huang, Yishay Mansour, and Tim Roughgarden

Also available as arXiv:2004.05439. Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. InProceedings of the 2015 ACM Conference on Economics and Computation (EC),

work page arXiv 2004
[9]

Anderson, D.A

doi: 10.1214/aoms/1177729392. Tze Leung Lai and Ching Zong Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems.The Annals of Statistics, pages 154–166,

work page doi:10.1214/aoms/1177729392
[10]

Sentence-bert: Sentence embeddings using siamese BERT-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992,

work page 2019
[11]

arXiv preprint arXiv:2009.10982 , year=

Eric J. Tchetgen Tchetgen et al. An introduction to proximal causal learning.arXiv preprint arXiv:2009.10982,

work page arXiv 2009
[12]

doi: 10.1111/poms. 13337. Jeffrey M. Wooldridge.Econometric Analysis of Cross Section and Panel Data. MIT Press, 2 edition,

work page doi:10.1111/poms
[13]

If, in addition, the eigenvalue condition (20) holds, i.e. there exists λ >0 such that λmin(Q(X))≥λ almost surely, then L(gbΛ)− L(g ∗)≥λ E h ∥∆bΛ(X −(K ∗ i ,K) i )∥2 2 i .(42) Combining (40) with (42) yields E h ∥gbΛ(X −(K ∗ i ,K) i )−g ∗(X −(K ∗ i ,K) i )∥2 2 i p − →0, which is the claimedL 2 consistency. B Experimental setup details for Section 6.1 Each...

work page 2034

[1] [1]

Steven Berry, James Levinsohn, and Ariel Pakes

doi: 10.1109/IJCNN.1991.155621. Steven Berry, James Levinsohn, and Ariel Pakes. Automobile prices in market equilibrium.Econometrica, 63(4): 841–890,

work page doi:10.1109/ijcnn.1991.155621 1991

[2] [2]

doi: 10.1201/9781420057669

ISBN 9781584881704. doi: 10.1201/9781420057669. Rich Caruana. Multitask learning.Machine Learning, 28(1):41–75,

work page doi:10.1201/9781420057669

[3] [3]

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins

DOI: https://doi.org/10.24432/C5BW33. Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21 (1):C1–C68,

work page doi:10.24432/c5bw33

[4] [4]

Applied causal inference powered by ML and AI

Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. Applied causal inference powered by ML and AI. arXiv:2403.02467,

work page arXiv

[5] [5]

doi: 10.1017/CBO9780511761362

ISBN 9781107619678. doi: 10.1017/CBO9780511761362. Paperback reprint available (later printing). Bradley Efron and Carl Morris. Stein’s estimation rule and its competitors—an empirical bayes approach.Journal of the American Statistical Association, 68(341):117–130,

work page doi:10.1017/cbo9780511761362

[6] [6]

Theodoros Evgeniou and Massimiliano Pontil

doi: 10.1080/01621459.1973.10481350. Theodoros Evgeniou and Massimiliano Pontil. Regularized multi-task learning. InProceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 109–117,

work page doi:10.1080/01621459.1973.10481350 1973

[7] [7]

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy

doi: 10.1073/pnas.2014602118. Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. Deep IV: A flexible approach for counterfactual prediction.Proceedings of the 34th International Conference on Machine Learning (ICML), pages 1414–1423,

work page doi:10.1073/pnas.2014602118

[8] [8]

Zhiyi Huang, Yishay Mansour, and Tim Roughgarden

Also available as arXiv:2004.05439. Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. InProceedings of the 2015 ACM Conference on Economics and Computation (EC),

work page arXiv 2004

[9] [9]

Anderson, D.A

doi: 10.1214/aoms/1177729392. Tze Leung Lai and Ching Zong Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems.The Annals of Statistics, pages 154–166,

work page doi:10.1214/aoms/1177729392

[10] [10]

Sentence-bert: Sentence embeddings using siamese BERT-networks

Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992,

work page 2019

[11] [11]

arXiv preprint arXiv:2009.10982 , year=

Eric J. Tchetgen Tchetgen et al. An introduction to proximal causal learning.arXiv preprint arXiv:2009.10982,

work page arXiv 2009

[12] [12]

doi: 10.1111/poms. 13337. Jeffrey M. Wooldridge.Econometric Analysis of Cross Section and Panel Data. MIT Press, 2 edition,

work page doi:10.1111/poms

[13] [13]

If, in addition, the eigenvalue condition (20) holds, i.e. there exists λ >0 such that λmin(Q(X))≥λ almost surely, then L(gbΛ)− L(g ∗)≥λ E h ∥∆bΛ(X −(K ∗ i ,K) i )∥2 2 i .(42) Combining (40) with (42) yields E h ∥gbΛ(X −(K ∗ i ,K) i )−g ∗(X −(K ∗ i ,K) i )∥2 2 i p − →0, which is the claimedL 2 consistency. B Experimental setup details for Section 6.1 Each...

work page 2034