Estimation of Treatment Effects Under Nonstationarity via the Truncated Policy Gradient Estimator
read the original abstract
Randomized experiments (or A/B tests) are widely used to evaluate interventions in dynamic systems such as recommendation platforms, marketplaces, and digital health. In these settings, interventions affect both current and future system states, so estimating the global average treatment effect (GATE) requires accounting for temporal dynamics, which is especially challenging in the presence of nonstationarity; existing approaches suffer from high bias, high variance, or both. In this paper, we address this challenge via the novel Truncated Policy Gradient (TPG) estimator, which replaces instantaneous outcomes with short-horizon outcome trajectories. The estimator admits a policy gradient interpretation: it is a truncation of the first-order approximation to the GATE, yielding provable reductions in bias and variance in nonstationary Markovian settings. We further establish a central limit theorem for the TPG estimator and develop a consistent variance estimator that remains valid under nonstationarity with single-trajectory data. We validate our theory with two real-world case studies. The results show that relative to existing approaches, a well-calibrated TPG estimator can achieve a favorable balance between bias and variance in nonstationary settings, highlighting the value of the policy-gradient perspective for designing effective estimators under complex dynamics.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness
Dynamic marginal policy effects can be identified through reduced-form expressions and estimated with a doubly robust method under sequential unconfoundedness, avoiding full state observation and curse of horizon.
-
Robust Sequential Experimental Design for A/B Testing
A unified robust framework for sequential A/B testing bounds the worst-case mean squared error of treatment effect estimates under model misspecification in both contextual bandit and dynamic regimes.
-
Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness
Develops tractable reduced-form identification and a doubly robust estimator for dynamic marginal policy effects that avoids full state observation and exponential horizon curse.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.