Estimation of Treatment Effects Under Nonstationarity via the Truncated Policy Gradient Estimator

Ramesh Johari; Tianyi Peng; Wenqian Xing

arxiv: 2506.05308 · v3 · pith:SWZGYFVFnew · submitted 2025-06-05 · 📊 stat.ME

Estimation of Treatment Effects Under Nonstationarity via the Truncated Policy Gradient Estimator

Ramesh Johari , Tianyi Peng , Wenqian Xing This is my paper

classification 📊 stat.ME

keywords estimatorvariancebiasgradientnonstationaritypolicysettingsunder

0 comments

read the original abstract

Randomized experiments (or A/B tests) are widely used to evaluate interventions in dynamic systems such as recommendation platforms, marketplaces, and digital health. In these settings, interventions affect both current and future system states, so estimating the global average treatment effect (GATE) requires accounting for temporal dynamics, which is especially challenging in the presence of nonstationarity; existing approaches suffer from high bias, high variance, or both. In this paper, we address this challenge via the novel Truncated Policy Gradient (TPG) estimator, which replaces instantaneous outcomes with short-horizon outcome trajectories. The estimator admits a policy gradient interpretation: it is a truncation of the first-order approximation to the GATE, yielding provable reductions in bias and variance in nonstationary Markovian settings. We further establish a central limit theorem for the TPG estimator and develop a consistent variance estimator that remains valid under nonstationarity with single-trajectory data. We validate our theory with two real-world case studies. The results show that relative to existing approaches, a well-calibrated TPG estimator can achieve a favorable balance between bias and variance in nonstationary settings, highlighting the value of the policy-gradient perspective for designing effective estimators under complex dynamics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness
stat.ME 2026-04 unverdicted novelty 7.0

Dynamic marginal policy effects can be identified through reduced-form expressions and estimated with a doubly robust method under sequential unconfoundedness, avoiding full state observation and curse of horizon.
Robust Sequential Experimental Design for A/B Testing
stat.ML 2026-05 unverdicted novelty 6.0

A unified robust framework for sequential A/B testing bounds the worst-case mean squared error of treatment effect estimates under model misspecification in both contextual bandit and dynamic regimes.
Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness
stat.ME 2026-04 unverdicted novelty 6.0

Develops tractable reduced-form identification and a doubly robust estimator for dynamic marginal policy effects that avoids full state observation and exponential horizon curse.