Logging Policy Design for Off-Policy Evaluation

Connor Douglas; Foster Provost; Joel Persson

arxiv: 2605.15108 · v2 · pith:XCPSYI2Inew · submitted 2026-05-14 · 📊 stat.ML · cs.AI· cs.IR· cs.LG· stat.ME

Logging Policy Design for Off-Policy Evaluation

Connor Douglas , Joel Persson , Foster Provost This is my paper

Pith reviewed 2026-05-20 20:35 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.IRcs.LGstat.ME

keywords off-policy evaluationlogging policyreward-coverage tradeoffrecommendation systemspolicy evaluationtreatment selectionOPE error

0 comments

The pith

Logging policies for off-policy evaluation balance concentrating on high-reward actions against covering actions the target policy may take.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the choice of logging policy when collecting data for off-policy evaluation creates a direct tradeoff between lowering variance by focusing on high-reward actions and ensuring coverage of actions that the target policy might select. It develops a single framework that produces optimal logging policies under three information settings at data-collection time: full knowledge of the target and rewards, no knowledge, and partial knowledge via priors or estimates. This matters because companies evaluating new recommendation or treatment policies can use the framework to reduce estimation error without running live tests.

Core claim

We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take. We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are known, unknown, or partially known through priors or noisy estimates at logging time.

What carries the argument

Unifying framework for logging policy design that produces optimal policies across three informational regimes defined by knowledge of the target policy and reward distribution at collection time.

If this is right

When target policy and rewards are known, the optimal logging policy concentrates mass on high-reward actions consistent with the target.
When target and rewards are unknown, the optimal logging policy spreads probability more evenly to preserve coverage.
When priors or noisy estimates are available, the optimal logging policy tilts mass according to those priors while retaining necessary coverage.
Firms gain concrete design rules for choosing among candidate systems when the goal is accurate off-policy evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tradeoff framework could guide data collection in sequential or adaptive settings where logging policies update over time.
Operational limits on which actions can be logged may require adjusting the derived optima toward more conservative coverage.
Accurate estimation of the reward distribution at logging time becomes a critical upstream task for realizing the stated error reductions.

Load-bearing premise

The target policy and reward distribution, or priors over them, can be used or estimated when the logging policy is chosen.

What would settle it

Compare OPE error in a controlled recommender simulation when data is collected under the derived optimal logging policy for the known-regime case versus a uniform random logging policy, measuring the reduction in value-estimate variance.

Figures

Figures reproduced from arXiv: 2605.15108 by Connor Douglas, Foster Provost, Joel Persson.

**Figure 2.** Figure 2: Informational settings for logging policy design . The two dimensions of information [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of error across logging policy choices in [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: MSE as a function of the level of noise in the reward estimates ˆµ [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of posterior shrinkage and reward prediction noise on MSE and policy value . A [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: MSE of IPW estimator for soft-greedy logging policy classes [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: MSE of IPW estimator for soft-greedy logging policy classes [PITH_FULL_IMAGE:figures/full_fig_p054_7.png] view at source ↗

read the original abstract

Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice accuracy depends heavily on the logging policy used to collect data for computing the estimate. We study how to design logging policies that minimize OPE error for given target policies. We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take. We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time. Our results provide actionable guidance for firms choosing among multiple candidate recommendation systems. We demonstrate the importance of treatment selection when gathering data for OPE, and describe theoretically optimal approaches when this is a firm's primary objective. We also distill practical design principles for selecting logging policies when operational constraints prevent implementing the theoretical optimum.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives optimal logging policies for OPE under three knowledge regimes by formalizing the reward-coverage tradeoff.

read the letter

This paper's main contribution is a framework that derives optimal logging policies for minimizing OPE error under three different levels of prior knowledge about the target policy and reward distribution: fully known, unknown, and partially known via priors or noisy estimates at logging time. It frames the core issue as a tradeoff where focusing mass on high-reward actions cuts variance but can miss coverage for actions the target policy might select. The derivations appear to come from straightforward optimization of a standard OPE error objective once those inputs are treated as given. That keeps the math grounded without obvious circularity. The practical section distilling design principles for operational constraints is a plus for applied settings like recommender systems. It gives firms concrete guidance on data collection when the goal is reliable off-policy estimates rather than live testing. The soft spots are mostly around scope and validation. The whole analysis hinges on the ability to use or estimate the target policy and rewards when choosing the logging policy, which is a reasonable modeling choice but narrows how far the optima travel to messier real deployments. The abstract supplies no derivations or checks, so the full paper needs to show the closed forms hold and that the policies deliver measurable error reductions over baselines like uniform logging. If empirical results are thin or absent, that would be the clearest gap. This is aimed at researchers and practitioners working on OPE, bandits, or causal evaluation in ML systems who need better data collection strategies. Someone already familiar with importance sampling estimators would pick up the regime-specific policies quickly and get usable takeaways. I would send it for peer review. The topic hits a practical bottleneck with a coherent theoretical approach, so referees could usefully pressure-test the assumptions and any validation.

Referee Report

2 major / 3 minor

Summary. The paper studies the design of logging policies to minimize off-policy evaluation (OPE) error when estimating the value of a target policy from data collected under a different logging policy. It characterizes a reward-coverage tradeoff, introduces a unifying optimization framework for logging policy design, and derives closed-form optimal logging policies under three informational regimes: (i) target policy and reward distribution known, (ii) both unknown, and (iii) partially known via priors or noisy estimates. The work provides theoretical guidance for data collection in applications such as recommender systems and distills practical design principles under operational constraints.

Significance. If the derivations hold, the results supply actionable, theoretically optimal logging policies that directly address a practical bottleneck in OPE. The explicit treatment of the three informational regimes and the closed-form solutions constitute a clear strength, offering reproducible guidance that can be implemented or approximated in industrial settings where firms must choose among candidate policies before data collection.

major comments (2)

[§4.2] §4.2, the derivation of the optimal logging policy under the known regime: the objective combines variance and bias terms from the OPE estimator, but the paper should explicitly state whether the resulting policy remains optimal when the OPE estimator is replaced by a different unbiased estimator (e.g., DR instead of IPS).
[§5.1] §5.1, the partially-known regime: the optimal policy depends on the quality of the prior or noisy estimate supplied at logging time; the manuscript does not provide a sensitivity analysis showing how OPE error degrades when the prior mean or variance is misspecified by a fixed amount.

minor comments (3)

[Abstract / §2] The abstract and introduction use the phrase 'canonical informational regimes' without a brief justification for why exactly these three regimes are singled out; a short paragraph in §2 would improve readability.
[Throughout] Notation for the logging policy π_log and target policy π_target is introduced clearly, but the manuscript occasionally re-uses the symbol π without subscript in intermediate steps; consistent subscripting would prevent confusion.
[Figure 2] Figure 2 caption states that curves show 'OPE error versus coverage' but does not indicate whether the plotted quantity is mean squared error, variance only, or bias only; adding this detail would aid interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [§4.2] §4.2, the derivation of the optimal logging policy under the known regime: the objective combines variance and bias terms from the OPE estimator, but the paper should explicitly state whether the resulting policy remains optimal when the OPE estimator is replaced by a different unbiased estimator (e.g., DR instead of IPS).

Authors: The objective function minimized in §4.2 is the mean squared error of the inverse propensity scoring (IPS) estimator, which incorporates the specific bias and variance expressions for IPS. The closed-form optimal logging policy is therefore derived with respect to IPS-based OPE error. For an alternative unbiased estimator such as the doubly robust (DR) estimator, the objective would involve different terms, and the resulting optimal policy could differ. We will revise §4.2 to state this explicitly and note that the optimality result is specific to the IPS estimator. revision: yes
Referee: [§5.1] §5.1, the partially-known regime: the optimal policy depends on the quality of the prior or noisy estimate supplied at logging time; the manuscript does not provide a sensitivity analysis showing how OPE error degrades when the prior mean or variance is misspecified by a fixed amount.

Authors: We agree that sensitivity to prior misspecification is relevant for practical use. The derivations in §5.1 assume the supplied prior or noisy estimate is taken as given. While a comprehensive numerical sensitivity study would require additional modeling choices and experiments, we will add a brief discussion in the revised §5.1 describing how OPE error can degrade under small perturbations to the prior mean or variance, along with a qualitative assessment of robustness. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivations self-contained from explicit optimization

full rationale

The paper derives optimal logging policies by optimizing a standard OPE error objective (combining variance and bias terms) under three explicitly defined informational regimes, treating the target policy and reward distribution (or priors/noisy estimates) as given inputs at logging time. This structures the reward-coverage tradeoff analysis and closed-form policies without any reduction to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The central claims target external OPE performance benchmarks rather than internal consistency by construction, and the weakest assumption is stated openly as the modeling premise rather than a hidden flaw. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger is therefore minimal and provisional. The framework rests on standard OPE assumptions plus the ability to condition logging design on knowledge of target policy and rewards.

axioms (1)

domain assumption Target policy and reward distribution (or priors) are available or estimable at logging time to define the three informational regimes
The optimal policies are derived separately for known, unknown, and partially known cases; this premise structures the entire framework.

pith-pipeline@v0.9.0 · 5724 in / 1263 out tokens · 87743 ms · 2026-05-20T20:35:32.799017+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 2 internal anchors

[1]

Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =

Safe Exploration for Efficient Policy Evaluation and Comparison , author =. Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =. 2022 , publisher =

work page 2022
[2]

arXiv preprint arXiv:2402.08201 , year=

Off-policy evaluation in markov decision processes under weak distributional overlap , author=. arXiv preprint arXiv:2402.08201 , year=

work page arXiv
[3]

The Annals of Statistics , volume=

Off-policy evaluation in partially observed Markov decision processes under sequential ignorability , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023
[4]

Advances in neural information processing systems , volume=

Learning to optimize via information-directed sampling , author=. Advances in neural information processing systems , volume=

work page
[5]

Mathematics of Operations Research , volume=

Learning to optimize via posterior sampling , author=. Mathematics of Operations Research , volume=. 2014 , publisher=

work page 2014
[6]

arXiv preprint arXiv:2305.11812 , year=

Off-policy evaluation beyond overlap: partial identification through smoothness , author=. arXiv preprint arXiv:2305.11812 , year=

work page arXiv
[7]

Journal of Causal Inference , volume=

Adaptive normalization for IPW estimation , author=. Journal of Causal Inference , volume=. 2023 , publisher=

work page 2023
[8]

Advances in Neural Information Processing Systems , volume=

Counterfactual evaluation of peer-review assignment policies , author=. Advances in Neural Information Processing Systems , volume=

work page
[9]

Information Systems Research , year =

Carlos Fernández-Loría and Foster Provost and Jesse Anderton and Benjamin Carterette and Praveen Chandar , title =. Information Systems Research , year =

work page
[10]

Off-Policy Evaluation for Slate Recommendation , url =

Swaminathan, Adith and Krishnamurthy, Akshay and Agarwal, Alekh and Dudik, Miro and Langford, John and Jose, Damien and Zitouni, Imed , booktitle =. Off-Policy Evaluation for Slate Recommendation , url =

work page
[11]

Joel Persson , title =

work page
[12]

Imbens, Guido W and Rubin, Donald B , year=

work page
[13]

2020 , publisher=

Causal Inference: What If , author=. 2020 , publisher=

work page 2020
[14]

Clinical Kidney Journal , volume=

An Introduction to Inverse Probability of Treatment Weighting in Observational Research , author=. Clinical Kidney Journal , volume=. 2022 , publisher=

work page 2022
[15]

2020 , publisher=

Bandit Algorithms , author=. 2020 , publisher=

work page 2020
[16]

, title =

Ma, Cong and Zhu, Banghua and Jiao, Jiantao and Wainwright, Martin J. , title =. IEEE Transactions on Information Theory , volume =. 2022 , doi =

work page 2022
[17]

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , pages =

Chen, Minmin and Beutel, Alex and Covington, Paul and Jain, Sagar and Belletti, Francois and Chi, Ed H. , title =. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , pages =. 2019 , isbn =. doi:10.1145/3289600.3290999 , abstract =

work page doi:10.1145/3289600.3290999 2019
[18]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2017 , month =. doi:10.1609/aaai.v31i2.19104 , url =

work page doi:10.1609/aaai.v31i2.19104 2017
[19]

Journal of the American Statistical Association , volume=

Marginal Mean Models for Dynamic Regimes , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

work page 2001
[20]

Advances in Neural Information Processing Systems 36 (NeurIPS) , year =

Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making , author =. Advances in Neural Information Processing Systems 36 (NeurIPS) , year =

work page
[21]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

Off-Policy Evaluation and Learning from Logged Bandit Feedback , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

work page
[22]

Evaluating the Robustness of Off-Policy Evaluation , url =

Saito, Yuta and Udagawa, Takuma and Kiyohara, Haruka and Mogi, Kazuki and Narita, Yusuke and Tateno, Kei , urldate =. Evaluating the Robustness of Off-Policy Evaluation , url =. doi:10.48550/arXiv.2108.13703 , abstract =. 2108.13703 , keywords =

work page doi:10.48550/arxiv.2108.13703
[23]

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework , url =

Wan, Runzhe and Ge, Lin and Song, Rui , urldate =. Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework , url =. doi:10.48550/arXiv.2202.13227 , shorttitle =. 2202.13227 , keywords =

work page doi:10.48550/arxiv.2202.13227
[24]

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring , url =

Wan, Runzhe and Liu, Yu and. Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring , url =. doi:10.48550/arXiv.2304.00420 , shorttitle =. 2304.00420 , keywords =

work page doi:10.48550/arxiv.2304.00420
[25]

Off-Policy Policy Evaluation for Sequential Decisions Under Unobserved Confounding , url =

Namkoong, Hongseok and Keramati, Ramtin and Yadlowsky, Steve and Brunskill, Emma , urldate =. Off-Policy Policy Evaluation for Sequential Decisions Under Unobserved Confounding , url =. doi:10.48550/arXiv.2003.05623 , abstract =. 2003.05623 , keywords =

work page doi:10.48550/arxiv.2003.05623 2003
[26]

Minimax-Regret Sample Selection in Randomized Experiments , url =

Hu, Yuchen and Zhu, Henry and Brunskill, Emma and Wager, Stefan , urldate =. Minimax-Regret Sample Selection in Randomized Experiments , url =. doi:10.48550/arXiv.2403.01386 , abstract =. 2403.01386 , keywords =

work page doi:10.48550/arxiv.2403.01386
[27]

Adaptive Instrument Design for Indirect Experiments , url =

Chandak, Yash and Shankar, Shiv and Syrgkanis, Vasilis and Brunskill, Emma , urldate =. Adaptive Instrument Design for Indirect Experiments , url =. doi:10.48550/arXiv.2312.02438 , abstract =. 2312.02438 , keywords =

work page doi:10.48550/arxiv.2312.02438
[28]

Bayesian experimental design: A review.Statistical Science, 10(3):273–304, 1995

Chaloner, Kathryn and Verdinelli, Isabella , urldate =. Bayesian Experimental Design: A Review , volume =. doi:10.1214/ss/1177009939 , shorttitle =

work page doi:10.1214/ss/1177009939
[29]

2022 , organization=

Safe Optimal Design with Applications in Off-Policy Learning , author=. 2022 , organization=

work page 2022
[30]

Proceedings of The Web Conference (WWW) , year =

Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual Bandits , author =. Proceedings of The Web Conference (WWW) , year =

work page
[31]

Proceedings of the 34th International Conference on Machine Learning , series =

Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits , author =. Proceedings of the 34th International Conference on Machine Learning , series =

work page
[32]

Optimal Off-Policy Evaluation from Multiple Logging Policies , url =

Kallus, Nathan and Saito, Yuta and Uehara, Masatoshi , urldate =. Optimal Off-Policy Evaluation from Multiple Logging Policies , url =. Proceedings of the 38th International Conference on Machine Learning , publisher =

work page
[33]

arXiv preprint arXiv:2212.06355 , year=

A Review of Off-Policy Evaluation in Reinforcement Learning , author=. arXiv preprint arXiv:2212.06355 , year=

work page arXiv
[34]

, urldate =

Carlsson, Emil and Dubhashi, Devdatt and Johansson, Fredrik D. , urldate =. Thompson Sampling for Bandits with Clustered Arms , volume =. doi:10.24963/ijcai.2021/305 , abstract =

work page doi:10.24963/ijcai.2021/305 2021
[35]

and Dubhashi, Devdatt , urldate =

Carlsson, Emil and Basu, Debabrota and Johansson, Fredrik D. and Dubhashi, Devdatt , urldate =. Pure Exploration in Bandits with Linear Constraints , url =. doi:10.48550/arXiv.2306.12774 , abstract =. 2306.12774 , keywords =

work page doi:10.48550/arxiv.2306.12774
[36]

Power Constrained Bandits , url =

Yao, Jiayu and Brunskill, Emma and Pan, Weiwei and Murphy, Susan and Doshi-Velez, Finale , urldate =. Power Constrained Bandits , url =. doi:10.48550/arXiv.2004.06230 , abstract =. 2004.06230 , keywords =

work page doi:10.48550/arxiv.2004.06230 2004
[37]

Provably Good Batch Reinforcement Learning Without Great Exploration , url =

Liu, Yao and Swaminathan, Adith and Agarwal, Alekh and Brunskill, Emma , urldate =. Provably Good Batch Reinforcement Learning Without Great Exploration , url =. doi:10.48550/arXiv.2007.08202 , abstract =. 2007.08202 , keywords =

work page doi:10.48550/arxiv.2007.08202 2007
[38]

Design of Experiments for Stochastic Contextual Linear Bandits , url =

Zanette, Andrea and Dong, Kefan and Lee, Jonathan and Brunskill, Emma , urldate =. Design of Experiments for Stochastic Contextual Linear Bandits , url =. doi:10.48550/arXiv.2107.09912 , abstract =. 2107.09912 , keywords =

work page doi:10.48550/arxiv.2107.09912
[39]

Manski, Charles , urldate =

Dominitz, Jeff and F. Manski, Charles , urldate =. More Data or Better Data? A Statistical Decision Problem , volume =. doi:10.1093/restud/rdx005 , shorttitle =

work page doi:10.1093/restud/rdx005
[40]

Policy-Adaptive Estimator Selection for Off-Policy Evaluation , url =

Udagawa, Takuma and Kiyohara, Haruka and Narita, Yusuke and Saito, Yuta and Tateno, Kei , urldate =. Policy-Adaptive Estimator Selection for Off-Policy Evaluation , url =. doi:10.48550/arXiv.2211.13904 , abstract =. 2211.13904 , keywords =

work page doi:10.48550/arxiv.2211.13904
[41]

arXiv preprint arXiv:2402.10592 , year=

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification , author=. arXiv preprint arXiv:2402.10592 , year=

work page arXiv
[42]

Journal of the Royal Statistical Society , volume =

Neyman, Jerzy , title =. Journal of the Royal Statistical Society , volume =. 1934 , doi =

work page 1934
[43]

Federated Offline Policy Learning , url =

Carranza, Aldo Gael and Athey, Susan , urldate =. Federated Offline Policy Learning , url =. doi:10.48550/arXiv.2305.12407 , abstract =. 2305.12407 , keywords =

work page doi:10.48550/arxiv.2305.12407
[44]

1998 , publisher=

Reinforcement Learning: An Introduction , author=. 1998 , publisher=

work page 1998
[45]

Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits , url =

Cho, Brian and Meier, Dominik and Gan, Kyra and Kallus, Nathan , urldate =. Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits , url =. doi:10.48550/arXiv.2410.15564 , shorttitle =. 2410.15564 , keywords =

work page doi:10.48550/arxiv.2410.15564
[46]

Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments , url =

Sun, Ke and Kong, Linglong and Zhu, Hongtu and Shi, Chengchun , urldate =. Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments , url =. doi:10.48550/arXiv.2408.05342 , abstract =. 2408.05342 , keywords =

work page doi:10.48550/arxiv.2408.05342
[47]

Sequential Experimental Design for Transductive Linear Bandits

Fiez, Tanner and Jain, Lalit and Jamieson, Kevin and Ratliff, Lillian , urldate =. Sequential Experimental Design for Transductive Linear Bandits , url =. doi:10.48550/arXiv.1906.08399 , abstract =. 1906.08399 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906.08399 1906
[48]

Best-Arm Identification in Linear Bandits

Soare, Marta and Lazaric, Alessandro and Munos, Rémi , urldate =. Best-Arm Identification in Linear Bandits , url =. doi:10.48550/arXiv.1409.6110 , abstract =. 1409.6110 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1409.6110
[49]

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning , url =

Narang, Adhyyan and Wagenmaker, Andrew and Ratliff, Lillian and Jamieson, Kevin , urldate =. Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning , url =. doi:10.48550/arXiv.2406.06856 , abstract =. 2406.06856 , keywords =

work page doi:10.48550/arxiv.2406.06856
[50]

2025 , eprint =

Practical Improvements of A/B Testing with Off-Policy Estimation , author =. 2025 , eprint =. doi:10.48550/arXiv.2506.10677 , url =

work page doi:10.48550/arxiv.2506.10677 2025
[51]

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics , pages =

Toward Minimax Off-Policy Value Estimation , author =. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics , pages =. 2015 , editor =

work page 2015
[52]

International Conference on Machine Learning , pages=

Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[53]

Journal of the American Statistical Association , volume=

Statistical Inference for Online Decision Making: In a Contextual Bandit Setting , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

work page 2021
[54]

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

work page
[55]

arXiv preprint arXiv:2411.06329 , year=

Regret Minimization and Statistical Inference in Online Decision Making with High-Dimensional Covariates , author=. arXiv preprint arXiv:2411.06329 , year=

work page arXiv
[56]

International Conference on Artificial Intelligence and Statistics , pages=

Multi-Armed Bandit Experimental Design: Online Decision-Making and Adaptive Inference , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023
[57]

Journal of the American Statistical Association , volume=

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024
[58]

Advances in Neural Information Processing Systems , volume=

Inference for Batched Bandits , author=. Advances in Neural Information Processing Systems , volume=

work page
[59]

Advances in Neural Information Processing Systems , volume=

Statistical Inference with M-Estimators on Adaptively Collected Data , author=. Advances in Neural Information Processing Systems , volume=

work page
[60]

Advances in Neural Information Processing Systems , volume=

Post-Contextual-Bandit Inference , author=. Advances in Neural Information Processing Systems , volume=

work page
[61]

Annual Review of Statistics and its Application , volume=

Demystifying Inference After Adaptive Experiments , author=. Annual Review of Statistics and its Application , volume=. 2025 , publisher=

work page 2025
[62]

Statistical Science , volume=

Doubly Robust Policy Evaluation and Optimization , author=. Statistical Science , volume=. 2014 , publisher=

work page 2014
[63]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics , year =

Balanced Off-Policy Evaluation in General Action Spaces , author =. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics , year =

work page
[64]

Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages =

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design , author =. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages =

work page
[65]

Proceedings of the 39th International Conference on Machine Learning , year =

Off-Policy Evaluation for Large Action Spaces via Embeddings , author =. Proceedings of the 39th International Conference on Machine Learning , year =

work page
[66]

Journal of the American Statistical Association , volume=

A Generalization of Sampling Without Replacement from a Finite Universe , author=. Journal of the American Statistical Association , volume=. 1952 , publisher=

work page 1952
[67]

2003 , publisher=

Model Assisted Survey Sampling , author=. 2003 , publisher=

work page 2003
[68]

Journal of Computational and Graphical Statistics , volume=

Truncated Importance Sampling , author=. Journal of Computational and Graphical Statistics , volume=. 2008 , publisher=

work page 2008
[69]

Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , volume=

Estimation with Quadratic Loss , author=. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , volume=. 1961 , organization=

work page 1961
[70]

1973 , publisher=

Efron, Bradley and Morris, Carl , journal=. 1973 , publisher=

work page 1973
[71]

The Annals of Mathematical Statistics , pages=

Optimum allocation in linear regression theory , author=. The Annals of Mathematical Statistics , pages=. 1952 , publisher=

work page 1952
[72]

Optimum Experimental Designs, with

Atkinson, Anthony and Donev, Alexander and Tobias, Randall , volume=. Optimum Experimental Designs, with. 2007 , publisher=

work page 2007
[73]

Available at SSRN 5126080 , year=

Automated Experimental Design with Optimization from Historical Data Simulations , author=. Available at SSRN 5126080 , year=

work page
[74]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

Optimum Experimental Designs , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1959 , publisher=

work page 1959
[75]

The Annals of Mathematical Statistics , volume=

On the Efficient Design of Statistical Investigations , author=. The Annals of Mathematical Statistics , volume=. 1943 , publisher=

work page 1943
[76]

2009 , publisher=

An Introduction to Optimal Designs for Social and Biomedical Research , author=. 2009 , publisher=

work page 2009
[77]

Tutorials in Operations Research: Smarter Decisions for a Better World , pages=

Experimental Design for Causal Inference Through an Optimization Lens , author=. Tutorials in Operations Research: Smarter Decisions for a Better World , pages=. 2024 , publisher=

work page 2024
[78]

Management Science , volume=

Optimal Experimental Design for Staggered Rollouts , author=. Management Science , volume=. 2024 , publisher=

work page 2024
[79]

Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , volume=

An Empirical Bayes Approach to Statistics , author=. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , volume=. 1956 , organization=

work page 1956
[80]

The Empirical

Robbins, Herbert , journal=. The Empirical. 1964 , publisher=

work page 1964

Showing first 80 references.

[1] [1]

Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =

Safe Exploration for Efficient Policy Evaluation and Comparison , author =. Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =. 2022 , publisher =

work page 2022

[2] [2]

arXiv preprint arXiv:2402.08201 , year=

Off-policy evaluation in markov decision processes under weak distributional overlap , author=. arXiv preprint arXiv:2402.08201 , year=

work page arXiv

[3] [3]

The Annals of Statistics , volume=

Off-policy evaluation in partially observed Markov decision processes under sequential ignorability , author=. The Annals of Statistics , volume=. 2023 , publisher=

work page 2023

[4] [4]

Advances in neural information processing systems , volume=

Learning to optimize via information-directed sampling , author=. Advances in neural information processing systems , volume=

work page

[5] [5]

Mathematics of Operations Research , volume=

Learning to optimize via posterior sampling , author=. Mathematics of Operations Research , volume=. 2014 , publisher=

work page 2014

[6] [6]

arXiv preprint arXiv:2305.11812 , year=

Off-policy evaluation beyond overlap: partial identification through smoothness , author=. arXiv preprint arXiv:2305.11812 , year=

work page arXiv

[7] [7]

Journal of Causal Inference , volume=

Adaptive normalization for IPW estimation , author=. Journal of Causal Inference , volume=. 2023 , publisher=

work page 2023

[8] [8]

Advances in Neural Information Processing Systems , volume=

Counterfactual evaluation of peer-review assignment policies , author=. Advances in Neural Information Processing Systems , volume=

work page

[9] [9]

Information Systems Research , year =

Carlos Fernández-Loría and Foster Provost and Jesse Anderton and Benjamin Carterette and Praveen Chandar , title =. Information Systems Research , year =

work page

[10] [10]

Off-Policy Evaluation for Slate Recommendation , url =

Swaminathan, Adith and Krishnamurthy, Akshay and Agarwal, Alekh and Dudik, Miro and Langford, John and Jose, Damien and Zitouni, Imed , booktitle =. Off-Policy Evaluation for Slate Recommendation , url =

work page

[11] [11]

Joel Persson , title =

work page

[12] [12]

Imbens, Guido W and Rubin, Donald B , year=

work page

[13] [13]

2020 , publisher=

Causal Inference: What If , author=. 2020 , publisher=

work page 2020

[14] [14]

Clinical Kidney Journal , volume=

An Introduction to Inverse Probability of Treatment Weighting in Observational Research , author=. Clinical Kidney Journal , volume=. 2022 , publisher=

work page 2022

[15] [15]

2020 , publisher=

Bandit Algorithms , author=. 2020 , publisher=

work page 2020

[16] [16]

, title =

Ma, Cong and Zhu, Banghua and Jiao, Jiantao and Wainwright, Martin J. , title =. IEEE Transactions on Information Theory , volume =. 2022 , doi =

work page 2022

[17] [17]

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , pages =

Chen, Minmin and Beutel, Alex and Covington, Paul and Jain, Sagar and Belletti, Francois and Chi, Ed H. , title =. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , pages =. 2019 , isbn =. doi:10.1145/3289600.3290999 , abstract =

work page doi:10.1145/3289600.3290999 2019

[18] [18]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2017 , month =. doi:10.1609/aaai.v31i2.19104 , url =

work page doi:10.1609/aaai.v31i2.19104 2017

[19] [19]

Journal of the American Statistical Association , volume=

Marginal Mean Models for Dynamic Regimes , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

work page 2001

[20] [20]

Advances in Neural Information Processing Systems 36 (NeurIPS) , year =

Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making , author =. Advances in Neural Information Processing Systems 36 (NeurIPS) , year =

work page

[21] [21]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

Off-Policy Evaluation and Learning from Logged Bandit Feedback , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

work page

[22] [22]

Evaluating the Robustness of Off-Policy Evaluation , url =

Saito, Yuta and Udagawa, Takuma and Kiyohara, Haruka and Mogi, Kazuki and Narita, Yusuke and Tateno, Kei , urldate =. Evaluating the Robustness of Off-Policy Evaluation , url =. doi:10.48550/arXiv.2108.13703 , abstract =. 2108.13703 , keywords =

work page doi:10.48550/arxiv.2108.13703

[23] [23]

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework , url =

Wan, Runzhe and Ge, Lin and Song, Rui , urldate =. Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework , url =. doi:10.48550/arXiv.2202.13227 , shorttitle =. 2202.13227 , keywords =

work page doi:10.48550/arxiv.2202.13227

[24] [24]

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring , url =

Wan, Runzhe and Liu, Yu and. Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring , url =. doi:10.48550/arXiv.2304.00420 , shorttitle =. 2304.00420 , keywords =

work page doi:10.48550/arxiv.2304.00420

[25] [25]

Off-Policy Policy Evaluation for Sequential Decisions Under Unobserved Confounding , url =

Namkoong, Hongseok and Keramati, Ramtin and Yadlowsky, Steve and Brunskill, Emma , urldate =. Off-Policy Policy Evaluation for Sequential Decisions Under Unobserved Confounding , url =. doi:10.48550/arXiv.2003.05623 , abstract =. 2003.05623 , keywords =

work page doi:10.48550/arxiv.2003.05623 2003

[26] [26]

Minimax-Regret Sample Selection in Randomized Experiments , url =

Hu, Yuchen and Zhu, Henry and Brunskill, Emma and Wager, Stefan , urldate =. Minimax-Regret Sample Selection in Randomized Experiments , url =. doi:10.48550/arXiv.2403.01386 , abstract =. 2403.01386 , keywords =

work page doi:10.48550/arxiv.2403.01386

[27] [27]

Adaptive Instrument Design for Indirect Experiments , url =

Chandak, Yash and Shankar, Shiv and Syrgkanis, Vasilis and Brunskill, Emma , urldate =. Adaptive Instrument Design for Indirect Experiments , url =. doi:10.48550/arXiv.2312.02438 , abstract =. 2312.02438 , keywords =

work page doi:10.48550/arxiv.2312.02438

[28] [28]

Bayesian experimental design: A review.Statistical Science, 10(3):273–304, 1995

Chaloner, Kathryn and Verdinelli, Isabella , urldate =. Bayesian Experimental Design: A Review , volume =. doi:10.1214/ss/1177009939 , shorttitle =

work page doi:10.1214/ss/1177009939

[29] [29]

2022 , organization=

Safe Optimal Design with Applications in Off-Policy Learning , author=. 2022 , organization=

work page 2022

[30] [30]

Proceedings of The Web Conference (WWW) , year =

Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual Bandits , author =. Proceedings of The Web Conference (WWW) , year =

work page

[31] [31]

Proceedings of the 34th International Conference on Machine Learning , series =

Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits , author =. Proceedings of the 34th International Conference on Machine Learning , series =

work page

[32] [32]

Optimal Off-Policy Evaluation from Multiple Logging Policies , url =

Kallus, Nathan and Saito, Yuta and Uehara, Masatoshi , urldate =. Optimal Off-Policy Evaluation from Multiple Logging Policies , url =. Proceedings of the 38th International Conference on Machine Learning , publisher =

work page

[33] [33]

arXiv preprint arXiv:2212.06355 , year=

A Review of Off-Policy Evaluation in Reinforcement Learning , author=. arXiv preprint arXiv:2212.06355 , year=

work page arXiv

[34] [34]

, urldate =

Carlsson, Emil and Dubhashi, Devdatt and Johansson, Fredrik D. , urldate =. Thompson Sampling for Bandits with Clustered Arms , volume =. doi:10.24963/ijcai.2021/305 , abstract =

work page doi:10.24963/ijcai.2021/305 2021

[35] [35]

and Dubhashi, Devdatt , urldate =

Carlsson, Emil and Basu, Debabrota and Johansson, Fredrik D. and Dubhashi, Devdatt , urldate =. Pure Exploration in Bandits with Linear Constraints , url =. doi:10.48550/arXiv.2306.12774 , abstract =. 2306.12774 , keywords =

work page doi:10.48550/arxiv.2306.12774

[36] [36]

Power Constrained Bandits , url =

Yao, Jiayu and Brunskill, Emma and Pan, Weiwei and Murphy, Susan and Doshi-Velez, Finale , urldate =. Power Constrained Bandits , url =. doi:10.48550/arXiv.2004.06230 , abstract =. 2004.06230 , keywords =

work page doi:10.48550/arxiv.2004.06230 2004

[37] [37]

Provably Good Batch Reinforcement Learning Without Great Exploration , url =

Liu, Yao and Swaminathan, Adith and Agarwal, Alekh and Brunskill, Emma , urldate =. Provably Good Batch Reinforcement Learning Without Great Exploration , url =. doi:10.48550/arXiv.2007.08202 , abstract =. 2007.08202 , keywords =

work page doi:10.48550/arxiv.2007.08202 2007

[38] [38]

Design of Experiments for Stochastic Contextual Linear Bandits , url =

Zanette, Andrea and Dong, Kefan and Lee, Jonathan and Brunskill, Emma , urldate =. Design of Experiments for Stochastic Contextual Linear Bandits , url =. doi:10.48550/arXiv.2107.09912 , abstract =. 2107.09912 , keywords =

work page doi:10.48550/arxiv.2107.09912

[39] [39]

Manski, Charles , urldate =

Dominitz, Jeff and F. Manski, Charles , urldate =. More Data or Better Data? A Statistical Decision Problem , volume =. doi:10.1093/restud/rdx005 , shorttitle =

work page doi:10.1093/restud/rdx005

[40] [40]

Policy-Adaptive Estimator Selection for Off-Policy Evaluation , url =

Udagawa, Takuma and Kiyohara, Haruka and Narita, Yusuke and Saito, Yuta and Tateno, Kei , urldate =. Policy-Adaptive Estimator Selection for Off-Policy Evaluation , url =. doi:10.48550/arXiv.2211.13904 , abstract =. 2211.13904 , keywords =

work page doi:10.48550/arxiv.2211.13904

[41] [41]

arXiv preprint arXiv:2402.10592 , year=

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification , author=. arXiv preprint arXiv:2402.10592 , year=

work page arXiv

[42] [42]

Journal of the Royal Statistical Society , volume =

Neyman, Jerzy , title =. Journal of the Royal Statistical Society , volume =. 1934 , doi =

work page 1934

[43] [43]

Federated Offline Policy Learning , url =

Carranza, Aldo Gael and Athey, Susan , urldate =. Federated Offline Policy Learning , url =. doi:10.48550/arXiv.2305.12407 , abstract =. 2305.12407 , keywords =

work page doi:10.48550/arxiv.2305.12407

[44] [44]

1998 , publisher=

Reinforcement Learning: An Introduction , author=. 1998 , publisher=

work page 1998

[45] [45]

Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits , url =

Cho, Brian and Meier, Dominik and Gan, Kyra and Kallus, Nathan , urldate =. Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits , url =. doi:10.48550/arXiv.2410.15564 , shorttitle =. 2410.15564 , keywords =

work page doi:10.48550/arxiv.2410.15564

[46] [46]

Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments , url =

Sun, Ke and Kong, Linglong and Zhu, Hongtu and Shi, Chengchun , urldate =. Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments , url =. doi:10.48550/arXiv.2408.05342 , abstract =. 2408.05342 , keywords =

work page doi:10.48550/arxiv.2408.05342

[47] [47]

Sequential Experimental Design for Transductive Linear Bandits

Fiez, Tanner and Jain, Lalit and Jamieson, Kevin and Ratliff, Lillian , urldate =. Sequential Experimental Design for Transductive Linear Bandits , url =. doi:10.48550/arXiv.1906.08399 , abstract =. 1906.08399 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906.08399 1906

[48] [48]

Best-Arm Identification in Linear Bandits

Soare, Marta and Lazaric, Alessandro and Munos, Rémi , urldate =. Best-Arm Identification in Linear Bandits , url =. doi:10.48550/arXiv.1409.6110 , abstract =. 1409.6110 , keywords =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1409.6110

[49] [49]

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning , url =

Narang, Adhyyan and Wagenmaker, Andrew and Ratliff, Lillian and Jamieson, Kevin , urldate =. Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning , url =. doi:10.48550/arXiv.2406.06856 , abstract =. 2406.06856 , keywords =

work page doi:10.48550/arxiv.2406.06856

[50] [50]

2025 , eprint =

Practical Improvements of A/B Testing with Off-Policy Estimation , author =. 2025 , eprint =. doi:10.48550/arXiv.2506.10677 , url =

work page doi:10.48550/arxiv.2506.10677 2025

[51] [51]

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics , pages =

Toward Minimax Off-Policy Value Estimation , author =. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics , pages =. 2015 , editor =

work page 2015

[52] [52]

International Conference on Machine Learning , pages=

Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017

[53] [53]

Journal of the American Statistical Association , volume=

Statistical Inference for Online Decision Making: In a Contextual Bandit Setting , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

work page 2021

[54] [54]

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

work page

[55] [55]

arXiv preprint arXiv:2411.06329 , year=

Regret Minimization and Statistical Inference in Online Decision Making with High-Dimensional Covariates , author=. arXiv preprint arXiv:2411.06329 , year=

work page arXiv

[56] [56]

International Conference on Artificial Intelligence and Statistics , pages=

Multi-Armed Bandit Experimental Design: Online Decision-Making and Adaptive Inference , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023

[57] [57]

Journal of the American Statistical Association , volume=

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024

[58] [58]

Advances in Neural Information Processing Systems , volume=

Inference for Batched Bandits , author=. Advances in Neural Information Processing Systems , volume=

work page

[59] [59]

Advances in Neural Information Processing Systems , volume=

Statistical Inference with M-Estimators on Adaptively Collected Data , author=. Advances in Neural Information Processing Systems , volume=

work page

[60] [60]

Advances in Neural Information Processing Systems , volume=

Post-Contextual-Bandit Inference , author=. Advances in Neural Information Processing Systems , volume=

work page

[61] [61]

Annual Review of Statistics and its Application , volume=

Demystifying Inference After Adaptive Experiments , author=. Annual Review of Statistics and its Application , volume=. 2025 , publisher=

work page 2025

[62] [62]

Statistical Science , volume=

Doubly Robust Policy Evaluation and Optimization , author=. Statistical Science , volume=. 2014 , publisher=

work page 2014

[63] [63]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics , year =

Balanced Off-Policy Evaluation in General Action Spaces , author =. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics , year =

work page

[64] [64]

Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages =

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design , author =. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages =

work page

[65] [65]

Proceedings of the 39th International Conference on Machine Learning , year =

Off-Policy Evaluation for Large Action Spaces via Embeddings , author =. Proceedings of the 39th International Conference on Machine Learning , year =

work page

[66] [66]

Journal of the American Statistical Association , volume=

A Generalization of Sampling Without Replacement from a Finite Universe , author=. Journal of the American Statistical Association , volume=. 1952 , publisher=

work page 1952

[67] [67]

2003 , publisher=

Model Assisted Survey Sampling , author=. 2003 , publisher=

work page 2003

[68] [68]

Journal of Computational and Graphical Statistics , volume=

Truncated Importance Sampling , author=. Journal of Computational and Graphical Statistics , volume=. 2008 , publisher=

work page 2008

[69] [69]

Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , volume=

Estimation with Quadratic Loss , author=. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , volume=. 1961 , organization=

work page 1961

[70] [70]

1973 , publisher=

Efron, Bradley and Morris, Carl , journal=. 1973 , publisher=

work page 1973

[71] [71]

The Annals of Mathematical Statistics , pages=

Optimum allocation in linear regression theory , author=. The Annals of Mathematical Statistics , pages=. 1952 , publisher=

work page 1952

[72] [72]

Optimum Experimental Designs, with

Atkinson, Anthony and Donev, Alexander and Tobias, Randall , volume=. Optimum Experimental Designs, with. 2007 , publisher=

work page 2007

[73] [73]

Available at SSRN 5126080 , year=

Automated Experimental Design with Optimization from Historical Data Simulations , author=. Available at SSRN 5126080 , year=

work page

[74] [74]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

Optimum Experimental Designs , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1959 , publisher=

work page 1959

[75] [75]

The Annals of Mathematical Statistics , volume=

On the Efficient Design of Statistical Investigations , author=. The Annals of Mathematical Statistics , volume=. 1943 , publisher=

work page 1943

[76] [76]

2009 , publisher=

An Introduction to Optimal Designs for Social and Biomedical Research , author=. 2009 , publisher=

work page 2009

[77] [77]

Tutorials in Operations Research: Smarter Decisions for a Better World , pages=

Experimental Design for Causal Inference Through an Optimization Lens , author=. Tutorials in Operations Research: Smarter Decisions for a Better World , pages=. 2024 , publisher=

work page 2024

[78] [78]

Management Science , volume=

Optimal Experimental Design for Staggered Rollouts , author=. Management Science , volume=. 2024 , publisher=

work page 2024

[79] [79]

Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , volume=

An Empirical Bayes Approach to Statistics , author=. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , volume=. 1956 , organization=

work page 1956

[80] [80]

The Empirical

Robbins, Herbert , journal=. The Empirical. 1964 , publisher=

work page 1964