pith. sign in

arxiv: 2605.15108 · v2 · pith:XCPSYI2Inew · submitted 2026-05-14 · 📊 stat.ML · cs.AI· cs.IR· cs.LG· stat.ME

Logging Policy Design for Off-Policy Evaluation

Pith reviewed 2026-05-20 20:35 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.IRcs.LGstat.ME
keywords off-policy evaluationlogging policyreward-coverage tradeoffrecommendation systemspolicy evaluationtreatment selectionOPE error
0
0 comments X

The pith

Logging policies for off-policy evaluation balance concentrating on high-reward actions against covering actions the target policy may take.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the choice of logging policy when collecting data for off-policy evaluation creates a direct tradeoff between lowering variance by focusing on high-reward actions and ensuring coverage of actions that the target policy might select. It develops a single framework that produces optimal logging policies under three information settings at data-collection time: full knowledge of the target and rewards, no knowledge, and partial knowledge via priors or estimates. This matters because companies evaluating new recommendation or treatment policies can use the framework to reduce estimation error without running live tests.

Core claim

We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take. We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are known, unknown, or partially known through priors or noisy estimates at logging time.

What carries the argument

Unifying framework for logging policy design that produces optimal policies across three informational regimes defined by knowledge of the target policy and reward distribution at collection time.

If this is right

  • When target policy and rewards are known, the optimal logging policy concentrates mass on high-reward actions consistent with the target.
  • When target and rewards are unknown, the optimal logging policy spreads probability more evenly to preserve coverage.
  • When priors or noisy estimates are available, the optimal logging policy tilts mass according to those priors while retaining necessary coverage.
  • Firms gain concrete design rules for choosing among candidate systems when the goal is accurate off-policy evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tradeoff framework could guide data collection in sequential or adaptive settings where logging policies update over time.
  • Operational limits on which actions can be logged may require adjusting the derived optima toward more conservative coverage.
  • Accurate estimation of the reward distribution at logging time becomes a critical upstream task for realizing the stated error reductions.

Load-bearing premise

The target policy and reward distribution, or priors over them, can be used or estimated when the logging policy is chosen.

What would settle it

Compare OPE error in a controlled recommender simulation when data is collected under the derived optimal logging policy for the known-regime case versus a uniform random logging policy, measuring the reduction in value-estimate variance.

Figures

Figures reproduced from arXiv: 2605.15108 by Connor Douglas, Foster Provost, Joel Persson.

Figure 1
Figure 1. Figure 1: Dependence of IPW estimates on logging policy . Histogram of [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Informational settings for logging policy design . The two dimensions of information [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of error across logging policy choices in [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MSE as a function of the level of noise in the reward estimates ˆµ [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of posterior shrinkage and reward prediction noise on MSE and policy value . A [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: MSE of IPW estimator for soft-greedy logging policy classes [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MSE of IPW estimator for soft-greedy logging policy classes [PITH_FULL_IMAGE:figures/full_fig_p054_7.png] view at source ↗
read the original abstract

Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice accuracy depends heavily on the logging policy used to collect data for computing the estimate. We study how to design logging policies that minimize OPE error for given target policies. We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take. We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time. Our results provide actionable guidance for firms choosing among multiple candidate recommendation systems. We demonstrate the importance of treatment selection when gathering data for OPE, and describe theoretically optimal approaches when this is a firm's primary objective. We also distill practical design principles for selecting logging policies when operational constraints prevent implementing the theoretical optimum.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies the design of logging policies to minimize off-policy evaluation (OPE) error when estimating the value of a target policy from data collected under a different logging policy. It characterizes a reward-coverage tradeoff, introduces a unifying optimization framework for logging policy design, and derives closed-form optimal logging policies under three informational regimes: (i) target policy and reward distribution known, (ii) both unknown, and (iii) partially known via priors or noisy estimates. The work provides theoretical guidance for data collection in applications such as recommender systems and distills practical design principles under operational constraints.

Significance. If the derivations hold, the results supply actionable, theoretically optimal logging policies that directly address a practical bottleneck in OPE. The explicit treatment of the three informational regimes and the closed-form solutions constitute a clear strength, offering reproducible guidance that can be implemented or approximated in industrial settings where firms must choose among candidate policies before data collection.

major comments (2)
  1. [§4.2] §4.2, the derivation of the optimal logging policy under the known regime: the objective combines variance and bias terms from the OPE estimator, but the paper should explicitly state whether the resulting policy remains optimal when the OPE estimator is replaced by a different unbiased estimator (e.g., DR instead of IPS).
  2. [§5.1] §5.1, the partially-known regime: the optimal policy depends on the quality of the prior or noisy estimate supplied at logging time; the manuscript does not provide a sensitivity analysis showing how OPE error degrades when the prior mean or variance is misspecified by a fixed amount.
minor comments (3)
  1. [Abstract / §2] The abstract and introduction use the phrase 'canonical informational regimes' without a brief justification for why exactly these three regimes are singled out; a short paragraph in §2 would improve readability.
  2. [Throughout] Notation for the logging policy π_log and target policy π_target is introduced clearly, but the manuscript occasionally re-uses the symbol π without subscript in intermediate steps; consistent subscripting would prevent confusion.
  3. [Figure 2] Figure 2 caption states that curves show 'OPE error versus coverage' but does not indicate whether the plotted quantity is mean squared error, variance only, or bias only; adding this detail would aid interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive comments. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [§4.2] §4.2, the derivation of the optimal logging policy under the known regime: the objective combines variance and bias terms from the OPE estimator, but the paper should explicitly state whether the resulting policy remains optimal when the OPE estimator is replaced by a different unbiased estimator (e.g., DR instead of IPS).

    Authors: The objective function minimized in §4.2 is the mean squared error of the inverse propensity scoring (IPS) estimator, which incorporates the specific bias and variance expressions for IPS. The closed-form optimal logging policy is therefore derived with respect to IPS-based OPE error. For an alternative unbiased estimator such as the doubly robust (DR) estimator, the objective would involve different terms, and the resulting optimal policy could differ. We will revise §4.2 to state this explicitly and note that the optimality result is specific to the IPS estimator. revision: yes

  2. Referee: [§5.1] §5.1, the partially-known regime: the optimal policy depends on the quality of the prior or noisy estimate supplied at logging time; the manuscript does not provide a sensitivity analysis showing how OPE error degrades when the prior mean or variance is misspecified by a fixed amount.

    Authors: We agree that sensitivity to prior misspecification is relevant for practical use. The derivations in §5.1 assume the supplied prior or noisy estimate is taken as given. While a comprehensive numerical sensitivity study would require additional modeling choices and experiments, we will add a brief discussion in the revised §5.1 describing how OPE error can degrade under small perturbations to the prior mean or variance, along with a qualitative assessment of robustness. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivations self-contained from explicit optimization

full rationale

The paper derives optimal logging policies by optimizing a standard OPE error objective (combining variance and bias terms) under three explicitly defined informational regimes, treating the target policy and reward distribution (or priors/noisy estimates) as given inputs at logging time. This structures the reward-coverage tradeoff analysis and closed-form policies without any reduction to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The central claims target external OPE performance benchmarks rather than internal consistency by construction, and the weakest assumption is stated openly as the modeling premise rather than a hidden flaw. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger is therefore minimal and provisional. The framework rests on standard OPE assumptions plus the ability to condition logging design on knowledge of target policy and rewards.

axioms (1)
  • domain assumption Target policy and reward distribution (or priors) are available or estimable at logging time to define the three informational regimes
    The optimal policies are derived separately for known, unknown, and partially known cases; this premise structures the entire framework.

pith-pipeline@v0.9.0 · 5724 in / 1263 out tokens · 87743 ms · 2026-05-20T20:35:32.799017+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 2 internal anchors

  1. [1]

    Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =

    Safe Exploration for Efficient Policy Evaluation and Comparison , author =. Proceedings of the 39th International Conference on Machine Learning (ICML) , pages =. 2022 , publisher =

  2. [2]

    arXiv preprint arXiv:2402.08201 , year=

    Off-policy evaluation in markov decision processes under weak distributional overlap , author=. arXiv preprint arXiv:2402.08201 , year=

  3. [3]

    The Annals of Statistics , volume=

    Off-policy evaluation in partially observed Markov decision processes under sequential ignorability , author=. The Annals of Statistics , volume=. 2023 , publisher=

  4. [4]

    Advances in neural information processing systems , volume=

    Learning to optimize via information-directed sampling , author=. Advances in neural information processing systems , volume=

  5. [5]

    Mathematics of Operations Research , volume=

    Learning to optimize via posterior sampling , author=. Mathematics of Operations Research , volume=. 2014 , publisher=

  6. [6]

    arXiv preprint arXiv:2305.11812 , year=

    Off-policy evaluation beyond overlap: partial identification through smoothness , author=. arXiv preprint arXiv:2305.11812 , year=

  7. [7]

    Journal of Causal Inference , volume=

    Adaptive normalization for IPW estimation , author=. Journal of Causal Inference , volume=. 2023 , publisher=

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Counterfactual evaluation of peer-review assignment policies , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    Information Systems Research , year =

    Carlos Fernández-Loría and Foster Provost and Jesse Anderton and Benjamin Carterette and Praveen Chandar , title =. Information Systems Research , year =

  10. [10]

    Off-Policy Evaluation for Slate Recommendation , url =

    Swaminathan, Adith and Krishnamurthy, Akshay and Agarwal, Alekh and Dudik, Miro and Langford, John and Jose, Damien and Zitouni, Imed , booktitle =. Off-Policy Evaluation for Slate Recommendation , url =

  11. [11]

    Joel Persson , title =

  12. [12]

    Imbens, Guido W and Rubin, Donald B , year=

  13. [13]

    2020 , publisher=

    Causal Inference: What If , author=. 2020 , publisher=

  14. [14]

    Clinical Kidney Journal , volume=

    An Introduction to Inverse Probability of Treatment Weighting in Observational Research , author=. Clinical Kidney Journal , volume=. 2022 , publisher=

  15. [15]

    2020 , publisher=

    Bandit Algorithms , author=. 2020 , publisher=

  16. [16]

    , title =

    Ma, Cong and Zhu, Banghua and Jiao, Jiantao and Wainwright, Martin J. , title =. IEEE Transactions on Information Theory , volume =. 2022 , doi =

  17. [17]

    Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , pages =

    Chen, Minmin and Beutel, Alex and Covington, Paul and Jain, Sagar and Belletti, Francois and Chi, Ed H. , title =. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining , pages =. 2019 , isbn =. doi:10.1145/3289600.3290999 , abstract =

  18. [18]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2017 , month =. doi:10.1609/aaai.v31i2.19104 , url =

  19. [19]

    Journal of the American Statistical Association , volume=

    Marginal Mean Models for Dynamic Regimes , author=. Journal of the American Statistical Association , volume=. 2001 , publisher=

  20. [20]

    Advances in Neural Information Processing Systems 36 (NeurIPS) , year =

    Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making , author =. Advances in Neural Information Processing Systems 36 (NeurIPS) , year =

  21. [21]

    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

    Off-Policy Evaluation and Learning from Logged Bandit Feedback , author =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

  22. [22]

    Evaluating the Robustness of Off-Policy Evaluation , url =

    Saito, Yuta and Udagawa, Takuma and Kiyohara, Haruka and Mogi, Kazuki and Narita, Yusuke and Tateno, Kei , urldate =. Evaluating the Robustness of Off-Policy Evaluation , url =. doi:10.48550/arXiv.2108.13703 , abstract =. 2108.13703 , keywords =

  23. [23]

    Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework , url =

    Wan, Runzhe and Ge, Lin and Song, Rui , urldate =. Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework , url =. doi:10.48550/arXiv.2202.13227 , shorttitle =. 2202.13227 , keywords =

  24. [24]

    Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring , url =

    Wan, Runzhe and Liu, Yu and. Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring , url =. doi:10.48550/arXiv.2304.00420 , shorttitle =. 2304.00420 , keywords =

  25. [25]

    Off-Policy Policy Evaluation for Sequential Decisions Under Unobserved Confounding , url =

    Namkoong, Hongseok and Keramati, Ramtin and Yadlowsky, Steve and Brunskill, Emma , urldate =. Off-Policy Policy Evaluation for Sequential Decisions Under Unobserved Confounding , url =. doi:10.48550/arXiv.2003.05623 , abstract =. 2003.05623 , keywords =

  26. [26]

    Minimax-Regret Sample Selection in Randomized Experiments , url =

    Hu, Yuchen and Zhu, Henry and Brunskill, Emma and Wager, Stefan , urldate =. Minimax-Regret Sample Selection in Randomized Experiments , url =. doi:10.48550/arXiv.2403.01386 , abstract =. 2403.01386 , keywords =

  27. [27]

    Adaptive Instrument Design for Indirect Experiments , url =

    Chandak, Yash and Shankar, Shiv and Syrgkanis, Vasilis and Brunskill, Emma , urldate =. Adaptive Instrument Design for Indirect Experiments , url =. doi:10.48550/arXiv.2312.02438 , abstract =. 2312.02438 , keywords =

  28. [28]

    Bayesian experimental design: A review.Statistical Science, 10(3):273–304, 1995

    Chaloner, Kathryn and Verdinelli, Isabella , urldate =. Bayesian Experimental Design: A Review , volume =. doi:10.1214/ss/1177009939 , shorttitle =

  29. [29]

    2022 , organization=

    Safe Optimal Design with Applications in Off-Policy Learning , author=. 2022 , organization=

  30. [30]

    Proceedings of The Web Conference (WWW) , year =

    Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual Bandits , author =. Proceedings of The Web Conference (WWW) , year =

  31. [31]

    Proceedings of the 34th International Conference on Machine Learning , series =

    Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits , author =. Proceedings of the 34th International Conference on Machine Learning , series =

  32. [32]

    Optimal Off-Policy Evaluation from Multiple Logging Policies , url =

    Kallus, Nathan and Saito, Yuta and Uehara, Masatoshi , urldate =. Optimal Off-Policy Evaluation from Multiple Logging Policies , url =. Proceedings of the 38th International Conference on Machine Learning , publisher =

  33. [33]

    arXiv preprint arXiv:2212.06355 , year=

    A Review of Off-Policy Evaluation in Reinforcement Learning , author=. arXiv preprint arXiv:2212.06355 , year=

  34. [34]

    , urldate =

    Carlsson, Emil and Dubhashi, Devdatt and Johansson, Fredrik D. , urldate =. Thompson Sampling for Bandits with Clustered Arms , volume =. doi:10.24963/ijcai.2021/305 , abstract =

  35. [35]

    and Dubhashi, Devdatt , urldate =

    Carlsson, Emil and Basu, Debabrota and Johansson, Fredrik D. and Dubhashi, Devdatt , urldate =. Pure Exploration in Bandits with Linear Constraints , url =. doi:10.48550/arXiv.2306.12774 , abstract =. 2306.12774 , keywords =

  36. [36]

    Power Constrained Bandits , url =

    Yao, Jiayu and Brunskill, Emma and Pan, Weiwei and Murphy, Susan and Doshi-Velez, Finale , urldate =. Power Constrained Bandits , url =. doi:10.48550/arXiv.2004.06230 , abstract =. 2004.06230 , keywords =

  37. [37]

    Provably Good Batch Reinforcement Learning Without Great Exploration , url =

    Liu, Yao and Swaminathan, Adith and Agarwal, Alekh and Brunskill, Emma , urldate =. Provably Good Batch Reinforcement Learning Without Great Exploration , url =. doi:10.48550/arXiv.2007.08202 , abstract =. 2007.08202 , keywords =

  38. [38]

    Design of Experiments for Stochastic Contextual Linear Bandits , url =

    Zanette, Andrea and Dong, Kefan and Lee, Jonathan and Brunskill, Emma , urldate =. Design of Experiments for Stochastic Contextual Linear Bandits , url =. doi:10.48550/arXiv.2107.09912 , abstract =. 2107.09912 , keywords =

  39. [39]

    Manski, Charles , urldate =

    Dominitz, Jeff and F. Manski, Charles , urldate =. More Data or Better Data? A Statistical Decision Problem , volume =. doi:10.1093/restud/rdx005 , shorttitle =

  40. [40]

    Policy-Adaptive Estimator Selection for Off-Policy Evaluation , url =

    Udagawa, Takuma and Kiyohara, Haruka and Narita, Yusuke and Saito, Yuta and Tateno, Kei , urldate =. Policy-Adaptive Estimator Selection for Off-Policy Evaluation , url =. doi:10.48550/arXiv.2211.13904 , abstract =. 2211.13904 , keywords =

  41. [41]

    arXiv preprint arXiv:2402.10592 , year=

    Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification , author=. arXiv preprint arXiv:2402.10592 , year=

  42. [42]

    Journal of the Royal Statistical Society , volume =

    Neyman, Jerzy , title =. Journal of the Royal Statistical Society , volume =. 1934 , doi =

  43. [43]

    Federated Offline Policy Learning , url =

    Carranza, Aldo Gael and Athey, Susan , urldate =. Federated Offline Policy Learning , url =. doi:10.48550/arXiv.2305.12407 , abstract =. 2305.12407 , keywords =

  44. [44]

    1998 , publisher=

    Reinforcement Learning: An Introduction , author=. 1998 , publisher=

  45. [45]

    Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits , url =

    Cho, Brian and Meier, Dominik and Gan, Kyra and Kallus, Nathan , urldate =. Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits , url =. doi:10.48550/arXiv.2410.15564 , shorttitle =. 2410.15564 , keywords =

  46. [46]

    Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments , url =

    Sun, Ke and Kong, Linglong and Zhu, Hongtu and Shi, Chengchun , urldate =. Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments , url =. doi:10.48550/arXiv.2408.05342 , abstract =. 2408.05342 , keywords =

  47. [47]

    Sequential Experimental Design for Transductive Linear Bandits

    Fiez, Tanner and Jain, Lalit and Jamieson, Kevin and Ratliff, Lillian , urldate =. Sequential Experimental Design for Transductive Linear Bandits , url =. doi:10.48550/arXiv.1906.08399 , abstract =. 1906.08399 , keywords =

  48. [48]

    Best-Arm Identification in Linear Bandits

    Soare, Marta and Lazaric, Alessandro and Munos, Rémi , urldate =. Best-Arm Identification in Linear Bandits , url =. doi:10.48550/arXiv.1409.6110 , abstract =. 1409.6110 , keywords =

  49. [49]

    Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning , url =

    Narang, Adhyyan and Wagenmaker, Andrew and Ratliff, Lillian and Jamieson, Kevin , urldate =. Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning , url =. doi:10.48550/arXiv.2406.06856 , abstract =. 2406.06856 , keywords =

  50. [50]

    2025 , eprint =

    Practical Improvements of A/B Testing with Off-Policy Estimation , author =. 2025 , eprint =. doi:10.48550/arXiv.2506.10677 , url =

  51. [51]

    Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics , pages =

    Toward Minimax Off-Policy Value Estimation , author =. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics , pages =. 2015 , editor =

  52. [52]

    International Conference on Machine Learning , pages=

    Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  53. [53]

    Journal of the American Statistical Association , volume=

    Statistical Inference for Online Decision Making: In a Contextual Bandit Setting , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

  54. [54]

    Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

    Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits , author=. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages=

  55. [55]

    arXiv preprint arXiv:2411.06329 , year=

    Regret Minimization and Statistical Inference in Online Decision Making with High-Dimensional Covariates , author=. arXiv preprint arXiv:2411.06329 , year=

  56. [56]

    International Conference on Artificial Intelligence and Statistics , pages=

    Multi-Armed Bandit Experimental Design: Online Decision-Making and Adaptive Inference , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

  57. [57]

    Journal of the American Statistical Association , volume=

    Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

  58. [58]

    Advances in Neural Information Processing Systems , volume=

    Inference for Batched Bandits , author=. Advances in Neural Information Processing Systems , volume=

  59. [59]

    Advances in Neural Information Processing Systems , volume=

    Statistical Inference with M-Estimators on Adaptively Collected Data , author=. Advances in Neural Information Processing Systems , volume=

  60. [60]

    Advances in Neural Information Processing Systems , volume=

    Post-Contextual-Bandit Inference , author=. Advances in Neural Information Processing Systems , volume=

  61. [61]

    Annual Review of Statistics and its Application , volume=

    Demystifying Inference After Adaptive Experiments , author=. Annual Review of Statistics and its Application , volume=. 2025 , publisher=

  62. [62]

    Statistical Science , volume=

    Doubly Robust Policy Evaluation and Optimization , author=. Statistical Science , volume=. 2014 , publisher=

  63. [63]

    Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics , year =

    Balanced Off-Policy Evaluation in General Action Spaces , author =. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics , year =

  64. [64]

    Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages =

    Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design , author =. Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing , pages =

  65. [65]

    Proceedings of the 39th International Conference on Machine Learning , year =

    Off-Policy Evaluation for Large Action Spaces via Embeddings , author =. Proceedings of the 39th International Conference on Machine Learning , year =

  66. [66]

    Journal of the American Statistical Association , volume=

    A Generalization of Sampling Without Replacement from a Finite Universe , author=. Journal of the American Statistical Association , volume=. 1952 , publisher=

  67. [67]

    2003 , publisher=

    Model Assisted Survey Sampling , author=. 2003 , publisher=

  68. [68]

    Journal of Computational and Graphical Statistics , volume=

    Truncated Importance Sampling , author=. Journal of Computational and Graphical Statistics , volume=. 2008 , publisher=

  69. [69]

    Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , volume=

    Estimation with Quadratic Loss , author=. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability , volume=. 1961 , organization=

  70. [70]

    1973 , publisher=

    Efron, Bradley and Morris, Carl , journal=. 1973 , publisher=

  71. [71]

    The Annals of Mathematical Statistics , pages=

    Optimum allocation in linear regression theory , author=. The Annals of Mathematical Statistics , pages=. 1952 , publisher=

  72. [72]

    Optimum Experimental Designs, with

    Atkinson, Anthony and Donev, Alexander and Tobias, Randall , volume=. Optimum Experimental Designs, with. 2007 , publisher=

  73. [73]

    Available at SSRN 5126080 , year=

    Automated Experimental Design with Optimization from Historical Data Simulations , author=. Available at SSRN 5126080 , year=

  74. [74]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Optimum Experimental Designs , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1959 , publisher=

  75. [75]

    The Annals of Mathematical Statistics , volume=

    On the Efficient Design of Statistical Investigations , author=. The Annals of Mathematical Statistics , volume=. 1943 , publisher=

  76. [76]

    2009 , publisher=

    An Introduction to Optimal Designs for Social and Biomedical Research , author=. 2009 , publisher=

  77. [77]

    Tutorials in Operations Research: Smarter Decisions for a Better World , pages=

    Experimental Design for Causal Inference Through an Optimization Lens , author=. Tutorials in Operations Research: Smarter Decisions for a Better World , pages=. 2024 , publisher=

  78. [78]

    Management Science , volume=

    Optimal Experimental Design for Staggered Rollouts , author=. Management Science , volume=. 2024 , publisher=

  79. [79]

    Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , volume=

    An Empirical Bayes Approach to Statistics , author=. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics , volume=. 1956 , organization=

  80. [80]

    The Empirical

    Robbins, Herbert , journal=. The Empirical. 1964 , publisher=

Showing first 80 references.