Mind the Sim-to-Real Gap & Think Like a Scientist

Alexander Volfovsky; Dominique Perrault-Joncas; Gabriel Levin-Konigsberg; Harsh Parikh

arxiv: 2605.21458 · v1 · pith:QHZ3URLRnew · submitted 2026-05-20 · 💻 cs.AI · cs.LG· stat.ME

Mind the Sim-to-Real Gap & Think Like a Scientist

Harsh Parikh , Gabriel Levin-Konigsberg , Dominique Perrault-Joncas , Alexander Volfovsky This is my paper

Pith reviewed 2026-05-21 03:58 UTC · model grok-4.3

classification 💻 cs.AI cs.LGstat.ME

keywords sim-to-real gapsimulation lemmasequential decision makingexperimental designpolicy evaluationreinforcement learningFisher information

0 comments

The pith

Randomization in real experiments identifies the calibration-deployment shift in simulator value error while a reachability gap persists under passive learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies when a planner with a pre-trained but confounded simulator should run real experiments to improve sequential decisions. It decomposes simulator value error via an extended simulation lemma into a shift component that randomization can identify and correct and a parametric residual that further data cannot reduce. The value gap between the simulator policy and the true optimum further splits into local and reachability parts, with the latter bounded away from zero under purely passive learning. This guides the design of Fisher-SEP, an experimental policy that minimizes posterior predictive variance of a target policy's value, with reward-only and transition-only variants. Case studies show when front-loaded pilots amortize costs and when designed exploration is required to reach distant states.

Core claim

An extended simulation lemma decomposes the simulator's value error into a calibration-deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. The value gap between the simulator-optimal policy and the optimum splits into a local component on visited states and a reachability component on unvisited states that stays bounded away from zero at any horizon under purely passive learning. Fisher-SEP is proposed as a simulation-aided experimental policy that minimizes the posterior predictive variance of a target policy's value.

What carries the argument

The extended simulation lemma, which partitions simulator value error into a randomization-identifiable calibration-deployment shift and an irreducible parametric residual.

If this is right

In supply-chain problems with long horizons, front-loaded experimentation overtakes posterior updating once pilot costs are amortized.
In problems with separated regions like well- and poorly-surveilled corridors, only designed exploration reaches the poorly-surveilled states.
Reward-only and transition-only specializations of the experimental policy allow tailoring data collection to what is observed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decomposition suggests prioritizing early randomization experiments to calibrate simulators before committing to long deployment horizons.
Persistent reachability gaps imply that passive data collection alone will leave value estimates biased in problems with distant or low-probability states.
Variance-minimization objectives like Fisher-SEP could be adapted to set explicit budgets for real trials based on target precision.

Load-bearing premise

That randomization in real experiments can identify and correct the calibration-deployment shift component of simulator error.

What would settle it

An experiment in which randomized real trials fail to reduce the identified shift component of value error or in which the reachability component of the value gap approaches zero under infinite passive observations.

Figures

Figures reproduced from arXiv: 2605.21458 by Alexander Volfovsky, Dominique Perrault-Joncas, Gabriel Levin-Konigsberg, Harsh Parikh.

**Figure 2.** Figure 2: HIV mobile-testing program (30 common-seed trials, [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

read the original abstract

Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies when and how to supplement a pre-trained simulator (with inherited confounding and drift) with real experiments in sequential decision problems. It claims three results: (1) an extended simulation lemma decomposing simulator value error into a randomization-identifiable calibration-deployment shift and an irreducible parametric residual; (2) a decomposition of the value gap between simulator-optimal and optimal policies into local and reachability components, with the reachability component bounded away from zero under passive learning; (3) the Fisher-SEP policy that minimizes posterior predictive variance of a target policy's value (with reward-only and transition-only variants), illustrated in vending-machine supply-chain and HIV mobile-testing case studies.

Significance. If the decomposition in the extended simulation lemma holds, the work supplies a principled separation of simulator error sources that can guide the allocation of real experiments, with direct relevance to efficient policy learning under confounding. The reachability result and Fisher-SEP proposal highlight concrete regimes where passive learning fails and designed exploration or front-loaded pilots become necessary. The two case studies usefully illustrate the claimed regimes.

major comments (2)

[Abstract and §3] Abstract and §3 (extended simulation lemma): the decomposition of simulator value error into an identifiable calibration-deployment shift (via randomization) and an irreducible parametric residual is load-bearing for all subsequent claims on when to run real experiments. The argument implicitly requires that randomization in the real environment isolates the shift term without further modeling of state-dependent confounding or non-additive drift-policy interactions; if those conditions fail, the residual is no longer cleanly separable from what additional interaction can address.
[§4] §4 (value-gap decomposition): the claim that the reachability component remains bounded away from zero at any horizon under purely passive learning is central to the argument for designed exploration. The bound appears to rely on the specific corridor structure of the HIV example; the general conditions under which the reachability term cannot be reduced by passive sampling should be stated explicitly, including any assumptions on the state space or transition structure.

minor comments (2)

[Abstract] The acronym Fisher-SEP is introduced without expansion on first use; a parenthetical definition (e.g., Fisher-information Simulation-aided Experimental Policy) would improve readability.
[§3] Notation for the calibration-deployment shift term is used before it is formally defined; a short notational table or inline definition at first appearance would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. These help us clarify the assumptions underlying our decompositions and strengthen the generalizability of the results. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (extended simulation lemma): the decomposition of simulator value error into an identifiable calibration-deployment shift (via randomization) and an irreducible parametric residual is load-bearing for all subsequent claims on when to run real experiments. The argument implicitly requires that randomization in the real environment isolates the shift term without further modeling of state-dependent confounding or non-additive drift-policy interactions; if those conditions fail, the residual is no longer cleanly separable from what additional interaction can address.

Authors: The extended simulation lemma is derived under a model in which the simulator's inherited confounding and drift are captured as a calibration-deployment shift that can be isolated via randomization in the real environment, leaving an irreducible parametric residual. We agree that the clean separation assumes the absence of additional state-dependent confounding or non-additive drift-policy interactions beyond the modeled shift. Our framework targets regimes where this decomposition holds, consistent with standard sim-to-real assumptions. We will revise §3 to explicitly enumerate these modeling assumptions and discuss the conditions (including randomization requirements) under which the lemma applies, along with brief remarks on potential violations. revision: partial
Referee: [§4] §4 (value-gap decomposition): the claim that the reachability component remains bounded away from zero at any horizon under purely passive learning is central to the argument for designed exploration. The bound appears to rely on the specific corridor structure of the HIV example; the general conditions under which the reachability term cannot be reduced by passive sampling should be stated explicitly, including any assumptions on the state space or transition structure.

Authors: We appreciate this point. The reachability component is defined generally as the value difference arising from states not visited by the simulator-optimal policy. The result that this component is bounded away from zero under passive learning holds whenever the transition structure creates components unreachable with positive probability under passive sampling from the simulator policy. The HIV corridor serves as an illustration of such a structure, but the formal argument does not depend on it. We will revise §4 to state the general conditions explicitly, including assumptions on the state space (e.g., presence of separated or low-probability transition components) and transition kernel, and present the bound in a manner independent of the specific example. revision: yes

Circularity Check

0 steps flagged

Extended simulation lemma and policy proposals derive from problem setup without reduction to fitted inputs or self-citations

full rationale

The paper states three results beginning with an extended simulation lemma that decomposes simulator value error into a calibration-deployment shift identifiable via randomization and an irreducible parametric residual. This decomposition is presented as following directly from the sequential decision problem with a confounded simulator and unbiased real experiments. The subsequent split of the value gap into local and reachability components is likewise derived from visitation properties under passive learning, and Fisher-SEP is defined by minimizing posterior predictive variance of a target policy's value. No equations or steps reduce these quantities to parameters already fitted inside the simulator or to self-citations whose content is unverified; the derivations remain independent of the target claims and are self-contained against external benchmarks of the underlying MDP and identification assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the domain assumption that real experiments are unbiased and that randomization suffices to identify the calibration-deployment shift; no free parameters or new entities are introduced in the abstract.

axioms (2)

domain assumption Real experiments are unbiased while the simulator inherits confounding and drift from its calibration data.
Stated directly in the opening problem setup of the abstract.
domain assumption Randomization in real experiments can identify the calibration-deployment shift component of simulator error.
Invoked as part of the extended simulation lemma result.

pith-pipeline@v0.9.0 · 5790 in / 1444 out tokens · 40577 ms · 2026-05-21T03:58:53.293524+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

extended simulation lemma decomposes the simulator's value error into a calibration-deployment shift ... and a parametric residual
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

value gap ... splits into a local component ... and a reachability component

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

166 extracted references · 166 canonical work pages · 3 internal anchors

[1]

, title =

Howard, Ronald A. , title =. IEEE Transactions on Systems Science and Cybernetics , volume =. 1966 , publisher =

work page 1966
[2]

1961 , address =

Raiffa, Howard and Schlaifer, Robert , title =. 1961 , address =

work page 1961
[3]

Statistical Science , volume =

Chaloner, Kathryn and Verdinelli, Isabella , title =. Statistical Science , volume =. 1995 , publisher =

work page 1995
[4]

and Baio, Gianluca and Menzies, Nicolas A

Heath, Anna and Kunst, Natalia and Jackson, Christopher and Strong, Mark and Alarid-Escudero, Fernando and Goldhaber-Fiebert, Jeremy D. and Baio, Gianluca and Menzies, Nicolas A. and Jalal, Hawre , title =. Medical Decision Making , volume =. 2020 , publisher =

work page 2020
[5]

and Brennan, Alan , title =

Strong, Mark and Oakley, Jeremy E. and Brennan, Alan , title =. Medical Decision Making , volume =. 2014 , publisher =

work page 2014
[6]

and Chades, Iadine and Dezfouli, Amir , title =

Blau, Tom and Bonilla, Edwin V. and Chades, Iadine and Dezfouli, Amir , title =. International Conference on Machine Learning (ICML) , pages =. 2022 , organization =

work page 2022
[7]

SIAM Review , volume =

Peherstorfer, Benjamin and Willcox, Karen and Gunzburger, Max , title =. SIAM Review , volume =. 2018 , publisher =

work page 2018
[8]

and Schneider, Jeff and P

Kandasamy, Kirthevasan and Dasarathy, Gautam and Oliva, Junier B. and Schneider, Jeff and P. Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations , booktitle =

work page
[9]

Multi-fidelity Bayesian Optimisation with Continuous Approximations , booktitle =

Kandasamy, Kirthevasan and Dasarathy, Gautam and Schneider, Jeff and P. Multi-fidelity Bayesian Optimisation with Continuous Approximations , booktitle =. 2017 , organization =

work page 2017
[10]

Multi-fidelity Gaussian Process Bandit Optimisation , journal =

Kandasamy, Kirthevasan and Dasarathy, Gautam and P. Multi-fidelity Gaussian Process Bandit Optimisation , journal =

work page
[11]

, title =

Poloczek, Matthias and Wang, Jialei and Frazier, Peter I. , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[12]

arXiv preprint arXiv:2003.10870 , year =

Lee, Eric Hans and Perrone, Valerio and Archambeau, Cedric and Seeger, Matthias , title =. arXiv preprint arXiv:2003.10870 , year =

work page arXiv 2003
[13]

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter , title =. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2017 , organization =

work page 2017
[14]

Solving Rubik's Cube with a Robot Hand

Akkaya, Ilge and Andrychowicz, Marcin and Chociej, Maciek and Litwin, Mateusz and McGrew, Bob and Petron, Arthur and Paino, Alex and Plappert, Matthias and Powell, Glenn and Ribas, Raphael and others , title =. arXiv preprint arXiv:1910.07113 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1910
[15]

Conference on Robot Learning (CoRL) , pages =

Mehta, Bhairav and Diaz, Manfred and Golber, Florian and Sim, Christopher and Englert, Peter and Fox, Dieter , title =. Conference on Robot Learning (CoRL) , pages =. 2020 , organization =

work page 2020
[16]

International Conference on Robotics and Automation (ICRA) , pages =

Chebotar, Yevgen and Handa, Ankur and Makoviychuk, Viktor and Macklin, Miles and Issac, Jan and Ratliff, Nathan and Fox, Dieter , title =. International Conference on Robotics and Automation (ICRA) , pages =. 2019 , organization =

work page 2019
[17]

Conference on Robot Learning (CoRL) , pages =

Allevato, Adam and Short, Elaine Schaertl and Pryor, Mitch and Thomaz, Andrea , title =. Conference on Robot Learning (CoRL) , pages =. 2020 , organization =

work page 2020
[18]

Frontiers in Robotics and AI , volume =

Muratore, Fabio and Ramos, Fabio and Turk, Greg and Yu, Wenhao and Gienger, Michael and Peters, Jan , title =. Frontiers in Robotics and AI , volume =. 2022 , publisher =

work page 2022
[19]

IEEE Access , volume =

Salvato, Erica and Fenu, Gianfranco and Medvet, Eric and Pellegrino, Felice Andrea , title =. IEEE Access , volume =. 2021 , publisher =

work page 2021
[20]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kumar, Aviral and Zhou, Aurick and Tucker, George and Levine, Sergey , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[21]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Yu, Tianhe and Thomas, Garrett and Yu, Lantao and Ermon, Stefano and Zou, James and Levine, Sergey and Finn, Chelsea and Ma, Tengyu , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[22]

International Conference on Learning Representations (ICLR) , year =

Kostrikov, Ilya and Nair, Ashvin and Levine, Sergey , title =. International Conference on Learning Representations (ICLR) , year =

work page
[23]

International Conference on Machine Learning (ICML) , pages =

Fujimoto, Scott and Meger, David and Precup, Doina , title =. International Conference on Machine Learning (ICML) , pages =. 2019 , organization =

work page 2019
[24]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Levine, Sergey and Kumar, Aviral and Tucker, George and Fu, Justin , title =. arXiv preprint arXiv:2005.01643 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2005
[25]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kidambi, Rahul and Rajeswaran, Aravind and Netrapalli, Praneeth and Joachims, Thorsten , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[26]

and Malik, Ilyas and Rainforth, Tom , title =

Foster, Adam and Ivanova, Desi R. and Malik, Ilyas and Rainforth, Tom , title =. International Conference on Machine Learning (ICML) , pages =. 2021 , organization =

work page 2021
[27]

and Foster, Adam and Kleinegesse, Steven and Gutmann, Michael U

Ivanova, Desi R. and Foster, Adam and Kleinegesse, Steven and Gutmann, Michael U. and Rainforth, Tom , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[28]

and Bickford Smith, Freddie , title =

Rainforth, Tom and Foster, Adam and Ivanova, Desi R. and Bickford Smith, Freddie , title =. Statistical Science , year =

work page
[29]

Proceedings of the National Academy of Sciences , volume =

Bareinboim, Elias and Pearl, Judea , title =. Proceedings of the National Academy of Sciences , volume =. 2016 , publisher =

work page 2016
[30]

Mastering Diverse Domains through World Models

Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy , title =. arXiv preprint arXiv:2301.04104 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Mathematics of Operations Research , volume =

Russo, Daniel and Van Roy, Benjamin , title =. Mathematics of Operations Research , volume =. 2014 , publisher =

work page 2014
[32]

Operations Research , volume =

Russo, Daniel , title =. Operations Research , volume =. 2020 , publisher =

work page 2020
[33]

Bulletin of the American Mathematical Society , volume =

Robbins, Herbert , title =. Bulletin of the American Mathematical Society , volume =

work page
[34]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Niu, Haoyi and Qiu, Yiwen and Li, Ming and Zhou, Guyue and HU, Jianming and Zhan, Xianyuan , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[35]

and Smith, Laura and Kostrikov, Ilya and Levine, Sergey , title =

Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey , title =. International Conference on Machine Learning (ICML) , year =

work page
[36]

Conference on Robot Learning (CoRL) , pages =

Wu, Philipp and Escontrela, Alejandro and Hafner, Danijar and Abbeel, Pieter and Goldberg, Ken , title =. Conference on Robot Learning (CoRL) , pages =

work page
[37]

International Conference on Learning Representations (ICLR) , year =

Hansen, Nicklas and Wang, Xiaolong and Su, Hao , title =. International Conference on Learning Representations (ICLR) , year =

work page
[38]

, title =

DeGroot, Morris H. , title =. 1970 , address =

work page 1970
[39]

Medical Decision Making , volume =

Jalal, Hawre and Alarid-Escudero, Fernando , title =. Medical Decision Making , volume =. 2018 , publisher =

work page 2018
[40]

Ades, A. E. and Lu, Guobing and Claxton, Karl , title =. Medical Decision Making , volume =. 2004 , publisher =

work page 2004
[41]

Medical Decision Making , volume =

Brennan, Alan and Kharroubi, Samer and O'Hagan, Anthony and Chilcott, Jim , title =. Medical Decision Making , volume =. 2007 , publisher =

work page 2007
[42]

Journal of Health Economics , volume =

Claxton, Karl , title =. Journal of Health Economics , volume =. 1999 , publisher =

work page 1999
[43]

The Lancet , volume =

Claxton, Karl and Sculpher, Mark and Drummond, Michael , title =. The Lancet , volume =. 2002 , publisher =

work page 2002
[44]

2006 , address =

Briggs, Andrew and Claxton, Karl and Sculpher, Mark , title =. 2006 , address =

work page 2006
[45]

Wilson, Ewan C. F. , title =. PharmacoEconomics , volume =. 2015 , publisher =

work page 2015
[46]

and Inoue, Koichiro , title =

Chick, Stephen E. and Inoue, Koichiro , title =. Operations Research , volume =. 2001 , publisher =

work page 2001
[47]

and Branke, J

Chick, Stephen E. and Branke, J. Sequential Sampling to Myopically Maximize the Expected Value of Information , journal =. 2010 , publisher =

work page 2010
[48]

and Powell, Warren B

Frazier, Peter I. and Powell, Warren B. and Dayanik, Savas , title =. SIAM Journal on Control and Optimization , volume =. 2008 , publisher =

work page 2008
[49]

, title =

Thompson, William R. , title =. Biometrika , volume =. 1933 , publisher =

work page 1933
[50]

, title =

Gittins, John C. , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1979 , publisher =

work page 1979
[51]

2011 , address =

Gittins, John and Glazebrook, Kevin and Weber, Richard , title =. 2011 , address =

work page 2011
[52]

Advances in Applied Mathematics , volume =

Lai, Tze Leung and Robbins, Herbert , title =. Advances in Applied Mathematics , volume =. 1985 , publisher =

work page 1985
[53]

Finite-Time Analysis of the Multiarmed Bandit Problem , journal =

Auer, Peter and Cesa-Bianchi, Nicol. Finite-Time Analysis of the Multiarmed Bandit Problem , journal =. 2002 , publisher =

work page 2002
[54]

Conference on Learning Theory (COLT) , pages =

Agrawal, Shipra and Goyal, Navin , title =. Conference on Learning Theory (COLT) , pages =. 2012 , organization =

work page 2012
[55]

and Van Roy, Benjamin and Kazerouni, Abbas and Osband, Ian and Wen, Zheng , title =

Russo, Daniel J. and Van Roy, Benjamin and Kazerouni, Abbas and Osband, Ian and Wen, Zheng , title =. Foundations and Trends in Machine Learning , volume =. 2018 , publisher =

work page 2018
[56]

Best Arm Identification in Multi-Armed Bandits , booktitle =

Audibert, Jean-Yves and Bubeck, S. Best Arm Identification in Multi-Armed Bandits , booktitle =

work page
[57]

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , journal =

Kaufmann, Emilie and Capp. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , journal =

work page
[58]

, title =

Li, Lihong and Chu, Wei and Langford, John and Schapire, Robert E. , title =. International Conference on World Wide Web (WWW) , pages =. 2010 , organization =

work page 2010
[59]

International Conference on Machine Learning (ICML) , pages =

Agarwal, Alekh and Hsu, Daniel and Kale, Satyen and Langford, John and Li, Lihong and Schapire, Robert , title =. International Conference on Machine Learning (ICML) , pages =. 2014 , organization =

work page 2014
[60]

and Agarwal, Alekh and Dud

Foster, Dylan J. and Agarwal, Alekh and Dud. Practical Contextual Bandits with Regression Oracles , booktitle =. 2018 , organization =

work page 2018
[61]

and Rakhlin, Alexander , title =

Foster, Dylan J. and Rakhlin, Alexander , title =. International Conference on Machine Learning (ICML) , pages =. 2020 , organization =

work page 2020
[62]

, title =

Berry, Donald A. , title =. Nature Reviews Drug Discovery , volume =. 2006 , publisher =

work page 2006
[63]

and Lachin, John M

Rosenberger, William F. and Lachin, John M. , title =. 2012 , address =

work page 2012
[64]

, title =

Hu, Feifang and Rosenberger, William F. , title =. 2006 , address =

work page 2006
[65]

, title =

Berry, Donald A. , title =. Nature Reviews Clinical Oncology , volume =. 2012 , publisher =

work page 2012
[66]

, title =

Pocock, Stuart J. , title =. Biometrika , volume =. 1977 , publisher =

work page 1977
[67]

and Fleming, Thomas R

O'Brien, Peter C. and Fleming, Thomas R. , title =. Biometrics , volume =. 1979 , publisher =

work page 1979
[68]

, title =

Jennison, Christopher and Turnbull, Bruce W. , title =. 1999 , address =

work page 1999
[69]

and Connor, Jason T

Berry, Scott M. and Connor, Jason T. and Lewis, Roger J. , title =. JAMA , volume =. 2015 , publisher =

work page 2015
[70]

, title =

Woodcock, Janet and LaVange, Lisa M. , title =. New England Journal of Medicine , volume =. 2017 , publisher =

work page 2017
[71]

2019 , publisher =

Adaptive Platform Trials: Definition, Design, Conduct and Reporting Considerations , journal =. 2019 , publisher =

work page 2019
[72]

and Sigman, Carrie C

Barker, Ann D. and Sigman, Carrie C. and Kelloff, Gary J. and Hylton, Nola M. and Berry, Donald A. and Esserman, Laura J. , title =. Clinical Pharmacology & Therapeutics , volume =. 2009 , publisher =

work page 2009
[73]

Operations Research , volume =

Johari, Ramesh and Koomen, Pete and Pekelis, Leonid and Walsh, David , title =. Operations Research , volume =. 2022 , publisher =

work page 2022
[74]

and Ramdas, Aaditya and McAuliffe, Jon and Sekhon, Jasjeet , title =

Howard, Steven R. and Ramdas, Aaditya and McAuliffe, Jon and Sekhon, Jasjeet , title =. The Annals of Statistics , volume =. 2021 , publisher =

work page 2021
[75]

Game-Theoretic Statistics and Safe Anytime-Valid Inference , journal =

Ramdas, Aaditya and Gr. Game-Theoretic Statistics and Safe Anytime-Valid Inference , journal =. 2023 , publisher =

work page 2023
[76]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kallus, Nathan and Zhou, Angela , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[77]

Rosenman, Evan T. R. and Basse, Guillaume and Owen, Art B. and Baiocchi, Michael , title =. Biometrics , volume =. 2023 , publisher =

work page 2023
[78]

Journal of the American Statistical Association , volume =

Yang, Shu and Ding, Peng , title =. Journal of the American Statistical Association , volume =. 2020 , publisher =

work page 2020
[79]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kallus, Nathan and Puli, Aahlad Manas and Shalit, Uri , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[80]

, title =

Kohavi, Ron and Longbotham, Roger and Sommerfield, Dan and Henne, Randal M. , title =. Data Mining and Knowledge Discovery , volume =. 2009 , publisher =

work page 2009

Showing first 80 references.

[1] [1]

, title =

Howard, Ronald A. , title =. IEEE Transactions on Systems Science and Cybernetics , volume =. 1966 , publisher =

work page 1966

[2] [2]

1961 , address =

Raiffa, Howard and Schlaifer, Robert , title =. 1961 , address =

work page 1961

[3] [3]

Statistical Science , volume =

Chaloner, Kathryn and Verdinelli, Isabella , title =. Statistical Science , volume =. 1995 , publisher =

work page 1995

[4] [4]

and Baio, Gianluca and Menzies, Nicolas A

Heath, Anna and Kunst, Natalia and Jackson, Christopher and Strong, Mark and Alarid-Escudero, Fernando and Goldhaber-Fiebert, Jeremy D. and Baio, Gianluca and Menzies, Nicolas A. and Jalal, Hawre , title =. Medical Decision Making , volume =. 2020 , publisher =

work page 2020

[5] [5]

and Brennan, Alan , title =

Strong, Mark and Oakley, Jeremy E. and Brennan, Alan , title =. Medical Decision Making , volume =. 2014 , publisher =

work page 2014

[6] [6]

and Chades, Iadine and Dezfouli, Amir , title =

Blau, Tom and Bonilla, Edwin V. and Chades, Iadine and Dezfouli, Amir , title =. International Conference on Machine Learning (ICML) , pages =. 2022 , organization =

work page 2022

[7] [7]

SIAM Review , volume =

Peherstorfer, Benjamin and Willcox, Karen and Gunzburger, Max , title =. SIAM Review , volume =. 2018 , publisher =

work page 2018

[8] [8]

and Schneider, Jeff and P

Kandasamy, Kirthevasan and Dasarathy, Gautam and Oliva, Junier B. and Schneider, Jeff and P. Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations , booktitle =

work page

[9] [9]

Multi-fidelity Bayesian Optimisation with Continuous Approximations , booktitle =

Kandasamy, Kirthevasan and Dasarathy, Gautam and Schneider, Jeff and P. Multi-fidelity Bayesian Optimisation with Continuous Approximations , booktitle =. 2017 , organization =

work page 2017

[10] [10]

Multi-fidelity Gaussian Process Bandit Optimisation , journal =

Kandasamy, Kirthevasan and Dasarathy, Gautam and P. Multi-fidelity Gaussian Process Bandit Optimisation , journal =

work page

[11] [11]

, title =

Poloczek, Matthias and Wang, Jialei and Frazier, Peter I. , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[12] [12]

arXiv preprint arXiv:2003.10870 , year =

Lee, Eric Hans and Perrone, Valerio and Archambeau, Cedric and Seeger, Matthias , title =. arXiv preprint arXiv:2003.10870 , year =

work page arXiv 2003

[13] [13]

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter , title =. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2017 , organization =

work page 2017

[14] [14]

Solving Rubik's Cube with a Robot Hand

Akkaya, Ilge and Andrychowicz, Marcin and Chociej, Maciek and Litwin, Mateusz and McGrew, Bob and Petron, Arthur and Paino, Alex and Plappert, Matthias and Powell, Glenn and Ribas, Raphael and others , title =. arXiv preprint arXiv:1910.07113 , year =

work page internal anchor Pith review Pith/arXiv arXiv 1910

[15] [15]

Conference on Robot Learning (CoRL) , pages =

Mehta, Bhairav and Diaz, Manfred and Golber, Florian and Sim, Christopher and Englert, Peter and Fox, Dieter , title =. Conference on Robot Learning (CoRL) , pages =. 2020 , organization =

work page 2020

[16] [16]

International Conference on Robotics and Automation (ICRA) , pages =

Chebotar, Yevgen and Handa, Ankur and Makoviychuk, Viktor and Macklin, Miles and Issac, Jan and Ratliff, Nathan and Fox, Dieter , title =. International Conference on Robotics and Automation (ICRA) , pages =. 2019 , organization =

work page 2019

[17] [17]

Conference on Robot Learning (CoRL) , pages =

Allevato, Adam and Short, Elaine Schaertl and Pryor, Mitch and Thomaz, Andrea , title =. Conference on Robot Learning (CoRL) , pages =. 2020 , organization =

work page 2020

[18] [18]

Frontiers in Robotics and AI , volume =

Muratore, Fabio and Ramos, Fabio and Turk, Greg and Yu, Wenhao and Gienger, Michael and Peters, Jan , title =. Frontiers in Robotics and AI , volume =. 2022 , publisher =

work page 2022

[19] [19]

IEEE Access , volume =

Salvato, Erica and Fenu, Gianfranco and Medvet, Eric and Pellegrino, Felice Andrea , title =. IEEE Access , volume =. 2021 , publisher =

work page 2021

[20] [20]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kumar, Aviral and Zhou, Aurick and Tucker, George and Levine, Sergey , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[21] [21]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Yu, Tianhe and Thomas, Garrett and Yu, Lantao and Ermon, Stefano and Zou, James and Levine, Sergey and Finn, Chelsea and Ma, Tengyu , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[22] [22]

International Conference on Learning Representations (ICLR) , year =

Kostrikov, Ilya and Nair, Ashvin and Levine, Sergey , title =. International Conference on Learning Representations (ICLR) , year =

work page

[23] [23]

International Conference on Machine Learning (ICML) , pages =

Fujimoto, Scott and Meger, David and Precup, Doina , title =. International Conference on Machine Learning (ICML) , pages =. 2019 , organization =

work page 2019

[24] [24]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Levine, Sergey and Kumar, Aviral and Tucker, George and Fu, Justin , title =. arXiv preprint arXiv:2005.01643 , year =

work page internal anchor Pith review Pith/arXiv arXiv 2005

[25] [25]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kidambi, Rahul and Rajeswaran, Aravind and Netrapalli, Praneeth and Joachims, Thorsten , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[26] [26]

and Malik, Ilyas and Rainforth, Tom , title =

Foster, Adam and Ivanova, Desi R. and Malik, Ilyas and Rainforth, Tom , title =. International Conference on Machine Learning (ICML) , pages =. 2021 , organization =

work page 2021

[27] [27]

and Foster, Adam and Kleinegesse, Steven and Gutmann, Michael U

Ivanova, Desi R. and Foster, Adam and Kleinegesse, Steven and Gutmann, Michael U. and Rainforth, Tom , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[28] [28]

and Bickford Smith, Freddie , title =

Rainforth, Tom and Foster, Adam and Ivanova, Desi R. and Bickford Smith, Freddie , title =. Statistical Science , year =

work page

[29] [29]

Proceedings of the National Academy of Sciences , volume =

Bareinboim, Elias and Pearl, Judea , title =. Proceedings of the National Academy of Sciences , volume =. 2016 , publisher =

work page 2016

[30] [30]

Mastering Diverse Domains through World Models

Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy , title =. arXiv preprint arXiv:2301.04104 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Mathematics of Operations Research , volume =

Russo, Daniel and Van Roy, Benjamin , title =. Mathematics of Operations Research , volume =. 2014 , publisher =

work page 2014

[32] [32]

Operations Research , volume =

Russo, Daniel , title =. Operations Research , volume =. 2020 , publisher =

work page 2020

[33] [33]

Bulletin of the American Mathematical Society , volume =

Robbins, Herbert , title =. Bulletin of the American Mathematical Society , volume =

work page

[34] [34]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Niu, Haoyi and Qiu, Yiwen and Li, Ming and Zhou, Guyue and HU, Jianming and Zhan, Xianyuan , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[35] [35]

and Smith, Laura and Kostrikov, Ilya and Levine, Sergey , title =

Ball, Philip J. and Smith, Laura and Kostrikov, Ilya and Levine, Sergey , title =. International Conference on Machine Learning (ICML) , year =

work page

[36] [36]

Conference on Robot Learning (CoRL) , pages =

Wu, Philipp and Escontrela, Alejandro and Hafner, Danijar and Abbeel, Pieter and Goldberg, Ken , title =. Conference on Robot Learning (CoRL) , pages =

work page

[37] [37]

International Conference on Learning Representations (ICLR) , year =

Hansen, Nicklas and Wang, Xiaolong and Su, Hao , title =. International Conference on Learning Representations (ICLR) , year =

work page

[38] [38]

, title =

DeGroot, Morris H. , title =. 1970 , address =

work page 1970

[39] [39]

Medical Decision Making , volume =

Jalal, Hawre and Alarid-Escudero, Fernando , title =. Medical Decision Making , volume =. 2018 , publisher =

work page 2018

[40] [40]

Ades, A. E. and Lu, Guobing and Claxton, Karl , title =. Medical Decision Making , volume =. 2004 , publisher =

work page 2004

[41] [41]

Medical Decision Making , volume =

Brennan, Alan and Kharroubi, Samer and O'Hagan, Anthony and Chilcott, Jim , title =. Medical Decision Making , volume =. 2007 , publisher =

work page 2007

[42] [42]

Journal of Health Economics , volume =

Claxton, Karl , title =. Journal of Health Economics , volume =. 1999 , publisher =

work page 1999

[43] [43]

The Lancet , volume =

Claxton, Karl and Sculpher, Mark and Drummond, Michael , title =. The Lancet , volume =. 2002 , publisher =

work page 2002

[44] [44]

2006 , address =

Briggs, Andrew and Claxton, Karl and Sculpher, Mark , title =. 2006 , address =

work page 2006

[45] [45]

Wilson, Ewan C. F. , title =. PharmacoEconomics , volume =. 2015 , publisher =

work page 2015

[46] [46]

and Inoue, Koichiro , title =

Chick, Stephen E. and Inoue, Koichiro , title =. Operations Research , volume =. 2001 , publisher =

work page 2001

[47] [47]

and Branke, J

Chick, Stephen E. and Branke, J. Sequential Sampling to Myopically Maximize the Expected Value of Information , journal =. 2010 , publisher =

work page 2010

[48] [48]

and Powell, Warren B

Frazier, Peter I. and Powell, Warren B. and Dayanik, Savas , title =. SIAM Journal on Control and Optimization , volume =. 2008 , publisher =

work page 2008

[49] [49]

, title =

Thompson, William R. , title =. Biometrika , volume =. 1933 , publisher =

work page 1933

[50] [50]

, title =

Gittins, John C. , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1979 , publisher =

work page 1979

[51] [51]

2011 , address =

Gittins, John and Glazebrook, Kevin and Weber, Richard , title =. 2011 , address =

work page 2011

[52] [52]

Advances in Applied Mathematics , volume =

Lai, Tze Leung and Robbins, Herbert , title =. Advances in Applied Mathematics , volume =. 1985 , publisher =

work page 1985

[53] [53]

Finite-Time Analysis of the Multiarmed Bandit Problem , journal =

Auer, Peter and Cesa-Bianchi, Nicol. Finite-Time Analysis of the Multiarmed Bandit Problem , journal =. 2002 , publisher =

work page 2002

[54] [54]

Conference on Learning Theory (COLT) , pages =

Agrawal, Shipra and Goyal, Navin , title =. Conference on Learning Theory (COLT) , pages =. 2012 , organization =

work page 2012

[55] [55]

and Van Roy, Benjamin and Kazerouni, Abbas and Osband, Ian and Wen, Zheng , title =

Russo, Daniel J. and Van Roy, Benjamin and Kazerouni, Abbas and Osband, Ian and Wen, Zheng , title =. Foundations and Trends in Machine Learning , volume =. 2018 , publisher =

work page 2018

[56] [56]

Best Arm Identification in Multi-Armed Bandits , booktitle =

Audibert, Jean-Yves and Bubeck, S. Best Arm Identification in Multi-Armed Bandits , booktitle =

work page

[57] [57]

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , journal =

Kaufmann, Emilie and Capp. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , journal =

work page

[58] [58]

, title =

Li, Lihong and Chu, Wei and Langford, John and Schapire, Robert E. , title =. International Conference on World Wide Web (WWW) , pages =. 2010 , organization =

work page 2010

[59] [59]

International Conference on Machine Learning (ICML) , pages =

Agarwal, Alekh and Hsu, Daniel and Kale, Satyen and Langford, John and Li, Lihong and Schapire, Robert , title =. International Conference on Machine Learning (ICML) , pages =. 2014 , organization =

work page 2014

[60] [60]

and Agarwal, Alekh and Dud

Foster, Dylan J. and Agarwal, Alekh and Dud. Practical Contextual Bandits with Regression Oracles , booktitle =. 2018 , organization =

work page 2018

[61] [61]

and Rakhlin, Alexander , title =

Foster, Dylan J. and Rakhlin, Alexander , title =. International Conference on Machine Learning (ICML) , pages =. 2020 , organization =

work page 2020

[62] [62]

, title =

Berry, Donald A. , title =. Nature Reviews Drug Discovery , volume =. 2006 , publisher =

work page 2006

[63] [63]

and Lachin, John M

Rosenberger, William F. and Lachin, John M. , title =. 2012 , address =

work page 2012

[64] [64]

, title =

Hu, Feifang and Rosenberger, William F. , title =. 2006 , address =

work page 2006

[65] [65]

, title =

Berry, Donald A. , title =. Nature Reviews Clinical Oncology , volume =. 2012 , publisher =

work page 2012

[66] [66]

, title =

Pocock, Stuart J. , title =. Biometrika , volume =. 1977 , publisher =

work page 1977

[67] [67]

and Fleming, Thomas R

O'Brien, Peter C. and Fleming, Thomas R. , title =. Biometrics , volume =. 1979 , publisher =

work page 1979

[68] [68]

, title =

Jennison, Christopher and Turnbull, Bruce W. , title =. 1999 , address =

work page 1999

[69] [69]

and Connor, Jason T

Berry, Scott M. and Connor, Jason T. and Lewis, Roger J. , title =. JAMA , volume =. 2015 , publisher =

work page 2015

[70] [70]

, title =

Woodcock, Janet and LaVange, Lisa M. , title =. New England Journal of Medicine , volume =. 2017 , publisher =

work page 2017

[71] [71]

2019 , publisher =

Adaptive Platform Trials: Definition, Design, Conduct and Reporting Considerations , journal =. 2019 , publisher =

work page 2019

[72] [72]

and Sigman, Carrie C

Barker, Ann D. and Sigman, Carrie C. and Kelloff, Gary J. and Hylton, Nola M. and Berry, Donald A. and Esserman, Laura J. , title =. Clinical Pharmacology & Therapeutics , volume =. 2009 , publisher =

work page 2009

[73] [73]

Operations Research , volume =

Johari, Ramesh and Koomen, Pete and Pekelis, Leonid and Walsh, David , title =. Operations Research , volume =. 2022 , publisher =

work page 2022

[74] [74]

and Ramdas, Aaditya and McAuliffe, Jon and Sekhon, Jasjeet , title =

Howard, Steven R. and Ramdas, Aaditya and McAuliffe, Jon and Sekhon, Jasjeet , title =. The Annals of Statistics , volume =. 2021 , publisher =

work page 2021

[75] [75]

Game-Theoretic Statistics and Safe Anytime-Valid Inference , journal =

Ramdas, Aaditya and Gr. Game-Theoretic Statistics and Safe Anytime-Valid Inference , journal =. 2023 , publisher =

work page 2023

[76] [76]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kallus, Nathan and Zhou, Angela , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[77] [77]

Rosenman, Evan T. R. and Basse, Guillaume and Owen, Art B. and Baiocchi, Michael , title =. Biometrics , volume =. 2023 , publisher =

work page 2023

[78] [78]

Journal of the American Statistical Association , volume =

Yang, Shu and Ding, Peng , title =. Journal of the American Statistical Association , volume =. 2020 , publisher =

work page 2020

[79] [79]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Kallus, Nathan and Puli, Aahlad Manas and Shalit, Uri , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[80] [80]

, title =

Kohavi, Ron and Longbotham, Roger and Sommerfield, Dan and Henne, Randal M. , title =. Data Mining and Knowledge Discovery , volume =. 2009 , publisher =

work page 2009