pith. sign in

arxiv: 2511.08717 · v4 · submitted 2025-11-11 · 📊 stat.ML · cs.LG

Optimal control of the future via prospective learning with control

Pith reviewed 2026-05-17 23:00 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords prospective learningoptimal controlempirical risk minimizationnon-stationary environmentsreset-freeforagingBayes optimal policy
0
0 comments X

The pith

In non-stationary reset-free environments, empirical risk minimization asymptotically reaches the Bayes optimal control policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Prospective Learning with Control to extend supervised learning methods to problems of optimal control in environments that change over time and lack episodic resets. It shows that under fairly general assumptions, the standard approach of empirical risk minimization can find the best possible policy in the long run. This approach is illustrated on a foraging task where agents must gather resources in a changing world. Current reinforcement learning methods, even when made aware of time, take much longer to converge than the proposed prospective agents on a simple benchmark.

Core claim

We introduce Prospective Learning with Control (PLuC), a framework that applies empirical risk minimization to learn control policies in non-stationary, reset-free environments. Under certain fairly general assumptions, we prove that this method asymptotically achieves the Bayes optimal policy. In the specific case of foraging, prospective agents converge orders of magnitude faster than modern reinforcement learning algorithms.

What carries the argument

Prospective Learning with Control (PLuC), which uses supervised learning techniques to optimize policies for future control in changing environments without resets.

If this is right

  • ERM asymptotically achieves the Bayes optimal policy in the PLuC framework.
  • Prospective foraging agents outperform RL algorithms in non-stationary reset-free settings.
  • The method applies to both natural and artificial agents in canonical tasks like foraging.
  • Time-aware modifications to RL still converge slower than prospective methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This framework may allow supervised learning successes to transfer directly to sequential decision making in realistic settings.
  • Future work could test the approach in higher-dimensional or more complex non-stationary tasks.
  • It suggests a path to more efficient learning in environments where resets are impossible, such as real-world robotics.

Load-bearing premise

The claim relies on certain fairly general but unspecified assumptions holding in the non-stationary reset-free environment.

What would settle it

Demonstrating a non-stationary reset-free environment where empirical risk minimization fails to converge to the Bayes optimal policy would falsify the asymptotic achievement result.

Figures

Figures reproduced from arXiv: 2511.08717 by Aranyak Acharyya, Ashwin De Silva, James Hassett, Joshua T. Vogelstein, Yuxin Bai, Zeyu Shen.

Figure 1
Figure 1. Figure 1: 1-D foraging environment. An agent moves along a 1 × 7 linear track with two reward patches (A, B). Rewards alternate between the two patches over time, and the currently active patch’s reward decays exponentially [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ProForg efficiently achieves Bayes optimal regret. Normalized prospective regret of ProForg (red), time-aware Fitted Q-Iteration (FQI with time, blue-purple, our invention to improve FQI), Time-agnostic Fitted Q-Iteration(FQI w/o time, light-blue [33]), time-aware Soft Actor-Critic (SAC with time, purple-red), and Time-agnostic Soft Actor-Critic(SAC w/o time, lavender [35]). While ProForg, time-aware FQI, … view at source ↗
Figure 3
Figure 3. Figure 3: ProForg online is several fold more efficient than offline. Normalized prospective regret for ProForg for online (red) and offline (pink). After warm-starting with 200 time steps, the online one converges in 20 time steps, whereas the offline one requires about 4× more data to converge [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Normalized prospective regret for ProForg(red), ProForg-I (orange), and ProForg-C (yellow).Removing either com [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: ProForg with decision forests is 4x more efficient than with neural networks. Normalized prospective regret for ProForg with Gradient-Boosted Trees (red) and MLP Regressor (blue). While ProForg is 4x more efficient, ProForg-NN does converge as well. Online or Offline? Building on the online formulation, we compare the online and offline ProForg, under the same environment settings and parameters for traini… view at source ↗
read the original abstract

Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in reinforcement learning (RL). RL is mathematically distinct from supervised learning, which has been the main workhorse for the recent achievements in AI. Moreover, RL typically operates in a stationary environment with episodic resets, limiting its utility. Here, we extend supervised learning to address learning to control in non-stationary, reset-free environments. Using this framework, called ''Prospective Learning with Control'' (PLuC), we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective learning with control: foraging, a canonical task relevant to both natural and artificial agents. We illustrate that modern RL algorithms, which assume stationarity, struggle in these non-stationary reset-free environments. Even with time-aware modifications, they converge orders of magnitude slower than our prospective foraging agents on a simple 1-D foraging benchmark. Code is available at: https://github.com/neurodata/procontrol.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Prospective Learning with Control (PLuC), a framework extending supervised learning via empirical risk minimization (ERM) to optimal control in non-stationary, reset-free environments. It claims to prove that under certain fairly general assumptions, ERM asymptotically recovers the Bayes optimal policy. The framework is illustrated on a foraging task, where prospective agents are shown to converge orders of magnitude faster than standard and time-aware RL methods on a 1-D benchmark. Code is provided.

Significance. If the asymptotic result holds under well-specified assumptions that accommodate arbitrary non-stationarity without implicit access to future statistics, the work could offer a theoretically grounded supervised-learning route to control problems where RL's stationarity assumptions fail. The reproducibility via public code is a clear strength.

major comments (2)
  1. [Abstract] Abstract: The central claim that 'we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy' provides neither the assumptions nor any derivation outline or error analysis. Standard ERM convergence arguments require i.i.d. or stationary data; the non-stationary reset-free setting therefore needs explicit conditions (e.g., on total variation of the environment measure or existence of a limiting distribution) to remain valid. Without these, it is impossible to verify whether the result applies to the motivating class of problems or reduces to a fitted quantity by construction.
  2. [Foraging benchmark] Foraging benchmark section: The reported comparison states that RL algorithms 'converge orders of magnitude slower' than prospective agents, yet no variance across runs, confidence intervals, or statistical tests are provided. This weakens the empirical support for the claim that PLuC is practically superior in non-stationary reset-free settings.
minor comments (2)
  1. [Methods] The prospective loss function and its relation to the standard supervised loss could be stated more explicitly with a short example in the main text rather than deferred to the appendix.
  2. [Introduction] A brief discussion of how the framework reduces to standard supervised learning when the environment is stationary would help readers situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive feedback on our manuscript. We have carefully considered each of the major comments and provide point-by-point responses below. We believe these revisions will strengthen the presentation of our results on Prospective Learning with Control.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy' provides neither the assumptions nor any derivation outline or error analysis. Standard ERM convergence arguments require i.i.d. or stationary data; the non-stationary reset-free setting therefore needs explicit conditions (e.g., on total variation of the environment measure or existence of a limiting distribution) to remain valid. Without these, it is impossible to verify whether the result applies to the motivating class of problems or reduces to a fitted quantity by construction.

    Authors: We thank the referee for highlighting the need for greater clarity regarding the theoretical result. The assumptions—including conditions on the total variation of the environment measure and existence of limiting distributions that accommodate arbitrary non-stationarity without implicit access to future statistics—are explicitly stated in the theorem and proof in Section 3 of the manuscript, along with a derivation outline and error analysis that extends standard ERM arguments to the reset-free case. To address this comment directly, we will revise the abstract to include a concise statement of the key assumptions and a high-level sketch of the convergence argument. This change will make the scope of the result immediately verifiable from the abstract while preserving the full details in the main text. revision: yes

  2. Referee: [Foraging benchmark] Foraging benchmark section: The reported comparison states that RL algorithms 'converge orders of magnitude slower' than prospective agents, yet no variance across runs, confidence intervals, or statistical tests are provided. This weakens the empirical support for the claim that PLuC is practically superior in non-stationary reset-free settings.

    Authors: We agree that the empirical section would benefit from additional statistical rigor. In the revised manuscript, we will report results averaged over multiple independent runs, include confidence intervals or standard error bars, and add appropriate statistical tests (e.g., paired t-tests or Wilcoxon tests) to quantify the significance of the observed differences in convergence rates. These updates will provide stronger quantitative support for the practical superiority of prospective agents over time-aware RL baselines in the 1-D foraging benchmark. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper claims an asymptotic proof that ERM recovers the Bayes optimal policy under certain fairly general assumptions within the PLuC framework for non-stationary reset-free control. No load-bearing steps are exhibited that reduce by the paper's own equations or self-citations to fitted inputs, self-definitions, or ansatzes imported from prior author work. The result is presented as independent content resting on the stated assumptions and framework extension rather than tautological renaming or construction. This is the expected honest outcome when the derivation chain does not collapse to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on a set of unspecified assumptions that enable the ERM-to-Bayes-optimal reduction; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Certain fairly general assumptions allow ERM to asymptotically achieve the Bayes optimal policy in non-stationary reset-free control settings.
    Invoked in the abstract to support the main theoretical result but not enumerated or justified there.

pith-pipeline@v0.9.0 · 5501 in / 1190 out tokens · 31614 ms · 2026-05-17T23:00:04.703952+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

  1. [1]

    Sulla determinazione empirica delle leggi di probabilita.Gion

    V Glivenko. Sulla determinazione empirica delle leggi di probabilita.Gion. Ist. Ital. Attauri., 4:92–99, 1933. URLhttps://ci.nii.ac.jp/naid/10026792179/. 1

  2. [2]

    Sulla determinazione empirica delle leggi di probabilita.Giorn

    Francesco Paolo Cantelli. Sulla determinazione empirica delle leggi di probabilita.Giorn. Ist. Ital. Attuari, 4,

  3. [3]

    On the uniform convergence of relative frequencies of events to their probabilities,

    V Vapnik and A Chervonenkis. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities.Theory of Probability and its Applications, 16:264–280, 1971. ISSN 0040-585X. doi:10.1137/ 1116025. URLhttps://doi.org/10.1137/1116025. doi: 10.1137/1116025

  4. [4]

    A Theory of the Learnable.Communications of the ACM, 27:1134–1142, 1984

    L G Valiant. A Theory of the Learnable.Communications of the ACM, 27:1134–1142, 1984. ISSN 0001-

  5. [5]

    A theory of the learnable,

    doi:10.1145/1968.1972. URLhttp://doi.acm.org/10.1145/1968.1972. 1, 2

  6. [6]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume 30, 2017. 1

  7. [7]

    A Bayesian approach to filtering junk E-mail.AAAI Con- ference on Artificial Intelligence, 1998

    M Sahami, S Dumais, D Heckerman, and E Horvitz. A Bayesian approach to filtering junk E-mail.AAAI Con- ference on Artificial Intelligence, 1998. URLhttps://cdn.aaai.org/Workshops/1998/WS-98-05/ WS98-05-009.pdf. 1

  8. [8]

    30 Leland McInnes, John Healy, and Steve Astels

    Abraham Wald. Statistical Decision Functions.Annals of Mathematical Statistics, 20:165–205, 1949. ISSN 0003-4851,2168-8990. doi:10.1214/aoms/1177730030. URLhttps://projecteuclid.org/euclid. aoms/1177730030. 1

  9. [9]

    The Annals of Mathematical Statistics , author =

    Leonard E Baum and Ted Petrie. Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics, 37:1554–1563, 1966. ISSN 0003-4851,2168-8990. doi:10.1214/ aoms/1177699147. URLhttp://dx.doi.org/10.1214/aoms/1177699147. 1

  10. [10]

    Dynamic programming and stochastic control processes.Information and control, 1:228– 239, 1958

    Richard Bellman. Dynamic programming and stochastic control processes.Information and control, 1:228– 239, 1958. ISSN 0019-9958,1878-2981. doi:10.1016/s0019-9958(58)80003-0. URLhttp://dx.doi. org/10.1016/S0019-9958(58)80003-0. 1

  11. [11]

    A new approach to linear filtering and prediction problems.International Jour- nal of Engineering, Transactions A: Basics, 82:35–45, 1960

    R E Kalman. A new approach to linear filtering and prediction problems.International Jour- nal of Engineering, Transactions A: Basics, 82:35–45, 1960. ISSN 0021-9223. doi:10.1115/ 1.3662552. URLhttp://fluidsengineering.asmedigitalcollection.asme.org/article. aspx?articleid=1430402. 1

  12. [12]

    Adaptive control: The model reference approach.IEEE transactions on systems, man, and cybernetics, SMC-14:169–170, 1984

    Y oan D Landau. Adaptive control: The model reference approach.IEEE transactions on systems, man, and cybernetics, SMC-14:169–170, 1984. ISSN 0018-9472,2168-2909. doi:10.1109/tsmc.1984.6313284. URL http://dx.doi.org/10.1109/TSMC.1984.6313284. 1

  13. [13]

    MIT Press, 2018

    Richard S Sutton and Andrew G Barto.Reinforcement Learning: An Introduction. MIT Press, 2018. 1, 4

  14. [14]

    Maddison, et al

    David Silver, Aja Huang, Chris J. Maddison, et al. Mastering the game of go with deep neural networks and tree search.Nature, 529(7587):484–489, 2016. 1, 4

  15. [15]

    doi: 10.1613/jair.1.13673

    Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards Continual Reinforcement Learn- ing: A Review and Perspectives.Journal of Artificial Intelligence Research, 75:1401–1476, 2022. ISSN 1076-9757,1076-9757. doi:10.1613/jair.1.13673. URLhttps://www.jair.org/index.php/jair/ article/view/13673. 1

  16. [16]

    arXiv preprint arXiv:2307.11046 , title =

    David Abel, André Barreto, Benjamin Van Roy, Doina Precup, H V Hasselt, and Satinder Singh. A definition of continual reinforcement learning.Neural Information Processing Systems, abs/2307.11046, 2023. doi: 10.48550/arXiv.2307.11046. URLhttps://openreview.net/pdf?id=ZZS9WEWYbD. 8

  17. [17]

    Continual learning as computationally constrained reinforcement learning.Foundations and Trends® in Machine Learning, 18:913–1053, 2025

    Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, and Benjamin Van Roy. Continual learning as computationally constrained reinforcement learning.Foundations and Trends® in Machine Learning, 18:913–1053, 2025. ISSN 1935-8237,1935-8245. doi:10.1561/2200000116. URLhttp://dx.doi.org/10.1561/2200000116. 1

  18. [18]

    Y ou only live once: Single-life reinforcement learning.Advances in Neural Information Processing Systems, abs/2210.08863, 2022

    Annie S Chen, Archit Sharma, S Levine, and Chelsea Finn. Y ou only live once: Single-life reinforcement learning.Advances in Neural Information Processing Systems, abs/2210.08863, 2022. ISSN 1049-5258. doi:10.48550/arXiv.2210.08863. URLhttps://proceedings.neurips.cc/paper_files/paper/ 2022/file/5ec4e93f2cec19d47ef852a0e1fb2c48-Paper-Conference.pdf. 1

  19. [19]

    Reset-free lifelong learning with skill-space planning.arXiv [cs.LG], 2020

    Kevin Lu, Aditya Grover, Pieter Abbeel, and Igor Mordatch. Reset-free lifelong learning with skill-space planning.arXiv [cs.LG], 2020. URLhttps://openreview.net/pdf?id=HIGSa_3kOx3. 1

  20. [20]

    Basic Books, 2013

    Leslie Valiant.Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Com- plex World. Basic Books, 2013. ISBN 9780465032716. 2

  21. [21]

    The MIT Press, kindle edition, 2012

    Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.Foundations of Machine Learning (Adaptive Computation and Machine Learning series). The MIT Press, kindle edition, 2012. 2

  22. [22]

    MIT Press, 2016

    Ian Goodfellow, Y oshua Bengio, Aaron Courville, and Y oshua Bengio.Deep Learning, volume 1 ofAdaptive Computation and Machine Learning series. MIT Press, 2016. ISBN 9780262337434. URLhttps://www. amazon.com/dp/B01MRVFGX4/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1. 2

  23. [23]

    Simple lifelong learning machines.IEEE transactions on pattern analysis and machine intelligence, PP:1–15, 2025

    Joshua T Vogelstein, Jayanta Dey, Hayden S Helm, Will LeVine, Ronak D Mehta, Tyler M Tomita, Haoyin Xu, Ali Geisa, Qingyang Wang, Gido M van de Ven, Chenyu Gao, Weiwei Y ang, Bryan Tower, Jonathan Larson, Christopher M White, and Carey E Priebe. Simple lifelong learning machines.IEEE transactions on pattern analysis and machine intelligence, PP:1–15, 2025...

  24. [24]

    Prospective Learning: Principled Extrapolation to the Future

    Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J How, Justus M Kebschull, John W Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Y ezerets, Dinesh Jayaraman, Jong M Shin, Soledad Villar, Ian Phillips, Carey E P...

  25. [25]

    Prospective learning: Learning for a dynamic future

    Ashwin De Silva, Rahul Ramesh, Rubing Y ang, Siyu Yu, Joshua T Vogelstein, and Pratik Chaudhari. Prospective learning: Learning for a dynamic future. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 4, 11, 12

  26. [26]

    Lecture notes in computer science

    Yuxin Bai, Cecelia Shuai, Ashwin De Silva, Siyu Yu, Pratik Chaudhari, and Joshua T Vogelstein.Prospective learning in retrospect, pages 17–29. Lecture notes in computer science. Springer Nature Switzerland, 2026. 2, 5, 11

  27. [27]

    Athena Scientific, 2023

    Dimitri Bertsekas.A course in Reinforcement Learning. Athena Scientific, 2023. 2, 4

  28. [28]

    Monte carlo go, 1993

    Bernd Brügmann. Monte carlo go, 1993. 4

  29. [29]

    Bandit based monte-carlo planning

    Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors,Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Berlin, Germany, September 18–22, 2006, Proceedings, volume 4212 ofLecture Notes in Computer Science, pages 282–293. Springer, 2006. 9

  30. [30]

    Efficient selectivity and backup operators in monte-carlo tree search

    Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In H. Jaap van den Herik, Paolo Ciancarini, and H. H. L. M. Donkers, editors,Computers and Games, CG 2006, Turin, Italy, May 29–31, 2006, Revised Papers, Lecture Notes in Computer Science, pages 72–83. Springer, 2007

  31. [31]

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. Mastering chess and shogi by self-play with a general reinforcement learning algorithm, 2017. URLhttps://arxiv.org/abs/1712.01815. 4

  32. [32]

    What is foraging?Biology & Philosophy, 39:3, 2024

    David L Barack. What is foraging?Biology & Philosophy, 39:3, 2024. 4

  33. [33]

    Psychology Press, 1 edition, 2014

    James J Gibson.The Ecological Approach to Visual Perception: Classic Edition (Psychology Press & Routledge Classic Editions). Psychology Press, 1 edition, 2014. 4

  34. [34]

    Tree-based batch mode reinforcement learning.Journal of Machine Learning Research, 6, 2005

    Damien Ernst, Pierre Geurts, and Louis Wehenkel. Tree-based batch mode reinforcement learning.Journal of Machine Learning Research, 6, 2005. 5, 15

  35. [35]

    Finite-time bounds for fitted value iteration.Journal of Machine Learn- ing Research, 9(5), 2008

    Remi Munos and Csaba Szepesvari. Finite-time bounds for fitted value iteration.Journal of Machine Learn- ing Research, 9(5), 2008. 5, 15

  36. [36]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018. 5, 16

  37. [37]

    Soft Actor-Critic Algorithms and Applications

    Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. Soft actor-critic algorithms and applications.arXiv preprint arXiv:1812.05905, 2018. 5, 16

  38. [38]

    Cambridge University Press, 2014

    Shai Shalev-Shwartz and Shai Ben-David.Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. 11

  39. [39]

    On the application of probability theory to agricultural experiments: Essay on principles, section 9.(translated in 1990).Statistical Science, 5:465–480, 1923

    J Neyman. On the application of probability theory to agricultural experiments: Essay on principles, section 9.(translated in 1990).Statistical Science, 5:465–480, 1923. 12

  40. [40]

    prospective learning

    D Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of educational Psychology, 66:688–701, 1974. 12 10 A Prospective Learning without control (PL-C) Here we briefly review the prior work on this topic, which is called "prospective learning" [23–25] (PL), modifying notation slightly for convenience. In retrospec...