Treatment, evidence, imitation, and chat
Pith reviewed 2026-05-19 08:18 UTC · model grok-4.3
The pith
Imitation from chat data cannot solve the core medical treatment problem that clinicians and patients must address together.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Solving the treatment problem demands integration of randomized experimental data and carefully interpreted observational data rather than imitation alone; an LLM-based system can participate in that process but only after the ethical and evidentiary challenges of obtaining suitable training signals are resolved.
What carries the argument
The contrast between the treatment problem (evidence-driven collaborative decision making) and the chat problem (imitation of conversational responses), with statins used to illustrate the evidentiary requirements.
If this is right
- Experimental data from randomized trials remains indispensable for validating treatment choices even when language models participate.
- Observational data can fill gaps but requires explicit handling of confounding and selection assumptions.
- Imitation-trained chat capabilities may improve communication around decisions without replacing the evidence base.
- Regulatory and ethical frameworks for medical AI will need to address how training data for treatment decisions is obtained.
Where Pith is reading between the lines
- Similar distinctions between imitation and evidence may apply to other high-stakes domains such as legal or financial advice.
- One testable extension is whether hybrid training that injects trial results into language-model fine-tuning measurably improves downstream patient outcomes.
- The argument implies that purely observational or chat-derived systems risk systematic bias unless paired with experimental benchmarks.
Load-bearing premise
Ethical experiments and defensible observational assumptions can be secured to generate the data needed for training systems on real treatment decisions.
What would settle it
A controlled study in which patients whose decisions are guided by an imitation-only model achieve the same or better health outcomes than those guided by current evidence-based protocols would undermine the central claim.
read the original abstract
Large language models are thought to have the potential to aid in medical decision making. This work investigates the degree to which this might be the case. We start with the treatment problem, the patient's core medical decision-making task, which is solved in collaboration with a clinician. We discuss different approaches to solving it, including, within evidence-based medicine, experimental and observational data. We then discuss the chat problem, and how this differs from the treatment problem -- in particular with respect to imitation (and how imitation alone cannot solve the true treatment problem, although this does not mean it is not useful). We then discuss how a large-language-model-based system might be trained to solve the treatment problem, highlighting that the major challenges relate to the ethics of experimentation and the assumptions associated with observation. We finally discuss how these challenges relate to evidence-based medicine and how this might inform the efforts of the medical research community to solve the treatment problem. Throughout, we illustrate our arguments with the cholesterol medications, statins.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that large language models have potential to aid medical decision making but must distinguish the treatment problem, which requires causal evidence from experimental or observational studies under ethical constraints, from the chat problem based on imitation. Using the statins example, it argues that imitation alone cannot solve the treatment problem, though it can be useful, and identifies ethics of experimentation and observational assumptions as major challenges for training such systems, relating this to evidence-based medicine.
Significance. The result, if it holds, is significant in that it provides a clear conceptual separation between imitation-driven chat and evidence-based treatment in the context of AI for medicine. The paper gives credit to the concrete statins illustration for making the ethical and observational issues tangible. This framework could help steer research away from over-reliance on pure imitation learning for high-stakes causal decisions.
minor comments (2)
- [Abstract] The abstract is concise but could include a sentence on the statins example to better prepare the reader for the full argument.
- [Discussion of training LLMs] The challenges are well-described qualitatively; adding a reference to specific methods in causal inference, such as those handling observational data biases, would enhance clarity without altering the conceptual nature.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our manuscript and the recommendation for minor revision. We are pleased that the conceptual framework distinguishing the treatment problem from the chat problem, and the use of the statins example to illustrate ethical and observational challenges, was viewed as significant. We respond to the referee's summary of the paper below.
read point-by-point responses
-
Referee: The manuscript claims that large language models have potential to aid medical decision making but must distinguish the treatment problem, which requires causal evidence from experimental or observational studies under ethical constraints, from the chat problem based on imitation. Using the statins example, it argues that imitation alone cannot solve the treatment problem, though it can be useful, and identifies ethics of experimentation and observational assumptions as major challenges for training such systems, relating this to evidence-based medicine.
Authors: We appreciate this concise summary of our work, which accurately reflects the main points we sought to make. We agree that the distinction is important for guiding research in AI for medicine away from over-reliance on imitation for causal decisions. revision: no
Circularity Check
No significant circularity identified
full rationale
This is a conceptual discussion paper with no mathematical derivations, equations, or fitted parameters. It distinguishes the treatment problem (requiring causal evidence under ethical/observational constraints) from the chat/imitation problem using standard principles of evidence-based medicine and causal inference, illustrated via the statins example. All load-bearing claims rest on externally established distinctions rather than self-referential definitions, self-citations, or reductions to inputs by construction. The paper is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The treatment problem requires evidence from experimental or observational data rather than imitation alone.
- domain assumption Ethics of experimentation and assumptions in observational data are the primary barriers to training LLM-based treatment systems.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the optimization in (2) is the fundamental treatment problem... imitation objective in (7) does not take into account utility, U
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
observational data... no unmeasured confounders assumption cannot be verified with data alone
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Constrained policy optimization, in: International Conference on Machine Learning, PMLR
Achiam, J., Held, D., Tamar, A., Abbeel, P., 2017. Constrained policy optimization, in: International Conference on Machine Learning, PMLR. pp. 22–31
work page 2017
-
[2]
Ah-Thiane, L., Heudel, P.E., Campone, M., Robert, M., Brillaud-Meflah, V., Rousseau, C., Le Blanc-Onfroy, M., Tomaszewski, F., Supiot, S., Perennec, T., et al., 2025. Large language models as decision-making tools in oncology: Comparing artificial intelligence suggestions and expert recommendations. JCO Clinical Cancer Informatics 9, e2400230
work page 2025
-
[3]
Large language models as co-pilots for causal inference in medical studies
Alaa, A., Phillips, R.V., Kıcıman, E., Balzer, L.B., van der Laan, M., Petersen, M., 2024. Large language models as co-pilots for causal inference in medical studies. arXiv preprint arXiv:2407.19118
-
[4]
Artificial hallucinations in chatgpt: implications in scientific writing
Alkaissi, H., McFarlane, S.I., 2023. Artificial hallucinations in chatgpt: implications in scientific writing. Cureus 15
work page 2023
-
[5]
Randomized-controlled trials are methodologically inappropriate in adolescent transgender healthcare
Ashley, F., Tordoff, D.M., Olson-Kennedy, J., Restar, A.J., 2024. Randomized-controlled trials are methodologically inappropriate in adolescent transgender healthcare. International Journal of Transgender Health 25, 407–418
work page 2024
-
[6]
Evaluating artificial intelligence responses to public health questions
Ayers, J.W., Zhu, Z., Poliak, A., Leas, E.C., Dredze, M., Hogarth, M., Smith, D.M., 2023. Evaluating artificial intelligence responses to public health questions. JAMA Network Open 6, e2317517–e2317517
work page 2023
-
[7]
Why we need observational studies to evaluate the effectiveness of health care
Black, N., 1996. Why we need observational studies to evaluate the effectiveness of health care. Bmj 312, 1215–1218
work page 1996
-
[8]
Braude, H.D., 2009. Clinical intuition versus statistics: different modes of tacit knowledge in clinical epidemiology and evidence-based medicine. Theoretical medicine and bioethics 30, 181–198
work page 2009
-
[9]
Superhuman performance of a large language model on the reasoning tasks of a physician
Brodeur, P.G., Buckley, T.A., Kanjee, Z., Goh, E., Ling, E.B., Jain, P., Cabral, S., Abdulnour, R.E., Haimovich, A., Freed, J.A., et al., 2024. Superhuman performance of a large language model on the reasoning tasks of a physician. arXiv preprint arXiv:2412.10849
-
[10]
Language models are few-shot learners
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al., 2020. Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901
work page 2020
-
[11]
Impact of a digital scribe system on clinical documentation time and quality: usability study
van Buchem, M.M., Kant, I.M., King, L., Kazmaier, J., Steyerberg, E.W., Bauer, M.P., 2024. Impact of a digital scribe system on clinical documentation time and quality: usability study. JMIR AI 3, e60020
work page 2024
- [12]
-
[13]
Statistical Reinforcement Learning
Chakraborty, B., Moodie, E.E.M., 2013. Statistical Reinforcement Learning. Springer New York, New York, NY. pp. 31–52. URL: https://doi.org/10.1007/978-1-4614-7428-9_3 , doi:10.1007/978-1-4614-7428-9_3
-
[14]
Chaudhuri, S.E., Ben Chaouch, Z., Hauber, B., Mange, B., Zhou, M., Christopher, S., Bardot, D., Sheehan, M., Donnelly, A., McLaughlin, L., et al., 2023. Use of bayesian decision analysis to maximize value in patient-centered randomized clinical trials in parkinson’s disease. Journal of Biopharmaceutical Statistics , 1–20
work page 2023
-
[15]
Clinical judgement in the era of big data and predictive analytics
Chin-Yee, B., Upshur, R., 2018. Clinical judgement in the era of big data and predictive analytics. Journal of Evaluation in Clinical Practice 24, 638–645
work page 2018
-
[16]
Chou, R., Cantor, A., Dana, T., Wagner, J., Ahmed, A.Y., Fu, R., Ferencik, M., 2022. Statin use for the primary prevention of cardiovascular disease in adults: updated evidence report and systematic review for the us preventive services task force. Jama 328, 754–771
work page 2022
-
[17]
Beyond randomised versus observational studies
Concato, J., Horwitz, R.I., 2004. Beyond randomised versus observational studies. The Lancet 363, 1660–1661
work page 2004
-
[18]
Understanding and misunderstanding randomized controlled trials
Deaton, A., Cartwright, N., 2018. Understanding and misunderstanding randomized controlled trials. Social science & medicine 210, 2–21
work page 2018
-
[19]
Health professionals’ adherence to stroke clinical guidelines: a review of the literature
Donnellan, C., Sweetman, S., Shelley, E., 2013. Health professionals’ adherence to stroke clinical guidelines: a review of the literature. Health policy 111, 245–263
work page 2013
-
[20]
Suffering, meaning, and healing: challenges of contemporary medicine
Egnew, T.R., 2009. Suffering, meaning, and healing: challenges of contemporary medicine. The Annals of Family Medicine 7, 170–175
work page 2009
-
[21]
Constructing dynamic treatment regimes over indefinite time horizons
Ertefaie, A., Strawderman, R.L., 2018. Constructing dynamic treatment regimes over indefinite time horizons. Biometrika 105, 963–977
work page 2018
-
[22]
Value-aware loss function for model learning in reinforcement learning
Farahmand, A.m., Barreto, A.M., Nikovski, D.N., 2016. Value-aware loss function for model learning in reinforcement learning
work page 2016
-
[23]
The intellectual crisis of psychiatric research
Fava, G.A., 2006. The intellectual crisis of psychiatric research. Psychotherapy and Psycho- somatics 75, 202–208
work page 2006
- [24]
-
[25]
Judg- ment and decision-making in clinical dentistry
Feller, L., Lemmer, J., Nemutandani, M.S., Ballyram, R., Khammissa, R.A.G., 2020. Judg- ment and decision-making in clinical dentistry. Journal of International Medical Research 48, 0300060520972877
work page 2020
-
[26]
Fijaˇ cko, N., Gosak, L.,ˇStiglic, G., Picard, C.T., Douma, M.J., 2023. Can chatgpt pass the life support exams without entering the american heart association course? Resuscitation 185
work page 2023
-
[27]
Fisher, R.A., 1970. Statistical methods for research workers, in: Breakthroughs in statistics: Methodology and distribution. Springer, pp. 66–70
work page 1970
-
[28]
Principal stratification in causal inference
Frangakis, C.E., Rubin, D.B., 2002. Principal stratification in causal inference. Biometrics 58, 21–29
work page 2002
-
[29]
Popcorn: Partially observed prediction constrained reinforcement learning
Futoma, J., Hughes, M.C., Doshi-Velez, F., 2020. Popcorn: Partially observed prediction constrained reinforcement learning. arXiv preprint arXiv:2001.04032 . 14
-
[30]
Ghasemi, P., Greenberg, M., Southern, D.A., Li, B., White, J.A., Lee, J., 2025. Personalized decision making for coronary artery disease treatment using offline reinforcement learning. npj Digital Medicine 8, 99
work page 2025
-
[31]
Goff, D.C., Lloyd-Jones, D.M., Bennett, G., Coady, S., D’agostino, R.B., Gibbons, R., Green- land, P., Lackland, D.T., Levy, D., O’donnell, C.J., et al., 2014. 2013 acc/aha guideline on the assessment of cardiovascular risk: a report of the american college of cardiology/american heart association task force on practice guidelines. Journal of the American...
work page 2014
-
[32]
Large language model influence on management reasoning: A randomized controlled trial
Goh, E., Gallo, R., Strong, E., Weng, Y., Kerman, H., Freed, J., Cool, J.A., Kanjee, Z., Lane, K.P., Parsons, A.S., et al., 2024. Large language model influence on management reasoning: A randomized controlled trial. medRxiv
work page 2024
-
[33]
Accuracy and reliability of chatbot responses to physician questions
Goodman, R.S., Patrinely, J.R., Stone, C.A., Zimmerman, E., Donald, R.R., Chang, S.S., Berkowitz, S.T., Finn, A.P., Jahangir, E., Scoville, E.A., et al., 2023. Accuracy and reliability of chatbot responses to physician questions. JAMA Network Open 6, e2336483–e2336483
work page 2023
-
[34]
Gottesman, O., Futoma, J., Liu, Y., Parbhoo, S., Celi, L., Brunskill, E., Doshi-Velez, F.,
-
[35]
Interpretable off-policy evaluation in reinforcement learning by highlighting influential transitions, in: International Conference on Machine Learning, PMLR. pp. 3658–3667
-
[36]
Evaluating Reinforcement Learning Algorithms in Observational Health Settings
Gottesman, O., Johansson, F., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., et al., 2018. Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:1805.12298
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
Intuition and evidence–uneasy bedfellows? British Journal of General Practice 52, 395–400
Greenhalgh, T., 2002. Intuition and evidence–uneasy bedfellows? British Journal of General Practice 52, 395–400
work page 2002
-
[38]
As- sessment of large language models (llms) in decision-making support for gynecologic oncology
Gumilar, K.E., Indraprasta, B.R., Faridzi, A.S., Wibowo, B.M., Herlambang, A., Rahestyn- ingtyas, E., Irawan, B., Tambunan, Z., Bustomi, A.F., Brahmantara, B.N., et al., 2024. As- sessment of large language models (llms) in decision-making support for gynecologic oncology. Computational and Structural Biotechnology Journal 23, 4019–4026
work page 2024
-
[39]
The impact of nuance dax ambient listening ai documentation: a cohort study
Haberle, T., Cleveland, C., Snow, G.L., Barber, C., Stookey, N., Thornock, C., Younger, L., Mullahkhel, B., Ize-Ludlow, D., 2024. The impact of nuance dax ambient listening ai documentation: a cohort study. Journal of the American Medical Informatics Association 31, 975–979
work page 2024
-
[40]
Artificial intelligence in medicine
Hamet, P., Tremblay, J., 2017. Artificial intelligence in medicine. metabolism 69, S36–S40
work page 2017
-
[41]
Medpair: Measuring physicians and ai relevance alignment in medical question answering
Hao, Y., Alhamoud, K., Jeong, H., Zhang, H., Puri, I., Torr, P., Schaekermann, M., Stern, A.D., Ghassemi, M., 2025. Medpair: Measuring physicians and ai relevance alignment in medical question answering. arXiv preprint arXiv:2505.24040
-
[42]
Why a bayesian approach to drug development and evalua- tion?
Harrell Jr, F.E., Vange, L., 2019. Why a bayesian approach to drug development and evalua- tion?
work page 2019
-
[43]
Recognizing racit knowledge in medical epistemology
Henry, S.G., 2006. Recognizing racit knowledge in medical epistemology. Theoretical medicine and bioethics 27, 187–213
work page 2006
-
[44]
Evidence-based practice– imperfect but necessary
Herbert, R.D., Sherrington, C., Maher, C., Moseley, A.M., 2001. Evidence-based practice– imperfect but necessary. Physiotherapy Theory and Practice 17, 201–211. 15
work page 2001
-
[45]
Artificial intelligence in medicine
Holmes, J., Sacchi, L., Bellazzi, R., et al., 2004. Artificial intelligence in medicine. Ann R Coll Surg Engl 86, 334–8
work page 2004
-
[46]
A generalization of sampling without replacement from a finite universe
Horvitz, D.G., Thompson, D.J., 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association 47, 663–685
work page 1952
-
[47]
H¨ uy¨ uk, A., Qian, Z., van der Schaar, M., 2024. Adaptive experiment design with synthetic controls, in: International Conference on Artificial Intelligence and Statistics, PMLR. pp. 1180–1188
work page 2024
-
[48]
An evaluation framework for clinical use of large language models in patient interaction tasks
Johri, S., Jeong, J., Tran, B.A., Schlessinger, D.I., Wongvibulsin, S., Barnes, L.A., Zhou, H.Y., Cai, Z.R., Van Allen, E.M., Kim, D., et al., 2025. An evaluation framework for clinical use of large language models in patient interaction tasks. Nature Medicine , 1–10
work page 2025
-
[49]
Deep reinforcement learning in medicine
Jonsson, A., 2019. Deep reinforcement learning in medicine. Kidney diseases 5, 18–22
work page 2019
- [50]
-
[51]
Efficient evaluation of natural stochastic policies in offline reinforcement learning
Kallus, N., Uehara, M., 2020. Efficient evaluation of natural stochastic policies in offline reinforcement learning. arXiv preprint arXiv:2006.03886
-
[52]
Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge
Kanjee, Z., Crowe, B., Rodman, A., 2023. Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge. JAMA URL: https://doi.org/10.1001/jama. 2023.8288, doi:10.1001/jama.2023.8288
-
[53]
Diversity, equity, and inclusion in clinical trials
Keegan, G., Crown, A., Joseph, K.A., 2023. Diversity, equity, and inclusion in clinical trials. Surgical Oncology Clinics 32, 221–232
work page 2023
-
[54]
Towards optimal doubly robust estimation of heterogeneous causal effects
Kennedy, E.H., 2023. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics 17, 3008–3049
work page 2023
-
[55]
Abstentionbench: Reasoning llms fail on unanswerable questions
Kirichenko, P., Ibrahim, M., Chaudhuri, K., Bell, S.J., 2025. Abstentionbench: Reasoning llms fail on unanswerable questions. arXiv preprint arXiv:2506.09038
-
[56]
Imitation and reinforcement learning
Kober, J., Peters, J., 2010. Imitation and reinforcement learning. IEEE Robotics & Automa- tion Magazine 17, 55–62
work page 2010
-
[57]
On information and sufficiency
Kullback, S., Leibler, R.A., 1951. On information and sufficiency. The annals of mathematical statistics 22, 79–86
work page 1951
-
[58]
Kumah-Crystal, Y., Mankowitz, S., Embi, P., Lehmann, C.U., 2023. Chatgpt and the clinical informatics board examination: the end of unproctored maintenance of certification? Journal of the American Medical Informatics Association , ocad104
work page 2023
-
[59]
Kung, T.H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepa˜ no, C., Madriaga, M., Aggabao, R., Diaz-Candido, G., Maningo, J., et al., 2023. Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models. PLoS digital health 2, e0000198
work page 2023
-
[60]
Kweon, S., Kim, J., Kwak, H., Cha, D., Yoon, H., Kim, K., Won, S., Choi, E., 2024. Ehrnoteqa: A patient-specific question answering benchmark for evaluating large language models in clin- ical settings. Preprint
work page 2024
-
[61]
Dynamic treatment regimes: Technical challenges and applications
Laber, E.B., Lizotte, D.J., Qian, M., Pelham, W.E., Murphy, S.A., 2014. Dynamic treatment regimes: Technical challenges and applications. Electronic journal of statistics 8, 1225. 16
work page 2014
-
[62]
LeBlanc, P.M., Banks, D., Fu, L., Li, M., Tang, Z., Wu, Q., 2024. Recommender systems: a review. Journal of the American Statistical Association 119, 773–785
work page 2024
-
[63]
Levine, S., Abbeel, P., 2014. Learning neural network policies with guided policy search under unknown dynamics., in: NIPS, Citeseer. pp. 1071–1079
work page 2014
-
[64]
Mediq: Question-asking llms and a benchmark for reliable interactive clinical reasoning
Li, S., Balachandran, V., Feng, S., Ilgen, J., Pierson, E., Koh, P.W.W., Tsvetkov, Y., 2024. Mediq: Question-asking llms and a benchmark for reliable interactive clinical reasoning. Ad- vances in Neural Information Processing Systems 37, 28858–28888
work page 2024
-
[65]
Lindley, D., 1991. Making Decisions. Wiley. URL: https://books.google.com/books?id= 3-ZQAAAAMAAJ
work page 1991
-
[66]
Using AI-generated suggestions from ChatGPT to opti- mize clinical decision support
Liu, S., Wright, A.P., Patterson, B.L., Wanderer, J.P., Turer, R.W., Nelson, S.D., McCoy, A.B., Sittig, D.F., Wright, A., 2023. Using AI-generated suggestions from ChatGPT to opti- mize clinical decision support. Journal of the American Medical Informatics Association 30, 1237–1245. doi: 10.1093/jamia/ocad072
-
[67]
Luckett, D.J., Laber, E.B., Kahkoska, A.R., Maahs, D.M., Mayer-Davis, E., Kosorok, M.R.,
-
[68]
Journal of the American Statistical Association
Estimating dynamic treatment regimes in mobile health using v-learning. Journal of the American Statistical Association
-
[69]
Overview of artificial intelligence in medicine
Malik, P., Pathania, M., Rathaur, V.K., et al., 2019. Overview of artificial intelligence in medicine. Journal of family medicine and primary care 8, 2328–2331
work page 2019
-
[70]
Miao, B.Y., Williams, C.Y., Chinedu-Eneh, E., Zack, T., Alsentzer, E., Butte, A.J., Chen, I.Y., 2025. Understanding contraceptive switching rationales from real world clinical notes using large language models. npj Digital Medicine 8, 221
work page 2025
-
[71]
Optimal dynamic treatment regimes
Murphy, S.A., 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, 331–355
work page 2003
-
[72]
Narayanan, A., Kapoor, S., 2024. Ai snake oil: What artificial intelligence can do, what it can’t, and how to tell the difference, in: AI Snake Oil. Princeton University Press
work page 2024
-
[73]
PEGASUS: A Policy Search Method for Large MDPs and POMDPs
Ng, A.Y., Jordan, M.I., 2013. Pegasus: A policy search method for large mdps and pomdps. arXiv preprint arXiv:1301.3878
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[74]
OpenAI, 2023. Chatgpt. https://chat.openai.com
work page 2023
-
[75]
Training language models to follow instructions with human feedback
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al., 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35, 27730–27744
work page 2022
-
[76]
Monte Carlo theory, methods and examples
Owen, A.B., 2013. Monte Carlo theory, methods and examples
work page 2013
-
[77]
Pauker, S.G., Kassirer, J.P., 1987. Decision analysis. New England Journal of Medicine 316, 250–258
work page 1987
- [78]
-
[79]
Relative entropy policy search, in: Proceedings of the AAAI Conference on Artificial Intelligence
Peters, J., Mulling, K., Altun, Y., 2010. Relative entropy policy search, in: Proceedings of the AAAI Conference on Artificial Intelligence
work page 2010
-
[80]
Petersen, B.K., Yang, J., Grathwohl, W.S., Cockrell, C., Santiago, C., An, G., Faissol, D.M.,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.