Belief-Space Control for Personalized Cancer Treatment via Active Inference

C. Emre Koksal; Deniz Sargun; H. Bugra Tulay

arxiv: 2606.10376 · v3 · pith:FCM2TL46new · submitted 2026-06-09 · 💻 cs.AI · cs.IT· math.IT

Belief-Space Control for Personalized Cancer Treatment via Active Inference

Deniz Sargun , H. Bugra Tulay , C. Emre Koksal This is my paper

Pith reviewed 2026-06-27 13:33 UTC · model grok-4.3

classification 💻 cs.AI cs.ITmath.IT

keywords active inferencebelief space controlcancer treatmentpersonalized medicinesequential decision makingexpected free energypartial observability

0 comments

The pith

Belief-space active inference enables simultaneous patient categorization and high-efficacy cancer treatment under measurement constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames cancer treatment as a sequential decision-making problem with partial observability and latent patient heterogeneity. It models the problem in belief space using active inference and derives an expected free-energy objective that unifies goal-directed control with information acquisition while respecting explicit budgets on medical measurements. Unlike standard reinforcement learning, this accounts for treatments that permanently alter patient transition dynamics. The framework is implemented and tested on real clinical data from the AACR Project GENIE Biopharma Collaborative dataset. A sympathetic reader would care because the approach produces both categorization and effective treatment recommendations in one objective under realistic constraints.

Core claim

By modeling cancer treatment as a belief-space planning problem using active inference, an expected free-energy objective is derived that unifies goal-directed control and information acquisition under measurement budgets. When implemented on real clinical cancer data from the AACR Project GENIE Biopharma Collaborative dataset, this yields simultaneous patient categorization and high treatment efficacy under real measurement and treatment constraints.

What carries the argument

The expected free-energy objective in a belief-space active-inference model, which unifies goal-directed control and information acquisition under measurement budgets.

If this is right

Treatments that permanently modify patient transition dynamics can be planned by updating beliefs over time.
Measurement budgets are explicitly incorporated into the planning objective.
Patient heterogeneity is handled through latent states in the belief space.
The same objective produces both categorization and treatment recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could extend to other medical sequential decisions involving irreversible actions and partial observability.
Standard reinforcement learning approaches may require similar belief-space adaptations in domains where actions change dynamics permanently.
Validation on additional clinical datasets beyond GENIE would test broader applicability.

Load-bearing premise

Cancer treatment dynamics and measurement constraints can be faithfully captured by a belief-space active-inference model whose expected free-energy objective directly yields both categorization and high efficacy on the AACR GENIE dataset.

What would settle it

Applying the model to the AACR GENIE dataset and observing that it fails to produce both accurate patient categorization and higher treatment efficacy than standard methods under the same measurement budgets would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.10376 by C. Emre Koksal, Deniz Sargun, H. Bugra Tulay.

**Figure 2.** Figure 2: Empirical transition probabilities. Top: Baseline (untreated) transitions P0|C for early-stage, middle-aged patients. Bottom: Treatment-conditioned transitions during chemo. windows and map each window to one of seven action classes (EGFR, VEGF, BRAF, HER2, IO, ChemoOnly, Investigational) using the following priority: check targeted agents first, then IO, and default to ChemoOnly when only cytotoxic agents… view at source ↗

**Figure 3.** Figure 3: True patient state (top) and agent belief distribution (bottom) over [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 7.** Figure 7: ℓ2 distance between current and desired steady-state distributions patient trajectories (100 steps each) to quantify the tradeoff between information acquisition and control performance [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Total measurement and total cost statistics for five measurement [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: shows the instantaneous expected life expectancy vi (Lemma 1) of the evolving transition matrix Pk for an advanced-stage young patient receiving EGFR-targeted therapy. As treatment blends into the baseline dynamics via Pk+1 = (1 − αk)Pk + αkT, the expected remaining life from state A (Attenuation) increases from 237 to 327 time steps, with similar improvements from states B and C. The concave saturation sh… view at source ↗

**Figure 10.** Figure 10: Available longitudinal information for a representative patient in [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

read the original abstract

Cancer treatment is at the core a sequential decision-making problem with partial observability, latent patient heterogeneity, and explicit constraints on the budget for medical measurements. Unlike standard Reinforcement Learning (RL) approaches that control state trajectories, cancer treatments permanently modify patients' transition dynamics, changing how states evolve over time. We model cancer treatment as a belief-space planning problem using active inference, deriving an expected free-energy objective that unifies goal-directed control and information acquisition under measurement budgets without. We implement this framework using real clinical cancer data from the AACR Project GENIE Biopharma Collaborative dataset. Results on clinical data demonstrate a simultaneous patient categorization and high treatment efficacy, under real measurement and treatment constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies active inference to cancer treatment planning with measurement budgets on GENIE data, but the abstract gives no way to check the model or results.

read the letter

The paper frames cancer treatment as a belief-space planning task under active inference. It uses an expected free energy objective to handle both treatment goals and the cost of measurements, while noting that treatments change patient dynamics permanently. They run it on the AACR GENIE clinical dataset and claim it produces patient categorization plus high efficacy under real constraints.

The application itself is the main new element. Standard active inference already combines control and information seeking, so the work mainly shows the framework can be pointed at oncology planning with explicit measurement limits. That framing is reasonable and addresses a practical issue that usual RL setups often ignore.

The obvious limitation is that only the abstract is visible here. There are no equations for the state transitions, no account of how the GENIE data is turned into a generative model, and no numbers or baselines for the claimed efficacy and categorization. Without those pieces it is impossible to judge whether the math actually supports the results or whether the model assumptions fit cancer biology.

The central claim therefore rests on an uninspectable step: that the expected free energy objective, once fitted to the data, simultaneously delivers both categorization and effective treatment. That step could be fine or could hide fitting artifacts; the abstract does not let us tell.

This is for readers already working on active inference or POMDPs in medical decision making. Someone looking for a worked example of information-constrained planning in oncology might pick up the problem setup. It is not yet strong enough to cite for a new method.

The paper should go to peer review. The idea is coherent on its own terms and uses real data, so referees can check the missing derivations and experiments.

Referee Report

2 major / 1 minor

Summary. The paper models cancer treatment as a sequential decision-making problem under partial observability, latent patient heterogeneity, and measurement-budget constraints. It proposes a belief-space active-inference controller whose expected free-energy objective is claimed to unify goal-directed control with information acquisition. The framework is applied to the AACR Project GENIE Biopharma Collaborative dataset, with the central claim that the resulting policy simultaneously achieves patient categorization and high treatment efficacy under realistic constraints.

Significance. If the derivation and empirical results hold, the work would demonstrate a concrete clinical use-case for active inference that respects measurement costs and dynamic modification of patient transition kernels. The use of real GENIE data rather than synthetic trajectories is a positive feature. However, the absence of any equations, model specifications, or quantitative tables in the supplied manuscript prevents evaluation of whether the expected-free-energy construction actually delivers the claimed unification or whether the reported efficacy is robust to baselines.

major comments (2)

[Methods / Derivation] No derivation, state-space definition, or expected-free-energy functional is supplied anywhere in the manuscript. Without these, it is impossible to verify whether the objective reduces to self-referential or fitted quantities (a known risk in active-inference formulations) or genuinely handles permanent changes to transition dynamics induced by treatment.
[Results] The results claim 'simultaneous patient categorization and high treatment efficacy' on the GENIE dataset, yet no metrics, confusion matrices, efficacy scores, measurement-budget curves, or baseline comparisons are provided. This renders the central empirical claim unverifiable.

minor comments (1)

[Abstract] The final sentence of the abstract is truncated ('under measurement budgets without.').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and for highlighting the need for explicit technical content. We agree that the submitted manuscript omitted key derivations and quantitative results, which prevents verification of the claims. We will revise accordingly to address both major comments.

read point-by-point responses

Referee: [Methods / Derivation] No derivation, state-space definition, or expected-free-energy functional is supplied anywhere in the manuscript. Without these, it is impossible to verify whether the objective reduces to self-referential or fitted quantities (a known risk in active-inference formulations) or genuinely handles permanent changes to transition dynamics induced by treatment.

Authors: We agree the manuscript as provided contained no equations, state-space definitions, or explicit expected-free-energy derivation. The revised version will add a dedicated Methods section with: (i) the belief-space POMDP formulation including treatment-modified transition kernels, (ii) the full derivation of the expected free energy that trades off goal-directed terms against information gain under explicit measurement budgets, and (iii) discussion of how the objective avoids self-referential collapse while capturing permanent dynamics changes. This will enable direct verification. revision: yes
Referee: [Results] The results claim 'simultaneous patient categorization and high treatment efficacy' on the GENIE dataset, yet no metrics, confusion matrices, efficacy scores, measurement-budget curves, or baseline comparisons are provided. This renders the central empirical claim unverifiable.

Authors: We acknowledge that the current manuscript supplies no numerical results, tables, or baseline comparisons. The revision will include a Results section with: categorization accuracy and confusion matrices on the AACR GENIE data, treatment efficacy scores, measurement-budget performance curves, and direct comparisons against standard RL and POMDP baselines. These additions will substantiate the dual claims of patient categorization and efficacy under realistic constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract describes modeling cancer treatment via active inference and deriving an expected free-energy objective from first principles of belief-space planning under partial observability and measurement budgets. No equations, parameter-fitting steps, or self-citations are supplied in the visible text that would allow reduction of any claimed prediction or categorization result to a fitted input or self-referential definition. The central claim of simultaneous categorization and efficacy on GENIE data is presented as an empirical outcome of the derived objective rather than a quantity forced by construction from the inputs. Without load-bearing self-citations or ansatzes that collapse the derivation, the framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; analysis is limited to the high-level claim that an expected free-energy objective unifies control and information acquisition.

pith-pipeline@v0.9.1-grok · 5647 in / 1059 out tokens · 19154 ms · 2026-06-27T13:33:15.626035+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Certified World Models as Sensing Clocks: Drift-Aware Deadlines for Active Perception
cs.LG 2026-07 unverdicted novelty 5.0

Derives a drift-aware sensing clock from certified world models that controls certificate violations on held-out data and outperforms expected-belief scheduling in a synthetic benchmark at matched sensing budget.

Reference graph

Works this paper leans on

24 extracted references · 2 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

The ecology and evolutionary biology of cancer: a review of mathematical models of necrosis and tumor cell diversity,

J. D. Nagy, “The ecology and evolutionary biology of cancer: a review of mathematical models of necrosis and tumor cell diversity,”Math. Biosci. Eng, vol. 2, no. 2, pp. 381–418, 2005

2005
[2]

A free energy principle for a particular physics

K. Friston, “A free energy principle for a particular physics,”arXiv preprint arXiv:1906.10184, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[3]

The free energy principle made simpler but not too simple,

K. Friston, L. Da Costa, N. Sajid, C. Heins, K. Ueltzh ¨offer, G. A. Pavliotis, and T. Parr, “The free energy principle made simpler but not too simple,”Physics Reports, vol. 1024, pp. 1–29, 2023

2023
[4]

On bayesian mechanics: a physics of and by beliefs,

M. J. Ramstead, D. A. Sakthivadivel, C. Heins, M. Koudahl, B. Millidge, L. Da Costa, B. Klein, and K. J. Friston, “On bayesian mechanics: a physics of and by beliefs,”Interface Focus, vol. 13, no. 3, p. 20220029, 2023

2023
[5]

AACR Project GENIE: Powering Precision Medicine Through An International Consortium,

The AACR Project GENIE Consortium, “AACR Project GENIE: Powering Precision Medicine Through An International Consortium,” Cancer Discovery, vol. 7, no. 8, pp. 818–831, August 2017, version v2.0

2017
[6]

A contextual-bandit approach to personalized news article recommendation,

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inProceedings of the 19th international conference on World wide web, 2010, pp. 661– 670

2010
[7]

Taming the monster: A fast and simple algorithm for contextual bandits,

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. Schapire, “Taming the monster: A fast and simple algorithm for contextual bandits,” in Proceedings of the 31st International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, E. P. Xing and T. Jebara, Eds., vol. 32, no. 2. Bejing, China: PMLR, 22–24 Jun 2014, pp. 1638– 1646

2014
[8]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovskiet al., “Human-level control through deep reinforcement learning,”nature, vol. 518, no. 7540, pp. 529–533, 2015

2015
[9]

A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis,

X. Wu, R. Li, Z. He, T. Yu, and C. Cheng, “A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis,”NPJ Digital Medicine, vol. 6, no. 1, p. 15, 2023

2023
[10]

Reinforcement learning in healthcare: A survey,

C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,”ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021

2021
[11]

Doubly robust off-policy value evaluation for reinforcement learning,

N. Jiang and L. Li, “Doubly robust off-policy value evaluation for reinforcement learning,” inInternational conference on machine learning. PMLR, 2016, pp. 652–661

2016
[12]

Markov decision processes with observation costs: framework and computation with a penalty scheme,

C. Reisinger and J. Tam, “Markov decision processes with observation costs: framework and computation with a penalty scheme,”Mathematics of Operations Research, vol. 50, no. 2, pp. 1305–1332, 2025

2025
[13]

A multi-objective constrained partially observable markov decision process model for breast cancer screening,

R. K. Helmeczi, C. Kavaklioglu, M. Cevik, and D. Pirayesh Neghab, “A multi-objective constrained partially observable markov decision process model for breast cancer screening,”Operational Research, vol. 23, no. 2, p. 30, 2023

2023
[14]

Optimizing active surveillance for prostate cancer using partially observable markov decision processes,

W. Li, B. T. Denton, and T. M. Morgan, “Optimizing active surveillance for prostate cancer using partially observable markov decision processes,”European Journal of Operational Research, vol. 305, no. 1, pp. 386–399, 2023. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0377221722004337

2023
[15]

Point-based value iteration: An anytime algorithm for pomdps,

J. Pineau, G. Gordon, S. Thrunet al., “Point-based value iteration: An anytime algorithm for pomdps,” inIjcai, vol. 3, 2003, pp. 1025–1032

2003
[16]

Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces

H. Kurniawati, D. Hsu, W. S. Leeet al., “Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces.” in Robotics: Science and systems, vol. 2008. Zurich, Switzerland, 2008

2008
[17]

Monte-carlo planning in large pomdps,

D. Silver and J. Veness, “Monte-carlo planning in large pomdps,” Advances in neural information processing systems, vol. 23, 2010

2010
[18]

Despot: Online pomdp planning with regularization,

A. Somani, N. Ye, D. Hsu, and W. S. Lee, “Despot: Online pomdp planning with regularization,”Advances in neural information processing systems, vol. 26, 2013

2013
[19]

Assessing multimodality breast cancer screening strategies for brca1/2 gene mutation carriers and other high-risk populations,

C ¸. C ¸a˘glayan, T. Ayer, and D. U. Ekwueme, “Assessing multimodality breast cancer screening strategies for brca1/2 gene mutation carriers and other high-risk populations,”INFORMS Journal on Computing, 2025

2025
[20]

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

S. Levine, “Reinforcement learning and control as probabilistic inference: Tutorial and review,”arXiv preprint arXiv:1805.00909, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Linearly-solvable markov decision problems,

E. Todorov, “Linearly-solvable markov decision problems,”Advances in neural information processing systems, vol. 19, 2006

2006
[22]

Optimal control as a graphical model inference problem,

H. J. Kappen, V . G´omez, and M. Opper, “Optimal control as a graphical model inference problem,”Machine learning, vol. 87, no. 2, pp. 159–182, 2012

2012
[23]

Computational nosology and precision psychiatry,

K. J. Friston, A. D. Redish, and J. A. Gordon, “Computational nosology and precision psychiatry,”Computational Psychiatry (Cambridge, Mass.), vol. 1, p. 2, 2017

2017
[24]

J. G. Kemeny and J. L. Snell,Finite Markov Chains, ser. Undergraduate Texts in Mathematics. New York: Springer-Verlag, 1976. APPENDIXA NOTATION Table I summarizes the mathematical notation used through- out this paper. TABLE I NOTATION FOR VARIABLES Symbol Definition Symbol Definition Symbol Definition Aaction pair of(M, T)Xstateαtuning limit CcategoryYob...

1976

[1] [1]

The ecology and evolutionary biology of cancer: a review of mathematical models of necrosis and tumor cell diversity,

J. D. Nagy, “The ecology and evolutionary biology of cancer: a review of mathematical models of necrosis and tumor cell diversity,”Math. Biosci. Eng, vol. 2, no. 2, pp. 381–418, 2005

2005

[2] [2]

A free energy principle for a particular physics

K. Friston, “A free energy principle for a particular physics,”arXiv preprint arXiv:1906.10184, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[3] [3]

The free energy principle made simpler but not too simple,

K. Friston, L. Da Costa, N. Sajid, C. Heins, K. Ueltzh ¨offer, G. A. Pavliotis, and T. Parr, “The free energy principle made simpler but not too simple,”Physics Reports, vol. 1024, pp. 1–29, 2023

2023

[4] [4]

On bayesian mechanics: a physics of and by beliefs,

M. J. Ramstead, D. A. Sakthivadivel, C. Heins, M. Koudahl, B. Millidge, L. Da Costa, B. Klein, and K. J. Friston, “On bayesian mechanics: a physics of and by beliefs,”Interface Focus, vol. 13, no. 3, p. 20220029, 2023

2023

[5] [5]

AACR Project GENIE: Powering Precision Medicine Through An International Consortium,

The AACR Project GENIE Consortium, “AACR Project GENIE: Powering Precision Medicine Through An International Consortium,” Cancer Discovery, vol. 7, no. 8, pp. 818–831, August 2017, version v2.0

2017

[6] [6]

A contextual-bandit approach to personalized news article recommendation,

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-bandit approach to personalized news article recommendation,” inProceedings of the 19th international conference on World wide web, 2010, pp. 661– 670

2010

[7] [7]

Taming the monster: A fast and simple algorithm for contextual bandits,

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. Schapire, “Taming the monster: A fast and simple algorithm for contextual bandits,” in Proceedings of the 31st International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, E. P. Xing and T. Jebara, Eds., vol. 32, no. 2. Bejing, China: PMLR, 22–24 Jun 2014, pp. 1638– 1646

2014

[8] [8]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovskiet al., “Human-level control through deep reinforcement learning,”nature, vol. 518, no. 7540, pp. 529–533, 2015

2015

[9] [9]

A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis,

X. Wu, R. Li, Z. He, T. Yu, and C. Cheng, “A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis,”NPJ Digital Medicine, vol. 6, no. 1, p. 15, 2023

2023

[10] [10]

Reinforcement learning in healthcare: A survey,

C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,”ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021

2021

[11] [11]

Doubly robust off-policy value evaluation for reinforcement learning,

N. Jiang and L. Li, “Doubly robust off-policy value evaluation for reinforcement learning,” inInternational conference on machine learning. PMLR, 2016, pp. 652–661

2016

[12] [12]

Markov decision processes with observation costs: framework and computation with a penalty scheme,

C. Reisinger and J. Tam, “Markov decision processes with observation costs: framework and computation with a penalty scheme,”Mathematics of Operations Research, vol. 50, no. 2, pp. 1305–1332, 2025

2025

[13] [13]

A multi-objective constrained partially observable markov decision process model for breast cancer screening,

R. K. Helmeczi, C. Kavaklioglu, M. Cevik, and D. Pirayesh Neghab, “A multi-objective constrained partially observable markov decision process model for breast cancer screening,”Operational Research, vol. 23, no. 2, p. 30, 2023

2023

[14] [14]

Optimizing active surveillance for prostate cancer using partially observable markov decision processes,

W. Li, B. T. Denton, and T. M. Morgan, “Optimizing active surveillance for prostate cancer using partially observable markov decision processes,”European Journal of Operational Research, vol. 305, no. 1, pp. 386–399, 2023. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0377221722004337

2023

[15] [15]

Point-based value iteration: An anytime algorithm for pomdps,

J. Pineau, G. Gordon, S. Thrunet al., “Point-based value iteration: An anytime algorithm for pomdps,” inIjcai, vol. 3, 2003, pp. 1025–1032

2003

[16] [16]

Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces

H. Kurniawati, D. Hsu, W. S. Leeet al., “Sarsop: Efficient point-based pomdp planning by approximating optimally reachable belief spaces.” in Robotics: Science and systems, vol. 2008. Zurich, Switzerland, 2008

2008

[17] [17]

Monte-carlo planning in large pomdps,

D. Silver and J. Veness, “Monte-carlo planning in large pomdps,” Advances in neural information processing systems, vol. 23, 2010

2010

[18] [18]

Despot: Online pomdp planning with regularization,

A. Somani, N. Ye, D. Hsu, and W. S. Lee, “Despot: Online pomdp planning with regularization,”Advances in neural information processing systems, vol. 26, 2013

2013

[19] [19]

Assessing multimodality breast cancer screening strategies for brca1/2 gene mutation carriers and other high-risk populations,

C ¸. C ¸a˘glayan, T. Ayer, and D. U. Ekwueme, “Assessing multimodality breast cancer screening strategies for brca1/2 gene mutation carriers and other high-risk populations,”INFORMS Journal on Computing, 2025

2025

[20] [20]

Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review

S. Levine, “Reinforcement learning and control as probabilistic inference: Tutorial and review,”arXiv preprint arXiv:1805.00909, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Linearly-solvable markov decision problems,

E. Todorov, “Linearly-solvable markov decision problems,”Advances in neural information processing systems, vol. 19, 2006

2006

[22] [22]

Optimal control as a graphical model inference problem,

H. J. Kappen, V . G´omez, and M. Opper, “Optimal control as a graphical model inference problem,”Machine learning, vol. 87, no. 2, pp. 159–182, 2012

2012

[23] [23]

Computational nosology and precision psychiatry,

K. J. Friston, A. D. Redish, and J. A. Gordon, “Computational nosology and precision psychiatry,”Computational Psychiatry (Cambridge, Mass.), vol. 1, p. 2, 2017

2017

[24] [24]

J. G. Kemeny and J. L. Snell,Finite Markov Chains, ser. Undergraduate Texts in Mathematics. New York: Springer-Verlag, 1976. APPENDIXA NOTATION Table I summarizes the mathematical notation used through- out this paper. TABLE I NOTATION FOR VARIABLES Symbol Definition Symbol Definition Symbol Definition Aaction pair of(M, T)Xstateαtuning limit CcategoryYob...

1976