Measurized Markov Decision Processes
Pith reviewed 2026-05-24 01:22 UTC · model grok-4.3
The pith
Lifting MDPs to probability measure states creates a deterministic generalization that supports constraints and approximations with Borel-measurable optimal policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Measurized MDPs are deterministic MDPs whose states are probability measures on the original state space and whose actions are stochastic kernels; they generalize stochastic MDPs and, when the lifted processes satisfy semicontinuous-semicompact assumptions, admit optimal Borel-measurable value functions and policies under milder conditions than the universally measurable framework, for both discounted infinite-horizon and long-run average reward criteria. Any MDP can be algebraically lifted to such a process, and the setting permits constraints and approximations unavailable in standard MDPs.
What carries the argument
The algebraic lifting procedure that maps any MDP impacted by external random shocks to a non-deterministic measure-valued MDP analyzed under semicontinuous-semicompact assumptions.
If this is right
- Optimal policies remain Borel-measurable rather than requiring universal measurability.
- Constraints can be imposed directly on the measure-valued states.
- Value function approximations become available in the lifted space.
- The long-run average reward case is handled within the same framework with similar guarantees.
- Non-deterministic measure-valued MDPs arise naturally from standard MDPs with shocks.
Where Pith is reading between the lines
- This lifting could facilitate solving MDPs by embedding them into a space where deterministic optimization methods apply more readily.
- It may enable new connections to problems like distributionally robust control by treating measures explicitly as states.
- Applying the procedure to a simple MDP with additive shocks would test whether optimality is preserved exactly.
Load-bearing premise
The lifted MDPs satisfy the semicontinuous-semicompact assumptions of Hernández-Lerma and Lasserre.
What would settle it
A concrete counterexample of an MDP with external shocks whose algebraic lifting satisfies the semicontinuous-semicompact assumptions but lacks a Borel-measurable optimal policy would falsify the accessibility and milder conditions claims.
Figures
read the original abstract
In this paper, we explore lifting Markov Decision Processes (MDPs) to the space of probability measures and consider the so-called measurized MDPs: deterministic processes where states are probability measures on the original state space, and actions are stochastic kernels on the original action space. We show that measurized MDPs are a generalization of stochastic MDPs, thus the measurized framework can be deployed without loss of fidelity. Bertsekas and Shreve studied similar deterministic MDPs under the discounted infinite-horizon criterion in the context of universally measurable policies. Here, we also consider the long-run average reward case, but we cast lifted MDPs within the semicontinuous-semicompact framework of Hern\'andez-Lerma and Lasserre. This makes the lifted framework more accessible as it entails (i) optimal Borel-measurable value functions and policies, (ii) reasonably mild assumptions that are easier to verify than those in the universally-measurable framework, and (iii) simpler proofs. In addition, we showcase the untapped potential of lifted MDPs by demonstrating how the measurized framework enables the incorporation of constraints and value function approximations that are not available from the standard MDP setting. Furthermore, we introduce a novel algebraic lifting procedure for any MDP, showing that non-deterministic measure-valued MDPs can emerge from lifting MDPs impacted by external random shocks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces measurized MDPs obtained by lifting standard MDPs to the space of probability measures, where states are probability measures on the original state space and actions are stochastic kernels. It claims that measurized MDPs generalize stochastic MDPs without loss of fidelity, enable incorporation of constraints and value-function approximations unavailable in the standard setting, and that an algebraic lifting procedure applied to MDPs subject to external random shocks produces non-deterministic measure-valued dynamics. By embedding the lifted processes in the semicontinuous-semicompact framework of Hernández-Lerma and Lasserre (rather than the universally measurable setting of Bertsekas-Shreve), the paper asserts that optimal Borel-measurable value functions and policies are obtained under milder, more verifiable assumptions for both discounted and average-reward criteria.
Significance. If the lifting procedure is shown to preserve the required semicontinuity and semicompactness properties, the framework would supply a technically accessible route to measure-valued MDPs that yields Borel-measurable optima while supporting new modeling features such as explicit constraints on the measure-valued state.
major comments (2)
- [Algebraic lifting procedure] The section introducing the algebraic lifting procedure: the claim that the lifted processes satisfy the semicontinuous-semicompact assumptions of Hernández-Lerma and Lasserre (upper/lower semicontinuity of the reward and weak continuity of the transition kernel) after the algebraic lifting is asserted but not accompanied by an explicit verification for general (non-compact) state spaces; the weak topology on the space of probability measures does not automatically inherit these properties from the original MDP, and this verification is load-bearing for the Borel-measurability and accessibility claims.
- [Framework choice and benefits] Paragraphs on framework choice and benefits: the central assertion that the measurized framework yields optimal Borel-measurable policies and value functions under milder conditions than Bertsekas-Shreve rests entirely on the lifted MDPs satisfying the semicontinuous-semicompact assumptions for both discounted and average-reward criteria; without a detailed argument or counterexample-free demonstration that the assumptions transfer, the comparison to the universally measurable framework cannot be substantiated.
minor comments (1)
- The abstract and introduction could more explicitly separate the deterministic measurized MDPs from the non-deterministic measure-valued processes that arise only after the algebraic lifting with external shocks.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will incorporate revisions to strengthen the presentation of the algebraic lifting and framework comparison.
read point-by-point responses
-
Referee: [Algebraic lifting procedure] The section introducing the algebraic lifting procedure: the claim that the lifted processes satisfy the semicontinuous-semicompact assumptions of Hernández-Lerma and Lasserre (upper/lower semicontinuity of the reward and weak continuity of the transition kernel) after the algebraic lifting is asserted but not accompanied by an explicit verification for general (non-compact) state spaces; the weak topology on the space of probability measures does not automatically inherit these properties from the original MDP, and this verification is load-bearing for the Borel-measurability and accessibility claims.
Authors: We agree that an explicit verification is required for general (non-compact) state spaces, as the weak topology on probability measures does not automatically preserve the properties. The manuscript asserts the preservation via the algebraic nature of the lift but does not supply the detailed argument. In revision we will add a dedicated lemma and proof showing that if the original MDP satisfies the Hernández-Lerma–Lasserre semicontinuity and weak-continuity conditions, then the measurized process does as well, including the requisite arguments for the weak topology. revision: yes
-
Referee: [Framework choice and benefits] Paragraphs on framework choice and benefits: the central assertion that the measurized framework yields optimal Borel-measurable policies and value functions under milder conditions than Bertsekas-Shreve rests entirely on the lifted MDPs satisfying the semicontinuous-semicompact assumptions for both discounted and average-reward criteria; without a detailed argument or counterexample-free demonstration that the assumptions transfer, the comparison to the universally measurable framework cannot be substantiated.
Authors: The comparison to the Bertsekas–Shreve universally measurable setting indeed depends on the lifted processes satisfying the semicontinuous-semicompact assumptions. As indicated in our response to the first comment, the revised manuscript will contain the explicit verification for both criteria. This will substantiate that the assumptions are milder and more readily verifiable, thereby supporting the claimed advantages in accessibility and Borel measurability. revision: yes
Circularity Check
No significant circularity detected; derivation applies external frameworks to a constructed lifting.
full rationale
The paper constructs measurized MDPs via an algebraic lifting of standard MDPs to probability-measure states and stochastic-kernel actions, then invokes the semicontinuous-semicompact assumptions of Hernández-Lerma and Lasserre (an external reference) to obtain Borel-measurable value functions and policies. The generalization claim follows directly from the lifting definition rather than any fitted parameter or self-referential loop. No load-bearing self-citations, self-definitional reductions, or renamings of known results appear; the central results rest on verifying the lifted processes satisfy the cited external conditions, which is presented as an independent step rather than a tautology.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Lifted MDPs satisfy the semicontinuous-semicompact conditions of Hernández-Lerma and Lasserre.
- domain assumption The measure-valued lifting preserves the optimal policies and values of the original stochastic MDP without loss of fidelity.
invented entities (1)
-
measurized MDP
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we cast lifted MDPs within the semicontinuous-semicompact framework of Hernández-Lerma and Lasserre... optimal Borel-measurable value functions and policies
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
novel algebraic lifting procedure... non-deterministic measure-valued MDPs can emerge from lifting MDPs impacted by external random shocks
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adelman, Daniel, 2007, Dynamic bid prices in revenue management,Operations Research 55, 647–661. Adelman, Daniel, Christiane Barz, and Alba V Olivares-Nadal, 2025, Dynamic basis function generation for network revenue management,INFORMS Journal on Computing. Adelman, Daniel, and Diego Klabjan, 2012, Computing near-optimal policies in generalized joint rep...
work page 2007
-
[2]
Bellman, Richard, 1966, Dynamic programming,Science153, 34–37. Bertsekas, Dimitri, and Steven E Shreve, 1996,Stochastic optimal control: the discrete-time case, volume 5 (Athena Scientific). Billingsley, Patrick, 1999,Convergence of probability measures(John Wiley & Sons). Borkar, Vivek, and Rahul Jain, 2014, Risk-constrained markov decision processes,IEE...
-
[3]
Powell, Warren B, 2007,Approximate Dynamic Programming: Solving the curses of dimen- sionality, volume 703 (John Wiley & Sons). Puterman, Martin L, 2014,Markov decision processes: discrete stochastic dynamic program- ming(John Wiley & Sons). Shreve, Steven E, and Dimitri P Bertsekas, 1979, Universally measurable policies in dynamic programming,Mathematics...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.