Continuous-Time Markov Decision Processes with Controlled Observations
Pith reviewed 2026-05-24 21:52 UTC · model grok-4.3
The pith
Decision makers can jointly optimize observation times and control actions in continuous-time discounted jump Markov decision processes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors provide a theoretical framework that the decision maker can utilize to find the optimal observation epochs and the optimal actions jointly. Two cases are investigated. One is gated queueing systems in which we explicitly characterize the optimal action and the optimal observation where the optimal observation is shown to be independent of the state. Another is the inventory control problem with Poisson arrival process in which we obtain numerically the optimal action and observation. The results show that it is optimal to observe more frequently at a region of states where the optimal action adapts constantly.
What carries the argument
Dynamic programming equations that jointly optimize the timing of the next observation and the control trajectory between observations.
If this is right
- In gated queueing systems the optimal observation schedule can be chosen without reference to the current state.
- In inventory control with Poisson arrivals, observation frequency increases in state regions where the optimal action changes with the state.
- The value function for the joint problem is characterized by the dynamic programming equations derived from the model.
- Numerical computation of the joint optimum is feasible for concrete problems such as inventory control.
Where Pith is reading between the lines
- The state-independence result could simplify real-time implementation in queueing applications by removing the need to track state for scheduling decisions.
- The same joint-optimization approach might be tested on other systems with costly observations, such as maintenance scheduling or sensor activation.
- Algorithms that solve the dynamic programming equations at scale would make the framework practical for larger state spaces.
Load-bearing premise
The underlying process is a continuous-time discounted jump Markov decision process and an optimal joint policy over actions and observation times exists and satisfies the dynamic programming equations.
What would settle it
An explicit gated queueing example in which the optimal next observation time varies with the current state would falsify the independence result.
Figures
read the original abstract
In this paper, we study a continuous-time discounted jump Markov decision process with both controlled actions and observations. The observation is only available for a discrete set of time instances. At each time of observation, one has to select an optimal timing for the next observation and a control trajectory for the time interval between two observation points. We provide a theoretical framework that the decision maker can utilize to find the optimal observation epochs and the optimal actions jointly. Two cases are investigated. One is gated queueing systems in which we explicitly characterize the optimal action and the optimal observation where the optimal observation is shown to be independent of the state. Another is the inventory control problem with Poisson arrival process in which we obtain numerically the optimal action and observation. The results show that it is optimal to observe more frequently at a region of states where the optimal action adapts constantly.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies continuous-time discounted jump Markov decision processes in which both actions and observation times are controlled. At each observation epoch the decision maker jointly selects the time until the next observation and a control trajectory to be followed until then. A dynamic-programming framework is developed for this joint optimization. Two applications are treated: gated queueing systems, for which an explicit characterization is given and the optimal observation time is proved to be state-independent, and an inventory-control problem with Poisson arrivals, for which numerical solutions are computed showing that observation frequency increases in regions where the optimal action changes rapidly.
Significance. If the derivations are correct, the work supplies a usable DP-based method for trading off observation cost against control performance in CTMDPs. The state-independence result for gated queues is a clean structural property that simplifies computation. The inventory example illustrates how the framework behaves on a standard applied problem. The explicit characterization in one case and the reproducible numerical procedure in the other are positive features.
minor comments (3)
- [§3] §3 (or the section presenting the DP equations): the value-function recursion between observation epochs should be written out explicitly, including the integral form of the discounted cost under a fixed control trajectory, so that the optimality equations for the joint choice of next observation time and action are unambiguous.
- [Inventory numerical results] Inventory numerical section: the statement that observation is more frequent where the action adapts constantly should be accompanied by a table or plot that reports the computed inter-observation times for representative states, together with the corresponding optimal actions, so the claimed correlation can be verified directly.
- [Notation] Notation: the symbol used for the controlled jump rate should be defined once at first use and then employed consistently; currently the same letter appears to denote both the rate and the control in some passages.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report correctly identifies the core contributions: the joint DP framework for actions and observation times, the explicit state-independent observation policy for gated queues, and the numerical behavior in the inventory example. No major comments were raised in the report.
Circularity Check
No significant circularity
full rationale
The paper develops a DP-based framework for jointly optimizing observation times and controls in a continuous-time discounted jump MDP, with an explicit structural result (state-independent optimal observation) for the gated queueing case. All load-bearing steps rest on standard existence assumptions for optimal policies via the Bellman equations of the model and on explicit characterization from the controlled jump rates and observation constraints; no fitted parameters are renamed as predictions, no self-citation chain is invoked to justify uniqueness, and no ansatz or renaming reduces the claimed results to their inputs by construction. The derivation is therefore self-contained against the model primitives.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1 (Dynamic programming equation) ... v(x) = sup ... {r̄(x,a(·),T) + β^T ∑ q(x,x';a,T)v(x') + g(T)}
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
gated queueing ... optimal observation is shown to be independent of the state
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. L. Puterman, Markov Decision Processes.: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, 2014
work page 2014
-
[2]
Optimal control of admission to a queueing s ystem,
S. Stidham, “Optimal control of admission to a queueing s ystem,” IEEE Transactions on Automatic Control , vol. 30, no. 8, pp. 705–713, 1985
work page 1985
-
[3]
Applications of markov decision processes i n communica- tion networks,
E. Altman, “Applications of markov decision processes i n communica- tion networks,” in Handbook of Markov decision processes . Springer, 2002, pp. 489–536
work page 2002
-
[4]
Uav path planning in a dynamic env ironment via partially observable markov decision process,
S. Ragi and E. K. Chong, “Uav path planning in a dynamic env ironment via partially observable markov decision process,” IEEE Transactions on Aerospace and Electronic Systems , vol. 49, no. 4, pp. 2397–2412, 2013
work page 2013
-
[5]
Krishnamurthy, Partially observed Markov decision processes
V . Krishnamurthy, Partially observed Markov decision processes. Cam- bridge University Press, 2016
work page 2016
-
[6]
Minimax control of switching systems under s ampling,
T. Bas ¸ar, “Minimax control of switching systems under s ampling,” Systems & Control Letters , vol. 25, no. 5, pp. 315–325, 1995
work page 1995
-
[7]
Stochastic opt imal control under poisson-distributed observations,
M. Ades, P . E. Caines, and R. P . Malham´ e, “Stochastic opt imal control under poisson-distributed observations,” IEEE Transactions on Auto- matic Control, vol. 45, no. 1, pp. 3–13, 2000
work page 2000
-
[8]
Optimal control of lti systems over unreliable communication links,
O. C. Imer, S. Y¨ uksel, and T. Bas ¸ar, “Optimal control of lti systems over unreliable communication links,” Automatica, vol. 42, no. 9, pp. 1429–1439, 2006
work page 2006
-
[9]
Durrett, Probability: theory and examples
R. Durrett, Probability: theory and examples . Cambridge university press, 2019, vol. 49
work page 2019
-
[10]
Some generalizations of the theory of cumulat ive sums of random variables,
A. Wald, “Some generalizations of the theory of cumulat ive sums of random variables,” The Annals of Mathematical Statistics , vol. 16, no. 3, pp. 287–293, 1945
work page 1945
-
[11]
Liberzon, Calculus of variations and optimal control theory: a concise introduction
D. Liberzon, Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, 2011
work page 2011
- [12]
-
[13]
D. P . Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming . Athena Scientific Belmont, MA, 1996, vol. 5. 8
work page 1996
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.