Projected Dynamic Programming for Sequential Quantum State Discrimination

Donghwa Ji; Hyunjun Jang; Jaehun Jeong; Kabgyun Jeong

arxiv: 2604.15393 · v1 · submitted 2026-04-16 · 🪐 quant-ph

Projected Dynamic Programming for Sequential Quantum State Discrimination

Jaehun Jeong , Donghwa Ji , Hyunjun Jang , Kabgyun Jeong This is my paper

Pith reviewed 2026-05-10 11:22 UTC · model grok-4.3

classification 🪐 quant-ph

keywords sequential quantum state discriminationPOMDPdynamic programmingminimum-error discriminationbelief discretizationquantum hypothesis testingcomplexity analysis

0 comments

The pith

Sequential quantum state discrimination is cast as a POMDP that includes minimum-error discrimination as its one-step case.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames the agent's choice in sequential quantum state discrimination—whether to take another measurement or stop and decide—as a partially observable Markov decision process with a fixed but unknown hidden state. This turns the problem into one of computing an optimal policy over belief states updated by quantum measurements. The authors show that the usual minimum-error discrimination recovers exactly when only one step is allowed. To make the continuous belief space and measurement options tractable they impose a regular grid on the simplex and a finite library of measurements, then prove bounds on the approximation error and the computational cost. Readers may care because the construction quantifies when extra measurements are worth taking and shows that familiar curses of dimensionality persist when the observations are quantum.

Core claim

We formally cast SQSD into a static-hidden-state Partially Observable Markov Decision Process (POMDP) framework. We demonstrate that this formulation precisely subsumes the conventional minimum-error discrimination (MED) scheme as a special one-step case. Furthermore, we apply a regular grid-based discretization to the continuous belief simplex and approximate the possibly continuous measurement space using a finite library. Then we provide rigorous mathematical bounds on the resulting errors and analyze the computational complexity for both offline planning and online execution. Our analysis confirms that the inherent trade-off between accuracy and complexity, as well as the curse of diment

What carries the argument

The static-hidden-state POMDP formulation with regular grid discretization of the belief simplex and finite-library approximation of the measurement space, which enables dynamic programming to compute sequential policies while supplying error bounds.

Load-bearing premise

The error introduced by discretizing the belief simplex and the measurement space remains bounded for the quantum states and hypothesis counts under consideration.

What would settle it

Direct computation of the optimal sequential discrimination policy for binary states showing that the discretized POMDP value differs from the true optimum by more than the derived bound for a chosen grid size.

Figures

Figures reproduced from arXiv: 2604.15393 by Donghwa Ji, Hyunjun Jang, Jaehun Jeong, Kabgyun Jeong.

**Figure 2.** Figure 2: Belief-simplex geometry. Each component of the 3-tuple denotes the belief in a [PITH_FULL_IMAGE:figures/full_fig_p040_2.png] view at source ↗

**Figure 3.** Figure 3: Optimal one-step value J ∗ 1 (b) on the trine belief simplex. The value is highest near the vertices, where one hypothesis already dominates the others, and lowest near the symmetric center b = (1/3, 1/3, 1/3), where the three hypotheses are uniformly balanced. We compare this quantity with the immediate stopping value S(b) := max{b1, b2, b3}, and define the one-step measurement gain by G(b) := J ∗ 1 (b) −… view at source ↗

**Figure 4.** Figure 4: One-step gain G(b) = J ∗ 1 (b) − S(b) on the trine simplex. The gain is largest in the central high-uncertainty region and decreases toward the vertices, where immediate stopping is already nearly optimal. is already concentrated near a single hypothesis, further measurement offers little improvement over simply declaring that hypothesis immediately. This contrast between J ∗ 1 (b) and G(b) is conceptually… view at source ↗

**Figure 5.** Figure 5: Best orientation α ∗ (b) on the trine simplex. The maximizing orientation is arranged into symmetry-related sectors, with near-tie boundaries separating regions in which different measurement orientations are almost equally optimal. low, [PITH_FULL_IMAGE:figures/full_fig_p043_5.png] view at source ↗

**Figure 6.** Figure 6: Posterior routing on the trine simplex for five representative belief states. In each [PITH_FULL_IMAGE:figures/full_fig_p044_6.png] view at source ↗

**Figure 7.** Figure 7: Finite-horizon continuation-value difference maps on the trine simplex. Left: [PITH_FULL_IMAGE:figures/full_fig_p046_7.png] view at source ↗

**Figure 8.** Figure 8: Compact robustness check for the finite-horizon continuation regions under discretiza [PITH_FULL_IMAGE:figures/full_fig_p047_8.png] view at source ↗

**Figure 9.** Figure 9: Numerical diagnostics for the posterior-routing computation. The figure checks nor [PITH_FULL_IMAGE:figures/full_fig_p050_9.png] view at source ↗

**Figure 10.** Figure 10: Full-size posterior-routing detail for Case A (nearly symmetric central regime). [PITH_FULL_IMAGE:figures/full_fig_p050_10.png] view at source ↗

**Figure 11.** Figure 11: Full-size posterior-routing detail for Case B (quasi-binary edge regime). [PITH_FULL_IMAGE:figures/full_fig_p051_11.png] view at source ↗

**Figure 12.** Figure 12: Full-size posterior-routing detail for Case C (near-certainty regime). [PITH_FULL_IMAGE:figures/full_fig_p051_12.png] view at source ↗

**Figure 13.** Figure 13: Full-size posterior-routing detail for Case D (generic asymmetric interior regime). [PITH_FULL_IMAGE:figures/full_fig_p052_13.png] view at source ↗

**Figure 14.** Figure 14: Full-size posterior-routing detail for Case E (off-center interior regime near a switching [PITH_FULL_IMAGE:figures/full_fig_p052_14.png] view at source ↗

**Figure 15.** Figure 15: Supplementary finite-horizon value maps for the trine example. Left: [PITH_FULL_IMAGE:figures/full_fig_p053_15.png] view at source ↗

**Figure 16.** Figure 16: Supplementary finite-horizon policy maps for the trine example. Left: optimal stage [PITH_FULL_IMAGE:figures/full_fig_p053_16.png] view at source ↗

**Figure 17.** Figure 17: Supplementary measurement-index difference map for the finite-horizon trine exam [PITH_FULL_IMAGE:figures/full_fig_p054_17.png] view at source ↗

**Figure 18.** Figure 18: Supplementary robustness comparisons for the finite-horizon trine example. Left: [PITH_FULL_IMAGE:figures/full_fig_p054_18.png] view at source ↗

read the original abstract

Sequential Quantum State Discrimination (SQSD) can be naturally framed as a sequential decision-making problem: at each time step, an agent must decide whether to perform an additional measurement to gather more information or to conclude with an optimal decision based on the current belief. In this paper, we formally cast SQSD into a static-hidden-state Partially Observable Markov Decision Process (POMDP) framework. We demonstrate that this formulation precisely subsumes the conventional minimum-error discrimination (MED) scheme as a special one-step case. Furthermore, we apply a regular grid-based discretization to the continuous belief simplex and approximate the possibly continuous measurement space using a finite library. Then we provide rigorous mathematical bounds on the resulting errors and analyze the computational complexity for both offline planning and online execution. Our analysis confirms that the inherent trade-off between accuracy and complexity, as well as the curse of dimensionality regarding the number of hypotheses, are also prominently observed in the quantum regime. Finally, we provide a working example of binary state discrimination to derive explicit forms of various functions and present numerical simulations for trine state discrimination to visualize the sequential structure of our POMDP-based SQSD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames SQSD as a POMDP that reduces to standard MED in one step, then discretizes the belief simplex and claims error bounds, but those bounds look under-supported without regularity conditions on the value function.

read the letter

The core move is to treat the unknown hypothesis as a static hidden state in a POMDP, so the belief vector becomes the observable state and each measurement is an action. This setup correctly recovers the usual minimum-error decision rule when the horizon is one, and the authors spell out the Bellman backup for that case. They then replace the continuous belief simplex with a regular grid and the measurement space with a finite library, derive bounds on the approximation error, and analyze offline planning versus online execution complexity. Explicit closed forms appear for the binary case, and trine-state simulations show how the policy unfolds over multiple steps. The accuracy-complexity trade-off and the curse of dimensionality with growing hypothesis count are stated plainly. These pieces give a concrete computational route that goes beyond one-shot or heuristic treatments of sequential discrimination. The soft spot is the discretization claim. The abstract asserts rigorous mathematical bounds, yet no Lipschitz constant, modulus of continuity, or convexity argument for the quantum value function is referenced. Without those, a uniform grid can leave uncontrolled bias when likelihood surfaces are peaked or the simplex dimension rises, so the bounds risk being looser than advertised. The numerical examples stay small, so scaling behavior remains untested. This is for quantum-information researchers who already know POMDPs and want a decision-theoretic handle on adaptive measurements. A reader comfortable with both fields will extract the mapping and the worked examples quickly. The modeling is internally consistent and the computational approach is explicit, so the paper deserves a serious referee even if the error analysis needs tightening in revision.

Referee Report

1 major / 2 minor

Summary. The paper casts Sequential Quantum State Discrimination (SQSD) as a static-hidden-state POMDP, shows that this exactly subsumes minimum-error discrimination (MED) as the one-step case, applies regular-grid discretization to the belief simplex together with a finite measurement library, derives rigorous error bounds, analyzes offline/online complexity, and illustrates the approach with an explicit binary-state example plus trine-state numerical simulations.

Significance. The exact POMDP reduction to MED and the explicit complexity analysis are useful contributions that could support adaptive quantum measurement protocols. If the claimed rigorous bounds on discretization error are fully substantiated, the work supplies a concrete, implementable framework for trading accuracy against computational cost in the quantum setting.

major comments (1)

[Discretization and error-bound section] The abstract states that regular-grid discretization of the belief simplex and finite approximation of the measurement space 'provide rigorous mathematical bounds on the resulting errors.' No explicit Lipschitz constant, modulus of continuity, or convexity argument for the quantum value function is referenced; without such regularity conditions the uniform-grid error bounds may fail to be rigorous for high-dimensional simplices or sharply peaked Born-rule likelihoods (as the skeptic note correctly flags). This is load-bearing for the approximation claims.

minor comments (2)

[Numerical simulations] The trine-state simulation figures would benefit from an explicit statement of the grid resolution employed and the resulting observed error relative to the derived bound.
[Notation and definitions] Notation for the belief simplex and the finite measurement library should be introduced once with a clear table or equation reference rather than scattered across the text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the major comment point by point below.

read point-by-point responses

Referee: [Discretization and error-bound section] The abstract states that regular-grid discretization of the belief simplex and finite approximation of the measurement space 'provide rigorous mathematical bounds on the resulting errors.' No explicit Lipschitz constant, modulus of continuity, or convexity argument for the quantum value function is referenced; without such regularity conditions the uniform-grid error bounds may fail to be rigorous for high-dimensional simplices or sharply peaked Born-rule likelihoods (as the skeptic note correctly flags). This is load-bearing for the approximation claims.

Authors: We appreciate the referee's observation that the rigor of the discretization error bounds is central to our claims. In Section 4 we derive the approximation error by bounding the difference between the continuous and discretized value functions using the fact that the finite-horizon POMDP value function is Lipschitz continuous in the belief state (under total variation) with a modulus that depends on the horizon, the number of hypotheses, and the operator norm of the measurement operators. However, we did not explicitly extract or display this Lipschitz constant nor spell out the convexity argument for the quantum value function in the main text. We will revise the manuscript to include an explicit derivation of the constant (showing it is finite and independent of the particular belief trajectory) together with the modulus-of-continuity estimate that justifies the uniform-grid bound even when Born-rule likelihoods are peaked. The revised version will also clarify that the bound remains uniform over the compact simplex for any fixed finite measurement library. These additions will be placed in the discretization section and referenced from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity in POMDP casting or discretization bounds

full rationale

The paper applies the standard static-hidden-state POMDP formalism to SQSD by identifying the fixed hypothesis as the hidden state and Born-rule probabilities as observations; the one-step reduction to MED follows directly from the Bellman backup with horizon length 1, which is an intended verification rather than a derived prediction. Error bounds are obtained from standard analysis of regular-grid approximation error on the belief simplex and finite measurement library truncation, using general properties of value functions and discretization without any fitted parameters renamed as predictions or load-bearing self-citations. The derivation remains self-contained and draws on external POMDP theory and quantum measurement without reducing central claims to tautological inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard POMDP axioms from decision theory and quantum measurement postulates, with the main added assumptions being the sufficiency of regular-grid discretization and finite measurement libraries for bounding errors.

axioms (2)

domain assumption The belief simplex in SQSD can be approximated by a regular grid discretization while preserving the ability to derive rigorous error bounds.
Invoked to convert the continuous POMDP into a discrete solvable problem.
domain assumption The continuous space of possible quantum measurements can be approximated by a finite library without invalidating the optimality analysis.
Used to make online execution and offline planning computationally tractable.

pith-pipeline@v0.9.0 · 5498 in / 1577 out tokens · 53631 ms · 2026-05-10T11:22:14.249029+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Quantum state discrimination and its applications

Joonwoo Bae and Leong-Chuan Kwek. Quantum state discrimination and its applications. Journal of Physics A: Mathematical and Theoretical, 48(8):083001, January 2015. 48

work page 2015
[2]

Stubbs, Narayanan Rengaswamy, and Henry D

Sarah Brandsen, Mengke Lian, Kevin D. Stubbs, Narayanan Rengaswamy, and Henry D. Pfister. Adaptive procedures for discriminating between arbitrary tensor-product quantum states. In2020 IEEE International Symposium on Information Theory (ISIT), pages 1933– 1938, 2020

work page 1933
[3]

Value-function approximations for partially observable markov decision processes.Journal of Artificial Intelligence Research, 13:33–94, 2000

Milos Hauskrecht. Value-function approximations for partially observable markov decision processes.Journal of Artificial Intelligence Research, 13:33–94, 2000

work page 2000
[4]

Helstrom.Quantum Detection and Estimation Theory

Carl W. Helstrom.Quantum Detection and Estimation Theory. Academic Press, New York, 1976

work page 1976
[5]

Littman, and Anthony R

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1–2):99–134, 1998

work page 1998
[6]

Partially observable markov decision processes and robotics.Annual Review of Control, Robotics, and Autonomous Systems, 5:253–277, 2022

Hanna Kurniawati. Partially observable markov decision processes and robotics.Annual Review of Control, Robotics, and Autonomous Systems, 5:253–277, 2022

work page 2022
[7]

Yonglong Li, Vincent Y. F. Tan, and Marco Tomamichel. Optimal adaptive strategies for sequential quantum hypothesis testing.Communications in Mathematical Physics, 392:993– 1027, 2022

work page 2022
[8]

William S. Lovejoy. Computationally feasible bounds for partially observed markov decision processes.Operations Research, 39(1):162–175, 1991

work page 1991
[9]

George E. Monahan. State of the art—a survey of partially observable markov decision processes: Theory, models, and algorithms.Management Science, 28(1):1–16, 1982

work page 1982
[10]

Gordon, and Sebastian Thrun

Joelle Pineau, Geoffrey J. Gordon, and Sebastian Thrun. Anytime point-based approxima- tions for large pomdps.Journal of Artificial Intelligence Research, 27:335–380, 2006

work page 2006
[11]

Shani, J

G. Shani, J. Pineau, and R. Kaplow. A survey of point-based pomdp solvers.Autonomous Agents and Multi-Agent Systems, 27(1):1–51, 2013

work page 2013
[12]

Smallwood and Edward J

Richard D. Smallwood and Edward J. Sondik. The optimal control of partially observable markov processes over a finite horizon.Operations Research, 21(5):1071–1088, 1973

work page 1973
[13]

Trey Smith and Reid G. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. InProceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), 2005

work page 2005
[14]

Matthijs T. J. Spaan and Nikos Vlassis. Perseus: Randomized point-based value iteration for POMDPs.Journal of Artificial Intelligence Research, 24:195–220, 2005

work page 2005
[15]

Minimum-consumption discrimination of quantum states via globally optimal adaptive measurements.Phys

Boxuan Tian, Wen-Zhe Yan, Zhibo Hou, Guo-Yong Xiang, Chuan-Feng Li, and Guang- Can Guo. Minimum-consumption discrimination of quantum states via globally optimal adaptive measurements.Phys. Rev. Lett., 132:110801, 2024

work page 2024
[16]

H. Yuen, R. Kennedy, and M. Lax. Optimum testing of multiple hypotheses in quantum detection theory.IEEE Transactions on Information Theory, 21(2):125–134, 1975. A Supplementary Figures for the Trine Example This appendix collects supplementary figures for the trine-state numerical example of Section 8. We separate the materials into posterior-routing fig...

work page 1975

[1] [1]

Quantum state discrimination and its applications

Joonwoo Bae and Leong-Chuan Kwek. Quantum state discrimination and its applications. Journal of Physics A: Mathematical and Theoretical, 48(8):083001, January 2015. 48

work page 2015

[2] [2]

Stubbs, Narayanan Rengaswamy, and Henry D

Sarah Brandsen, Mengke Lian, Kevin D. Stubbs, Narayanan Rengaswamy, and Henry D. Pfister. Adaptive procedures for discriminating between arbitrary tensor-product quantum states. In2020 IEEE International Symposium on Information Theory (ISIT), pages 1933– 1938, 2020

work page 1933

[3] [3]

Value-function approximations for partially observable markov decision processes.Journal of Artificial Intelligence Research, 13:33–94, 2000

Milos Hauskrecht. Value-function approximations for partially observable markov decision processes.Journal of Artificial Intelligence Research, 13:33–94, 2000

work page 2000

[4] [4]

Helstrom.Quantum Detection and Estimation Theory

Carl W. Helstrom.Quantum Detection and Estimation Theory. Academic Press, New York, 1976

work page 1976

[5] [5]

Littman, and Anthony R

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in partially observable stochastic domains.Artificial Intelligence, 101(1–2):99–134, 1998

work page 1998

[6] [6]

Partially observable markov decision processes and robotics.Annual Review of Control, Robotics, and Autonomous Systems, 5:253–277, 2022

Hanna Kurniawati. Partially observable markov decision processes and robotics.Annual Review of Control, Robotics, and Autonomous Systems, 5:253–277, 2022

work page 2022

[7] [7]

Yonglong Li, Vincent Y. F. Tan, and Marco Tomamichel. Optimal adaptive strategies for sequential quantum hypothesis testing.Communications in Mathematical Physics, 392:993– 1027, 2022

work page 2022

[8] [8]

William S. Lovejoy. Computationally feasible bounds for partially observed markov decision processes.Operations Research, 39(1):162–175, 1991

work page 1991

[9] [9]

George E. Monahan. State of the art—a survey of partially observable markov decision processes: Theory, models, and algorithms.Management Science, 28(1):1–16, 1982

work page 1982

[10] [10]

Gordon, and Sebastian Thrun

Joelle Pineau, Geoffrey J. Gordon, and Sebastian Thrun. Anytime point-based approxima- tions for large pomdps.Journal of Artificial Intelligence Research, 27:335–380, 2006

work page 2006

[11] [11]

Shani, J

G. Shani, J. Pineau, and R. Kaplow. A survey of point-based pomdp solvers.Autonomous Agents and Multi-Agent Systems, 27(1):1–51, 2013

work page 2013

[12] [12]

Smallwood and Edward J

Richard D. Smallwood and Edward J. Sondik. The optimal control of partially observable markov processes over a finite horizon.Operations Research, 21(5):1071–1088, 1973

work page 1973

[13] [13]

Trey Smith and Reid G. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. InProceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), 2005

work page 2005

[14] [14]

Matthijs T. J. Spaan and Nikos Vlassis. Perseus: Randomized point-based value iteration for POMDPs.Journal of Artificial Intelligence Research, 24:195–220, 2005

work page 2005

[15] [15]

Minimum-consumption discrimination of quantum states via globally optimal adaptive measurements.Phys

Boxuan Tian, Wen-Zhe Yan, Zhibo Hou, Guo-Yong Xiang, Chuan-Feng Li, and Guang- Can Guo. Minimum-consumption discrimination of quantum states via globally optimal adaptive measurements.Phys. Rev. Lett., 132:110801, 2024

work page 2024

[16] [16]

H. Yuen, R. Kennedy, and M. Lax. Optimum testing of multiple hypotheses in quantum detection theory.IEEE Transactions on Information Theory, 21(2):125–134, 1975. A Supplementary Figures for the Trine Example This appendix collects supplementary figures for the trine-state numerical example of Section 8. We separate the materials into posterior-routing fig...

work page 1975