arxiv: 2605.09019 · v1 · submitted 2026-05-09 · 🪐 quant-ph · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Learning Pure Quantum States in Any Dimension (Almost) Without Regret

Josep Lumbreras , Marco Tomamichel

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:47 UTC · model grok-4.3

classification 🪐 quant-ph cs.LG

keywords quantum state tomographypure statesregret minimizationonline learningquditsmanifold geometryprojective measurements

0 comments

The pith

A protocol learns any unknown pure quantum state in dimension d with cumulative regret O(d^3 log^2 T) and online infidelity O(d^3 log T / t).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper extends minimal-disturbance tomography from qubits to pure states in arbitrary finite dimension by operating locally on the curved manifold of pure states. It divides learning into epochs where pairs of nearby rank-one projectors are measured in opposite tangent directions; differences of the outcomes supply exact linear observations of the current error component. These local models are aggregated by a robust variance-adaptive estimator that reuses precision from earlier epochs via hot-start regularization. If the bounds hold, tomography of any pure state incurs only logarithmic cumulative disturbance rather than linear growth in the number of copies. Readers care because the result shows that low-disturbance learning is a geometric property that survives the transition from qubits to qudits.

Core claim

The algorithm proceeds in epochs. In each epoch, it fixes a current estimate, measures pairs of nearby rank-one projectors obtained by moving in opposite tangent directions, and takes differences of the corresponding outcomes. This gives an exact linear observation of the tangent component of the error. The resulting local linear models are combined with a robust variance-adaptive estimator and a hot-start regularization that transfers precision across epochs. For every unknown pure state in dimension d, after T measured copies, our protocol achieves cumulative regret O(d^3 log^2 T), and at each intermediate time t ≤ T its current estimate has online infidelity O(d^3 log(T)/t).

What carries the argument

Epoch-wise local linear models formed from differences of outcomes on opposite tangent projectors, aggregated by a variance-adaptive estimator with hot-start regularization that carries precision forward.

Load-bearing premise

Differences of outcomes from pairs of nearby rank-one projectors in opposite tangent directions yield an exact linear observation of the tangent component of the estimation error, and these local models can be stably combined across epochs by a robust variance-adaptive estimator with hot-start regularization.

What would settle it

For a known pure state in dimension d=3, execute the protocol to large T and check whether the summed infidelity over all copies exceeds any constant times d^3 log^2 T; systematic exceedance would falsify the regret bound.

read the original abstract

We extend quantum state tomography with minimal cumulative disturbance, first investigated in [arXiv:2406.18370], to arbitrary finite-dimensional pure states. A learner sequentially receives fresh copies of an unknown pure state, chooses a rank-one projector for each copy using the previous outcomes, and performs the corresponding two-outcome projective measurement. The goal is to learn the state while keeping the chosen projectors close to the unknown state in order to minimize disturbance. The qubit solution relies on the special geometry of the Bloch sphere and does not extend directly to qudits, where pure states form a curved manifold. We show that this obstruction can be overcome by working locally on the pure-state manifold. The algorithm proceeds in epochs. In each epoch, it fixes a current estimate, measures pairs of nearby rank-one projectors obtained by moving in opposite tangent directions, and takes differences of the corresponding outcomes. This gives an exact linear observation of the tangent component of the error. The resulting local linear models are combined with a robust variance-adaptive estimator and a hot-start regularization that transfers precision across epochs. For every unknown pure state in dimension \(d\), after \(T\) measured copies, our protocol achieves cumulative regret \(\mathcal{O}(d^3\log^2 T)\), and at each intermediate time \(t\leq T\) its current estimate has online infidelity \(\mathcal{O}(d^3\log(T)/t)\). Hence, pure-state tomography with essentially no cumulative disturbance is not a peculiarity of qubits but a geometric phenomenon that persists for qudits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends low-regret pure-state learning to dimension d with an epoch-based tangent linearization scheme, but the claimed exact linearity of the paired observations looks approximate once the overlap factor is expanded.

read the letter

The main thing here is a concrete regret bound of O(d^3 log^2 T) for sequential pure-state tomography in any finite dimension, plus an online infidelity rate that improves as O(d^3 log T / t). The algorithm runs in epochs, holds a current estimate fixed, measures pairs of nearby rank-one projectors in opposite tangent directions, and feeds the outcome differences into a variance-adaptive estimator with hot-start regularization to carry precision forward.

Referee Report

2 major / 2 minor

Summary. The paper extends quantum state tomography with minimal cumulative disturbance to arbitrary finite-dimensional pure states. It introduces an epoch-based algorithm that fixes a current estimate, measures pairs of nearby rank-one projectors displaced in opposite tangent directions, and uses outcome differences to obtain local linear observations of the tangent error component. These models are aggregated via a robust variance-adaptive estimator with hot-start regularization, yielding cumulative regret O(d^3 log^2 T) and online infidelity O(d^3 log(T)/t) after T copies for any unknown pure state in dimension d.

Significance. If the central claims hold, the work establishes that essentially disturbance-free pure-state learning is a geometric feature of the pure-state manifold that persists beyond qubits, providing the first polynomial-in-d regret bounds for this setting. The epoch-wise local linearization plus cross-epoch regularization is a technically interesting approach that could influence other manifold-constrained quantum learning problems.

major comments (2)

[algorithm description and local linear models section] The abstract and the algorithmic construction (around the description of paired tangent measurements) assert that the difference of the two Bernoulli outcomes supplies an 'exact linear observation' of the tangent component of the estimation error. However, the exact overlap is |⟨ψ|φ±⟩|^2 = |⟨ψ|ˆψ⟩ + s ⟨ψ|v⟩|^2 / (1 + s^2). The difference p+ − p− therefore equals (2s Re⟨e,v⟩) / (1 + s^2) multiplied by the unknown |⟨ψ|ˆψ⟩|^2 plus O(s^2 + ||e||^2) terms. Because |⟨ψ|ˆψ⟩| varies across epochs and is not known, the supplied observations are linear only up to a multiplicative bias that depends on the current infidelity; the subsequent robust estimator and hot-start regularization do not cancel this bias. This directly affects the validity of the regret analysis that relies on exact local linear models.
[regret analysis and theorem statements] The claimed O(d^3 log^2 T) cumulative regret and O(d^3 log T / t) online infidelity are stated as consequences of the local linear models and the variance-adaptive estimator, but no derivation, error propagation, or concentration argument is supplied in the visible sections that would control the multiplicative bias and quadratic residuals identified above. Without an explicit bound on the approximation error in the linear observations, it is impossible to verify that the stated rates follow.

minor comments (2)

[preliminaries] Notation for the tangent vectors v and the displacement parameter s should be introduced with explicit normalization (e.g., ||v||=1) and range of s to make the local-chart construction reproducible.
[algorithm] The abstract mentions 'robust variance-adaptive estimator' and 'hot-start regularization' without defining the precise update rule or the regularization parameter; these should be stated as explicit equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify that the local observations are approximate rather than exactly linear and that the main text defers key parts of the error analysis. We have revised the manuscript to clarify the approximation, replace imprecise language in the abstract, and move explicit error bounds and derivations into the main body.

read point-by-point responses

Referee: [algorithm description and local linear models section] The abstract and the algorithmic construction (around the description of paired tangent measurements) assert that the difference of the two Bernoulli outcomes supplies an 'exact linear observation' of the tangent component of the estimation error. However, the exact overlap is |⟨ψ|φ±⟩|^2 = |⟨ψ|ˆψ⟩ + s ⟨ψ|v⟩|^2 / (1 + s^2). The difference p+ − p− therefore equals (2s Re⟨e,v⟩) / (1 + s^2) multiplied by the unknown |⟨ψ|ˆψ⟩|^2 plus O(s^2 + ||e||^2) terms. Because |⟨ψ|ˆψ⟩| varies across epochs and is not known, the supplied observations are linear only up to a multiplicative bias that depends on the current infidelity; the subsequent robust estimator and hot-start regularization do not cancel this bias. This directly affects the validity of the regret analysis that relies on exact local linear models.

Authors: We thank the referee for this precise expansion. The manuscript (Section 3.2) already derives the same overlap expression and shows that the multiplicative factor equals 1 − δ_epoch where δ_epoch is the infidelity at the beginning of the epoch. Because the hot-start procedure carries forward the previous estimate, δ_epoch is known to be O(d^3 log^2 t / t) from the induction hypothesis. The O(s^2 + ||e||^2) residuals are bounded by choosing the displacement s = Θ(√δ_epoch) inside each epoch; these terms are then absorbed into the variance-adaptive estimator as an additive perturbation whose total contribution over an epoch of length τ is O(√(δ_epoch τ) + δ_epoch τ). We have revised the abstract and the opening of Section 3 to replace the word “exact” with “approximate linear up to controlled higher-order terms” and have added an explicit lemma stating the bias bound. revision: yes
Referee: [regret analysis and theorem statements] The claimed O(d^3 log^2 T) cumulative regret and O(d^3 log T / t) online infidelity are stated as consequences of the local linear models and the variance-adaptive estimator, but no derivation, error propagation, or concentration argument is supplied in the visible sections that would control the multiplicative bias and quadratic residuals identified above. Without an explicit bound on the approximation error in the linear observations, it is impossible to verify that the stated rates follow.

Authors: We acknowledge that the main-text presentation of the regret proof was too terse. The complete error-propagation argument, including the effect of the multiplicative bias and the quadratic residuals, appears in Appendix B. In the revision we have inserted a new subsection (4.2) that summarizes the key steps: (i) the per-observation linearization error is O(s^2 + δ_epoch), (ii) summing over an epoch of length τ with s ∼ √δ_epoch yields an additive O(√(δ_epoch τ) + δ_epoch τ) term, (iii) the variance-adaptive estimator tolerates this perturbation because its robustness parameter is set larger than the bias, and (iv) choosing epoch lengths τ_k = Θ(k^2 d^3 log^2 T) makes the total accumulated error sum to O(d^3 log^2 T). The online-infidelity bound follows by the same induction. We have moved the statement of the linearization-error lemma into the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the regret bound derivation

full rationale

The paper constructs an epoch-based algorithm that fixes an estimate, measures pairs of tangent-displaced rank-one projectors, forms outcome differences, and feeds the resulting local linear models into a robust variance-adaptive estimator with hot-start regularization. The O(d^3 log^2 T) cumulative regret and O(d^3 log(T)/t) online infidelity are presented as consequences of this construction and its analysis on the pure-state manifold. No equation reduces the final bound to a fitted parameter, self-defined quantity, or load-bearing self-citation; the cited prior qubit work supplies only contextual motivation while the qudit extension relies on independent geometric arguments given in the manuscript. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, new entities, or non-standard axioms beyond the standard geometric fact that pure states form a manifold.

axioms (1)

domain assumption Pure states in dimension d form a curved manifold whose tangent space can be used for local linear approximations
Invoked to explain why the qubit Bloch-sphere method fails and how the new algorithm proceeds locally.

pith-pipeline@v0.9.0 · 5576 in / 1436 out tokens · 58806 ms · 2026-05-12T01:47:50.576271+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

symmetric retracted measurements... gives an exact linear observation of the tangent component of the error
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hot-start regularization that transfers precision across epochs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

no residual drift

LINEAR MODEL IN THE TANGENT SPACE Having established the necessary tools to project states into local tangent spaces and retract them back to the manifold, we are now ready to define the linear estimation model for our algorithm. We begin by constructing a measurement scheme such that the expected outcomes remain linear with respect to both the unknown pa...

work page
[2]

To describe this, we start by fixing the epochm∈[M] with its base stateC m

MEASUREMENT SELECTION, HOT START AND EIGENVALUE CONTROL In this section we define the measurement selection rule that we will use in our algorithm. To describe this, we start by fixing the epochm∈[M] with its base stateC m. First note that at stepsof epochm, the design superoperatorV tan s−1 :T CmM →T CmMis a self-adjoint positive tangent-space superopera...

work page
[3]

TmX s=1 dtanX i=1 ω2 m ε(j) m,s,i 2 ⟨G, Om,s,i⟩2 Hm # =E

TANGENT-SPACE MEDIAN OF MEANS (MOM) ESTIMATOR In this section, we construct robust confidence regions for the tangent linear estimator least squares defined in (29). We note that we introduced the adaptive weightsω m in order to take advantage of the vanishing statistical noise when measuring in the direction of the unknown state. Recall thatd tan = 2(d−1...

work page
[4]

The algorithm freezes the base stateC m during each epoch, performs all linear estimation in the tangent spaceT CmM, and carries only the scalar precision µm to the next epoch

QUDIT ALGORITHM We now have all the required tools to state the epoch-based algorithm for qudit pure-state tomography with minimal regret. The algorithm freezes the base stateC m during each epoch, performs all linear estimation in the tangent spaceT CmM, and carries only the scalar precision µm to the next epoch. This prevents mixing tangent vectors defi...

work page
[5]

Given a tangent estimator ˆ∆∈T CM, define ˆV:= ˆ∆ ∥ ˆ∆∥F whenever ˆ∆̸= 0, and use the convention thatC new =Cif ˆ∆ = 0

Let∆ ∗ :=P TC(ρ−C)∈T CM defined in(22). Given a tangent estimator ˆ∆∈T CM, define ˆV:= ˆ∆ ∥ ˆ∆∥F whenever ˆ∆̸= 0, and use the convention thatC new =Cif ˆ∆ = 0. We define the angle ˆγ(ˆ∆) := 1 2 arcsin min n 1, √ 2∥ ˆ∆∥F o .(128) The update of the base state is Cnew = RetractC √ 2 ˆγ(ˆ∆) ˆV .(129) If ˆ∆ = ∆∗, thenC new =ρ. Moreover, for any estimator satis...

work page
[6]

Combining all bounds gives ∥dCnew∥F ∥H∥ F ≤L(x) := 1 + 2x W + 2 √ 2x3 W(1 +W) 2 + 2 √ 2x 1 +W .(162) On the intervalx∈[0, p 3/8], we haveW(x) = √ 1−2x 2 ≥1/2

Therefore, ∥ ˆ∆H+H ˆ∆∥F ≤2∥ ˆ∆∥2∥H∥ F = √ 2x∥H∥ F ,(160) and the third term in (155) is bounded by 2 √ 2x 1 +W ∥H∥ F .(161) The last term contributes∥H∥ F . Combining all bounds gives ∥dCnew∥F ∥H∥ F ≤L(x) := 1 + 2x W + 2 √ 2x3 W(1 +W) 2 + 2 √ 2x 1 +W .(162) On the intervalx∈[0, p 3/8], we haveW(x) = √ 1−2x 2 ≥1/2. A direct derivative check shows thatL(x) ...

work page
[7]

DISCUSSION AND OPEN PROBLEMS We have shown that the low-regret tomography phenomenon is not restricted to qubits. Al- though the Bloch-sphere reduction breaks down in higher dimension, the pure-state manifold still has enough structure to support an adaptive protocol with polylogarithmic cumulative regret. The main idea is to replace the global linear-ban...

work page 2024
[8]

Improved Algorithms for Linear Stochastic Bandits

Y. Abbasi-Yadkori, D. P´ al, and C. Szepesv´ ari.“Improved Algorithms for Linear Stochastic Bandits”. InAdvances in Neural Information Processing Systems, volume 24, (2011)

work page 2011
[9]

Absil, R

P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press (2008)

work page 2008
[10]

Bengtsson and K

I. Bengtsson and K. ˙Zyczkowski.Geometry of quantum states: an introduction to quantum entangle- ment. Cambridge university press (2017). 43

work page 2017
[11]

When Does Adaptivity Help for Quantum State Learning?

S. Chen, B. Huang, J. Li, A. Liu, and M. Sellke.“When Does Adaptivity Help for Quantum State Learning?”. In2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 391–404, Los Alamitos, CA, USA(2023)

work page 2023
[12]

Stochastic Linear Optimization under Bandit Feedback

V. Dani, T. P. Hayes, and S. M. Kakade.“Stochastic Linear Optimization under Bandit Feedback.”. In Proceedings of the 21st Conference on Learning Theory, volume 2, page 3, (2008)

work page 2008
[13]

Fast state tomography with optimal error bounds

M. Gut ¸˘ a, J. Kahn, R. Kueng, and J. A. Tropp.“Fast state tomography with optimal error bounds”. Journal of Physics A: Mathematical and Theoretical53(20): 204001 (2020)

work page 2020
[14]

Low rank matrix recovery from rank one measurements

R. Kueng, H. Rauhut, and U. Terstiege.“Low rank matrix recovery from rank one measurements”. Applied and Computational Harmonic Analysis42(1): 88–116 (2017)

work page 2017
[15]

Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states

J. Lumbreras, E. Haapasalo, and M. Tomamichel.“Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states”. Quantum6: 749, (2022)

work page 2022
[16]

Quantum state-agnostic work extraction (almost) without dissipation

J. Lumbreras, R. C. Huang, Y. Hu, M. Gu, and M. Tomamichel.“Quantum state-agnostic work extraction (almost) without dissipation”. arXiv preprint arXiv:2505.09456 , (2025)

work page arXiv 2025
[17]

Learning pure quantum states (almost) without regret

J. Lumbreras, M. Terekhov, and M. Tomamichel.“Learning pure quantum states (almost) without regret”. arXiv preprint arXiv:2406.18370 , (2024)

work page arXiv 2024
[18]

Linear bandits with polylogarithmic minimax regret

J. Lumbreras and M. Tomamichel.“Linear bandits with polylogarithmic minimax regret”. InProceedings of Thirty Seventh Conference on Learning Theory, volume 247 ofProceedings of Machine Learning Research, pages 3644–3682, (2024)

work page 2024
[19]

Linearly Parameterized Bandits

P. Rusmevichientong and J. N. Tsitsiklis.“Linearly Parameterized Bandits”. Mathematics of Opera- tions Research35(2): 395–411 (2010)

work page 2010
[20]

Almost optimal algorithms for linear stochastic bandits with heavy-tailed payoffs

H. Shao, X. Yu, I. King, and M. R. Lyu.“Almost optimal algorithms for linear stochastic bandits with heavy-tailed payoffs”. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 8430–8439, Red Hook, NY, USA(2018)

work page 2018