arxiv: 2511.14579 · v2 · submitted 2025-11-18 · 🪐 quant-ph

Gradient-descent methods for scalable quantum detector tomography

Amanuel Anteneh , Olivier Pfister This is my paper

Pith reviewed 2026-05-17 20:46 UTC · model grok-4.3

classification 🪐 quant-ph

keywords quantum detector tomographygradient descentPOVM reconstructionphase-insensitive detectorsquantum opticsoptimizationtomography scalabilityStiefel manifold

0 comments

The pith

Gradient descent optimization reconstructs the POVM of phase-insensitive quantum detectors faster than constrained convex methods while matching or exceeding fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a gradient-descent technique to perform quantum detector tomography on phase-insensitive detectors by iteratively adjusting POVM elements to fit observed measurement data. It numerically compares this approach to constrained convex optimization and finds that gradient descent reaches similar or better reconstruction accuracy in substantially shorter computation time, even when noise is present and only a small number of probe states are available. The method is extended to phase-sensitive detectors by parametrizing the POVM on the complex Stiefel manifold so that gradient steps remain on the valid manifold. A reader would care because quantum detectors must be characterized accurately in experiments, yet traditional convex solvers become slow as the detector dimension grows or data sets enlarge.

Core claim

Gradient descent can be applied directly to the parameters of a POVM to solve the quantum detector tomography problem for phase-insensitive detectors, yielding a reconstruction of the detector response that matches or surpasses the fidelity of constrained convex optimization while requiring far less runtime, as shown in numerical tests that include realistic noise levels and restricted probe resources; the same framework extends to the phase-sensitive case through a manifold-constrained parametrization.

What carries the argument

Gradient descent optimization of POVM matrix elements (with Stiefel-manifold parametrization for the phase-sensitive extension) to minimize the mismatch between predicted and observed detection statistics.

If this is right

The method enables tomography of higher-dimensional or more complex detectors within practical laboratory time budgets.
Reconstruction remains reliable when measurement noise is present and only limited probe states can be prepared.
The Stiefel-manifold parametrization brings gradient-based optimization to phase-sensitive detectors without leaving the space of valid POVMs.
Reduced computation time makes repeated calibrations feasible during long experimental runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gradient machinery could be combined with adaptive probe-state selection to further reduce the total number of measurements needed.
Integration into real-time control software might allow detectors to be recalibrated on the fly as experimental conditions drift.
Because the approach is first-order and local, it may scale more gracefully than convex solvers when detector Hilbert spaces become large.

Load-bearing premise

Gradient descent will converge to a high-fidelity POVM without becoming trapped in poor local minima, and the added-noise simulations used for benchmarking will accurately reflect real experimental performance.

What would settle it

An experiment on a calibrated phase-insensitive detector where the gradient-descent reconstruction produces measurably lower fidelity than a constrained convex optimization run on the same data set, or where the gradient method fails to converge within the reported time advantage.

Figures

Figures reproduced from arXiv: 2511.14579 by Amanuel Anteneh, Olivier Pfister.

**Figure 2.** Figure 2: FIG. 2: Average reconstruction fidelity of QDT [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4: Average reconstruction fidelity of QDT [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

We present a technique for performing quantum detector tomography (QDT) of phase insensitive quantum detectors, a category under which many detectors of interest fall under, using gradient descent-based optimization to learn the positive operator-valued measure (POVM) that best describes the data collected using the detector under study. We numerically benchmark our method against constrained convex optimization (CCO) and show that it reaches higher or comparable reconstruction fidelity in much less time even in the presence of noise and limited probe state resources. We also present a possible extension of our approach to the phase sensitive case via a parametrization of POVMs on the complex Stiefel manifold which enables gradient based optimization restricted to this manifold.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gradient descent gives a faster alternative to convex optimization for small quantum detector tomography cases, but the scaling and robustness claims lack supporting tests.

read the letter

The bottom line is that this paper offers a gradient-descent approach to quantum detector tomography that appears faster than constrained convex optimization in small cases, with comparable fidelity even when noise is present. What stands out as new is the application of gradient descent to POVM reconstruction for phase-insensitive detectors, plus the Stiefel manifold parametrization to extend it to phase-sensitive ones. The benchmarks against CCO under limited probes and noise are a solid practical test. The paper does well at showing the method can be implemented and run quicker for the dimensions they chose. That speed advantage could matter for lab work where time is limited. The soft spots are around the limited scope of the experiments. All results are for low-dimensional, phase-insensitive detectors. There is no exploration of larger Hilbert spaces or checks on initialization sensitivity to see if local minima cause problems. Without that, the claim of scalability is not fully backed by the data. The concern in the stress-test note holds up here. This paper is aimed at people doing quantum detector characterization in experiments. A reader who needs a faster computational tool for tomography would find value in trying out the approach, especially if they can add their own scaling tests. It shows clear thinking by adapting known optimization ideas to this setting without obvious contradictions. I would bring it to the next reading group as a discussion point on practical methods. It deserves peer review because the core idea is workable and the results invite further checks. Recommendation: Send it to referees with a note to expand the numerical section on scaling and robustness.

Referee Report

2 major / 3 minor

Summary. The paper introduces gradient descent methods for quantum detector tomography (QDT) of phase-insensitive detectors by optimizing the POVM parameters to fit experimental data. Numerical benchmarks against constrained convex optimization (CCO) demonstrate that the proposed method achieves higher or comparable reconstruction fidelity in significantly less time, even with noise and limited probe states. The work also proposes an extension to phase-sensitive detectors using a parametrization on the complex Stiefel manifold.

Significance. If the numerical advantages hold under broader conditions, this method could provide a more scalable alternative to convex optimization for characterizing quantum detectors, which is relevant for quantum information processing tasks involving higher-dimensional systems. The benchmarks offer initial support for practical efficiency gains, but the absence of scaling analysis limits the strength of the contribution.

major comments (2)

[Numerical benchmarks] Numerical benchmarks section: The comparisons are performed only on low-dimensional phase-insensitive detectors with small Fock-space truncation levels. Since the title and abstract emphasize scalability, the manuscript requires explicit scaling studies (e.g., wall-clock time and fidelity versus Hilbert-space dimension or number of probe states) to confirm that the reported speed-up persists beyond the tested regime; otherwise the central claim of practical advantage over CCO is not yet load-bearing.
[Method / Optimization details] Optimization and convergence: No empirical or theoretical analysis is provided on robustness to local minima, initialization dependence, or basin-hopping frequency for the gradient-descent procedure. Given that POVM fitting landscapes are generally non-convex, this omission directly affects the reliability of the fidelity claims under noise and limited probes.

minor comments (3)

[Abstract] The abstract states 'much less time' without reporting concrete timing metrics, hardware specifications, or iteration counts used in the CCO versus GD comparison.
[Method] Clarify the explicit parametrization chosen for the diagonal (phase-insensitive) POVM elements in the gradient-descent update rule.
[Extension to phase-sensitive case] The Stiefel-manifold extension is described only as 'possible'; if it is intended as a contribution, a minimal numerical demonstration or convergence guarantee should be added or the claim should be toned down.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thoughtful review and valuable suggestions. We have carefully considered the major comments and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Numerical benchmarks] Numerical benchmarks section: The comparisons are performed only on low-dimensional phase-insensitive detectors with small Fock-space truncation levels. Since the title and abstract emphasize scalability, the manuscript requires explicit scaling studies (e.g., wall-clock time and fidelity versus Hilbert-space dimension or number of probe states) to confirm that the reported speed-up persists beyond the tested regime; otherwise the central claim of practical advantage over CCO is not yet load-bearing.

Authors: We agree with the referee that demonstrating scalability through explicit scaling studies is important to support the claims in the title and abstract. In the revised manuscript, we will include additional numerical results showing how the wall-clock time and reconstruction fidelity scale with increasing Hilbert-space dimension (e.g., larger Fock space truncations) and varying numbers of probe states. These studies will help confirm whether the observed computational advantages over constrained convex optimization persist in higher-dimensional regimes. revision: yes
Referee: [Method / Optimization details] Optimization and convergence: No empirical or theoretical analysis is provided on robustness to local minima, initialization dependence, or basin-hopping frequency for the gradient-descent procedure. Given that POVM fitting landscapes are generally non-convex, this omission directly affects the reliability of the fidelity claims under noise and limited probes.

Authors: We acknowledge that the non-convex nature of the optimization landscape warrants further investigation into robustness and initialization effects. While our current implementation employs random initializations and reports the best outcome across trials, we did not include a systematic study of convergence behavior or sensitivity to starting points. In the revised manuscript, we will add empirical analysis, such as statistics over multiple initializations and observations on convergence under noisy conditions, to better substantiate the reliability of the reported fidelities. revision: yes

Circularity Check

0 steps flagged

No circularity: standard GD optimization applied to QDT data fitting with independent numerical benchmarks

full rationale

The paper applies gradient descent to minimize a loss function for reconstructing phase-insensitive POVMs from tomography data and benchmarks runtime and fidelity against constrained convex optimization on simulated datasets with noise and limited probes. These are direct empirical comparisons of two standard optimization approaches on the same fitting problem; no derivation chain reduces the reported performance advantage to a fitted parameter, self-definition, or self-citation. The Stiefel-manifold extension is described as a possible future parametrization without any load-bearing uniqueness theorem or ansatz imported from prior self-work. The method is self-contained against external benchmarks and does not rename known results or smuggle assumptions via citation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work relies on standard quantum measurement theory and numerical optimization without introducing new physical postulates or fitted constants beyond routine hyperparameters.

free parameters (1)

gradient descent hyperparameters (learning rate, iterations)
Chosen to achieve convergence; not derived from first principles.

axioms (1)

standard math Detector response is fully described by a POVM: positive semi-definite operators summing to the identity.
Invoked as the mathematical model for any quantum measurement.

pith-pipeline@v0.9.0 · 5401 in / 1199 out tokens · 57536 ms · 2026-05-17T20:46:48.499739+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min Π ||P−FΠ||_F^2 subject to Π_ij ≥0, rows sum to 1; softmax applied to rows of Π

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 3 internal anchors

[1]

would be another research direction of interest that could bypass the need for Riemannian manifold based optimization

work page
[2]

D. T. Smithey, M. Beck, M. G. Raymer, and A. Faridani, Measurement of the Wigner distribution and the density matrix of a light mode using optical homodyne tomog- raphy: Application to squeezed states and the vacuum, Phys. Rev. Lett.70, 1244 (1993)

work page 1993
[3]

Banaszek and K

K. Banaszek and K. W´ odkiewicz, Direct probing of quan- tum phase space by photon counting, Phys. Rev. Lett. 76, 4344 (1996)

work page 1996
[4]

Wallentowitz and W

S. Wallentowitz and W. Vogel, Unbalanced homodyning for quantum state measurements, Phys. Rev. A53, 4528 (1996)

work page 1996
[5]

Lobino, D

M. Lobino, D. Korystov, C. Kupchak, E. Figueroa, B. C. 8 Sanders, and A. I. Lvovsky, Complete characterization of quantum-optical processes, Science322, 563 (2008), https://www.science.org/doi/pdf/10.1126/science.1162086

work page doi:10.1126/science.1162086 2008
[6]

Nielsen, J

E. Nielsen, J. K. Gamble, K. Rudinger, T. Scholten, K. Young, and R. Blume-Kohout, Gate set tomography, Quantum5, 557 (2021)

work page 2021
[7]

Feito, J

A. Feito, J. Lundeen, H. Coldenstrodt-Ronge, J. Eisert, M. B. Plenio, and I. A. Walmsley, Measuring measure- ment: theory and practice, New Journal of Physics11, 093038 (2009)

work page 2009
[8]

Kuzmich, I

A. Kuzmich, I. Walmsley, and L. Mandel, Violation of bell’s inequality by a generalized einstein-podolsky-rosen state using homodyne detection, Physical review letters 85, 1349 (2000)

work page 2000
[9]

Raussendorf, D

R. Raussendorf, D. E. Browne, and H. J. Briegel, Measurement-based quantum computation on cluster states, Physical review A68, 022312 (2003)

work page 2003
[10]

H. J. Briegel, D. E. Browne, W. D¨ ur, R. Raussendorf, and M. Van den Nest, Measurement-based quantum compu- tation, Nature Physics5, 19 (2009)

work page 2009
[11]

J. S. Lundeen, A. Feito, H. Coldenstrodt-Ronge, K. L. Pregnell, C. Silberhorn, T. C. Ralph, J. Eisert, M. B. Plenio, and I. A. Walmsley, Tomography of quantum de- tectors, Nature Physics5, 27 (2009)

work page 2009
[12]

Zhang, A

L. Zhang, A. Datta, H. B. Coldenstrodt-Ronge, X.-M. Jin, J. Eisert, M. B. Plenio, and I. A. Walmsley, Recursive quantum detector tomography, New Journal of Physics 14, 115005 (2012)

work page 2012
[13]

C. M. Natarajan, L. Zhang, H. Coldenstrodt-Ronge, G. Donati, S. N. Dorenbos, V. Zwiller, I. A. Walms- ley, and R. H. Hadfield, Quantum detector tomography of a time-multiplexed superconducting nanowire single- photon detector at telecom wavelengths, Optics express 21, 893 (2013)

work page 2013
[14]

Schapeler, J

T. Schapeler, J. Philipp H¨ opker, and T. J. Bartley, Quan- tum detector tomography of a 2×2 multi-pixel array of superconducting nanowire single photon detectors, Op- tics Express28, 33035 (2020)

work page 2020
[15]

Schapeler, J

T. Schapeler, J. P. H¨ opker, and T. J. Bartley, Quantum detector tomography of a high dynamic-range supercon- ducting nanowire single-photon detector, Superconduc- tor Science and Technology34, 064002 (2021)

work page 2021
[16]

Liu, J.-Q

D.-S. Liu, J.-Q. Wang, C.-L. Zou, X.-F. Ren, and G.- C. Guo, Optimized detector tomography for photon- number-resolving detectors with hundreds of pixels, Physical Review A108, 052611 (2023)

work page 2023
[17]

Cattaneo, M

M. Cattaneo, M. A. Rossi, K. Korhonen, E.-M. Borrelli, G. Garc´ ıa-P´ erez, Z. Zimbor´ as, and D. Cavalcanti, Self- consistent quantum measurement tomography based on semidefinite programming, Physical Review Research5, 033154 (2023)

work page 2023
[18]

Barber` a-Rodr´ ıguez, L

J. Barber` a-Rodr´ ıguez, L. Zambrano, A. Ac´ ın, and D. Fa- rina, Boosting projective methods for quantum process and detector tomography, Physical Review Research7, 013208 (2025)

work page 2025
[19]

LeCun, Y

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, na- ture521, 436 (2015)

work page 2015
[20]

M. D. Hoffman, A. Gelman, et al., The no-u-turn sam- pler: adaptively setting path lengths in hamiltonian monte carlo., J. Mach. Learn. Res.15, 1593 (2014)

work page 2014
[21]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differentiation in machine learning: a survey, Journal of machine learning research18, 1 (2018)

work page 2018
[22]

Goodfellow, Y

I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, Vol. 1 (MIT press Cambridge, 2016)

work page 2016
[23]

C. M. Bishop and H. Bishop, Deep learning: Foundations and concepts (Springer Nature, 2023)

work page 2023
[24]

J. M. Arrazola, T. R. Bromley, J. Izaac, C. R. Myers, K. Br´ adler, and N. Killoran, Machine learning method for state preparation and gate synthesis on photonic quan- tum computers, Quantum Science and Technology4, 024004 (2019)

work page 2019
[25]

F. M. Miatto and N. Quesada, Fast optimization of parametrized quantum optical circuits, Quantum4, 366 (2020)

work page 2020
[26]

Kudra, M

M. Kudra, M. Kervinen, I. Strandberg, S. Ahmed, M. Scigliuzzo, A. Osman, D. P. Lozano, M. O. Thol´ en, R. Borgani, D. B. Haviland, et al., Robust preparation of wigner-negative states with optimized snap-displacement sequences, PRX Quantum3, 030301 (2022)

work page 2022
[27]

Y. Yao, F. Miatto, and N. Quesada, Riemannian opti- mization of photonic quantum circuits in phase and fock space, SciPost Physics17, 082 (2024)

work page 2024
[28]

Bolduc, G

E. Bolduc, G. C. Knee, E. M. Gauger, and J. Leach, Projected gradient descent algorithms for quantum state tomography, npj Quantum Information3, 44 (2017)

work page 2017
[29]

Ahmed, F

S. Ahmed, F. Quijandr´ ıa, and A. F. Kockum, Gradient- descent quantum process tomography by learning kraus operators, Physical Review Letters130, 150402 (2023)

work page 2023
[30]

Y. Wang, L. Liu, S. Cheng, L. Li, and J. Chen, Efficient factored gradient descent algorithm for quantum state tomography, Physical Review Research6, 033034 (2024)

work page 2024
[31]

Hsu, E.-J

M.-C. Hsu, E.-J. Kuo, W.-H. Yu, J.-F. Cai, and M.-H. Hsieh, Quantum state tomography via nonconvex rie- mannian gradient descent, Physical Review Letters132, 240804 (2024)

work page 2024
[32]

Gaikwad, M

A. Gaikwad, M. S. Torres, S. Ahmed, and A. F. Kockum, Gradient-descent methods for fast quantum state to- mography, Quantum Science and Technology10, 045055 (2025)

work page 2025
[33]

C. W. Helstrom, Quantum Detection and Estimation Theory (Mathematics in Science and Engineering, 123, Academic Press, New York, 1976)

work page 1976
[34]

Benenti, G

G. Benenti, G. Casati, D. Rossini, and G. Strini, Principles of quantum computation and information: a comprehensive textbook (World Scientific, 2019)

work page 2019
[35]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (2019) pp. 4171–4186

work page 2019
[36]

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, Distil- bert, a distilled version of bert: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[37]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka- plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information processing systems33, 1877 (2020)

work page 2020
[38]

Raffel, N

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of machine learning research21, 1 (2020)

work page 2020
[39]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi` ere, N. Goyal, E. Ham- bro, F. Azhar, et al., Llama: Open and efficient founda- 9 tion language models, arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in International Conference on Learning Representations (ICLR) (2015)

work page 2015
[41]

Panageas, G

I. Panageas, G. Piliouras, and X. Wang, First-order methods almost always avoid saddle points: The case of vanishing step-sizes, Advances in Neural Information Processing Systems32(2019)

work page 2019
[42]

Schapeler, R

T. Schapeler, R. Schade, M. Lass, C. Plessl, and T. J. Bartley, Scalable quantum detector tomography by high- performance computing, Quantum Science and Technol- ogy10, 015018 (2024)

work page 2024
[43]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems32(2019)

work page 2019
[44]

Diamond and S

S. Diamond and S. Boyd, Cvxpy: A python-embedded modeling language for convex optimization, Journal of Machine Learning Research17, 1 (2016)

work page 2016
[45]

Kok and B

P. Kok and B. W. Lovett, Introduction to optical quantum information processing (Cambridge university press, 2010)

work page 2010
[46]

A. E. Lita, A. J. Miller, and S. W. Nam, Counting near- infrared single-photons with 95% efficiency, Opt. Expr. 16, 3032 (2008)

work page 2008
[47]

Cahall, K

C. Cahall, K. L. Nicolich, N. T. Islam, G. P. Lafyatis, A. J. Miller, D. J. Gauthier, and J. Kim, Multi-photon detection using a conventional superconducting nanowire single-photon detector, Optica4, 1534 (2017)

work page 2017
[48]

Eaton, A

M. Eaton, A. Hossameldin, R. J. Birrittella, P. M. Alsing, C. C. Gerry, H. Dong, C. Cuevas, and O. Pfister, Resolu- tion of 100 photons and quantum generation of unbiased random numbers, Nature Photonics17, 106 (2023)

work page 2023
[49]

Tiedau, E

J. Tiedau, E. Meyer-Scott, T. Nitsche, S. Barkhofen, T. J. Bartley, and C. Silberhorn, A high dynamic range optical detector for measuring single photons and bright light, Optics express27, 1 (2019)

work page 2019
[50]

Larsen, J

M. Larsen, J. Bourassa, S. Kocsis, J. Tasker, R. Chad- wick, C. Gonz´ alez-Arciniegas, J. Hastrup, C. Lopetegui- Gonz´ alez, F. Miatto, A. Motamedi,et al., Integrated pho- tonic source of gottesman–kitaev–preskill qubits, Nature , 1 (2025)

work page 2025
[51]

Gottesman, A

D. Gottesman, A. Kitaev, and J. Preskill, Encoding a qubit in an oscillator, Phys. Rev. A64, 012310 (2001)

work page 2001
[52]

B. Q. Baragiola, G. Pantaleoni, R. N. Alexander, A. Karanjai, and N. C. Menicucci, All-Gaussian univer- sality and fault tolerance with the Gottesman-Kitaev- Preskill code, Phys. Rev. Lett.123, 200502 (2019)

work page 2019
[53]

Boyd and L

S. Boyd and L. Vandenberghe, Convex optimization (Cambridge university press, 2004)

work page 2004
[54]

Fukuda, G

D. Fukuda, G. Fujii, T. Numata, K. Amemiya, A. Yoshizawa, H. Tsuchida, H. Fujino, H. Ishii, T. Itatani, S. Inoue, et al., Titanium-based transition- edge photon number resolving detector with 98% detec- tion efficiency with index-matched small-gap fiber cou- pling, Optics express19, 870 (2011)

work page 2011
[55]

Zhang, L

W. Zhang, L. You, H. Li, J. Huang, C. Lv, L. Zhang, X. Liu, J. Wu, Z. Wang, and X. Xie, Nbn superconduct- ing nanowire single photon detector with efficiency over 90% at 1550 nm wavelength operational at compact cry- ocooler temperature, Science China Physics, Mechanics & Astronomy60, 120314 (2017)

work page 2017
[56]

M. J. Fitch, B. C. Jacobs, T. B. Pittman, and J. D. Fran- son, Photon-number resolution using time-multiplexed single-photon detectors, Phys. Rev. A68, 043814 (2003)

work page 2003
[57]

D. P. Bertsekas, Projected newton methods for optimiza- tion problems with simple constraints, SIAM Journal on control and Optimization20, 221 (1982)

work page 1982
[58]

S. Li, Y. Zhao, R. Varma, O. Salpekar, P. Noordhuis, T. Li, A. Paszke, J. Smith, B. Vaughan, P. Damania, et al., Pytorch distributed: Experiences on accelerating data parallel training, arXiv preprint arXiv:2006.15704 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2006
[59]

Narang, G

S. Narang, G. Diamos, E. Elsen, P. Micikevicius, J. Al- ben, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, et al., Mixed precision training, in Int. Conf. on Learning Representation (2017)

work page 2017
[60]

Bubeck et al., Convex optimization: Algorithms and complexity, Foundations and Trends®in Machine Learning8, 231 (2015)

S. Bubeck et al., Convex optimization: Algorithms and complexity, Foundations and Trends®in Machine Learning8, 231 (2015)

work page 2015