pith. machine review for the scientific record. sign in

arxiv: 2511.14579 · v2 · submitted 2025-11-18 · 🪐 quant-ph

Gradient-descent methods for scalable quantum detector tomography

Pith reviewed 2026-05-17 20:46 UTC · model grok-4.3

classification 🪐 quant-ph
keywords quantum detector tomographygradient descentPOVM reconstructionphase-insensitive detectorsquantum opticsoptimizationtomography scalabilityStiefel manifold
0
0 comments X

The pith

Gradient descent optimization reconstructs the POVM of phase-insensitive quantum detectors faster than constrained convex methods while matching or exceeding fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a gradient-descent technique to perform quantum detector tomography on phase-insensitive detectors by iteratively adjusting POVM elements to fit observed measurement data. It numerically compares this approach to constrained convex optimization and finds that gradient descent reaches similar or better reconstruction accuracy in substantially shorter computation time, even when noise is present and only a small number of probe states are available. The method is extended to phase-sensitive detectors by parametrizing the POVM on the complex Stiefel manifold so that gradient steps remain on the valid manifold. A reader would care because quantum detectors must be characterized accurately in experiments, yet traditional convex solvers become slow as the detector dimension grows or data sets enlarge.

Core claim

Gradient descent can be applied directly to the parameters of a POVM to solve the quantum detector tomography problem for phase-insensitive detectors, yielding a reconstruction of the detector response that matches or surpasses the fidelity of constrained convex optimization while requiring far less runtime, as shown in numerical tests that include realistic noise levels and restricted probe resources; the same framework extends to the phase-sensitive case through a manifold-constrained parametrization.

What carries the argument

Gradient descent optimization of POVM matrix elements (with Stiefel-manifold parametrization for the phase-sensitive extension) to minimize the mismatch between predicted and observed detection statistics.

If this is right

  • The method enables tomography of higher-dimensional or more complex detectors within practical laboratory time budgets.
  • Reconstruction remains reliable when measurement noise is present and only limited probe states can be prepared.
  • The Stiefel-manifold parametrization brings gradient-based optimization to phase-sensitive detectors without leaving the space of valid POVMs.
  • Reduced computation time makes repeated calibrations feasible during long experimental runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same gradient machinery could be combined with adaptive probe-state selection to further reduce the total number of measurements needed.
  • Integration into real-time control software might allow detectors to be recalibrated on the fly as experimental conditions drift.
  • Because the approach is first-order and local, it may scale more gracefully than convex solvers when detector Hilbert spaces become large.

Load-bearing premise

Gradient descent will converge to a high-fidelity POVM without becoming trapped in poor local minima, and the added-noise simulations used for benchmarking will accurately reflect real experimental performance.

What would settle it

An experiment on a calibrated phase-insensitive detector where the gradient-descent reconstruction produces measurably lower fidelity than a constrained convex optimization run on the same data set, or where the gradient method fails to converge within the reported time advantage.

Figures

Figures reproduced from arXiv: 2511.14579 by Amanuel Anteneh, Olivier Pfister.

Figure 1
Figure 1. Figure 1: FIG. 1: Wall clock time (in seconds) and wall clock [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2: Average reconstruction fidelity of QDT [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4: Average reconstruction fidelity of QDT [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

We present a technique for performing quantum detector tomography (QDT) of phase insensitive quantum detectors, a category under which many detectors of interest fall under, using gradient descent-based optimization to learn the positive operator-valued measure (POVM) that best describes the data collected using the detector under study. We numerically benchmark our method against constrained convex optimization (CCO) and show that it reaches higher or comparable reconstruction fidelity in much less time even in the presence of noise and limited probe state resources. We also present a possible extension of our approach to the phase sensitive case via a parametrization of POVMs on the complex Stiefel manifold which enables gradient based optimization restricted to this manifold.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces gradient descent methods for quantum detector tomography (QDT) of phase-insensitive detectors by optimizing the POVM parameters to fit experimental data. Numerical benchmarks against constrained convex optimization (CCO) demonstrate that the proposed method achieves higher or comparable reconstruction fidelity in significantly less time, even with noise and limited probe states. The work also proposes an extension to phase-sensitive detectors using a parametrization on the complex Stiefel manifold.

Significance. If the numerical advantages hold under broader conditions, this method could provide a more scalable alternative to convex optimization for characterizing quantum detectors, which is relevant for quantum information processing tasks involving higher-dimensional systems. The benchmarks offer initial support for practical efficiency gains, but the absence of scaling analysis limits the strength of the contribution.

major comments (2)
  1. [Numerical benchmarks] Numerical benchmarks section: The comparisons are performed only on low-dimensional phase-insensitive detectors with small Fock-space truncation levels. Since the title and abstract emphasize scalability, the manuscript requires explicit scaling studies (e.g., wall-clock time and fidelity versus Hilbert-space dimension or number of probe states) to confirm that the reported speed-up persists beyond the tested regime; otherwise the central claim of practical advantage over CCO is not yet load-bearing.
  2. [Method / Optimization details] Optimization and convergence: No empirical or theoretical analysis is provided on robustness to local minima, initialization dependence, or basin-hopping frequency for the gradient-descent procedure. Given that POVM fitting landscapes are generally non-convex, this omission directly affects the reliability of the fidelity claims under noise and limited probes.
minor comments (3)
  1. [Abstract] The abstract states 'much less time' without reporting concrete timing metrics, hardware specifications, or iteration counts used in the CCO versus GD comparison.
  2. [Method] Clarify the explicit parametrization chosen for the diagonal (phase-insensitive) POVM elements in the gradient-descent update rule.
  3. [Extension to phase-sensitive case] The Stiefel-manifold extension is described only as 'possible'; if it is intended as a contribution, a minimal numerical demonstration or convergence guarantee should be added or the claim should be toned down.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thoughtful review and valuable suggestions. We have carefully considered the major comments and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [Numerical benchmarks] Numerical benchmarks section: The comparisons are performed only on low-dimensional phase-insensitive detectors with small Fock-space truncation levels. Since the title and abstract emphasize scalability, the manuscript requires explicit scaling studies (e.g., wall-clock time and fidelity versus Hilbert-space dimension or number of probe states) to confirm that the reported speed-up persists beyond the tested regime; otherwise the central claim of practical advantage over CCO is not yet load-bearing.

    Authors: We agree with the referee that demonstrating scalability through explicit scaling studies is important to support the claims in the title and abstract. In the revised manuscript, we will include additional numerical results showing how the wall-clock time and reconstruction fidelity scale with increasing Hilbert-space dimension (e.g., larger Fock space truncations) and varying numbers of probe states. These studies will help confirm whether the observed computational advantages over constrained convex optimization persist in higher-dimensional regimes. revision: yes

  2. Referee: [Method / Optimization details] Optimization and convergence: No empirical or theoretical analysis is provided on robustness to local minima, initialization dependence, or basin-hopping frequency for the gradient-descent procedure. Given that POVM fitting landscapes are generally non-convex, this omission directly affects the reliability of the fidelity claims under noise and limited probes.

    Authors: We acknowledge that the non-convex nature of the optimization landscape warrants further investigation into robustness and initialization effects. While our current implementation employs random initializations and reports the best outcome across trials, we did not include a systematic study of convergence behavior or sensitivity to starting points. In the revised manuscript, we will add empirical analysis, such as statistics over multiple initializations and observations on convergence under noisy conditions, to better substantiate the reliability of the reported fidelities. revision: yes

Circularity Check

0 steps flagged

No circularity: standard GD optimization applied to QDT data fitting with independent numerical benchmarks

full rationale

The paper applies gradient descent to minimize a loss function for reconstructing phase-insensitive POVMs from tomography data and benchmarks runtime and fidelity against constrained convex optimization on simulated datasets with noise and limited probes. These are direct empirical comparisons of two standard optimization approaches on the same fitting problem; no derivation chain reduces the reported performance advantage to a fitted parameter, self-definition, or self-citation. The Stiefel-manifold extension is described as a possible future parametrization without any load-bearing uniqueness theorem or ansatz imported from prior self-work. The method is self-contained against external benchmarks and does not rename known results or smuggle assumptions via citation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work relies on standard quantum measurement theory and numerical optimization without introducing new physical postulates or fitted constants beyond routine hyperparameters.

free parameters (1)
  • gradient descent hyperparameters (learning rate, iterations)
    Chosen to achieve convergence; not derived from first principles.
axioms (1)
  • standard math Detector response is fully described by a POVM: positive semi-definite operators summing to the identity.
    Invoked as the mathematical model for any quantum measurement.

pith-pipeline@v0.9.0 · 5401 in / 1199 out tokens · 57536 ms · 2026-05-17T20:46:48.499739+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 3 internal anchors

  1. [1]

    would be another research direction of interest that could bypass the need for Riemannian manifold based optimization

  2. [2]

    D. T. Smithey, M. Beck, M. G. Raymer, and A. Faridani, Measurement of the Wigner distribution and the density matrix of a light mode using optical homodyne tomog- raphy: Application to squeezed states and the vacuum, Phys. Rev. Lett.70, 1244 (1993)

  3. [3]

    Banaszek and K

    K. Banaszek and K. W´ odkiewicz, Direct probing of quan- tum phase space by photon counting, Phys. Rev. Lett. 76, 4344 (1996)

  4. [4]

    Wallentowitz and W

    S. Wallentowitz and W. Vogel, Unbalanced homodyning for quantum state measurements, Phys. Rev. A53, 4528 (1996)

  5. [5]

    Lobino, D

    M. Lobino, D. Korystov, C. Kupchak, E. Figueroa, B. C. 8 Sanders, and A. I. Lvovsky, Complete characterization of quantum-optical processes, Science322, 563 (2008), https://www.science.org/doi/pdf/10.1126/science.1162086

  6. [6]

    Nielsen, J

    E. Nielsen, J. K. Gamble, K. Rudinger, T. Scholten, K. Young, and R. Blume-Kohout, Gate set tomography, Quantum5, 557 (2021)

  7. [7]

    Feito, J

    A. Feito, J. Lundeen, H. Coldenstrodt-Ronge, J. Eisert, M. B. Plenio, and I. A. Walmsley, Measuring measure- ment: theory and practice, New Journal of Physics11, 093038 (2009)

  8. [8]

    Kuzmich, I

    A. Kuzmich, I. Walmsley, and L. Mandel, Violation of bell’s inequality by a generalized einstein-podolsky-rosen state using homodyne detection, Physical review letters 85, 1349 (2000)

  9. [9]

    Raussendorf, D

    R. Raussendorf, D. E. Browne, and H. J. Briegel, Measurement-based quantum computation on cluster states, Physical review A68, 022312 (2003)

  10. [10]

    H. J. Briegel, D. E. Browne, W. D¨ ur, R. Raussendorf, and M. Van den Nest, Measurement-based quantum compu- tation, Nature Physics5, 19 (2009)

  11. [11]

    J. S. Lundeen, A. Feito, H. Coldenstrodt-Ronge, K. L. Pregnell, C. Silberhorn, T. C. Ralph, J. Eisert, M. B. Plenio, and I. A. Walmsley, Tomography of quantum de- tectors, Nature Physics5, 27 (2009)

  12. [12]

    Zhang, A

    L. Zhang, A. Datta, H. B. Coldenstrodt-Ronge, X.-M. Jin, J. Eisert, M. B. Plenio, and I. A. Walmsley, Recursive quantum detector tomography, New Journal of Physics 14, 115005 (2012)

  13. [13]

    C. M. Natarajan, L. Zhang, H. Coldenstrodt-Ronge, G. Donati, S. N. Dorenbos, V. Zwiller, I. A. Walms- ley, and R. H. Hadfield, Quantum detector tomography of a time-multiplexed superconducting nanowire single- photon detector at telecom wavelengths, Optics express 21, 893 (2013)

  14. [14]

    Schapeler, J

    T. Schapeler, J. Philipp H¨ opker, and T. J. Bartley, Quan- tum detector tomography of a 2×2 multi-pixel array of superconducting nanowire single photon detectors, Op- tics Express28, 33035 (2020)

  15. [15]

    Schapeler, J

    T. Schapeler, J. P. H¨ opker, and T. J. Bartley, Quantum detector tomography of a high dynamic-range supercon- ducting nanowire single-photon detector, Superconduc- tor Science and Technology34, 064002 (2021)

  16. [16]

    Liu, J.-Q

    D.-S. Liu, J.-Q. Wang, C.-L. Zou, X.-F. Ren, and G.- C. Guo, Optimized detector tomography for photon- number-resolving detectors with hundreds of pixels, Physical Review A108, 052611 (2023)

  17. [17]

    Cattaneo, M

    M. Cattaneo, M. A. Rossi, K. Korhonen, E.-M. Borrelli, G. Garc´ ıa-P´ erez, Z. Zimbor´ as, and D. Cavalcanti, Self- consistent quantum measurement tomography based on semidefinite programming, Physical Review Research5, 033154 (2023)

  18. [18]

    Barber` a-Rodr´ ıguez, L

    J. Barber` a-Rodr´ ıguez, L. Zambrano, A. Ac´ ın, and D. Fa- rina, Boosting projective methods for quantum process and detector tomography, Physical Review Research7, 013208 (2025)

  19. [19]

    LeCun, Y

    Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, na- ture521, 436 (2015)

  20. [20]

    M. D. Hoffman, A. Gelman, et al., The no-u-turn sam- pler: adaptively setting path lengths in hamiltonian monte carlo., J. Mach. Learn. Res.15, 1593 (2014)

  21. [21]

    A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differentiation in machine learning: a survey, Journal of machine learning research18, 1 (2018)

  22. [22]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, Vol. 1 (MIT press Cambridge, 2016)

  23. [23]

    C. M. Bishop and H. Bishop, Deep learning: Foundations and concepts (Springer Nature, 2023)

  24. [24]

    J. M. Arrazola, T. R. Bromley, J. Izaac, C. R. Myers, K. Br´ adler, and N. Killoran, Machine learning method for state preparation and gate synthesis on photonic quan- tum computers, Quantum Science and Technology4, 024004 (2019)

  25. [25]

    F. M. Miatto and N. Quesada, Fast optimization of parametrized quantum optical circuits, Quantum4, 366 (2020)

  26. [26]

    Kudra, M

    M. Kudra, M. Kervinen, I. Strandberg, S. Ahmed, M. Scigliuzzo, A. Osman, D. P. Lozano, M. O. Thol´ en, R. Borgani, D. B. Haviland, et al., Robust preparation of wigner-negative states with optimized snap-displacement sequences, PRX Quantum3, 030301 (2022)

  27. [27]

    Y. Yao, F. Miatto, and N. Quesada, Riemannian opti- mization of photonic quantum circuits in phase and fock space, SciPost Physics17, 082 (2024)

  28. [28]

    Bolduc, G

    E. Bolduc, G. C. Knee, E. M. Gauger, and J. Leach, Projected gradient descent algorithms for quantum state tomography, npj Quantum Information3, 44 (2017)

  29. [29]

    Ahmed, F

    S. Ahmed, F. Quijandr´ ıa, and A. F. Kockum, Gradient- descent quantum process tomography by learning kraus operators, Physical Review Letters130, 150402 (2023)

  30. [30]

    Y. Wang, L. Liu, S. Cheng, L. Li, and J. Chen, Efficient factored gradient descent algorithm for quantum state tomography, Physical Review Research6, 033034 (2024)

  31. [31]

    Hsu, E.-J

    M.-C. Hsu, E.-J. Kuo, W.-H. Yu, J.-F. Cai, and M.-H. Hsieh, Quantum state tomography via nonconvex rie- mannian gradient descent, Physical Review Letters132, 240804 (2024)

  32. [32]

    Gaikwad, M

    A. Gaikwad, M. S. Torres, S. Ahmed, and A. F. Kockum, Gradient-descent methods for fast quantum state to- mography, Quantum Science and Technology10, 045055 (2025)

  33. [33]

    C. W. Helstrom, Quantum Detection and Estimation Theory (Mathematics in Science and Engineering, 123, Academic Press, New York, 1976)

  34. [34]

    Benenti, G

    G. Benenti, G. Casati, D. Rossini, and G. Strini, Principles of quantum computation and information: a comprehensive textbook (World Scientific, 2019)

  35. [35]

    Devlin, M.-W

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (2019) pp. 4171–4186

  36. [36]

    V. Sanh, L. Debut, J. Chaumond, and T. Wolf, Distil- bert, a distilled version of bert: smaller, faster, cheaper and lighter, arXiv preprint arXiv:1910.01108 (2019)

  37. [37]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka- plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in neural information processing systems33, 1877 (2020)

  38. [38]

    Raffel, N

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of machine learning research21, 1 (2020)

  39. [39]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi` ere, N. Goyal, E. Ham- bro, F. Azhar, et al., Llama: Open and efficient founda- 9 tion language models, arXiv preprint arXiv:2302.13971 (2023)

  40. [40]

    D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in International Conference on Learning Representations (ICLR) (2015)

  41. [41]

    Panageas, G

    I. Panageas, G. Piliouras, and X. Wang, First-order methods almost always avoid saddle points: The case of vanishing step-sizes, Advances in Neural Information Processing Systems32(2019)

  42. [42]

    Schapeler, R

    T. Schapeler, R. Schade, M. Lass, C. Plessl, and T. J. Bartley, Scalable quantum detector tomography by high- performance computing, Quantum Science and Technol- ogy10, 015018 (2024)

  43. [43]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems32(2019)

  44. [44]

    Diamond and S

    S. Diamond and S. Boyd, Cvxpy: A python-embedded modeling language for convex optimization, Journal of Machine Learning Research17, 1 (2016)

  45. [45]

    Kok and B

    P. Kok and B. W. Lovett, Introduction to optical quantum information processing (Cambridge university press, 2010)

  46. [46]

    A. E. Lita, A. J. Miller, and S. W. Nam, Counting near- infrared single-photons with 95% efficiency, Opt. Expr. 16, 3032 (2008)

  47. [47]

    Cahall, K

    C. Cahall, K. L. Nicolich, N. T. Islam, G. P. Lafyatis, A. J. Miller, D. J. Gauthier, and J. Kim, Multi-photon detection using a conventional superconducting nanowire single-photon detector, Optica4, 1534 (2017)

  48. [48]

    Eaton, A

    M. Eaton, A. Hossameldin, R. J. Birrittella, P. M. Alsing, C. C. Gerry, H. Dong, C. Cuevas, and O. Pfister, Resolu- tion of 100 photons and quantum generation of unbiased random numbers, Nature Photonics17, 106 (2023)

  49. [49]

    Tiedau, E

    J. Tiedau, E. Meyer-Scott, T. Nitsche, S. Barkhofen, T. J. Bartley, and C. Silberhorn, A high dynamic range optical detector for measuring single photons and bright light, Optics express27, 1 (2019)

  50. [50]

    Larsen, J

    M. Larsen, J. Bourassa, S. Kocsis, J. Tasker, R. Chad- wick, C. Gonz´ alez-Arciniegas, J. Hastrup, C. Lopetegui- Gonz´ alez, F. Miatto, A. Motamedi,et al., Integrated pho- tonic source of gottesman–kitaev–preskill qubits, Nature , 1 (2025)

  51. [51]

    Gottesman, A

    D. Gottesman, A. Kitaev, and J. Preskill, Encoding a qubit in an oscillator, Phys. Rev. A64, 012310 (2001)

  52. [52]

    B. Q. Baragiola, G. Pantaleoni, R. N. Alexander, A. Karanjai, and N. C. Menicucci, All-Gaussian univer- sality and fault tolerance with the Gottesman-Kitaev- Preskill code, Phys. Rev. Lett.123, 200502 (2019)

  53. [53]

    Boyd and L

    S. Boyd and L. Vandenberghe, Convex optimization (Cambridge university press, 2004)

  54. [54]

    Fukuda, G

    D. Fukuda, G. Fujii, T. Numata, K. Amemiya, A. Yoshizawa, H. Tsuchida, H. Fujino, H. Ishii, T. Itatani, S. Inoue, et al., Titanium-based transition- edge photon number resolving detector with 98% detec- tion efficiency with index-matched small-gap fiber cou- pling, Optics express19, 870 (2011)

  55. [55]

    Zhang, L

    W. Zhang, L. You, H. Li, J. Huang, C. Lv, L. Zhang, X. Liu, J. Wu, Z. Wang, and X. Xie, Nbn superconduct- ing nanowire single photon detector with efficiency over 90% at 1550 nm wavelength operational at compact cry- ocooler temperature, Science China Physics, Mechanics & Astronomy60, 120314 (2017)

  56. [56]

    M. J. Fitch, B. C. Jacobs, T. B. Pittman, and J. D. Fran- son, Photon-number resolution using time-multiplexed single-photon detectors, Phys. Rev. A68, 043814 (2003)

  57. [57]

    D. P. Bertsekas, Projected newton methods for optimiza- tion problems with simple constraints, SIAM Journal on control and Optimization20, 221 (1982)

  58. [58]

    S. Li, Y. Zhao, R. Varma, O. Salpekar, P. Noordhuis, T. Li, A. Paszke, J. Smith, B. Vaughan, P. Damania, et al., Pytorch distributed: Experiences on accelerating data parallel training, arXiv preprint arXiv:2006.15704 (2020)

  59. [59]

    Narang, G

    S. Narang, G. Diamos, E. Elsen, P. Micikevicius, J. Al- ben, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, et al., Mixed precision training, in Int. Conf. on Learning Representation (2017)

  60. [60]

    Bubeck et al., Convex optimization: Algorithms and complexity, Foundations and Trends®in Machine Learning8, 231 (2015)

    S. Bubeck et al., Convex optimization: Algorithms and complexity, Foundations and Trends®in Machine Learning8, 231 (2015)