Local Inverse Geometry Can Be Amortized
Pith reviewed 2026-05-14 19:57 UTC · model grok-4.3
The pith
A learned reverse operator amortizes local inverse geometry so first-order methods match damped Gauss-Newton on nonlinear inverse problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
D-IPG is first-order equivalent to damped Gauss-Newton under local pseudoinverse consistency between the learned reverse Jacobian and the forward Jacobian, with the deviation controlled by composition error and conditioning; this consistency is enforced during training by the Jacobian Composition Penalty.
What carries the argument
The Deceptron bidirectional surrogate whose reverse Jacobian is trained via the Jacobian Composition Penalty to act as a local left inverse of the forward Jacobian, enabling the D-IPG iterative solver.
If this is right
- D-IPG achieves comparable or better recovery quality than standard baselines across seven PDE inverse-problem benchmarks.
- D-IPG reaches 94.8 percent mean success rate on the six-problem reliability suite.
- D-IPG delivers up to 77 times lower inference-time solve cost than curvature-aware methods on the main benchmarks.
- The equivalence holds with deviation controlled directly by the measured composition error and problem conditioning.
Where Pith is reading between the lines
- The same amortization could be applied to inverse problems outside PDEs, such as parameter estimation in robotics or imaging.
- If consistency generalizes, hybrid solvers could monitor composition error at runtime and fall back to exact linear solves only when needed.
- Testing on problems with rapidly changing conditioning would expose where the learned operator stops tracking true inverse geometry.
- Lower per-iteration cost opens the door to real-time inverse problems in control loops where Gauss-Newton is currently too slow.
Load-bearing premise
The learned reverse Jacobian maintains local pseudoinverse consistency with the forward Jacobian along optimization trajectories.
What would settle it
Run D-IPG on a benchmark where the composition error stays large despite JCP training and measure whether the update directions diverge from those of damped Gauss-Newton by more than the predicted bound.
Figures
read the original abstract
Nonlinear inverse problems often trade inexpensive but fragile first-order updates against curvature-aware methods such as Gauss-Newton and Levenberg-Marquardt, which obtain stronger directions by repeatedly solving Jacobian-based linearized systems. We propose a learned alternative: amortize local inverse geometry into a reusable reverse operator. Our framework learns a bidirectional surrogate, Deceptron, and deploys it through D-IPG (Deceptron Inverse-Preconditioned Gradient), an iterative solver that pulls residual-corrected measurement-space proposals back to latent space. The key mechanism is a Jacobian Composition Penalty (JCP), which trains the reverse Jacobian to act as a local left inverse of the forward Jacobian; its runtime counterpart, RJCP, measures the same inverse-consistency error along optimization trajectories. We prove that D-IPG is first-order equivalent to damped Gauss-Newton under local pseudoinverse consistency, with deviation controlled by composition error and conditioning. Across seven PDE inverse-problem benchmarks, D-IPG outperforms standard baselines, achieves 94.8% mean success across the six-problem reliability suite, and reaches comparable or better recovery quality at up to 77x lower inference-time solve cost on the main benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes amortizing local inverse geometry for nonlinear inverse problems via a learned bidirectional surrogate (Deceptron) trained with a Jacobian Composition Penalty (JCP). It introduces the D-IPG iterative solver that uses the learned reverse operator for residual-corrected proposals, proves first-order equivalence to damped Gauss-Newton under a local pseudoinverse consistency condition (with deviation controlled by composition error and conditioning), and reports strong empirical results on seven PDE inverse-problem benchmarks, including 94.8% mean success rate and up to 77x lower inference-time cost.
Significance. If the local consistency transfers to D-IPG trajectories and the equivalence holds in practice, the work provides a practical learned alternative to curvature-aware solvers that retains first-order efficiency while achieving stronger directions, supported by reproducible benchmarks across multiple inverse problems. The combination of a parameter-free-style derivation under the consistency assumption and extensive empirical validation strengthens the contribution.
major comments (2)
- [Proof of first-order equivalence] Proof of first-order equivalence (likely Theorem 3 or §4): the claimed equivalence to damped Gauss-Newton holds only under the assumption that the learned reverse Jacobian maintains local left-inverse consistency (small RJCP) along the specific sequences of latent points visited by D-IPG iterations. The manuscript provides no reported measurements or bounds on runtime RJCP for the optimization trajectories in the seven benchmarks, leaving the deviation term (controlled by composition error and conditioning) unverified and potentially unbounded if trajectories exit the training support.
- [Experimental validation] Experimental validation (§5 or Table 1-3): while JCP is used during training on a fixed distribution, the paper does not include ablation or diagnostic plots showing that the runtime RJCP metric remains small on the actual D-IPG paths for the reported PDE problems. This directly affects whether the strong benchmark results (94.8% success, 77x speedup) can be attributed to the proven equivalence rather than to the learned model behaving as a generic preconditioner.
minor comments (2)
- [Notation] Notation for Deceptron and D-IPG could be clarified with a single summary table of symbols and their roles to aid readability.
- [Abstract] The abstract mentions 'parameter-free' aspects of the equivalence but the deviation bound depends on conditioning; a brief remark on this dependence would prevent misinterpretation.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and for highlighting the importance of verifying the local consistency assumption in practice. We address the major comments below and will revise the manuscript accordingly to strengthen the connection between theory and experiments.
read point-by-point responses
-
Referee: [Proof of first-order equivalence] Proof of first-order equivalence (likely Theorem 3 or §4): the claimed equivalence to damped Gauss-Newton holds only under the assumption that the learned reverse Jacobian maintains local left-inverse consistency (small RJCP) along the specific sequences of latent points visited by D-IPG iterations. The manuscript provides no reported measurements or bounds on runtime RJCP for the optimization trajectories in the seven benchmarks, leaving the deviation term (controlled by composition error and conditioning) unverified and potentially unbounded if trajectories exit the training support.
Authors: We agree that Theorem 3 establishes first-order equivalence only under the local pseudoinverse consistency condition (small RJCP). Although JCP training encourages this property on the data distribution, explicit runtime verification along D-IPG trajectories is a valuable addition. In the revised manuscript we will report mean and maximum RJCP values observed during optimization on all seven benchmarks, together with a short analysis confirming that trajectories remain within the support where the deviation term stays controlled by the reported conditioning bounds. revision: yes
-
Referee: [Experimental validation] Experimental validation (§5 or Table 1-3): while JCP is used during training on a fixed distribution, the paper does not include ablation or diagnostic plots showing that the runtime RJCP metric remains small on the actual D-IPG paths for the reported PDE problems. This directly affects whether the strong benchmark results (94.8% success, 77x speedup) can be attributed to the proven equivalence rather than to the learned model behaving as a generic preconditioner.
Authors: We concur that diagnostic evidence is needed to link the empirical gains directly to the equivalence result rather than generic preconditioning. The revised version will include new ablation and diagnostic plots that track RJCP along the full D-IPG trajectories for the PDE benchmarks. These plots will show that RJCP remains small (consistent with training values) throughout the iterations, thereby supporting attribution of the 94.8% success rate and up to 77x speedup to the first-order equivalence. revision: yes
Circularity Check
No significant circularity; equivalence proof is conditional on an independently enforceable property
full rationale
The paper's central derivation is a mathematical proof that D-IPG matches the first-order behavior of damped Gauss-Newton whenever local pseudoinverse consistency holds, with explicit deviation bounds in terms of composition error and conditioning. This statement is self-contained and does not reduce to the training procedure or fitted parameters by construction; it is a standard first-order analysis that applies to any reverse operator satisfying the consistency condition. The Jacobian Composition Penalty (JCP) is merely one mechanism for attempting to enforce that condition during training, but the proof itself makes no reference to how consistency is obtained and remains valid (or invalid) independently of the training data, loss terms, or learned weights. No self-citation chain, ansatz smuggling, or renaming of known results is present in the provided derivation steps. The practical question of whether consistency transfers to test trajectories is a separate empirical concern, not a circularity in the claimed derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- Deceptron network weights
axioms (1)
- domain assumption Local pseudoinverse consistency between forward and reverse Jacobians can be achieved and maintained by the Jacobian Composition Penalty
invented entities (3)
-
Deceptron
no independent evidence
-
D-IPG solver
no independent evidence
-
Jacobian Composition Penalty (JCP)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J uniqueness) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We prove that D-IPG is first-order equivalent to damped Gauss-Newton under local pseudoinverse consistency, with deviation controlled by composition error and conditioning. ... Jacobian Composition Penalty (JCP) ... LJCP = E_{x,ξ} ||Jg(fW(x)) Jf(x) ξ − ξ||₂²
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection (coupling combiner forces bilinear J branch) echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Theorem 4.2 ... ∥ΔxDIPG − ΔxGN,α∥₂ ≤ αt ∥Jg(fW(xt)) Jf(xt) − I∥₂ / σmin(Jf(xt)) ∥rt∥₂
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A method for the solution of certain non-linear problems in least squares
Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164–168, 1944
work page 1944
- [2]
-
[3]
Jorge Nocedal and Stephen J. Wright.Numerical Optimization. Springer, New York, NY , 2 edition, 2006
work page 2006
-
[4]
Learning fast approximations of sparse coding
Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. InProceedings of the 27th International Conference on Machine Learning, pages 399–406, 2010
work page 2010
-
[5]
Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas. Learning to learn by gradient descent by gradient descent. InAdvances in Neural Information Processing Systems, volume 29, 2016
work page 2016
-
[6]
Vishal Monga, Yuelong Li, and Yonina C. Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44, 2021
work page 2021
-
[7]
Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018
Jonas Adler and Ozan Öktem. Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018
work page 2018
-
[8]
Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021
work page 2021
-
[9]
Fourier neural operator for parametric partial differen- tial equations
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differen- tial equations. InInternational Conference on Learning Representations, 2021
work page 2021
-
[10]
Minimization of functions having Lipschitz continuous first partial derivatives
Larry Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1):1–3, 1966
work page 1966
-
[11]
Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Communications in Statistics - Simulation and Computation, 19(2):433–450, 1990
work page 1990
-
[12]
Roger Penrose. A generalized inverse for matrices.Mathematical Proceedings of the Cambridge Philosophical Society, 51(3):406–413, 1955
work page 1955
-
[13]
Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles.Mathematical Programming, 91(2):201–213, 2002
work page 2002
-
[14]
Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988
Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988. 10 Code availability A code repository is available at https://github.com/AadityaKachhadiya/deceptron. The repository contains an installable PyTorch implementation of Deceptron/D-IPG, including the learned forward–reverse modul...
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.