Local Inverse Geometry Can Be Amortized

Aaditya L. Kachhadiya

arxiv: 2605.13068 · v1 · pith:6FGT2XE2new · submitted 2026-05-13 · 💻 cs.LG

Local Inverse Geometry Can Be Amortized

Aaditya L. Kachhadiya This is my paper

Pith reviewed 2026-05-14 19:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords amortized inverse geometryDeceptronD-IPGJacobian Composition PenaltyGauss-Newton equivalencePDE inverse problemsfirst-order solverspseudoinverse consistency

0 comments

The pith

A learned reverse operator amortizes local inverse geometry so first-order methods match damped Gauss-Newton on nonlinear inverse problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Deceptron as a bidirectional neural surrogate that learns forward and reverse mappings for inverse problems. Training uses the Jacobian Composition Penalty to make the reverse Jacobian a local left inverse of the forward Jacobian. This setup powers D-IPG, an iterative solver shown to be first-order equivalent to damped Gauss-Newton whenever local pseudoinverse consistency holds, with deviation bounded by composition error and conditioning. On seven PDE inverse-problem benchmarks the method delivers comparable recovery quality, 94.8 percent mean success on the reliability suite, and up to 77 times lower inference-time solve cost than standard baselines. A reader would care because the approach removes the need for repeated Jacobian linear solves while preserving the directional strength of curvature-aware methods.

Core claim

D-IPG is first-order equivalent to damped Gauss-Newton under local pseudoinverse consistency between the learned reverse Jacobian and the forward Jacobian, with the deviation controlled by composition error and conditioning; this consistency is enforced during training by the Jacobian Composition Penalty.

What carries the argument

The Deceptron bidirectional surrogate whose reverse Jacobian is trained via the Jacobian Composition Penalty to act as a local left inverse of the forward Jacobian, enabling the D-IPG iterative solver.

If this is right

D-IPG achieves comparable or better recovery quality than standard baselines across seven PDE inverse-problem benchmarks.
D-IPG reaches 94.8 percent mean success rate on the six-problem reliability suite.
D-IPG delivers up to 77 times lower inference-time solve cost than curvature-aware methods on the main benchmarks.
The equivalence holds with deviation controlled directly by the measured composition error and problem conditioning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same amortization could be applied to inverse problems outside PDEs, such as parameter estimation in robotics or imaging.
If consistency generalizes, hybrid solvers could monitor composition error at runtime and fall back to exact linear solves only when needed.
Testing on problems with rapidly changing conditioning would expose where the learned operator stops tracking true inverse geometry.
Lower per-iteration cost opens the door to real-time inverse problems in control loops where Gauss-Newton is currently too slow.

Load-bearing premise

The learned reverse Jacobian maintains local pseudoinverse consistency with the forward Jacobian along optimization trajectories.

What would settle it

Run D-IPG on a benchmark where the composition error stays large despite JCP training and measure whether the update directions diverge from those of damped Gauss-Newton by more than the predicted bound.

Figures

Figures reproduced from arXiv: 2605.13068 by Aaditya L. Kachhadiya.

**Figure 2.** Figure 2: Mechanism study on Allen–Cahn-2D. Panel (a) shows the iteration-wise separation [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Reliability matrix across six PDE inverse problems. Columns are ordered by the D [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Pooled solver profiles across the full benchmark. (a) wall-clock Dolan–Moré profile. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Cosine spectrum of the learned pullback across problems with available cosine diagnostics. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Reproducibility effects of the Jacobian Composition Penalty. Left: cross-problem associa [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Hyperparameter analysis on Allen–Cahn-2D. Left: probe-count sensitivity. Top: success [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Representative qualitative reconstructions on three benchmark problems. Left block: [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Nonlinear inverse problems often trade inexpensive but fragile first-order updates against curvature-aware methods such as Gauss-Newton and Levenberg-Marquardt, which obtain stronger directions by repeatedly solving Jacobian-based linearized systems. We propose a learned alternative: amortize local inverse geometry into a reusable reverse operator. Our framework learns a bidirectional surrogate, Deceptron, and deploys it through D-IPG (Deceptron Inverse-Preconditioned Gradient), an iterative solver that pulls residual-corrected measurement-space proposals back to latent space. The key mechanism is a Jacobian Composition Penalty (JCP), which trains the reverse Jacobian to act as a local left inverse of the forward Jacobian; its runtime counterpart, RJCP, measures the same inverse-consistency error along optimization trajectories. We prove that D-IPG is first-order equivalent to damped Gauss-Newton under local pseudoinverse consistency, with deviation controlled by composition error and conditioning. Across seven PDE inverse-problem benchmarks, D-IPG outperforms standard baselines, achieves 94.8% mean success across the six-problem reliability suite, and reaches comparable or better recovery quality at up to 77x lower inference-time solve cost on the main benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper amortizes local inverse geometry with a Deceptron network and JCP penalty to build D-IPG, claiming first-order equivalence to damped Gauss-Newton, but the key consistency is not shown to hold on actual solver trajectories.

read the letter

The main takeaway is that this work trains a bidirectional network to serve as a reusable local left inverse for the forward Jacobian in nonlinear inverse problems, then plugs it into an iterative solver called D-IPG. The training uses a Jacobian Composition Penalty to push the composition error low, and they sketch a proof that the resulting updates match damped Gauss-Newton to first order when that error stays small, with deviation bounded by conditioning and the remaining composition term. On seven PDE benchmarks they report strong reliability numbers and up to 77x lower solve cost than standard methods while matching or beating recovery quality.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes amortizing local inverse geometry for nonlinear inverse problems via a learned bidirectional surrogate (Deceptron) trained with a Jacobian Composition Penalty (JCP). It introduces the D-IPG iterative solver that uses the learned reverse operator for residual-corrected proposals, proves first-order equivalence to damped Gauss-Newton under a local pseudoinverse consistency condition (with deviation controlled by composition error and conditioning), and reports strong empirical results on seven PDE inverse-problem benchmarks, including 94.8% mean success rate and up to 77x lower inference-time cost.

Significance. If the local consistency transfers to D-IPG trajectories and the equivalence holds in practice, the work provides a practical learned alternative to curvature-aware solvers that retains first-order efficiency while achieving stronger directions, supported by reproducible benchmarks across multiple inverse problems. The combination of a parameter-free-style derivation under the consistency assumption and extensive empirical validation strengthens the contribution.

major comments (2)

[Proof of first-order equivalence] Proof of first-order equivalence (likely Theorem 3 or §4): the claimed equivalence to damped Gauss-Newton holds only under the assumption that the learned reverse Jacobian maintains local left-inverse consistency (small RJCP) along the specific sequences of latent points visited by D-IPG iterations. The manuscript provides no reported measurements or bounds on runtime RJCP for the optimization trajectories in the seven benchmarks, leaving the deviation term (controlled by composition error and conditioning) unverified and potentially unbounded if trajectories exit the training support.
[Experimental validation] Experimental validation (§5 or Table 1-3): while JCP is used during training on a fixed distribution, the paper does not include ablation or diagnostic plots showing that the runtime RJCP metric remains small on the actual D-IPG paths for the reported PDE problems. This directly affects whether the strong benchmark results (94.8% success, 77x speedup) can be attributed to the proven equivalence rather than to the learned model behaving as a generic preconditioner.

minor comments (2)

[Notation] Notation for Deceptron and D-IPG could be clarified with a single summary table of symbols and their roles to aid readability.
[Abstract] The abstract mentions 'parameter-free' aspects of the equivalence but the deviation bound depends on conditioning; a brief remark on this dependence would prevent misinterpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and for highlighting the importance of verifying the local consistency assumption in practice. We address the major comments below and will revise the manuscript accordingly to strengthen the connection between theory and experiments.

read point-by-point responses

Referee: [Proof of first-order equivalence] Proof of first-order equivalence (likely Theorem 3 or §4): the claimed equivalence to damped Gauss-Newton holds only under the assumption that the learned reverse Jacobian maintains local left-inverse consistency (small RJCP) along the specific sequences of latent points visited by D-IPG iterations. The manuscript provides no reported measurements or bounds on runtime RJCP for the optimization trajectories in the seven benchmarks, leaving the deviation term (controlled by composition error and conditioning) unverified and potentially unbounded if trajectories exit the training support.

Authors: We agree that Theorem 3 establishes first-order equivalence only under the local pseudoinverse consistency condition (small RJCP). Although JCP training encourages this property on the data distribution, explicit runtime verification along D-IPG trajectories is a valuable addition. In the revised manuscript we will report mean and maximum RJCP values observed during optimization on all seven benchmarks, together with a short analysis confirming that trajectories remain within the support where the deviation term stays controlled by the reported conditioning bounds. revision: yes
Referee: [Experimental validation] Experimental validation (§5 or Table 1-3): while JCP is used during training on a fixed distribution, the paper does not include ablation or diagnostic plots showing that the runtime RJCP metric remains small on the actual D-IPG paths for the reported PDE problems. This directly affects whether the strong benchmark results (94.8% success, 77x speedup) can be attributed to the proven equivalence rather than to the learned model behaving as a generic preconditioner.

Authors: We concur that diagnostic evidence is needed to link the empirical gains directly to the equivalence result rather than generic preconditioning. The revised version will include new ablation and diagnostic plots that track RJCP along the full D-IPG trajectories for the PDE benchmarks. These plots will show that RJCP remains small (consistent with training values) throughout the iterations, thereby supporting attribution of the 94.8% success rate and up to 77x speedup to the first-order equivalence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; equivalence proof is conditional on an independently enforceable property

full rationale

The paper's central derivation is a mathematical proof that D-IPG matches the first-order behavior of damped Gauss-Newton whenever local pseudoinverse consistency holds, with explicit deviation bounds in terms of composition error and conditioning. This statement is self-contained and does not reduce to the training procedure or fitted parameters by construction; it is a standard first-order analysis that applies to any reverse operator satisfying the consistency condition. The Jacobian Composition Penalty (JCP) is merely one mechanism for attempting to enforce that condition during training, but the proof itself makes no reference to how consistency is obtained and remains valid (or invalid) independently of the training data, loss terms, or learned weights. No self-citation chain, ansatz smuggling, or renaming of known results is present in the provided derivation steps. The practical question of whether consistency transfers to test trajectories is a separate empirical concern, not a circularity in the claimed derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 3 invented entities

The central claim rests on the existence of a learnable local left-inverse relationship that can be enforced by a composition penalty and that transfers to optimization trajectories. The Deceptron parameters are fitted to data; the consistency property is an assumption rather than a derived guarantee.

free parameters (1)

Deceptron network weights
Neural network parameters fitted during training to approximate the reverse operator and satisfy the JCP.

axioms (1)

domain assumption Local pseudoinverse consistency between forward and reverse Jacobians can be achieved and maintained by the Jacobian Composition Penalty
Invoked to prove first-order equivalence to damped Gauss-Newton.

invented entities (3)

Deceptron no independent evidence
purpose: Bidirectional neural surrogate that learns both forward and reverse mappings
New architecture introduced to amortize inverse geometry.
D-IPG solver no independent evidence
purpose: Iterative optimization procedure that uses the learned reverse operator for preconditioning
New solver framework built on the surrogate.
Jacobian Composition Penalty (JCP) no independent evidence
purpose: Training loss that enforces local left-inverse consistency
New penalty term introduced for training.

pith-pipeline@v0.9.0 · 5498 in / 1414 out tokens · 43260 ms · 2026-05-14T19:57:27.540664+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J uniqueness) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We prove that D-IPG is first-order equivalent to damped Gauss-Newton under local pseudoinverse consistency, with deviation controlled by composition error and conditioning. ... Jacobian Composition Penalty (JCP) ... LJCP = E_{x,ξ} ||Jg(fW(x)) Jf(x) ξ − ξ||₂²
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection (coupling combiner forces bilinear J branch) echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Theorem 4.2 ... ∥ΔxDIPG − ΔxGN,α∥₂ ≤ αt ∥Jg(fW(xt)) Jf(xt) − I∥₂ / σmin(Jf(xt)) ∥rt∥₂

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

A method for the solution of certain non-linear problems in least squares

Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164–168, 1944

work page 1944
[2]

Marquardt

Donald W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, 1963

work page 1963
[3]

Wright.Numerical Optimization

Jorge Nocedal and Stephen J. Wright.Numerical Optimization. Springer, New York, NY , 2 edition, 2006

work page 2006
[4]

Learning fast approximations of sparse coding

Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. InProceedings of the 27th International Conference on Machine Learning, pages 399–406, 2010

work page 2010
[5]

Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas. Learning to learn by gradient descent by gradient descent. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[6]

Vishal Monga, Yuelong Li, and Yonina C. Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44, 2021

work page 2021
[7]

Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

Jonas Adler and Ozan Öktem. Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

work page 2018
[8]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021

work page 2021
[9]

Fourier neural operator for parametric partial differen- tial equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differen- tial equations. InInternational Conference on Learning Representations, 2021

work page 2021
[10]

Minimization of functions having Lipschitz continuous first partial derivatives

Larry Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1):1–3, 1966

work page 1966
[11]

Hutchinson

Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Communications in Statistics - Simulation and Computation, 19(2):433–450, 1990

work page 1990
[12]

A generalized inverse for matrices.Mathematical Proceedings of the Cambridge Philosophical Society, 51(3):406–413, 1955

Roger Penrose. A generalized inverse for matrices.Mathematical Proceedings of the Cambridge Philosophical Society, 51(3):406–413, 1955

work page 1955
[13]

Dolan and Jorge J

Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles.Mathematical Programming, 91(2):201–213, 2002

work page 2002
[14]

Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988. 10 Code availability A code repository is available at https://github.com/AadityaKachhadiya/deceptron. The repository contains an installable PyTorch implementation of Deceptron/D-IPG, including the learned forward–reverse modul...

work page 1988

[1] [1]

A method for the solution of certain non-linear problems in least squares

Kenneth Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164–168, 1944

work page 1944

[2] [2]

Marquardt

Donald W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, 1963

work page 1963

[3] [3]

Wright.Numerical Optimization

Jorge Nocedal and Stephen J. Wright.Numerical Optimization. Springer, New York, NY , 2 edition, 2006

work page 2006

[4] [4]

Learning fast approximations of sparse coding

Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. InProceedings of the 27th International Conference on Machine Learning, pages 399–406, 2010

work page 2010

[5] [5]

Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas

Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas. Learning to learn by gradient descent by gradient descent. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[6] [6]

Vishal Monga, Yuelong Li, and Yonina C. Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing.IEEE Signal Processing Magazine, 38(2):18–44, 2021

work page 2021

[7] [7]

Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

Jonas Adler and Ozan Öktem. Learned primal-dual reconstruction.IEEE Transactions on Medical Imaging, 37(6):1322–1332, 2018

work page 2018

[8] [8]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, 2021

work page 2021

[9] [9]

Fourier neural operator for parametric partial differen- tial equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differen- tial equations. InInternational Conference on Learning Representations, 2021

work page 2021

[10] [10]

Minimization of functions having Lipschitz continuous first partial derivatives

Larry Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1):1–3, 1966

work page 1966

[11] [11]

Hutchinson

Michael F. Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines.Communications in Statistics - Simulation and Computation, 19(2):433–450, 1990

work page 1990

[12] [12]

A generalized inverse for matrices.Mathematical Proceedings of the Cambridge Philosophical Society, 51(3):406–413, 1955

Roger Penrose. A generalized inverse for matrices.Mathematical Proceedings of the Cambridge Philosophical Society, 51(3):406–413, 1955

work page 1955

[13] [13]

Dolan and Jorge J

Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles.Mathematical Programming, 91(2):201–213, 2002

work page 2002

[14] [14]

Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale, NJ, 2 edition, 1988. 10 Code availability A code repository is available at https://github.com/AadityaKachhadiya/deceptron. The repository contains an installable PyTorch implementation of Deceptron/D-IPG, including the learned forward–reverse modul...

work page 1988