arxiv: 2604.14484 · v2 · submitted 2026-04-15 · 💻 cs.RO · cs.AI· math.OC

Recognition: 2 theorem links

· Lean Theorem

A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning

Junghoon Seo

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:17 UTC · model grok-4.3

classification 💻 cs.RO cs.AImath.OC

keywords behavior cloningPD controlerror dynamicsnonasymptotic boundsgain tuningclosed-loop performanceimitation learningrobotics

0 comments

The pith

Behavior cloning failure probability factorizes into a gain-dependent amplification index and the training loss, so controller gains affect closed-loop performance independently of validation error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that action errors in behavior cloning propagate through the closed-loop dynamics of an underlying PD controller to produce position errors whose tails are controlled by a gain-dependent proxy matrix. This leads to a factorization where the probability of task failure over a finite horizon T equals the product of an amplification index that depends on the gains and a term involving only the validation loss plus slack. A reader would care because the result means that minimizing imitation loss during training does not by itself guarantee reliable execution on the robot; the choice of controller gains can change whether the policy succeeds or fails. The analysis ranks four canonical gain regimes by how tightly they bound failure and supplies explicit monotonicity results for the scalar second-order case.

Core claim

Independent sub-Gaussian action errors propagate through gain-dependent closed-loop dynamics to yield sub-Gaussian position errors whose proxy matrix X_∞(K) governs the failure tail. The probability of horizon-T task failure therefore factorizes into a gain-dependent amplification index Γ_T(K) and the validation loss plus generalization slack. Under shape-preserving upper-bound structural assumptions the proxy admits the scalar bound X_∞(K) ≼ Ψ(K) X-bar, with Ψ(K) decomposed into label difficulty, injection strength, and contraction; this ranks the four canonical regimes with compliant-overdamped tightest and stiff-underdamped loosest. For the canonical scalar second-order PD system the zero

What carries the argument

The proxy matrix X_∞(K) that bounds the tail of position errors arising from sub-Gaussian action errors propagating through the gain-dependent closed-loop dynamics of the PD controller.

If this is right

Training loss alone cannot predict closed-loop task performance.
Compliant-overdamped gains produce the smallest amplification of error and therefore the tightest failure bounds.
Stiff-underdamped gains produce the largest amplification and therefore the loosest failure bounds.
The stationary variance of the scalar second-order system is strictly monotone in both stiffness and damping over the stable orthant.
The exact zero-order-hold discretization inherits the same monotonicity in the gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robot designers could select gains by minimizing the amplification index rather than optimizing policy loss in isolation.
Closed-loop rollout validation under the target gains becomes necessary to certify performance.
The regime ranking supplies a concrete ordering that can be checked on physical hardware for any given task.
The factorization suggests that future imitation algorithms might jointly optimize policy parameters and controller gains.

Load-bearing premise

The shape-preserving upper-bound structural assumptions that permit bounding the proxy matrix by a scalar multiple of a fixed matrix.

What would settle it

Train policies to the same validation loss, deploy them under different PD gains on the same task, and check whether observed horizon-T failure rates scale exactly with the predicted amplification index Γ_T(K); mismatch would falsify the factorization.

Figures

Figures reproduced from arXiv: 2604.14484 by Junghoon Seo.

**Figure 2.** Figure 2: Monte Carlo position-error envelopes (N = 50,000 rollouts) for the four gain regimes under exact ZOH discretization. Shaded bands show the 95th and 99th percentiles. The dashed line is the steady-state 95th-percentile bound from Theorem 1. All panels share the same vertical scale. CO yields the tightest envelopes, confirming the predicted ordering. 10 2 Stiffness Kp 10 1 10 2 Damping Kd 0.75 0.50 0.25 0.00… view at source ↗

**Figure 4.** Figure 4: Empirical failure rate Pˆ(FailT ) (solid) versus the Theorem 3 upper bound (dashed) as a function of the success-tube radius r for the four canonical regimes. The bound dominates the Monte Carlo curve in every regime, and the regime ordering CO ≺ SO ≈ CU ≺ SU is preserved at all r. ZOH proxy Xd ∞. Both quantities respect the predicted ordering CO ≺ SO ≈ CU ≺ SU, with the near-equality of SO and CU exempli… view at source ↗

read the original abstract

Behavior cloning (BC) policies on position-controlled robots inherit the closed-loop response of the underlying PD controller, yet the nonasymptotic finite-horizon consequences of controller gains for BC failure remain open. We show that independent sub-Gaussian action errors propagate through the gain-dependent closed-loop dynamics to yield sub-Gaussian position errors whose proxy matrix $X_\infty(K)$ governs the failure tail. The probability of horizon-$T$ task failure factorizes into a gain-dependent amplification index $\Gamma_T(K)$ and the validation loss plus a generalization slack, so training loss alone cannot predict closed-loop performance. Under shape-preserving upper-bound structural assumptions, the proxy admits the scalar bound $X_\infty(K)\preceq\Psi(K)\bar X$, with $\Psi(K)$ decomposed into label difficulty, injection strength, and contraction. This ranks the four canonical regimes with compliant-overdamped (CO) tightest, stiff-underdamped (SU) loosest, and the stiff-overdamped versus compliant-underdamped ordering system-dependent. For the canonical scalar second-order PD system, the closed-form continuous-time stationary variance $X_\infty^{\mathrm{c}}(\alpha,\beta)=\sigma^2\alpha/(2\beta)$ is strictly monotone in stiffness and damping over the entire stable orthant, covering both underdamped and overdamped regimes, and the exact zero-order-hold (ZOH) discretization inherits this monotonicity. The analysis gives a nonasymptotic finite-horizon extension of the gain-dependent error-attenuation explanation of Bronars et al.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper factors BC failure probability through gain-dependent closed-loop dynamics with a clean monotonicity result for scalar PD systems, but the regime rankings rest on unverified structural assumptions.

read the letter

The paper shows that sub-Gaussian action errors in behavior cloning propagate through the closed-loop PD dynamics to produce position errors whose tail is governed by a proxy matrix, so the probability of finite-horizon task failure splits into a gain-dependent amplification index and the validation loss plus slack. That factorization is the main new piece, and it directly explains why training loss alone does not predict closed-loop success on position-controlled robots. For the scalar second-order PD case it also supplies closed-form stationary variance that is strictly monotone in stiffness and damping over the whole stable region, and the same monotonicity carries over to the zero-order-hold discretization. Both results extend the earlier gain-dependent attenuation argument in Bronars et al. with explicit nonasymptotic bounds, which is useful and cleanly executed once the linear dynamics are written down. The central propagation step follows from standard sub-Gaussian concentration, so that part holds up without extra assumptions. The softer spot is the ranking of the four canonical regimes and the decomposition of the amplification factor into label difficulty, injection strength, and contraction. Those claims require shape-preserving upper-bound structural assumptions to turn the proxy matrix into a scalar multiple of a fixed matrix. The abstract states those assumptions rather than deriving them from the error model or checking when they remain valid, so if the error covariance shape changes with gain the ordering (compliant-overdamped tightest, stiff-underdamped loosest) may not survive. The scalar case avoids this issue because it has an exact expression, but the general claim is conditional on the assumptions. This work is aimed at imitation-learning researchers who tune gains on real robots and want a theoretical handle on the training-to-deployment gap. A reader who already works with linear closed-loop analysis will find the factorization and the scalar monotonicity worth checking. It deserves a serious referee because the core math is grounded and the finite-horizon extension is practical, even if the structural assumptions need more justification or counter-examples in review.

Referee Report

1 major / 1 minor

Summary. The paper develops a nonasymptotic finite-horizon theory of how PD controller gains affect error propagation in behavior cloning. Independent sub-Gaussian action errors are shown to propagate through the closed-loop dynamics, yielding sub-Gaussian position errors whose tail is governed by a gain-dependent proxy matrix X_∞(K). This produces a factorization of the probability of horizon-T task failure into an amplification index Γ_T(K) and the validation loss plus generalization slack. Under additional shape-preserving upper-bound structural assumptions the proxy admits the scalar bound X_∞(K) ≼ Ψ(K) X-bar, with Ψ(K) decomposed into label difficulty, injection strength and contraction; the resulting regime ranking places compliant-overdamped (CO) as tightest and stiff-underdamped (SU) as loosest. Explicit closed-form monotonicity is derived for the scalar second-order PD system in both continuous time and zero-order-hold discretization.

Significance. If the factorization and scalar results hold, the work supplies a concrete theoretical link between controller gains and closed-loop failure that is absent from standard imitation-learning analyses. The nonasymptotic treatment and the explicit monotonicity result for the canonical scalar PD system are clear strengths; they furnish falsifiable predictions and a direct extension of the gain-dependent attenuation argument of Bronars et al. The factorization itself already implies that training loss alone is insufficient to predict closed-loop performance, independent of the scalar bound.

major comments (1)

[Shape-preserving upper-bound structural assumptions (prior to scalar bound)] The shape-preserving upper-bound structural assumptions (invoked immediately before the scalar bound X_∞(K) ≼ Ψ(K) X-bar) are load-bearing for the decomposition of Ψ(K) and the ranking of the four regimes. The manuscript states these assumptions as structural without deriving explicit conditions on the closed-loop map or error covariance under which they hold, nor does it verify tightness or necessity outside the scalar second-order case. If the assumptions fail when gain changes alter the shape of the error distribution beyond a fixed-matrix multiple, the regime ordering and the associated practical guidance do not follow.

minor comments (1)

[Abstract] The abstract refers to 'the gain-dependent error-attenuation explanation of Bronars et al.' without a citation key; the reference list should be checked for completeness and consistency with the in-text mention.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of the nonasymptotic factorization and the explicit monotonicity results for the scalar PD system. We address the single major comment below.

read point-by-point responses

Referee: The shape-preserving upper-bound structural assumptions (invoked immediately before the scalar bound X_∞(K) ≼ Ψ(K) X-bar) are load-bearing for the decomposition of Ψ(K) and the ranking of the four regimes. The manuscript states these assumptions as structural without deriving explicit conditions on the closed-loop map or error covariance under which they hold, nor does it verify tightness or necessity outside the scalar second-order case. If the assumptions fail when gain changes alter the shape of the error distribution beyond a fixed-matrix multiple, the regime ordering and the associated practical guidance do not follow.

Authors: We agree that the shape-preserving upper-bound assumptions are invoked without a general derivation of the conditions on the closed-loop map or covariance under which the error distribution remains a fixed-matrix multiple. These assumptions are introduced as sufficient structural hypotheses that permit the proxy matrix to be bounded by a scalar multiple, enabling the decomposition of Ψ(K) into label difficulty, injection strength, and contraction. The core factorization of horizon-T failure probability into Γ_T(K) and the validation loss plus generalization slack holds independently of these assumptions. The scalar second-order PD system provides an exact verification where the assumptions hold with equality in the stationary variance, and the closed-form expressions confirm monotonicity over the stable orthant. In the revision we will add a clarifying paragraph stating that the assumptions are sufficient (not necessary) for the scalar regime ranking, note that they may fail for general systems when gain changes alter covariance shape, and emphasize that the nonasymptotic propagation and factorization results remain valid without them. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation self-contained from external dynamics and explicit assumptions

full rationale

The factorization of horizon-T failure probability into Γ_T(K) and (validation loss + slack) follows directly from sub-Gaussian action-error propagation through the closed-loop linear dynamics, with X_∞(K) defined as the resulting proxy matrix from those dynamics. The scalar bound X_∞(K) ≼ Ψ(K) X-bar and regime ranking are obtained under explicitly stated shape-preserving upper-bound structural assumptions that are not fitted to data, not self-defined, and not justified solely by self-citation. The monotonicity result for the scalar second-order PD system is derived in closed form from the stationary variance expression. No step reduces by construction to its inputs, and the central claim that training loss alone cannot predict closed-loop performance is a direct consequence of the gain-dependent amplification factor rather than a renaming or tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on sub-Gaussian action noise, linear closed-loop dynamics induced by a PD controller, and the shape-preserving upper-bound structural assumptions that enable the scalar proxy bound. No numerical free parameters are fitted; the gains K appear as design variables. The proxy matrix and amplification index are derived quantities rather than new postulated entities.

axioms (2)

domain assumption Action errors are independent and sub-Gaussian.
Invoked to propagate errors through the closed-loop map and obtain sub-Gaussian position errors.
ad hoc to paper Shape-preserving upper-bound structural assumptions on the error dynamics.
Required to obtain the scalar bound X_∞(K) ≼ Ψ(K) X-bar and the subsequent regime ranking.

invented entities (2)

Proxy matrix X_∞(K) no independent evidence
purpose: Governs the tail of position errors after propagation through gain-dependent dynamics.
Defined from the closed-loop linear system; no independent evidence supplied beyond the derivation.
Amplification index Γ_T(K) no independent evidence
purpose: Factor that multiplies validation loss to give task-failure probability.
Derived from the finite-horizon propagation; no external falsifiable handle given.

pith-pipeline@v0.9.0 · 5576 in / 1853 out tokens · 37391 ms · 2026-05-12T03:17:49.547700+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

X_c_∞(α, β) = σ²α/(2β) is strictly monotone in stiffness and damping over the entire stable orthant
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

shape-preserving upper-bound structural assumptions … Ψ(K) = b(K)l(K)/(1−ρ∗(K)²)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Tune to Learn: How Controller Gains Shape Robot Policy Learning

A. Bronars, Y . Park, and P. Agrawal, “Tune to learn: How controller gains shape robot policy learning,” arXiv preprint arXiv:2604.02523, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Diffusion policy: Vi- suomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burch- fiel, R. Tedrake, and S. Song, “Diffusion policy: Vi- suomotor policy learning via action diffusion,”The International Journal of Robotics Research, vol. 44, no. 10-11, pp. 1684–1704, 2025

work page 2025
[3]

Universal manipu- lation interface: In-the-wild robot teaching without in- the-wild robots,

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipu- lation interface: In-the-wild robot teaching without in- the-wild robots,”Robotics: Science and Systems, 2024

work page 2024
[4]

Learn- ing fine-grained bimanual manipulation with low-cost hardware,

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learn- ing fine-grained bimanual manipulation with low-cost hardware,”Robotics: Science and Systems, 2023

work page 2023
[5]

Pd control with desired gravity compensation of robotic manipulators: a review,

R. Kelly, “Pd control with desired gravity compensation of robotic manipulators: a review,”The International Journal of Robotics Research, vol. 16, no. 5, pp. 660– 672, 1997

work page 1997
[6]

The pitfalls of imitation learning when actions are contin- uous,

M. Simchowitz, D. Pfrommer, and A. Jadbabaie, “The pitfalls of imitation learning when actions are contin- uous,” inProceedings of Thirty Eighth Conference on Learning Theory, 2025

work page 2025
[7]

Butterfly effects of SGD noise: Error amplification in behavior cloning and autoregres- sion,

A. Block, D. J. Foster, A. Krishnamurthy, M. Sim- chowitz, and C. Zhang, “Butterfly effects of SGD noise: Error amplification in behavior cloning and autoregres- sion,” inThe International Conference on Learning Representations, 2024

work page 2024
[8]

Efficient reductions for imitation learning,

S. Ross and J. A. Bagnell, “Efficient reductions for imitation learning,”Proceedings of the Thirteenth In- ternational Conference on Artificial Intelligence and Statistics, pp. 661–668, 2010

work page 2010
[9]

A reduction of imitation learning and structured prediction to no- regret online learning,

S. Ross, G. J. Gordon, and J. A. Bagnell, “A reduction of imitation learning and structured prediction to no- regret online learning,” inProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 627–635

work page 2011
[10]

Feedback in imitation learning: The three regimes of covariate shift,

J. Spencer, S. Choudhury, A. Venkatraman, B. Ziebart, and J. A. Bagnell, “Feedback in imitation learning: The three regimes of covariate shift,”arXiv preprint arXiv:2102.02872, 2021

work page arXiv 2021
[11]

A tour of reinforcement learning: The view from continuous control,

B. Recht, “A tour of reinforcement learning: The view from continuous control,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253– 279, 2019

work page 2019
[12]

Regret bounds for adaptive nonlinear control,

N. M. Boffi, S. Tu, and J.-J. E. Slotine, “Regret bounds for adaptive nonlinear control,” inLearning for Dynam- ics and Control. PMLR, 2021, pp. 471–483

work page 2021
[13]

The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint,

S. Tu and B. Recht, “The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint,” inConference on Learning Theory, 2019, pp. 3036–3083

work page 2019
[14]

ALVINN: An autonomous land vehicle in a neural network,

D. A. Pomerleau, “ALVINN: An autonomous land vehicle in a neural network,”Advances in Neural In- formation Processing Systems, vol. 1, 1989

work page 1989
[15]

A framework for behavioural cloning,

M. Bain and C. Sammut, “A framework for behavioural cloning,” inMachine Intelligence, 1999

work page 1999
[16]

Implicit behavioral cloning,

P. Florence, C. Lynch, A. Zeng, O. A. Ramirez, A. Wahid, L. Downs, A. Wong, J. Lee, I. Mordatch, and J. Tompson, “Implicit behavioral cloning,” inCon- ference on Robot Learning, 2022, pp. 158–168

work page 2022
[17]

Is behav- ior cloning all you need? understanding horizon in imitation learning,

D. J. Foster, A. Block, and D. Misra, “Is behav- ior cloning all you need? understanding horizon in imitation learning,”Advances in Neural Information Processing Systems, vol. 37, 2024

work page 2024
[18]

Provable guarantees for generative behavior cloning: Bridging low-level stability and high- level behavior,

A. Block, A. Jadbabaie, D. Pfrommer, M. Simchowitz, and R. Tedrake, “Provable guarantees for generative behavior cloning: Bridging low-level stability and high- level behavior,”Advances in Neural Information Pro- cessing Systems, vol. 36, pp. 48 534–48 547, 2023

work page 2023
[19]

Impedance control: An approach to manip- ulation,

N. Hogan, “Impedance control: An approach to manip- ulation,” in1984 American control conference. IEEE, 1984, pp. 304–313

work page 1984
[20]

Impedance control: An approach to manipu- lation, part ii implementation,

——, “Impedance control: An approach to manipu- lation, part ii implementation,”Journal of dynamic systems, measurement, and control, vol. 107, no. 1, pp. 8–16, 1985

work page 1985
[21]

A unified approach for motion and force control of robot manipulators: The operational space formulation,

O. Khatib, “A unified approach for motion and force control of robot manipulators: The operational space formulation,”IEEE Journal on Robotics and Automa- tion, vol. 3, no. 1, pp. 43–53, 2003

work page 2003
[22]

Learning variable impedance control for contact sensitive tasks,

M. Bogdanovic, M. Khadiv, and L. Righetti, “Learning variable impedance control for contact sensitive tasks,” IEEE Robotics and Automation Letters, 2020

work page 2020
[23]

A frame- work for autonomous impedance regulation of robots based on imitation learning and optimal control,

Y . Wu, F. Zhao, T. Tao, and A. Ajoudani, “A frame- work for autonomous impedance regulation of robots based on imitation learning and optimal control,”IEEE Robotics and Automation Letters, vol. 6, no. 1, pp. 127– 134, 2021

work page 2021
[24]

Learning compliant manipulation through kinesthetic and tactile human- robot interaction,

K. Kronander and A. Billard, “Learning compliant manipulation through kinesthetic and tactile human- robot interaction,”IEEE transactions on haptics, vol. 7, no. 3, pp. 367–380, 2013

work page 2013
[25]

On the role of the action space in robot manipulation learning and sim-to-real transfer,

E. Aljalbout, F. Frank, M. Karl, and P. van der Smagt, “On the role of the action space in robot manipulation learning and sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5895–5902, 2024

work page 2024
[26]

Action space design in reinforcement learning for robot motor skills,

J. Eßer, G. B. Margolis, O. Urbann, S. Kerner, and P. Agrawal, “Action space design in reinforcement learning for robot motor skills,” inConference on Robot Learning, 2024

work page 2024
[27]

What matters in learning from of- fline human demonstrations for robot manipulation,

A. Mandlekar, D. Xu, J. Wong, S. Nasiriany, C. Wang, R. Kulkarni, L. Fei-Fei, S. Savarese, Y . Zhu, and R. Mart´ın-Mart´ın, “What matters in learning from of- fline human demonstrations for robot manipulation,” in Conference on Robot Learning, 2021

work page 2021
[28]

Torque- based deep reinforcement learning for task-and-robot agnostic learning on bipedal robots using sim-to-real transfer,

D. Kim, G. Berseth, M. Schwartz, and J. Park, “Torque- based deep reinforcement learning for task-and-robot agnostic learning on bipedal robots using sim-to-real transfer,”IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6251–6258, 2023

work page 2023