Federated Nonlinear System Identification

Lav R. Varshney; Max Hartman; Omkar Tupe; Saurav Prakash

arxiv: 2508.15025 · v5 · submitted 2025-08-20 · 💻 cs.LG · cs.SY· eess.SY

Federated Nonlinear System Identification

Omkar Tupe , Max Hartman , Lav R. Varshney , Saurav Prakash This is my paper

Pith reviewed 2026-05-18 21:36 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords federated learningnonlinear system identificationconvergence rateslinearly-parameterized systemsfeature mapsdynamical systemsdistributed estimation

0 comments

The pith

Federated nonlinear system identification converges faster with more clients than centralized methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes theoretical guarantees that federated learning for linearly-parameterized nonlinear systems yields better convergence rates than a centralized approach, with the rate improving as the number of clients grows. A reader would care because this implies distributed data collection across devices can produce more accurate system models quicker without pooling raw trajectories in one place. The nonlinear case differs from linear only by a constant factor that depends on the chosen feature map, and this map can be selected to increase excitation for better results. Experiments on pendulum and quadrotor dynamics with i.i.d. inputs and random perturbations confirm that adding clients consistently accelerates convergence for each participant under non-active exploration.

Core claim

We establish theoretical guarantees on the effectiveness of federated nonlinear system identification compared to centralized approaches, demonstrating that the convergence rate improves as the number of clients increases. Although the convergence rates in the linear and nonlinear cases differ only by a constant, this constant depends on the feature map φ, which can be carefully chosen in the nonlinear setting to increase excitation and improve performance. We experimentally validate our theory in physical settings where client devices are driven by i.i.d. control inputs and control policies exhibiting i.i.d. random perturbations, ensuring non-active exploration, using trajectories from real

What carries the argument

The scaling of convergence bounds with client count via federated averaging of local parameter estimates on linearly-parameterized nonlinear dynamics.

If this is right

The error bound in federated nonlinear identification tightens as more clients contribute local estimates.
A well-chosen feature map φ reduces the constant factor separating nonlinear from linear convergence rates.
Each individual client achieves faster convergence to the true parameters when federation includes additional devices.
The guarantees hold for physical systems like pendulums and quadrotors represented by polynomial and trigonometric features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Collaborative modeling of dynamical systems could proceed across many edge devices while keeping trajectory data local.
The same client-count scaling may appear in other distributed parameter estimation problems in control applications.
New analysis would be needed if inputs become correlated across clients or if active exploration is introduced.

Load-bearing premise

Client devices receive i.i.d. control inputs and random perturbations that ensure non-active exploration, with trajectories generated from nonlinear systems having real-analytic feature functions.

What would settle it

An experiment on pendulum dynamics identification where the measured parameter error decay rate remains unchanged or worsens when the number of simulated clients is increased from one to ten under the stated i.i.d. input conditions.

read the original abstract

We consider federated learning of linearly-parameterized nonlinear systems. We establish theoretical guarantees on the effectiveness of federated nonlinear system identification compared to centralized approaches, demonstrating that the convergence rate improves as the number of clients increases. Although the convergence rates in the linear and nonlinear cases differ only by a constant, this constant depends on the feature map $\phi$, which can be carefully chosen in the nonlinear setting to increase excitation and improve performance. We experimentally validate our theory in physical settings where client devices are driven by i.i.d. control inputs and control policies exhibiting i.i.d. random perturbations, ensuring non-active exploration. Experiments use trajectories from nonlinear dynamical systems characterized by real-analytic feature functions, including polynomial and trigonometric components, representative of physical systems including pendulum and quadrotor dynamics. We analyze the convergence behavior of the proposed method under varying noise levels and data distributions. Results show that federated learning consistently improves convergence of any individual client as the number of participating clients increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies federated averaging to linearly parameterized nonlinear system ID and derives that convergence improves with client count, with the constant tunable by feature map choice.

read the letter

The main takeaway is that federated averaging on linearly parameterized nonlinear dynamics yields convergence rates that get better as the number of clients grows, and the size of that improvement depends on how well the chosen feature map excites the system. They start from the usual model where the next state is linear in features of current state and input, then average local parameter estimates across clients. The analysis shows the error scales like 1 over square root of total samples, but the prefactor improves with N because each client adds independent trajectories, and in the nonlinear case the feature map multiplies that constant in a way that can be bounded when the map is real-analytic. This specific combination of federated analysis plus tunable excitation for system ID is not in the earlier references they cite. The experiments on pendulum and quadrotor trajectories with i.i.d. inputs and perturbations give a concrete check that the trend holds under noise and varying data splits. The assumptions are strong but stated clearly: i.i.d. control inputs and perturbations across clients plus real-analytic features. If those hold, the math follows from standard least-squares bounds without circularity or missing terms. In practice those i.i.d. conditions may be optimistic, and picking a good phi still requires some system knowledge, but the paper does not overclaim the scope. This is useful for people working at the intersection of federated methods and control or robotics. A reader who needs distributed identification with privacy constraints or wants to see how feature choice affects rates will find the derivations and the physical-system tests worth reading. The work is grounded enough in standard techniques plus relevant experiments to merit a serious referee rather than a desk reject.

Referee Report

1 major / 2 minor

Summary. The manuscript presents a federated learning approach for linearly-parameterized nonlinear system identification. It establishes theoretical guarantees that the convergence rate of the federated estimator improves with the number of clients N relative to centralized methods, with the linear and nonlinear rates differing only by a feature-map-dependent constant that can be tuned via choice of φ. Experiments validate the approach on physical systems (pendulum, quadrotor) under i.i.d. control inputs, i.i.d. perturbations, and real-analytic features, showing improved per-client convergence as more clients participate.

Significance. If the stated rates hold, the work is significant for distributed control and robotics applications where data cannot be centralized. It extends federated analysis to nonlinear dynamics while crediting the explicit treatment of non-active exploration and the demonstration that federation yields a 1/sqrt(total samples) improvement when each client contributes fixed trajectories. The experimental results under varying noise and data distributions add practical value.

major comments (1)

[Theoretical guarantees] Theoretical guarantees section: the claim that the convergence rate improves with N relies on standard least-squares aggregation; the manuscript should explicitly derive the total-sample scaling (e.g., O(1/sqrt(N M)) where M is trajectories per client) and bound the φ-dependent constant to confirm it does not cancel the improvement under the i.i.d. input and real-analytic assumptions.

minor comments (2)

[Experiments] Experiments section: specify the precise number of trajectories per client, the exact noise variances tested, and the data-distribution shifts used so that the reported convergence improvements can be reproduced.
[Notation] Notation: ensure the feature map φ is defined consistently between the theoretical convergence statements and the polynomial/trigonometric examples in the physical-system experiments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for minor revision. We address the single major comment below and will incorporate the requested clarifications into the revised manuscript.

read point-by-point responses

Referee: Theoretical guarantees section: the claim that the convergence rate improves with N relies on standard least-squares aggregation; the manuscript should explicitly derive the total-sample scaling (e.g., O(1/sqrt(N M)) where M is trajectories per client) and bound the φ-dependent constant to confirm it does not cancel the improvement under the i.i.d. input and real-analytic assumptions.

Authors: We agree that an explicit derivation of the total-sample scaling would strengthen the presentation. In the revised version we will add a dedicated paragraph in the theoretical guarantees section deriving the federated least-squares error bound. Under the stated i.i.d. control inputs, i.i.d. perturbations, and real-analytic feature assumptions, the aggregated estimator across N clients each contributing M trajectories yields an estimation error of order O(1/sqrt(N M)) multiplied by a φ-dependent constant. We will also supply an explicit upper bound on this constant that remains independent of N and show that it can be made arbitrarily close to the linear-case constant by suitable choice of φ (e.g., via higher-order polynomial or trigonometric bases that increase excitation). This establishes that the 1/sqrt(N) improvement is not canceled by the nonlinear feature-map factor. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard federated analysis applied to parameterized model

full rationale

The paper's central claim of improved convergence rates with increasing clients follows from standard least-squares or gradient-based analysis on aggregated parameter estimates under i.i.d. inputs and perturbations. The 1/sqrt(total samples) rate scales naturally with N when each client contributes fixed trajectories, and the nonlinear feature-map constant is bounded via analyticity without reducing to a fitted quantity defined by the target result. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the derivation chain. The result remains self-contained against external federated optimization benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Based on abstract only; the central claim rests on the assumption that the nonlinear dynamics admit a linear parameterization with real-analytic features and that inputs are i.i.d. across clients with no active exploration.

free parameters (1)

feature map φ
Chosen to increase excitation; specific form (polynomial or trigonometric) affects the constant in the convergence rate but is not fitted to the target result.

axioms (2)

domain assumption Nonlinear dynamical systems are linearly-parameterized with real-analytic feature functions.
Invoked to represent pendulum and quadrotor dynamics in the theoretical and experimental sections.
domain assumption Control inputs and policy perturbations are i.i.d. across clients.
Used to ensure non-active exploration and to derive the client-count improvement in convergence.

pith-pipeline@v0.9.0 · 5699 in / 1358 out tokens · 36268 ms · 2026-05-18T21:36:47.727800+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

least-squares error estimate ... ¯θLSE = 1/M ∑ arg min ||X(i)+ − θ Φ(i)||²F (Eq. 2); finite-sample error O(1/√(T ∑ Ni)) + C2 ε (Thm 1)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

feature vector ϕ(·) ... real-analytic functions (Assumption 1); BMSB condition with sϕ, pϕ (Lemma 1)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.