arxiv: 2602.12139 · v2 · submitted 2026-02-12 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Oscillators Are All You Need: Irregular Time Series Modelling via Damped Harmonic Oscillators with Closed-Form Solutions

Yashas Shende (1) , Aritra Das (1) , Reva Laxmi Chauhan (1) , Arghya Pathak (1) , Debayan Gupta (1) ((1) Ashoka University)

Authors on Pith no claims yet

Pith reviewed 2026-05-16 02:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords irregular time seriesdamped harmonic oscillatorsclosed-form solutionscontinuous-time transformersattention mechanismsneural ODEsresonance modeling

0 comments

The pith

Damped harmonic oscillators with closed-form solutions replace neural ODEs in continuous-time transformers for irregular time series while preserving universal approximation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that irregular time series can be handled by modeling the hidden-state dynamics of a continuous-time transformer as a linear system of damped harmonic oscillators rather than general neural ODEs. Keys and values evolve as damped driven oscillators, while the query is expanded in a fixed sinusoidal basis; attention then emerges as a resonance effect between these modes. Because the oscillator system admits an explicit closed-form solution, numerical integration is eliminated and computation becomes fast. The central proof shows that any attention matrix realizable by the original continuous-key formulation can still be approximated arbitrarily closely by a suitable choice of fixed oscillator modes, so expressivity is retained.

Core claim

By replacing the neural ODE component in continuous-time transformers with a linear damped harmonic oscillator that admits a closed-form solution, the model captures query-key interactions as resonance while maintaining the universal approximation property of continuous-time attention. Specifically, any discrete attention matrix realizable by ContiFormer's continuous keys can be approximated arbitrarily well by the fixed oscillator modes.

What carries the argument

The damped driven oscillator parameterization of keys and values together with the sinusoidal expansion of the query, which converts attention computation into a closed-form resonance calculation.

If this is right

Numerical ODE solvers are no longer required, removing the dominant computational bottleneck and yielding orders-of-magnitude speedups.
The model achieves state-of-the-art accuracy on standard irregular time series benchmarks.
The resonance interpretation supplies a direct physical analogy for how query-key coupling occurs in the attention mechanism.
Because the solution is closed-form, the learned frequency and damping parameters become directly interpretable as temporal scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same oscillator substitution could be tested in other architectures that currently rely on numerical integration for continuous dynamics.
The fixed sinusoidal basis for queries suggests a natural way to add frequency-domain regularization or interpretability constraints.
Extending the linear oscillator to weakly nonlinear variants might still preserve enough closed-form structure to remain tractable.

Load-bearing premise

Modeling keys and values as damped driven oscillators and expanding the query in a fixed sinusoidal basis up to a suitable number of modes is sufficient to capture the full range of dynamics needed for arbitrary irregular time series without loss of expressivity.

What would settle it

A concrete attention matrix or irregular time series example that can be realized by continuous keys in the original formulation but cannot be approximated to arbitrary precision by any choice of fixed oscillator modes.

Figures

Figures reproduced from arXiv: 2602.12139 by Arghya Pathak (1), Aritra Das (1), Debayan Gupta (1) ((1) Ashoka University), Reva Laxmi Chauhan (1), Yashas Shende (1).

**Figure 1.** Figure 1: Architecture Pipeline Each input generates an oscillator for its key and another for its value. Those oscillators evolve in continuous time with closed-form solutions. The projections for each key and value per head h ∈ [H] with dh = d/H are given by Qi = WQXi+bQ, Ki = WKXi+bK, and Vi = WV Xi+bV where Qi , Ki , Vi ∈ R dh . Following this, the learnable parameters are: projection matrices and biases WQ, WK,… view at source ↗

**Figure 2.** Figure 2: Trajectories and Training Time Visualisations [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 5.** Figure 5: Phase–frequency attention α(ω, φ) for a representative key. The bright ridge in the (ω, φ) plane indicates the resonance region. (a) Sequence from the 1-D regression task: true underlying trajectory (line), irregular noisy observations (dots), final observation time, and the true versus predicted future target at Tfuture = 7. (b) Learned natural frequencies of the eight oscillator keys [PITH_FULL_IMAGE:… view at source ↗

**Figure 7.** Figure 7: Forecast on the chaotic logistic map with [PITH_FULL_IMAGE:figures/full_fig_p041_7.png] view at source ↗

**Figure 8.** Figure 8: Forecast on the chaotic logistic map with [PITH_FULL_IMAGE:figures/full_fig_p041_8.png] view at source ↗

read the original abstract

Transformers excel at time series modelling through attention mechanisms that capture long-term temporal patterns. However, they assume uniform time intervals and therefore struggle with irregular time series. Neural Ordinary Differential Equations (NODEs) effectively handle irregular time series by modelling hidden states as continuously evolving trajectories. ContiFormers arxiv:2402.10635 combine NODEs with Transformers, but inherit the computational bottleneck of the former by using heavy numerical solvers. This bottleneck can be removed by using a closed-form solution for the given dynamical system - but this is known to be intractable in general! We obviate this by replacing NODEs with a novel linear damped harmonic oscillator analogy - which has a known closed-form solution. We model keys and values as damped, driven oscillators and expand the query in a sinusoidal basis up to a suitable number of modes. This analogy naturally captures the query-key coupling that is fundamental to any transformer architecture by modelling attention as a resonance phenomenon. Our closed-form solution eliminates the computational overhead of numerical ODE solvers while preserving expressivity. We prove that this oscillator-based parameterisation maintains the universal approximation property of continuous-time attention; specifically, any discrete attention matrix realisable by ContiFormer's continuous keys can be approximated arbitrarily well by our fixed oscillator modes. Our approach delivers both theoretical guarantees and scalability, achieving state-of-the-art performance on irregular time series benchmarks while being orders of magnitude faster. Acknowledgement: This work was done in collaboration with Dirac Labs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper swaps NODE solvers for closed-form damped oscillators in continuous attention, which is a practical move, but the universal approximation claim with fixed modes does not hold up without letting the basis grow.

read the letter

The core idea here is replacing the numerical ODE solvers in models like ContiFormer with a linear damped harmonic oscillator setup that admits an exact closed-form solution. Keys and values evolve as damped driven oscillators while the query gets expanded in a fixed sinusoidal basis, turning attention into a resonance effect. This removes the integration cost and keeps the model continuous-time, which is the main practical advance over prior work on irregular series.

Referee Report

2 major / 1 minor

Summary. The paper proposes replacing neural ODEs in ContiFormers with a damped harmonic oscillator model for keys and values, expanding queries in a fixed sinusoidal basis, to obtain closed-form solutions for irregular time series attention. Attention is reinterpreted as a resonance phenomenon. The central claims are that this parameterization eliminates numerical ODE solvers while preserving the universal approximation property of continuous-time attention (specifically, any discrete attention matrix realizable by ContiFormer's continuous keys can be approximated arbitrarily well by the oscillator modes) and delivers SOTA performance with orders-of-magnitude speedups on irregular time series benchmarks.

Significance. If the universal-approximation claim and empirical results hold, the work would be significant: it directly addresses the computational bottleneck of numerical solvers in hybrid NODE-Transformer models for irregular data, supplies a closed-form alternative grounded in linear ODE theory, and offers both theoretical guarantees and practical scalability.

major comments (2)

[Abstract] Abstract: The universal-approximation statement asserts that any discrete attention matrix realizable by ContiFormer's continuous keys can be approximated arbitrarily well by 'fixed oscillator modes.' A finite fixed basis cannot be dense in the space of continuous functions over arbitrary irregular time points; the mode count must be permitted to grow with decreasing approximation error. The manuscript must clarify whether the number of modes is treated as a fixed hyperparameter independent of epsilon or allowed to increase, and supply the corresponding density argument.
[Abstract] Abstract and proof section: The claim of a 'proof' that the oscillator parameterization maintains the universal approximation property is asserted without derivation steps, explicit construction, error bounds, or analysis of how the damped-driven oscillator dynamics plus sinusoidal query expansion span the required function space. No intermediate lemmas or approximation-error analysis appear in the visible text.

minor comments (1)

[Abstract] Abstract: The final sentence claims 'state-of-the-art performance on irregular time series benchmarks' but provides no benchmark names, dataset sizes, or comparison baselines in the visible text; these details should be summarized even in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: The universal-approximation statement asserts that any discrete attention matrix realizable by ContiFormer's continuous keys can be approximated arbitrarily well by 'fixed oscillator modes.' A finite fixed basis cannot be dense in the space of continuous functions over arbitrary irregular time points; the mode count must be permitted to grow with decreasing approximation error. The manuscript must clarify whether the number of modes is treated as a fixed hyperparameter independent of epsilon or allowed to increase, and supply the corresponding density argument.

Authors: We appreciate this observation. In the manuscript, the number of oscillator modes is a tunable hyperparameter that can be increased to achieve higher approximation accuracy, similar to the number of terms in a Fourier series. We will clarify this point in the revised abstract and include a density argument in the theory section. Specifically, we will show that the solutions to the damped harmonic oscillator equations, being linear combinations of damped sinusoidal functions, can approximate continuous functions arbitrarily well on compact time intervals as the number of modes increases, thus preserving the universal approximation property. revision: yes
Referee: [Abstract] Abstract and proof section: The claim of a 'proof' that the oscillator parameterization maintains the universal approximation property is asserted without derivation steps, explicit construction, error bounds, or analysis of how the damped-driven oscillator dynamics plus sinusoidal query expansion span the required function space. No intermediate lemmas or approximation-error analysis appear in the visible text.

Authors: We acknowledge that the proof is presented at a high level in the current manuscript. To strengthen the presentation, we will expand the relevant section with a detailed derivation, including explicit construction of the approximation, intermediate lemmas on the expressivity of the oscillator modes, error bounds, and analysis showing how the damped-driven dynamics combined with the sinusoidal query expansion span the necessary function space for approximating any continuous attention matrix realizable by ContiFormer. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation uses standard ODE closed-forms and states an external approximation proof

full rationale

The paper replaces NODEs with a damped harmonic oscillator model whose closed-form solution follows directly from standard linear ODE theory. Keys/values are parameterized as damped driven oscillators and the query is expanded in a fixed sinusoidal basis; attention is interpreted as resonance. The universal-approximation claim is presented as a separate proof that any ContiFormer-realizable attention matrix can be approximated by the oscillator modes. No equation reduces to its own input by construction, no fitted parameter is relabeled as a prediction, and the ContiFormer reference is external (different authors, arXiv:2402.10635). The derivation chain therefore remains self-contained and does not rely on self-citation load-bearing or ansatz smuggling.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on the standard closed-form solution of linear damped driven harmonic oscillators and the assumption that a finite sinusoidal basis for the query is expressive enough; no new physical entities are postulated.

free parameters (2)

number of oscillator modes
Chosen as a suitable finite number to expand the query; directly affects approximation quality and compute.
damping and driving coefficients
Parameters of the oscillator dynamics for keys and values; learned or set per model.

axioms (2)

standard math The linear damped driven harmonic oscillator admits an exact closed-form solution.
Invoked to replace numerical ODE integration; standard result from differential equations.
domain assumption A finite sinusoidal basis expansion of the query is sufficient to realize arbitrary continuous attention matrices up to arbitrary accuracy.
Central to the universal-approximation claim; not derived in the abstract.

pith-pipeline@v0.9.0 · 5601 in / 1435 out tokens · 87206 ms · 2026-05-16T02:14:16.601149+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AxiomDischargePlan.lean ode_cosine_case / ode_constant_case / dAlembert_to_ODE_general echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

We model keys and values as damped, driven oscillators... closed-form solution... resonance phenomenon... H(ω) = β/(ω_i² - ω² + 2iγ_iω)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Curran Associates Inc. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling, 2014. URLhttps://arxiv.org/ abs/1412.3555. Nan Du, Hanjun Dai, Rakshit Trivedi, Utkarsh Upadhyay, Manuel Gomez-Rodriguez, and Le Song. Recurrent marked temporal point processes: Embedding e...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/2939672.2939875 2014
[2]

(1) Underdamped:γ 2 < ω 2 (γ < ω)

work page
[3]

(2) Critically damped:γ 2 =ω 2 (γ=ω)

work page
[4]

A.1.1 CASEI:γ < ω(UNDERDAMPED) Letω d = p ω2 −γ 2, thenλ 1,2 =−γ±iω d

(3) Overdamped:γ 2 > ω 2 (γ > ω) Eigenvalues ofA: det(A−λI) = −λ1 −ω2 −2γ−λ = (−λ)(−2γ−λ) +ω 2 =λ 2 + 2γλ+ω 2 = 0, so λ1,2 =−γ± p γ2 −ω 2. A.1.1 CASEI:γ < ω(UNDERDAMPED) Letω d = p ω2 −γ 2, thenλ 1,2 =−γ±iω d. Eigenvectors.Forλ 1 =−γ+iω d, (A−λ 1I) = γ−iω d 1 −ω2 −γ−iω d =⇒(γ−iω d)x+y= 0,−ω 2x+ (−γ−iω d)y= 0 so one eigenvector is v1 = 1 −γ+iω d . Forλ 2 =...

work page
[5]

1 2π Z π −π KN(θ)dθ= 1

work page
[6]

Proof.(1) Using the geometric sum, NX j=0 eijθ = 1−e i(N+1)θ 1−e iθ =e iN θ/2 sin (N+ 1)θ/2 sin(θ/2)

For any fixedδ∈(0, π], 1 2π Z |θ|≥δ KN(θ)dθ≤ 1 (N+ 1) sin 2(δ/2) . Proof.(1) Using the geometric sum, NX j=0 eijθ = 1−e i(N+1)θ 1−e iθ =e iN θ/2 sin (N+ 1)θ/2 sin(θ/2) . Hence KN(θ) = 1 N+ 1 NX j=0 eijθ 2 ≥0. (2) Integrating the Fourier series in Definition 4 term-wise over[−π, π]annihilates all nonzero frequencies; the constant term is1, so 1 2π R π −πKN...

work page
[7]

Equivalence Relation to Wigner (D)-matrices Z 2π 0 dα Z π 0 dβsinβ Z 2π 0 dγ D J M K(α, β, γ)∗ D j1 m1k1 (α, β, γ)D j2 m2k2 (α, β, γ) = 8π2 2J+ 1 ⟨j1m1j2m2|J M⟩ ⟨j 1k1j2k2|J K⟩. 4)Relation to spherical harmonics Z S2 Y m1 ℓ1 (Ω)∗ Y m2 ℓ2 (Ω)∗ Y M L (Ω) dΩ = s (2ℓ1 + 1)(2ℓ2 + 1) 4π(2L+ 1) ⟨ℓ10ℓ 20|L0⟩ ⟨ℓ 1m1 ℓ2m2 |LM⟩(88) =⇒Y m1 ℓ1 (Ω)Y m2 ℓ2 (Ω) = X L,M s...

work page
[8]

The tensor product T (ℓ1) ⊗T (ℓ2) (L) is computed as T (ℓ1) ⊗Y (ℓ2) (L) m = ℓ1X m1=−ℓ1 ℓ2X m2=−ℓ2 ⟨ℓ1m1, ℓ2m2|Lm⟩T (ℓ1) m1 Y (ℓ2) m2 .(100)

work page
[9]

For a relative position vectorr ij =r j −r i, Y m ℓ (ˆrij) =Y m ℓ (θij, ϕij),(θ ij, ϕij)are the spherical angles of ˆrij = rij ∥rij∥ .(101)

work page
[10]

Proof: Consider a transformationg= (R, t)∈E(3) Under the transformation: r′ i =Rr i +t, r′ ij =r ′ i −r ′ j =R ri −r j =Rr ij, br′ ij =R brij

For a nodeiwith neighboursN(i), T (ℓout) i = X j∈N(i) X ℓin X ℓ W (ℓout, ℓin, ℓ) T (ℓin) j ⊗Y (ℓ)(ˆrij) (ℓout) .(102) We claim that the above operation isE(3)-equivariant. Proof: Consider a transformationg= (R, t)∈E(3) Under the transformation: r′ i =Rr i +t, r′ ij =r ′ i −r ′ j =R ri −r j =Rr ij, br′ ij =R brij. Spherical harmonics transform as: y(ℓ)br′ ...

work page 2023
[11]

, M}andω ⋆ =ω m⋆

We sample a label indexm ⋆ ∼Unif{1, . . . , M}andω ⋆ =ω m⋆

work page
[12]

We sampleL= 32time stamps0≤t 1 <· · ·< t L ≤TwithT= 5from a homogeneous Poisson process with rateλ= 6and then re-normalize to[0, T]

work page
[13]

For eacht ℓ, form the two–dimensional observation xℓ = Acos(ω ⋆tℓ +ϕ) Asin(ω ⋆tℓ +ϕ) +ε ℓ, ε ℓ ∼ N(0,0.05 2I2)

We sample an amplitudeA∼Unif[0.8,1.2]and phaseϕ∼Unif[0,2π). For eacht ℓ, form the two–dimensional observation xℓ = Acos(ω ⋆tℓ +ϕ) Asin(ω ⋆tℓ +ϕ) +ε ℓ, ε ℓ ∼ N(0,0.05 2I2). The target is the class indexm⋆, i.e. the model must recover which frequency generated the sequence from irregular samples and additive noise. We generate50,000sequences for training,10...

work page
[14]

The resonance amplitude profile|H i(ω)|= 1√ (ω2 0,i−ω2)2+(2γiω)2 for each learned keyi using its trained parameters(ω 0,i, γi)

work page
[15]

The phase-dependent attention mapα(ω, φ)across the frequency-phase plane for individ- ual keys

work page
[16]

The maximum achievable attentionα max(ω) = max φ[α(ω, φ)]and the optimal phase φ∗(ω) = argH(ω)that yields this maximum

work page
[17]

The attention weight distribution across keys for validation examples, both before and after training

work page
[18]

query spectrum

The confusion matrix of average attention weights (rows = true class, columns = keys) to verify that attention concentrates on keys whose natural frequencies match the signal’s dominant frequency. (a) Learned natural frequencies for the eight oscil- lator keys (b) Confusion matrix of mean attention weights (c) Phase–frequency attentionα(ω, φ)for a repre- ...

work page