Learning Individual Dynamics from Sparse Cross-Sectional Snapshots

Christian Lagemann; Kai Lagemann; Sach Mukherjee; Steven L. Brunton

arxiv: 2605.23470 · v1 · pith:7GIODNZJnew · submitted 2026-05-22 · 💻 cs.LG · cs.AI· cs.CE

Learning Individual Dynamics from Sparse Cross-Sectional Snapshots

Christian Lagemann , Kai Lagemann , Steven L. Brunton , Sach Mukherjee This is my paper

Pith reviewed 2026-05-25 05:23 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CE

keywords individual dynamicscross-sectional datatrajectory inferenceidentifiabilitysparse snapshotsprobability flow ODEmixture of expertslatent dynamics

0 comments

The pith

Static individual contexts make dynamical parameters and routing jointly identifiable from single-timepoint snapshots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard approaches to learning how individuals evolve require either dense time-series tracking or else lose all individual distinctions when working with population-level snapshots. This paper demonstrates that the requirement can be relaxed when each observation carries a static individual-level context. CADENCE anchors latent continuous-time dynamics to those contexts, removes spatial ambiguities with a bijective Probability Flow ODE, and uses a Soft Mixture-of-Experts router to assign dynamics. The resulting construction yields joint identifiability of the per-individual parameters and the routing function. On benchmarks that range from physical systems to real biological data, the method matches or exceeds sequential models that were trained on complete trajectories.

Core claim

The paper establishes that individual dynamical parameters and the routing function are jointly identifiable from single-timepoint data when static individual contexts are available, by pairing a score-based spatial encoder realized as a bijective Probability Flow ODE with a Soft Mixture-of-Experts router. This construction recovers continuous individual trajectories without requiring longitudinal sequences.

What carries the argument

Bijective Probability Flow ODE paired with Soft Mixture-of-Experts router, which together eliminate diffeomorphic ambiguities and render per-individual parameters and routing jointly identifiable when anchored by static contexts.

If this is right

Continuous individual trajectories become recoverable from isolated snapshots rather than requiring dense longitudinal sequences.
Performance on physical and biological benchmarks equals or exceeds that of state-of-the-art sequential models trained on full trajectories.
Joint identifiability holds for both the dynamical parameters and the routing function under the stated architectural choices.
The framework applies uniformly across domains once static context variables are recorded alongside each snapshot.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Data-collection protocols in aging or epidemiology studies could shift emphasis toward richer static covariates rather than repeated observations of the same individuals.
The identifiability argument may extend to other latent dynamical models that currently rely on temporal density to resolve ambiguities.
If contexts themselves contain measurement error, the joint identifiability guarantee would require an additional robustness analysis not supplied in the paper.

Load-bearing premise

Static individual-level contexts are sufficient to anchor the latent dynamics and, together with the bijective Probability Flow ODE and SMoE router, render individual dynamical parameters and the routing function jointly identifiable from single-timepoint data.

What would settle it

A dataset of single-timepoint observations with known ground-truth individual parameters where two distinct parameter sets produce identical observed distributions after routing, or where CADENCE performance falls below that of a dense-trajectory baseline on the same held-out trajectories.

Figures

Figures reproduced from arXiv: 2605.23470 by Christian Lagemann, Kai Lagemann, Sach Mukherjee, Steven L. Brunton.

**Figure 1.** Figure 1: CADENCE overview. Stage 1 maps each high-dimensional observation x i ti to a latent state z i ti via a score-based bijective PF-ODE, Gaussian-pinning the latent space. Stage 2 routes the realization’s static context ci through the SMoE gating network to produce a convex expert mixture wi , which conditions the Neural ODE. Forward integration yields individual future trajectories. Proposition 1 establishes … view at source ↗

**Figure 2.** Figure 2: BM1 results comparing CADENCE against published baselines. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: BM6 results comparing CADENCE against published baselines. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: BM7 (LARRY haematopoiesis) results comparing CADENCE against published base [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

read the original abstract

Predicting how a dynamical unit evolves over time - how an individual ages, an epidemic spreads, or a physical system degrades - typically requires dense longitudinal tracking. When only extremely sparse or entirely cross-sectional data is available, inferring individualized, continuous-time trajectories is fundamentally ill-posed. Existing methods force a strict compromise: sequence models (e.g. latent ODEs) require dense longitudinal data, while cross-sectional methods (e.g. optimal transport, flow matching-based) map aggregate populations, losing individual dynamics. In this paper, we demonstrate that this dichotomy can be broken. We introduce CADENCE, a principled probabilistic framework that recovers continuous individual trajectories from isolated snapshots by anchoring latent dynamics to static, individual-level contexts. We provide novel identifiability guarantees for single-timepoint trajectory inference. By combining a score-based spatial encoder (bijective Probability Flow ODE) to eliminate diffeomorphic ambiguities with a Soft Mixture-of-Experts (SMoE) router, we show that individual dynamical parameters and routing function are jointly identifiable. Across a suite of benchmarks spanning physical systems to real-world biological data, CADENCE, trained strictly on extremely sparse snapshots with context structure, matches or exceeds the performance of state-of-the-art sequential models trained on dense, full-trajectory data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CADENCE claims joint identifiability of individual dynamics and routing from single snapshots via bijective PF ODE plus SMoE, but the guarantees rest on assumptions the stress-test flags as potentially fragile.

read the letter

The main thing here is that CADENCE anchors latent dynamics to static individual contexts, then uses a bijective score-based encoder (Probability Flow ODE) and Soft Mixture-of-Experts router to claim that dynamical parameters and the routing function become jointly identifiable from isolated snapshots. It reports matching or beating dense-trajectory sequential models on physical and biological benchmarks despite training only on sparse cross-sectional data with context structure.

Referee Report

1 major / 2 minor

Summary. The paper introduces CADENCE, a probabilistic framework for recovering continuous individual trajectories from extremely sparse or cross-sectional snapshots. It anchors latent dynamics to static individual-level contexts, employs a bijective Probability Flow ODE score-based spatial encoder to remove diffeomorphic ambiguities, and uses a Soft Mixture-of-Experts (SMoE) router. The central claims are novel joint identifiability guarantees for individual dynamical parameters and the routing function from single-timepoint data, plus empirical performance that matches or exceeds state-of-the-art sequential models trained on dense trajectories across physical and biological benchmarks.

Significance. If the identifiability result holds under the stated assumptions, the work would be significant for dynamical modeling in domains where dense longitudinal data are unavailable. It offers a concrete route to individualized continuous-time inference from cross-sectional snapshots by resolving latent ambiguities via bijective flows and context-anchored routing, potentially unifying cross-sectional and longitudinal paradigms.

major comments (1)

[Identifiability derivation (Methods/Appendix)] The joint identifiability claim for dynamical parameters and the routing function from single-timepoint data (abstract and presumably §3 or Appendix) rests on exact bijectivity of the Probability Flow ODE together with the SMoE decomposition under static context anchoring. The derivation must be checked for hidden assumptions on gating symmetries, approximate versus exact invertibility, and whether the router admits permutation or collapse modes; without an explicit assumptions list and a complete proof, the guarantee cannot be verified and remains the load-bearing step for the central contribution.

minor comments (2)

[Experiments section] Benchmark results should report error bars and statistical tests for the claimed superiority or parity with dense-trajectory baselines.
[Preliminaries/Methods] Notation for the routing function and its integration with the latent dynamics should be defined explicitly before the identifiability argument.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The identifiability guarantees form the core theoretical contribution, and we address the concern regarding the derivation below by committing to explicit clarifications and expansions.

read point-by-point responses

Referee: [Identifiability derivation (Methods/Appendix)] The joint identifiability claim for dynamical parameters and the routing function from single-timepoint data (abstract and presumably §3 or Appendix) rests on exact bijectivity of the Probability Flow ODE together with the SMoE decomposition under static context anchoring. The derivation must be checked for hidden assumptions on gating symmetries, approximate versus exact invertibility, and whether the router admits permutation or collapse modes; without an explicit assumptions list and a complete proof, the guarantee cannot be verified and remains the load-bearing step for the central contribution.

Authors: We agree that an explicit assumptions list and expanded proof are necessary for verifiability. In the revision we will add a dedicated Assumptions subsection (new §3.1) enumerating: (i) Lipschitz continuity of the latent vector field ensuring exact bijectivity of the Probability Flow ODE (not approximate), (ii) distinct static context embeddings that break gating symmetries and permutation modes in the SMoE, and (iii) bounded expert parameters together with the context-anchored score-matching objective that precludes collapse. The appendix proof will be extended with a dedicated lemma ruling out residual invariances. These additions directly address the referee's points and will be incorporated in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: identifiability claim rests on introduced components without reduction to inputs.

full rationale

The paper's central claim of joint identifiability for dynamical parameters and routing function from single-timepoint snapshots is presented as arising from the combination of a bijective Probability Flow ODE (to remove diffeomorphic ambiguities) and an SMoE router, anchored by static individual contexts. No equations, derivations, or self-citations are exhibited in the provided text that reduce this guarantee to a fitted quantity, a prior self-citation chain, or a self-definitional loop. The framework is described as introducing new components to break the dense-vs-cross-sectional dichotomy, with performance claims benchmarked externally rather than internally forced. This matches the default expectation of a self-contained derivation; no load-bearing step reduces by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claims rest on the assumption that static contexts plus the chosen encoder and router suffice for identifiability; no free parameters are enumerated in the abstract, but the framework itself is a new constructed object.

axioms (1)

domain assumption Static individual-level contexts are sufficient to anchor latent dynamics and eliminate diffeomorphic ambiguities when combined with a bijective Probability Flow ODE and SMoE router.
This premise is required for the single-timepoint identifiability claim to hold.

invented entities (2)

CADENCE framework no independent evidence
purpose: Recover continuous individual trajectories from sparse snapshots
New probabilistic framework introduced by the paper.
Soft Mixture-of-Experts (SMoE) router no independent evidence
purpose: Jointly identify routing function with dynamical parameters
Component introduced as part of the new method.

pith-pipeline@v0.9.0 · 5763 in / 1481 out tokens · 25481 ms · 2026-05-25T05:23:00.392345+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide novel identifiability guarantees for single-timepoint trajectory inference. By combining a score-based spatial encoder (bijective Probability Flow ODE) to eliminate diffeomorphic ambiguities with a Soft Mixture-of-Experts (SMoE) router, we show that individual dynamical parameters and routing function are jointly identifiable.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under Assumptions 1–5, the per-leaf parameter ŵ(λ)=Wα(λ) is identifiable from the ensemble distribution... (Theorem 1, informal)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

[1]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, volume 31, 2018

work page 2018
[2]

M. D. Craig. Minimum-volume transforms for remotely sensed data.IEEE Transactions on Geoscience and Remote Sensing, 32(3):542–552, 1994. doi: 10.1109/36.297973

work page doi:10.1109/36.297973 1994
[3]

Dibaeinia and S

P . Dibaeinia and S. Sinha. Sergio: a single-cell expression simulator guided by gene regulatory networks.Cell systems, 11(3):252–271, 2020

work page 2020
[4]

Hermann and A

R. Hermann and A. J. Krener. Nonlinear controllability and observability.IEEE Transactions on Automatic Control, 22(5):728–740, 1977

work page 1977
[5]

Huguet, D

G. Huguet, D. S. Magruder, A. Tong, O. Fasina, M. Kuchroo, G. Wolf, and S. Krishnaswamy. Manifold interpolating optimal-transport flows for trajectory inference. InAdvances in Neural Information Processing Systems, volume 35, pages 29903–29920, 2022

work page 2022
[6]

Iakovlev, C

V . Iakovlev, C. Yildiz, M. Heinonen, and H. Lähdesmäki. Latent neural ODEs with sparse Bayesian multiple shooting. InInternational Conference on Learning Representations, 2023

work page 2023
[7]

Isidori.Nonlinear control systems: an introduction

A. Isidori.Nonlinear control systems: an introduction. Springer, 1985

work page 1985
[8]

Khemakhem, D

I. Khemakhem, D. Kingma, R. Monti, and A. Hyvarinen. Variational autoencoders and nonlinear ICA: A unifying framework. InInternational conference on artificial intelligence and statistics, pages 2207–2217. PMLR, 2020

work page 2020
[9]

Kirchmeyer, Y

M. Kirchmeyer, Y. Yin, J. Donà, N. Baskiotis, A. Rakotomamonjy, and P . Gallinari. Gener- alizing to new physical systems via context-informed dynamics model. InInternational conference on machine learning, pages 11283–11301. PMLR, 2022. 14

work page 2022
[10]

Lagemann, C

K. Lagemann, C. Lagemann, B. Taschler, and S. Mukherjee. Deep learning of causal structures in high dimensions under data limitations.Nature Machine Intelligence, 5(11): 1306–1316, 2023

work page 2023
[11]

Lagemann, C

K. Lagemann, C. Lagemann, and S. Mukherjee. Invariance-based learning of latent dynam- ics. InInternational Conference on Learning Representations, 2024

work page 2024
[12]

Locatello, S

F. Locatello, S. Bauer, M. Lucic, G. Raetsch, S. Gelly, B. Schölkopf, and O. Bachem. Challeng- ing common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning, pages 4114–4124. PMLR, 2019

work page 2019
[13]

Norcliffe, C

A. Norcliffe, C. Bodnar, B. Day, J. Moss, and P . Liò. Neural ODE Processes. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum? id=27acGyyI1BY

work page 2021
[14]

Pérez, F

E. Pérez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

work page 2018
[15]

Puigcerver, C

J. Puigcerver, C. Riquelme, B. Mustafa, and N. Houlsby. From sparse to soft mixtures of experts. InInternational Conference on Learning Representations, 2024

work page 2024
[16]

Rubanova, R

Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[17]

Schiebinger, J

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V . Subramanian, A. Solomon, J. Gould, S. Liu, S. Lin, P . Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming.Cell, 176(4):928–943, 2019

work page 2019
[18]

Y. Sha, Y. Qiu, P . Zhou, and Q. Nie. Reconstructing growth and dynamic trajectories from single-cell transcriptomics data.Nature Machine Intelligence, 6(1):25–39, 2024

work page 2024
[19]

Y. Song, J. Sohl-Dickstein, D. P . Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

work page 2021
[20]

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures.Journal of Machine Learning Research, 11(50):1517–1561, 2010. URL http://jmlr.org/papers/v11/sriperumbudur10a. html

work page 2010
[21]

A. Tong, K. Fatras, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Ben- gio. Improving and generalizing flow-based generative models with minibatch optimal transport.arXiv preprint arXiv:2302.00482, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

A. W. van der Vaart.Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998. ISBN 9780521784504. doi: 10.1017/ CBO9780511802256. 15

work page 1998
[23]

Weinreb, A

C. Weinreb, A. Rodriguez-Fraticelli, F. D. Camargo, and A. M. Klein. Lineage tracing on transcriptional landscapes links state to fate during differentiation.Science, 367(6479): eaaw3381, 2020

work page 2020
[24]

H. Whitney. The self-intersections of a smooth n-manifold in 2n-space.Annals of Mathemat- ics, 45(2):220–246, 1944

work page 1944
[25]

D. Yao, C. Muller, and F. Locatello. Marrying causal representation learning with dynamical systems for science.Advances in Neural Information Processing Systems, 37:71705–71736, 2024

work page 2024
[26]

W. Yao, Y. Sun, A. Ho, C. Sun, and K. Zhang. Learning temporally causal latent processes from general temporal data.arXiv preprint arXiv:2110.05428, 2021

work page arXiv 2021
[27]

parameter confusion zone

Y. Yin, I. Ayed, E. de Bézenac, N. Baskiotis, and P . Gallinari. Leads: Learning dynamical systems that generalize across environments.Advances in Neural Information Processing Systems, 34:7561–7573, 2021. 16 Appendices. A Proofs of Theoretical Results A.1 Proof of Proposition 1 (Non-identifiability without structure) Letϕ:R q →R q be a smooth diffeomorph...

work page 2021
[28]

This follows from standard M-estimation under a quantitative FOA condition

Leaf-level argmin consistency(Proposition 4):Givena consistently estimated leaf assignment, the per-leaf MMD argmin is a consistent estimator of the FOA-faithful parameters w⋆(λ). This follows from standard M-estimation under a quantitative FOA condition

work page
[29]

Dictionary recovery(Corollary 2): Given consistent leaf-level estimates ˆwN(λ)→w ⋆(λ), the basisW ⋆ and routing α⋆(λ) are identified up to column permutation. We delineate two distinct convergence regimes governed by the annealing schedule: ananchor regime( τN → 0+ for one-hot routing) and asimplex-interior regime(τ N →τ 0 >0 to preserve soft routing). By...

work page
[30]

(Holds whenever the ODE flow is jointly continuous in (w, t, z) and the baseline density is continuous.) 2.Bounded characteristic kernel:k σ is bounded and characteristic

Compactness and continuity: Θ is compact and w7→(Φ w t,t0)#ρλ 0 is continuous in MMDkσ, uniformly in t∈ T λ. (Holds whenever the ODE flow is jointly continuous in (w, t, z) and the baseline density is continuous.) 2.Bounded characteristic kernel:k σ is bounded and characteristic. 21

work page
[31]

from a density fT continuous and bounded below on Tλ, andρ λ 0 has a continuous Lebesgue density bounded below on an open set V⊆R q

Sampling:enrollment times {tj} are i.i.d. from a density fT continuous and bounded below on Tλ, andρ λ 0 has a continuous Lebesgue density bounded below on an open set V⊆R q. 4.Bandwidth:h N →0, N λhN →∞. 5.Quantitative FOA (7): w ⋆(λ)is a well-separated minimum of L λ. 6.Reference consistency: R Tλ MMD2 kσ ( ˆρλ t,N,ρ λ t )dν λ(t) P− →0(Lemma 3 below). T...

work page

[1] [1]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, volume 31, 2018

work page 2018

[2] [2]

M. D. Craig. Minimum-volume transforms for remotely sensed data.IEEE Transactions on Geoscience and Remote Sensing, 32(3):542–552, 1994. doi: 10.1109/36.297973

work page doi:10.1109/36.297973 1994

[3] [3]

Dibaeinia and S

P . Dibaeinia and S. Sinha. Sergio: a single-cell expression simulator guided by gene regulatory networks.Cell systems, 11(3):252–271, 2020

work page 2020

[4] [4]

Hermann and A

R. Hermann and A. J. Krener. Nonlinear controllability and observability.IEEE Transactions on Automatic Control, 22(5):728–740, 1977

work page 1977

[5] [5]

Huguet, D

G. Huguet, D. S. Magruder, A. Tong, O. Fasina, M. Kuchroo, G. Wolf, and S. Krishnaswamy. Manifold interpolating optimal-transport flows for trajectory inference. InAdvances in Neural Information Processing Systems, volume 35, pages 29903–29920, 2022

work page 2022

[6] [6]

Iakovlev, C

V . Iakovlev, C. Yildiz, M. Heinonen, and H. Lähdesmäki. Latent neural ODEs with sparse Bayesian multiple shooting. InInternational Conference on Learning Representations, 2023

work page 2023

[7] [7]

Isidori.Nonlinear control systems: an introduction

A. Isidori.Nonlinear control systems: an introduction. Springer, 1985

work page 1985

[8] [8]

Khemakhem, D

I. Khemakhem, D. Kingma, R. Monti, and A. Hyvarinen. Variational autoencoders and nonlinear ICA: A unifying framework. InInternational conference on artificial intelligence and statistics, pages 2207–2217. PMLR, 2020

work page 2020

[9] [9]

Kirchmeyer, Y

M. Kirchmeyer, Y. Yin, J. Donà, N. Baskiotis, A. Rakotomamonjy, and P . Gallinari. Gener- alizing to new physical systems via context-informed dynamics model. InInternational conference on machine learning, pages 11283–11301. PMLR, 2022. 14

work page 2022

[10] [10]

Lagemann, C

K. Lagemann, C. Lagemann, B. Taschler, and S. Mukherjee. Deep learning of causal structures in high dimensions under data limitations.Nature Machine Intelligence, 5(11): 1306–1316, 2023

work page 2023

[11] [11]

Lagemann, C

K. Lagemann, C. Lagemann, and S. Mukherjee. Invariance-based learning of latent dynam- ics. InInternational Conference on Learning Representations, 2024

work page 2024

[12] [12]

Locatello, S

F. Locatello, S. Bauer, M. Lucic, G. Raetsch, S. Gelly, B. Schölkopf, and O. Bachem. Challeng- ing common assumptions in the unsupervised learning of disentangled representations. In International Conference on Machine Learning, pages 4114–4124. PMLR, 2019

work page 2019

[13] [13]

Norcliffe, C

A. Norcliffe, C. Bodnar, B. Day, J. Moss, and P . Liò. Neural ODE Processes. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum? id=27acGyyI1BY

work page 2021

[14] [14]

Pérez, F

E. Pérez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville. FiLM: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

work page 2018

[15] [15]

Puigcerver, C

J. Puigcerver, C. Riquelme, B. Mustafa, and N. Houlsby. From sparse to soft mixtures of experts. InInternational Conference on Learning Representations, 2024

work page 2024

[16] [16]

Rubanova, R

Y. Rubanova, R. T. Q. Chen, and D. K. Duvenaud. Latent ordinary differential equations for irregularly-sampled time series. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019

[17] [17]

Schiebinger, J

G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V . Subramanian, A. Solomon, J. Gould, S. Liu, S. Lin, P . Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming.Cell, 176(4):928–943, 2019

work page 2019

[18] [18]

Y. Sha, Y. Qiu, P . Zhou, and Q. Nie. Reconstructing growth and dynamic trajectories from single-cell transcriptomics data.Nature Machine Intelligence, 6(1):25–39, 2024

work page 2024

[19] [19]

Y. Song, J. Sohl-Dickstein, D. P . Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

work page 2021

[20] [20]

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures.Journal of Machine Learning Research, 11(50):1517–1561, 2010. URL http://jmlr.org/papers/v11/sriperumbudur10a. html

work page 2010

[21] [21]

A. Tong, K. Fatras, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, G. Wolf, and Y. Ben- gio. Improving and generalizing flow-based generative models with minibatch optimal transport.arXiv preprint arXiv:2302.00482, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

A. W. van der Vaart.Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1998. ISBN 9780521784504. doi: 10.1017/ CBO9780511802256. 15

work page 1998

[23] [23]

Weinreb, A

C. Weinreb, A. Rodriguez-Fraticelli, F. D. Camargo, and A. M. Klein. Lineage tracing on transcriptional landscapes links state to fate during differentiation.Science, 367(6479): eaaw3381, 2020

work page 2020

[24] [24]

H. Whitney. The self-intersections of a smooth n-manifold in 2n-space.Annals of Mathemat- ics, 45(2):220–246, 1944

work page 1944

[25] [25]

D. Yao, C. Muller, and F. Locatello. Marrying causal representation learning with dynamical systems for science.Advances in Neural Information Processing Systems, 37:71705–71736, 2024

work page 2024

[26] [26]

W. Yao, Y. Sun, A. Ho, C. Sun, and K. Zhang. Learning temporally causal latent processes from general temporal data.arXiv preprint arXiv:2110.05428, 2021

work page arXiv 2021

[27] [27]

parameter confusion zone

Y. Yin, I. Ayed, E. de Bézenac, N. Baskiotis, and P . Gallinari. Leads: Learning dynamical systems that generalize across environments.Advances in Neural Information Processing Systems, 34:7561–7573, 2021. 16 Appendices. A Proofs of Theoretical Results A.1 Proof of Proposition 1 (Non-identifiability without structure) Letϕ:R q →R q be a smooth diffeomorph...

work page 2021

[28] [28]

This follows from standard M-estimation under a quantitative FOA condition

Leaf-level argmin consistency(Proposition 4):Givena consistently estimated leaf assignment, the per-leaf MMD argmin is a consistent estimator of the FOA-faithful parameters w⋆(λ). This follows from standard M-estimation under a quantitative FOA condition

work page

[29] [29]

Dictionary recovery(Corollary 2): Given consistent leaf-level estimates ˆwN(λ)→w ⋆(λ), the basisW ⋆ and routing α⋆(λ) are identified up to column permutation. We delineate two distinct convergence regimes governed by the annealing schedule: ananchor regime( τN → 0+ for one-hot routing) and asimplex-interior regime(τ N →τ 0 >0 to preserve soft routing). By...

work page

[30] [30]

(Holds whenever the ODE flow is jointly continuous in (w, t, z) and the baseline density is continuous.) 2.Bounded characteristic kernel:k σ is bounded and characteristic

Compactness and continuity: Θ is compact and w7→(Φ w t,t0)#ρλ 0 is continuous in MMDkσ, uniformly in t∈ T λ. (Holds whenever the ODE flow is jointly continuous in (w, t, z) and the baseline density is continuous.) 2.Bounded characteristic kernel:k σ is bounded and characteristic. 21

work page

[31] [31]

from a density fT continuous and bounded below on Tλ, andρ λ 0 has a continuous Lebesgue density bounded below on an open set V⊆R q

Sampling:enrollment times {tj} are i.i.d. from a density fT continuous and bounded below on Tλ, andρ λ 0 has a continuous Lebesgue density bounded below on an open set V⊆R q. 4.Bandwidth:h N →0, N λhN →∞. 5.Quantitative FOA (7): w ⋆(λ)is a well-separated minimum of L λ. 6.Reference consistency: R Tλ MMD2 kσ ( ˆρλ t,N,ρ λ t )dν λ(t) P− →0(Lemma 3 below). T...

work page