A mean-field dynamical analysis of LoRA in transformers identifies phase transitions in catastrophic forgetting driven by perturbation norm and transformer depth.
Stable architectures for deep neural networks
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
method 1polarities
use method 1representative citing papers
NHODE framework learns partially observed dynamical systems by combining Hamiltonian neural networks with neural ODEs, enforcing energy conservation and improving long-horizon stability over data-driven baselines on mass-spring and three-body problems.
Dynamic Mode Decomposition shows that short contiguous spans of Vision Transformer blocks can be approximated by a low-rank linear operator K with high predictive fidelity for p<=4 steps, but this approximation fails to outperform an identity baseline when propagated to the final layer.
Effective depth, an operational count of sequential transformations, predicts CNN trainability better than nominal layer count because shortcuts and branches decouple the two.
citing papers explorer
-
Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics
A mean-field dynamical analysis of LoRA in transformers identifies phase transitions in catastrophic forgetting driven by perturbation norm and transformer depth.
-
Learning partially observed systems with neural Hamiltonian ordinary differential equations
NHODE framework learns partially observed dynamical systems by combining Hamiltonian neural networks with neural ODEs, enforcing energy conservation and improving long-horizon stability over data-driven baselines on mass-spring and three-body problems.
-
Dynamic Mode Decomposition along Depth in Vision Transformers
Dynamic Mode Decomposition shows that short contiguous spans of Vision Transformer blocks can be approximated by a low-rank linear operator K with high predictive fidelity for p<=4 steps, but this approximation fails to outperform an identity baseline when propagated to the final layer.
-
The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs
Effective depth, an operational count of sequential transformations, predicts CNN trainability better than nominal layer count because shortcuts and branches decouple the two.