Stable architectures for deep neural networks

Haber, E · 2017 · DOI 10.1088/1361-6420/aa9a90

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics

cs.LG · 2024-02-23 · unverdicted · novelty 7.0

A mean-field dynamical analysis of LoRA in transformers identifies phase transitions in catastrophic forgetting driven by perturbation norm and transformer depth.

Learning partially observed systems with neural Hamiltonian ordinary differential equations

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

NHODE framework learns partially observed dynamical systems by combining Hamiltonian neural networks with neural ODEs, enforcing energy conservation and improving long-horizon stability over data-driven baselines on mass-spring and three-body problems.

Dynamic Mode Decomposition along Depth in Vision Transformers

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

Dynamic Mode Decomposition shows that short contiguous spans of Vision Transformer blocks can be approximated by a low-rank linear operator K with high predictive fidelity for p<=4 steps, but this approximation fails to outperform an identity baseline when propagated to the final layer.

The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs

cs.CV · 2026-02-09 · unverdicted · novelty 4.0

Effective depth, an operational count of sequential transformations, predicts CNN trainability better than nominal layer count because shortcuts and branches decouple the two.

citing papers explorer

Showing 4 of 4 citing papers.

Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics cs.LG · 2024-02-23 · unverdicted · none · ref 13
A mean-field dynamical analysis of LoRA in transformers identifies phase transitions in catastrophic forgetting driven by perturbation norm and transformer depth.
Learning partially observed systems with neural Hamiltonian ordinary differential equations cs.LG · 2026-05-22 · unverdicted · none · ref 4
NHODE framework learns partially observed dynamical systems by combining Hamiltonian neural networks with neural ODEs, enforcing energy conservation and improving long-horizon stability over data-driven baselines on mass-spring and three-body problems.
Dynamic Mode Decomposition along Depth in Vision Transformers cs.CV · 2026-05-08 · unverdicted · none · ref 25
Dynamic Mode Decomposition shows that short contiguous spans of Vision Transformer blocks can be approximated by a low-rank linear operator K with high predictive fidelity for p<=4 steps, but this approximation fails to outperform an identity baseline when propagated to the final layer.
The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs cs.CV · 2026-02-09 · unverdicted · none · ref 4
Effective depth, an operational count of sequential transformations, predicts CNN trainability better than nominal layer count because shortcuts and branches decouple the two.

Stable architectures for deep neural networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer