Dynamic Mode Decomposition shows that short contiguous spans of Vision Transformer blocks can be approximated by a low-rank linear operator K with high predictive fidelity for p<=4 steps, but this approximation fails to outperform an identity baseline when propagated to the final layer.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
Koopman theory plus knowledge distillation yields linearized models from pre-trained nets that outperform standard least-squares Koopman approximations on MNIST and Fashion-MNIST in accuracy and stability.
citing papers explorer
-
Dynamic Mode Decomposition along Depth in Vision Transformers
Dynamic Mode Decomposition shows that short contiguous spans of Vision Transformer blocks can be approximated by a low-rank linear operator K with high predictive fidelity for p<=4 steps, but this approximation fails to outperform an identity baseline when propagated to the final layer.
-
Extraction of linearized models from pre-trained networks via knowledge distillation
Koopman theory plus knowledge distillation yields linearized models from pre-trained nets that outperform standard least-squares Koopman approximations on MNIST and Fashion-MNIST in accuracy and stability.