pith. sign in

arxiv: 2501.07451 · v4 · submitted 2025-01-13 · 💻 cs.CV

A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion

Pith reviewed 2026-05-23 05:36 UTC · model grok-4.3

classification 💻 cs.CV
keywords dynamic neural networkscomputer visionsensor fusionadaptive computationmodel efficiencytaxonomymulti-modal fusioninput-dependent computation
0
0 comments X

The pith

Dynamic neural networks adapt computations to each input's complexity instead of using fixed structures for all inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey organizes the large but scattered body of work on dynamic neural networks in computer vision. It groups methods into a taxonomy according to whether the output, the computation graph, or the input is made adaptive. The authors claim these networks bring particular gains in multi-modal sensor fusion by allowing the model to adjust to changing conditions, suppress noise from unreliable sensors, and focus on the most informative data streams. They review early fusion examples and supply a public list of papers with summaries and code links.

Core claim

Dynamic Neural Networks allow to condition the number of computations to the specific input. The current literature on the topic is very extensive and fragmented. We present a comprehensive survey that synthesizes and unifies existing Dynamic Neural Networks research in the context of Computer Vision. Additionally, we provide a logical taxonomy based on which component of the network is adaptive: the output, the computation graph or the input. Furthermore, we argue that Dynamic Neural Networks are particularly beneficial in the context of Sensor Fusion for better adaptivity, noise reduction and information prioritization. We present preliminary works in this direction.

What carries the argument

Taxonomy that classifies dynamic networks by the adaptive component: output, computation graph, or input

If this is right

  • Static compression methods ignore that different inputs need different amounts of work.
  • In sensor fusion, adaptive computation can down-weight noisy channels and emphasize reliable ones.
  • A shared taxonomy reduces duplication of effort across vision and fusion papers.
  • The supplied repository lowers the barrier to reproducing and extending existing dynamic methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive-component taxonomy could be tested on sequential decision tasks outside vision.
  • Measuring average FLOPs saved on embedded hardware would make the fusion benefit concrete.
  • Dynamic prioritization might interact with uncertainty estimation techniques already used in robotics.

Load-bearing premise

The existing literature is fragmented enough that a taxonomy based on adaptive components will make the field easier to navigate and apply.

What would settle it

A controlled comparison in which models using the proposed taxonomy show no measurable gains in adaptivity, noise handling, or prioritization during multi-modal sensor fusion tasks.

Figures

Figures reproduced from arXiv: 2501.07451 by Fabio Montello, Lazaros Nalpantidis, Ronja G\"uldenring, Simone Scardapane.

Figure 1
Figure 1. Figure 1: The three types of Dynamic Neural Networks we consider in the survey. From left to right: (a) Early Exits networks decide at which point to output, (b) Dynamic routing networks use a Mixture-of-Experts and decide which computational path is optimal according to the input, (c) Token skimming networks decide which subset of tokens will attend the following blocks. 1. Introduction In the past decade, the intr… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the publications considered in this survey, grouped by year and topic. In total, 148 publications have been reviewed: 62 for the Early Exits Section, 44 for the Dynamic Routing Section, 27 for the Token Skimming Section, and 15 in the Dynamic Sensor Fusion Section. of methodologies and applications in the domain through a comparison of recent advancements. Taxonomy. The full tree structure of t… view at source ↗
Figure 3
Figure 3. Figure 3: Taxonomy of Dynamic Neural Network techniques presented in this survey, categorized by application domain (Computer Vision and Sensor Fusion) and specific method. The diagram highlights key methods such as Early Exits (in green), Computational Routing (in blue), Token Skimming (in red), and their applications in various Sensor Fusion tasks (in yellow). relevant publications and not only major milestones. M… view at source ↗
Figure 4
Figure 4. Figure 4: Illustrative example of MSDNet (Multi-Scale Dense Network). MS￾DNet processes the input image through a multi-scale architecture, enabling feature ex￾traction at various resolutions. The feature maps shown in blue and classifiers highlighted in orange are active during this example, while the grey blocks indicate components that are not in use. For the complete scheme, please refer to Huang et al. (2018). … view at source ↗
read the original abstract

Model compression is essential in the deployment of large Computer Vision models on embedded devices. However, static optimization techniques (e.g. pruning, quantization, etc.) neglect the fact that different inputs have different complexities, thus requiring different amount of computations. Dynamic Neural Networks allow to condition the number of computations to the specific input. The current literature on the topic is very extensive and fragmented. We present a comprehensive survey that synthesizes and unifies existing Dynamic Neural Networks research in the context of Computer Vision. Additionally, we provide a logical taxonomy based on which component of the network is adaptive: the output, the computation graph or the input. Furthermore, we argue that Dynamic Neural Networks are particularly beneficial in the context of Sensor Fusion for better adaptivity, noise reduction and information prioritization. We present preliminary works in this direction. We complement this survey with a curated repository listing all the surveyed papers, each with a brief summary of the solution and the code base when available: https://github.com/DTU-PAS/awesome-dynn-for-cv .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript surveys dynamic neural networks (DyNNs) in computer vision, presenting a taxonomy that classifies methods according to the adaptive component (output, computation graph, or input). It argues that DyNNs offer particular advantages for multi-modal sensor fusion through improved adaptivity, noise reduction, and information prioritization, supports this with preliminary works, and provides a curated GitHub repository of surveyed papers with summaries and code links where available.

Significance. If the taxonomy proves comprehensive and the literature synthesis accurate, the survey would usefully consolidate an extensive and fragmented body of work while highlighting an underexplored application area in sensor fusion. The inclusion of a public repository with code availability is a concrete strength that enhances the practical utility of the contribution.

major comments (2)
  1. [Sensor Fusion discussion (likely §5 or equivalent)] The central argument that DyNNs are 'particularly beneficial' for sensor fusion (adaptivity, noise reduction, prioritization) rests on preliminary works; the manuscript should explicitly map each claimed benefit to specific cited methods or results in the sensor-fusion section to demonstrate that the benefits are evidenced rather than extrapolated.
  2. [Taxonomy definition (likely §3)] The taxonomy is presented as 'logical' and based on adaptive components, but without an explicit statement of the decision criteria used to assign papers to the three categories (output / graph / input), it is difficult to assess whether the taxonomy is exhaustive or whether borderline methods are handled consistently.
minor comments (2)
  1. [Abstract and repository description] The abstract states that the repository 'lists all the surveyed papers'; the manuscript should include a brief description of the search strategy, inclusion criteria, and cut-off date used to compile the list so readers can judge completeness.
  2. [Taxonomy overview] Figure or table summarizing the taxonomy would benefit from explicit counts or percentages of papers falling into each adaptive-component category to give a quantitative sense of the literature distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback. We address both major comments below and will revise the manuscript accordingly to strengthen the sensor-fusion discussion and clarify the taxonomy.

read point-by-point responses
  1. Referee: [Sensor Fusion discussion (likely §5 or equivalent)] The central argument that DyNNs are 'particularly beneficial' for sensor fusion (adaptivity, noise reduction, prioritization) rests on preliminary works; the manuscript should explicitly map each claimed benefit to specific cited methods or results in the sensor-fusion section to demonstrate that the benefits are evidenced rather than extrapolated.

    Authors: We agree that explicit mappings will make the claims more rigorous. In the revision we will insert a structured table (or bulleted mapping) in the sensor-fusion section that directly links each of the three claimed benefits to the specific preliminary works cited, quoting the relevant results or mechanisms from those papers. revision: yes

  2. Referee: [Taxonomy definition (likely §3)] The taxonomy is presented as 'logical' and based on adaptive components, but without an explicit statement of the decision criteria used to assign papers to the three categories (output / graph / input), it is difficult to assess whether the taxonomy is exhaustive or whether borderline methods are handled consistently.

    Authors: We will add an explicit paragraph (or short subsection) at the beginning of §3 that states the decision criteria used to assign a method to one of the three categories. The criteria will be defined in terms of the primary adaptive component, with examples of how borderline cases (e.g., methods that adapt both output and graph) are classified to ensure consistency and transparency. revision: yes

Circularity Check

0 steps flagged

No significant circularity; survey is organizational

full rationale

This is a survey paper synthesizing existing Dynamic Neural Networks literature for CV and arguing benefits for sensor fusion via a three-way taxonomy on adaptive components. No derivations, equations, fitted parameters, or predictions are present that could reduce to inputs by construction. The central argument rests on coverage and synthesis of prior work rather than any self-referential step, self-citation chain, or ansatz. No load-bearing claim reduces to a fit or definition within the paper itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper synthesizing existing research; it introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5724 in / 1038 out tokens · 44362 ms · 2026-05-23T05:36:33.320923+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. An Algorithm for On-Sensor Agnostic Detection of Changes in Human Activity for Ultra-Low-Power Applications

    eess.SP 2026-04 unverdicted novelty 6.0

    A non-parametric change-detection gate based on dynamic template matching reduces HAR computational load by over 67% with 97-98% sensitivity on two public datasets while requiring only brief device calibration.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Selective Sensor Fusion for Neural Visual-Inertial Odometry

    End-to-End Object Detection with Transformers. Springer Interna- tionalPublishing, Cham. pp.213–229. doi:10.1007/978-3-030-58452-8_ 13. Chen, C., Rosa, S., Miao, Y., Lu, C.X., Wu, W., Markham, A., Trigoni, N., 2019a. Selective Sensor Fusion for Neural Visual-Inertial Odometry. arXiv:1903.01534. 59 Chen, L., Odema, M., Faruque, M.A.A., 2022. Romanus: Robus...

  2. [2]

    Han, X., Wei, L., Dou, Z., Wang, Z., Qiang, C., He, X., Sun, Y., Han, Z., Tian, Q., 2024

    doi:10.1109/CVPR.2017.540. Han, X., Wei, L., Dou, Z., Wang, Z., Qiang, C., He, X., Sun, Y., Han, Z., Tian, Q., 2024. ViMoE: An Empirical Study of Designing Vision Mixture- of-Experts. doi:10.48550/arXiv.2410.15732,arXiv:2410.15732. 63 Han, Y., Han, D., Liu, Z., Wang, Y., Pan, X., Pu, Y., Deng, C., Feng, J., Song, S., Huang, G., 2023. Dynamic Perceiver for...

  3. [3]

    Multi-Scale Dense Networks for Resource Efficient Image Classification

    Multi-Scale Dense Networks for Resource Efficient Image Classifica- tion.arXiv:1703.09844. Huang, X., Huang, Z., Zuo, Y., Gong, Y., Zhang, C., Liu, D., Fang, Y.,

  4. [4]

    ProceedingsoftheAAAIConferenceonArtificialIntelligence 39, 3788–3796

    PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration. ProceedingsoftheAAAIConferenceonArtificialIntelligence 39, 3788–3796. doi:10.1609/aaai.v39i4.32395. 64 Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E., 1991. Adaptive Mixtures of Local Experts. Neural Computation 3, 79–87. doi:10.1162/ neco.1991.3.1.79. Jain, G., Hegde, N.,...

  5. [5]

    Jie, Z., Sun, P., Li, X., Feng, J., Liu, W., 2021

    doi:10.1109/TIP.2020.3018269. Jie, Z., Sun, P., Li, X., Feng, J., Liu, W., 2021. Anytime Recognition with Routing Convolutional Networks. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence 43, 1875–1886. doi:10.1109/TPAMI.2019. 2959322. John, V., Boyali, A., Tehrani, H., Ishimaru, K., Konishi, M., Liu, Z., Mita, S.,

  6. [6]

    IEEE Transactions on Intelligent Vehicles 3, 571–584

    Estimation of Steering Angle and Collision Avoidance for Automated Driving Using Deep Mixture of Experts. IEEE Transactions on Intelligent Vehicles 3, 571–584. doi:10.1109/TIV.2018.2874555. Ju, W., Bao, W., Ge, L., Yuan, D., 2021. Dynamic Early Exit Schedul- ing for Deep Neural Network Inference through Contextual Bandits, in: Proceedings of the 30th ACM ...

  7. [7]

    volume 13681, pp

    Springer Nature Switzerland, Cham. volume 13681, pp. 330–349. doi:10.1007/978-3-031-19803-8_20. Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. ImageNet Classification with Deep Convolutional Neural Networks, in: Advances in Neural Infor- mation Processing Systems, Curran Associates, Inc. Kuhse, D., Teper, H., Buschjäger, S., Wang, C.Y., Chen, J.J., 20...

  8. [8]

    Li, Y., Geller, T., Kim, Y., Panda, P., 2023b

    doi:10.1609/aaai.v37i7.26042. Li, Y., Geller, T., Kim, Y., Panda, P., 2023b. SEENN: Towards Temporal Spiking Early-Exit Neural Networks.arXiv:2304.01230. Li, Y., Song, L., Chen, Y., Li, Z., Zhang, X., Wang, X., Sun, J., 2020b. Learn- ing Dynamic Routing for Semantic Segmentation, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVP...

  9. [9]

    73 Schiebener, D., Morimoto, J., Asfour, T., Ude, A., 2013

    doi:10.1007/s12559-020-09734-4. 73 Schiebener, D., Morimoto, J., Asfour, T., Ude, A., 2013. Integrating visual perception and manipulation for autonomous learning of object represen- tations. Adaptive Behavior doi:10.1177/1059712313484502. Seol, K.S., Roh, S.D., Chung, K.S., 2023. Token Merging with Class Im- portance Score, in: IECON 2023- 49th Annual Co...

  10. [10]

    Valade, F., Hebiri, M., Gay, P., 2024

    doi:10.1109/ICRA.2017.7989540. Valade, F., Hebiri, M., Gay, P., 2024. EERO: Early Exit with Reject Option for Efficient Classification with limited budget.arXiv:2402.03779. Veit, A., Belongie, S., 2018. Convolutional Networks with Adaptive Inference Graphs . 75 Verelst, T., Tuytelaars, T., 2020. Dynamic Convolutions: Exploiting Spatial Sparsity for Faster...

  11. [11]

    volume 13664, pp

    Springer Nature Switzerland, Cham. volume 13664, pp. 226–243. doi:10.1007/978-3-031-19772-7_14. Wang, Z., Bao, W., Yuan, D., Ge, L., Tran, N.H., Zomaya, A.Y., 2019b. SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage, in: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile...

  12. [12]

    volume 12361, pp

    Springer International Publishing, Cham. volume 12361, pp. 275–

  13. [13]

    Xu, C., McAuley, J., 2023

    doi:10.1007/978-3-030-58517-4_17. Xu, C., McAuley, J., 2023. A Survey on Dynamic Neural Networks for Nat- ural Language Processing, in: Vlachos, A., Augenstein, I. (Eds.), Findings of the Association for Computational Linguistics: EACL 2023, Associa- tion for Computational Linguistics, Dubrovnik, Croatia. pp. 2370–2381. doi:10.18653/v1/2023.findings-eacl....