FusionSense: Tri-Stage Near-Sensor Learning for Runtime-Adaptive Multimodal Edge Intelligence
Pith reviewed 2026-05-25 05:40 UTC · model grok-4.3
The pith
FusionSense trains near-sensor classifiers in three stages using server fusion insights to decide which modalities to transmit, sustaining task quality at far higher data reduction rates than uni-modal methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FusionSense establishes that a tri-stage procedure—server-side fusion model training, generation of filter-out-safe labels that quantify each modality's necessity relative to the fused decision, and compaction of an edge fusion model by injecting near-sensor predictions as auxiliary signals—produces runtime decisions that jointly reduce compute and communication while preserving downstream task quality, delivering up to 33x lower energy at 1% FoI prevalence, 11x at 10%, and a 92.3% reduction in quality loss at a fixed 30% data reduction on dual RGB plus Depth/LiDAR setups with SynDrone.
What carries the argument
The filter-out-safe (FoS) labels that quantify each modality's necessity relative to the fused decision, used to guide compaction of the edge model with auxiliary near-sensor predictions.
If this is right
- The approach sustains task quality at substantially higher data-reduction rates than uni-modal filters.
- End-to-end energy use drops by up to 33 times at 1% event prevalence and 11 times at 10% prevalence.
- Quality loss falls by 92.3% at a fixed 30% data reduction compared with prior filtering baselines.
- The decision layer scales linearly with the number of sensors because cross-modal dependencies are handled at training time.
Where Pith is reading between the lines
- If FoS labels remain reliable on new tasks, the same labels could support dynamic sensor activation when environmental conditions change.
- The linear scaling property suggests the method could handle three or more modalities without a proportional rise in edge compute.
- Deployment on physical hardware would allow direct measurement of latency reductions that simulation alone cannot capture.
Load-bearing premise
The filter-out-safe labels produced by the server-side fusion model accurately capture each modality's necessity for the downstream task without introducing bias that would degrade the compacted edge model.
What would settle it
Running the compacted edge model on held-out SynDrone sequences and measuring whether task quality loss at 30% data reduction exceeds the claimed 92.3% reduction relative to always-transmit baselines would settle the performance claims.
Figures
read the original abstract
Autonomous systems and smart-industry deployments increasingly split computation across near-sensor, edge, and cloud resources, where tight energy, latency, and reliability budgets demand run-time adaptivity. In practice, deciding what to compute and transmit at each point is pivotal; yet as multimodal sensor suites (cameras, LiDAR/depth, etc.) proliferate at the edge, most prior approaches either (i) fuse modalities on powerful servers or (ii) apply uni-modal near-sensor filters that ignore cross-modal dependencies, leading to redundant transmissions or missed events. We present FusionSense, a fusion-aware intelligent sensing framework for energy-constrained autonomous edge systems. Lightweight near-sensor classifiers are trained via a three-step procedure: (i) a server-side fusion model learns the downstream task, (ii) filter-out-safe (FoS) labels quantify each modality's necessity relative to the fused decision, and (iii) an edge-side fusion model is compacted by injecting near-sensor predictions as auxiliary signals. The result is a run-time decision layer that jointly reduces compute and communication while scaling linearly with sensor count. On a dual-modality (RGB+Depth/LiDAR) setup with SynDrone, FusionSense sustains task quality at substantially higher data-reduction rates than uni-modal filters and delivers large end-to-end gains: up to 33x lower energy at 1% FoI prevalence, 11x at 10%, a 92.3% reduction in quality loss at a fixed 30% data reduction, and roughly 1.5x higher energy savings than the best prior filtering baseline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FusionSense, a tri-stage near-sensor learning framework for runtime-adaptive multimodal edge intelligence. A server-side fusion model is trained on the downstream task; filter-out-safe (FoS) labels are then derived to quantify each modality's necessity relative to the fused output; finally, an edge-side model is compacted by injecting near-sensor predictions as auxiliary signals. On a dual-modality (RGB + Depth/LiDAR) SynDrone setup, the approach is claimed to sustain task quality at higher data-reduction rates than uni-modal filters, yielding up to 33× lower energy at 1% FoI prevalence, 11× at 10%, a 92.3% reduction in quality loss at 30% data reduction, and ~1.5× higher energy savings than the best prior baseline.
Significance. If the quantitative claims are reproducible and the FoS-label transfer is shown to be unbiased, the work would offer a practical advance for energy- and bandwidth-constrained multimodal edge systems by exploiting cross-modal dependencies rather than treating modalities independently. The linear scaling with sensor count and the explicit three-stage training recipe are potentially useful for deployment.
major comments (2)
- [Tri-stage procedure (server fusion → FoS labeling → edge compaction)] The central performance claims (33× energy reduction, 92.3% quality-loss reduction, etc.) rest on the correctness of the FoS-label generation step. No quantitative validation is supplied that FoS labels agree with modality importance measured by ablation or leave-one-modality-out experiments; if the server fusion surface over-weights one modality or the implicit thresholding introduces correlated label noise, the compacted edge model will inherit the same bias and the reported data-reduction gains will not generalize. This assumption is load-bearing for all end-to-end numbers.
- [Experiments on SynDrone] The experimental section reports aggregate energy and quality metrics but supplies no dataset splits, number of runs, error bars, or explicit baseline implementations. Without these, the claimed superiority over “uni-modal filters” and “best prior filtering baseline” cannot be assessed for statistical significance or implementation fairness.
minor comments (2)
- [Method] Notation for FoS label extraction (thresholding relative to fused output) is described only at a high level; a precise algorithmic statement or pseudocode would improve reproducibility.
- [Abstract] The abstract states “roughly 1.5× higher energy savings” without specifying the exact prior baseline or the operating point at which the comparison is made.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of validation and reproducibility. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: The central performance claims (33× energy reduction, 92.3% quality-loss reduction, etc.) rest on the correctness of the FoS-label generation step. No quantitative validation is supplied that FoS labels agree with modality importance measured by ablation or leave-one-modality-out experiments; if the server fusion surface over-weights one modality or the implicit thresholding introduces correlated label noise, the compacted edge model will inherit the same bias and the reported data-reduction gains will not generalize. This assumption is load-bearing for all end-to-end numbers.
Authors: We agree that direct quantitative validation of FoS labels against ablation studies is necessary to confirm they accurately capture cross-modal necessity without bias. In the revised manuscript we will add leave-one-modality-out ablation experiments on the server fusion model and report agreement metrics (e.g., rank correlation or precision of modality importance) between these results and the derived FoS labels. This will strengthen the claims and allow readers to assess potential bias. revision: yes
-
Referee: The experimental section reports aggregate energy and quality metrics but supplies no dataset splits, number of runs, error bars, or explicit baseline implementations. Without these, the claimed superiority over “uni-modal filters” and “best prior filtering baseline” cannot be assessed for statistical significance or implementation fairness.
Authors: We acknowledge that the current experimental reporting lacks sufficient detail for reproducibility. In the revision we will explicitly state the SynDrone train/validation/test splits, the number of independent runs performed, include error bars or standard deviations on all reported metrics, and provide implementation details or references for the uni-modal filters and prior baselines to enable fair comparison and statistical assessment. revision: yes
Circularity Check
No circularity: pipeline described without equations or self-referential reductions
full rationale
The paper describes a tri-stage procedure (server fusion model, FoS label generation, edge compaction) in prose only. No equations, derivations, fitted parameters renamed as predictions, or self-citations appear in the provided text. The FoS labeling step is presented as an independent quantification step rather than a self-definition or fitted-input prediction. The central claims rest on empirical gains on SynDrone rather than any load-bearing mathematical reduction to inputs. This is the normal self-contained case.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Hamidreza Alikhani, Anil Kanduri, Pasi Liljeberg, Amir M Rahmani, and Nikil Dutt. 2023. DynaFuse: dynamic fusion for resource efficient multimodal machine learning inference.IEEE Embedded Systems Letters15, 4 (2023), 222–225
work page 2023
-
[2]
Hamidreza Alikhani, Ziyu Wang, Anil Kanduri, Pasi Lilieberg, Amir M Rah- mani, and Nikil Dutt. 2024. SEAL: Sensing efficient active learning on wearables through context-awareness. In2024 Design, Automation & Test in Europe Confer- ence & Exhibition (DATE). IEEE, 1–2
work page 2024
- [3]
-
[4]
Safa Bahri, Nesrine Zoghlami, Mourad Abed, and João Manuel RS Tavares. 2018. Big data for healthcare: a survey.IEEE access7 (2018), 7397–7408
work page 2018
-
[5]
Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuScenes: A Multimodal Dataset for Autonomous Driving. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2020
-
[6]
Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-K˜irkpatrick, and Shlomo Dubnov. 2022. HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection. InICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 646–650. doi:10. 1109/ICASSP43922.2022.9746312
-
[7]
Zhuo Chen, Kui Fan, Shiqi Wang, Lingyu Duan, Weisi Lin, and Alex Chichung Kot. 2020. Toward Intelligent Sensing: Intermediate Deep Feature Compression. IEEE Transactions on Image Processing29 (2020), 2230–2243. doi:10.1109/TIP.2019. 2941660
-
[8]
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. 2023. Imagebind: One embedding space to bind them all. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15180–15190
work page 2023
- [9]
- [10]
-
[11]
Sang-Ho Hwang, Kyung-Min Kim, Sungho Kim, and Jong Wook Kwak. 2023. Lossless Data Compression for Time-Series Sensor Data Based on Dynamic Bit Packing.Sensors23, 20 (2023), 8575
work page 2023
-
[12]
Samuel Isuwa, David Amos, Amit Kumar Singh, Bashir M Al-Hashimi, and Geoff V Merrett. 2023. Content-and lighting-aware adaptive brightness scaling for improved mobile user experience. In2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2
work page 2023
-
[13]
Abbas Javed, Hadi Larijani, Ali Ahmadinia, Rohinton Emmanuel, Mike Mannion, and Des Gibson. 2017. Design and Implementation of a Cloud Enabled Random Neural Network-Based Decentralized Smart Controller With Intelligent Sensor Nodes for HVAC.IEEE Internet of Things Journal4, 2 (2017), 393–403. doi:10. 1109/JIOT.2016.2627403
-
[14]
Bushra Khalid, Kashif Naseer Qureshi, Kayhan Zrar Ghafoor, and Gwanggil Jeon
-
[15]
An improved biometric based user authentication and key agreement scheme for intelligent sensor based wireless communication.Microprocessors and Microsystems96 (2023), 104722
work page 2023
- [16]
-
[17]
Yecheol Kim, Konyul Park, Minwook Kim, Dongsuk Kum, and Jun Won Choi
- [18]
-
[19]
Brett Koonce and Brett Koonce. 2021. MobileNetV3.Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization (2021), 125–144
work page 2021
-
[20]
Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg
Michelle A. Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg. 2021. Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors. In2021 IEEE International Conference on Robotics and Automation (ICRA). 909–916. doi:10. 1109/ICRA48506.2021.9561847
-
[21]
Jinglong Li and Han Han. 2022. Emotional Design Strategy of Smart Furniture for Small Households Based on User Experience. InInternational Conference on Human-Computer Interaction. Springer, 311–320
work page 2022
-
[22]
Konstantinos G Liakos, Patrizia Busato, Dimitrios Moshou, Simon Pearson, and Dionysis Bochtis. 2018. Machine learning in agriculture: A review.Sensors18, 8 (2018), 2674
work page 2018
-
[23]
Clemens Linnhoff, Kristof Hofrichter, Lukas Elster, Philipp Rosenberger, and Hermann Winner. 2022. Measuring the influence of environmental conditions on automotive lidar sensors.Sensors22, 14 (2022), 5266
work page 2022
-
[24]
Guan-Horng Liu, Avinash Siravuru, Sai Prabhakar, Manuela Veloso, and George Kantor. 2017. Learning end-to-end multimodal sensor policies for autonomous navigation. InConference on Robot Learning. PMLR, 249–261
work page 2017
-
[25]
Zheyu Liu, Erxiang Ren, Fei Qiao, Qi Wei, Xinjun Liu, Li Luo, Huichan Zhao, and Huazhong Yang. 2020. NS-CIM: A Current-Mode Computation-in-Memory Architecture Enabling Near-Sensor Processing for Intelligent IoT Vision Nodes. IEEE Transactions on Circuits and Systems I: Regular Papers67, 9 (2020), 2909–2922. doi:10.1109/TCSI.2020.2984161
-
[26]
Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L Rus, and Song Han. 2023. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In2023 IEEE international conference on robotics and automation (ICRA). IEEE, 2774–2781
work page 2023
-
[27]
R Madhusudhan and P Pravisha. 2024. Blockchain Based Artificial Intelligence of Things (AIoT) for Wildlife Monitoring. InInternational Conference on Advanced Information Networking and Applications. Springer, 25–36
work page 2024
-
[28]
Yang Ni, Yeseong Kim, Tajana Rosing, and Mohsen Imani. 2022. Online perfor- mance and power prediction for edge TPU via comprehensive characterization. In2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 612–615
work page 2022
-
[29]
Shahriar Nirjon, Robert F Dickerson, Philip Asare, Qiang Li, Dezhi Hong, John A Stankovic, Pan Hu, Guobin Shen, and Xiaofan Jiang. 2013. Auditeur: A mobile- cloud service platform for acoustic event detection on smartphones. InProceeding of the 11th annual international conference on Mobile systems, applications, and services. 403–416
work page 2013
-
[30]
Sabyasachi Pramanik, Digvijay Pandey, Subhankar Joardar, M Niranjanamurthy, Binay Kumar Pandey, and Jaspinder Kaur. 2023. An overview of IoT privacy and security in smart cities. InAIP Conference Proceedings, Vol. 2495. AIP Publishing
work page 2023
-
[31]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing network design spaces. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10428–10436
work page 2020
-
[32]
Giulia Rizzoli, Francesco Barbato, Matteo Caligiuri, and Pietro Zanuttigh. 2023. SynDrone-Multi-Modal UAV Dataset for Urban Scenarios. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2210–2220
work page 2023
- [33]
-
[34]
Abhishek Sharma, Vaidehi Sharma, Mohita Jaiswal, Hwang-Cheng Wang, Dushantha Nalin K Jayakody, Chathuranga M Wijerathna Basnayaka, and Am- mar Muthanna. 2022. Recent trends in AI-based intelligent sensing.Electronics 11, 10 (2022), 1661
work page 2022
-
[35]
Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge Computing: Vision and Challenges.IEEE Internet of Things Journal3, 5 (2016), 637–646. doi:10.1109/JIOT.2016.2579198
-
[36]
Kailai Sun, Xinwei Wang, and Qianchuan Zhao. 2023. A Review of AIoT-based Edge Devices and Lightweight Deployment.Authorea Preprints(2023)
work page 2023
-
[37]
Kuniyuki Takahashi and Jethro Tan. 2019. Deep visuo-tactile learning: Estimation of tactile properties from images. In2019 International Conference on Robotics and Automation (ICRA). IEEE, 8951–8957
work page 2019
-
[38]
Zain Taufique, Aman Vyas, Antonio Miele, Pasi Liljeberg, and Anil Kanduri. 2025. HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms. In2025 Design, Automation & Test in Europe Conference (DATE). IEEE, 1–7
work page 2025
-
[39]
Dequn Teng. 2021. AIoT Powered Wild Animal Tracing and Protection System Research Proposal for MRes in Engineering Science Supervised By Niki Trigoni. (2021)
work page 2021
-
[40]
Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[41]
Delia Velasco-Montero, Jorge Fernández-Berni, Ricardo Carmona-Galán, and Ángel Rodríguez-Vázquez. 2018. Performance analysis of real-time DNN infer- ence on Raspberry Pi. InReal-Time Image and Video Processing 2018, Vol. 10670. SPIE, 115–123
work page 2018
-
[42]
Shurun Wang, Shiqi Wang, Wenhan Yang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2022. Towards Analysis-Friendly Face Representation With Scalable Feature and Texture Compression.IEEE Transactions on Multimedia24 (2022), 3169–3181. doi:10.1109/TMM.2021.3094300
-
[43]
Manuel Woschank, Erwin Rauch, and Helmut Zsifkovits. 2020. A review of further directions for artificial intelligence, machine learning, and deep learning in smart logistics.Sustainability12, 9 (2020), 3760
work page 2020
-
[44]
Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, and Juan Pablo Bello. 2022. Wav2clip: Learning robust audio representations from clip. InICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4563–4567
work page 2022
-
[45]
Zihui Xue and Radu Marculescu. 2023. Dynamic Multimodal Fusion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2575–2584
work page 2023
-
[46]
Lei Xun, Mingyu Hu, Hengrui Zhao, Amit Kumar Singh, Jonathon Hare, and Geoff V Merrett. 2024. Fluid dynamic DNNs for reliable and adaptive distributed inference on edge devices. In2024 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2
work page 2024
-
[47]
Xinghua Yang, Zheyu Liu, Kechao Tang, Xunzhao Yin, Cheng Zhuo, Qi Wei, and Fei Qiao. 2023. Breaking the energy-efficiency barriers for smart sensing appli- cations with “Sensing with Computing” architectures.Science China Information Sciences66, 10 (2023), 200409
work page 2023
-
[48]
Sanggeon Yun, Hanning Chen, Ryozo Masukawa, Hamza Errahmouni Barkam, Andrew Ding, Wenjun Huang, Arghavan Rezvani, Shaahin Angizi, and Mohsen Imani. 2024. HyperSense: Hyperdimensional Intelligent Sensing for Energy- Efficient Sparse Data Processing.Advanced Intelligent Systems6, 12 (2024), 2400228
work page 2024
-
[49]
Sanggeon Yun, Ryozo Masukawa, Hanning Chen, Sungheon Jeong, Wenjun Huang, Arghavan Rezvani, Minhyoung Na, Yoshiki Yamaguchi, and Mohsen Imani. 2025. Hyperdimensional intelligent sensing for efficient real-time audio processing on extreme edge.IEEE Access(2025)
work page 2025
-
[50]
Sanggeon Yun, Ryozo Masukawa, Raheeb Hassan, Minhyoung Na, and Mohsen Imani. 2026. Contextual Fusion Strategies for Multimodal GNN-based Reasoning: Performance and Computational Trade-offs.IEEE Access(2026)
work page 2026
-
[51]
Sanggeon Yun, Ryozo Masukawa, Minhyoung Na, and Mohsen Imani. 2025. Mis- siongnn: Hierarchical multimodal gnn-based weakly supervised video anomaly recognition with mission-specific knowledge graph generation. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). IEEE, 4736–4745
work page 2025
-
[52]
Sanggeon Yun, Hyunwoo Oh, Ryozo Masukawa, and Mohsen Imani. 2026. De- coHD: Decomposed Hyperdimensional Classification under Extreme Memory Budgets. In2026 Design, Automation & Test in Europe Conference (DATE). IEEE
work page 2026
-
[53]
Sanggeon Yun, Hyunwoo Oh, Ryozo Masukawa, Pietro Mercati, Nathaniel D Bas- tian, and Mohsen Imani. 2026. LogHD: Robust Compression of Hyperdimensional Classifiers via Logarithmic Class-Axis Reduction. In2026 Design, Automation & Test in Europe Conference (DATE). IEEE
work page 2026
-
[54]
Tan Zhi-Xuan, Harold Soh, and Desmond Ong. 2020. Factorized inference in deep markov models for incomplete multimodal time series. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10334–10341
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.