{"paper":{"title":"Efficient Sensor Fusion for Gesture Recognition on Resource-Constrained Devices","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Fusing low-resolution ToF depth and IR thermal data with grouped convolutions lets microcontrollers classify seven gestures at 92.3 percent accuracy.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andrea Giudici, Christian Veronesi, Franco Zappa, Pietro Bartoli, Tommaso Bondini","submitted_at":"2026-05-13T12:53:22Z","abstract_excerpt":"Gesture recognition is a cornerstone of Human-Computer Interaction (HCI) for smart eyewear, enabling natural and device-free control in augmented reality environments. Traditional vision-based approaches face significant challenges regarding power consumption, computational latency, and user privacy. This paper proposes a lightweight, privacy-preserving gesture recognition system based on the fusion of low-resolution Time-of-Flight (ToF) and Infrared (IR) thermal sensors. We used an 8 times 8 multizone ToF sensor (VL53L8CH) and an 8 times 8 IR array (AMG8833) to capture complementary depth and"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The proposed fusion strategy significantly outperforms single-sensor baselines with an accuracy of 92.3% and a macro F1-score of 0.93.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The custom dataset of 7 static gestures and k-fold cross-validation results are representative of real-world wearable use cases and that the grouped-convolution architecture provides optimal fusion without overfitting.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Fusing 8x8 ToF and IR sensors with a 6343-parameter CNN achieves 92.3% accuracy and 0.93 macro F1 on 7 static gestures while running at millisecond latency and 50 mW on STM32 MCUs.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Fusing low-resolution ToF depth and IR thermal data with grouped convolutions lets microcontrollers classify seven gestures at 92.3 percent accuracy.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"82a03ad6595aad33c9b404a7d70a4c476084f88ccc50f55cd35a72307b0a2010"},"source":{"id":"2605.13462","kind":"arxiv","version":1},"verdict":{"id":"612f6149-1354-4f07-9c49-71036f14175e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:17:49.911125Z","strongest_claim":"The proposed fusion strategy significantly outperforms single-sensor baselines with an accuracy of 92.3% and a macro F1-score of 0.93.","one_line_summary":"Fusing 8x8 ToF and IR sensors with a 6343-parameter CNN achieves 92.3% accuracy and 0.93 macro F1 on 7 static gestures while running at millisecond latency and 50 mW on STM32 MCUs.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The custom dataset of 7 static gestures and k-fold cross-validation results are representative of real-world wearable use cases and that the grouped-convolution architecture provides optimal fusion without overfitting.","pith_extraction_headline":"Fusing low-resolution ToF depth and IR thermal data with grouped convolutions lets microcontrollers classify seven gestures at 92.3 percent accuracy."},"references":{"count":26,"sample":[{"doi":"","year":2025,"title":"Hand gesture recognition on edge devices: Sensor technologies, algo- rithms, and processing hardware,","work_id":"d0e23be1-f0d3-42d7-a94e-df8eead05d9b","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Augmented reality smart glasses use and acceptance: A literature review,","work_id":"2a14eb20-373e-46b8-88c5-9c50af935f9d","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.3390/app9153171","year":2019,"title":"User interactions for augmented reality smart glasses: A comparative evaluation of visual contexts and interaction gestures,","work_id":"6b30bd8a-60dc-4414-8097-3c8c3b56b545","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Speculative privacy concerns about ar glasses data collec- tion,","work_id":"767d2d15-e4aa-4093-8a94-821904dd8b10","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Energy-aware human activity recognition for wearable devices: A comprehensive review,","work_id":"e870070f-ff61-4f92-978a-03416683faea","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":26,"snapshot_sha256":"eeb2821d3146dab63ebf84e9dd8f9c4426ce3c59dc4254e971dceb3d4115f77b","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"05838e00eec51a92f6a46b72ff9a15bcef7cc62adc640550f16aff2364888a0e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}