Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices

Alexander Gr\"afe; Ding Huo; Johannes Berger; Marco Zimmerling; Sebastian Trimpe; Vincent de Bakker

arxiv: 2605.15694 · v2 · pith:3772BMCJnew · submitted 2026-05-15 · 💻 cs.LG

Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices

Alexander Gr\"afe , Ding Huo , Vincent de Bakker , Johannes Berger , Marco Zimmerling , Sebastian Trimpe This is my paper

Pith reviewed 2026-05-20 20:11 UTC · model grok-4.3

classification 💻 cs.LG

keywords distributed inferencetransformer modelsultra-low-power deviceswireless IoTmodel parallelismcommunication primitiveSomeGatheredge AI

0 comments

The pith

CATS enables distributed transformer inference on ultra-low-power wireless devices by running models up to 14 times larger across up to 16 nodes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CATS as a framework that lets multiple ultra-low-power wireless devices collaborate to execute transformer models too large for any single device. It centers on SomeGather, a pruned primitive that selectively broadcasts only necessary activation columns to cut communication and memory costs. Message-dropout training during the process builds robustness to the packet losses typical in wireless settings. A sympathetic reader cares because this co-design of partitioning, communication, and training could bring large-model capabilities to cheap battery-powered IoT hardware without relying on powerful central servers or constant connectivity.

Core claim

CATS is a communication-aware distributed transformer inference scheme co-designed across transformer partitioning, wireless communication and training. It employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy. Building on SomeGather, it designs a partitioning method that exploits this primitive for efficient model parallelism and uses message-dropout during training to yield models robust to message loss during inference.

What carries the argument

SomeGather, a pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy.

If this is right

Networks of up to 16 devices can execute transformer models 14 times larger than what fits on one device.
Message-dropout training produces models that retain accuracy despite packet losses during inference.
Partitioning built around SomeGather achieves efficient model parallelism with lower bandwidth and RAM demands.
The approach demonstrates the first real-world deployments of distributed transformer inference on ultra-low-power wireless hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The selective-broadcast idea could extend to other model families if analogous pruning rules are found for their layer operations.
Scaling to larger networks or mobile scenarios would likely need additional handling of device mobility and clock drift not tested here.
Combining CATS with local energy harvesting could support longer-running deployments in variable environments.

Load-bearing premise

SomeGather's selective column broadcasting combined with message-dropout training preserves model accuracy under real-world wireless packet losses and device constraints without hidden overheads that would negate the size gains.

What would settle it

An experiment on a real wireless testbed measuring whether accuracy remains within acceptable bounds at observed packet loss rates while total latency and energy stay below single-device baselines for equivalent model size.

Figures

Figures reproduced from arXiv: 2605.15694 by Alexander Gr\"afe, Ding Huo, Johannes Berger, Marco Zimmerling, Sebastian Trimpe, Vincent de Bakker.

**Figure 2.** Figure 2: Device Operation. Devices operate in synchronized rounds [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: Model Size Scaling. Colored regions indicate feasible com [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 8.** Figure 8: Test Loss of Models Versus Message Loss. We train trans [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 7.** Figure 7: Comparison of Normal Pruning ( ) and SomeGather ( ) on Accuracy Versus Communication Trade-off. SomeGather’s accuracy remains approximately constant with decreasing communication, whereas normal pruning’s accuracy degrades significantly. 0 % to 90 % accelerates the attention block by 3.08× at 256 features and 4.37× at 512 features and the residual block by 3.21× and 4.68×, respectively. Overall, pruned in… view at source ↗

read the original abstract

Transformer models are rapidly becoming a cornerstone of modern Internet of Things (IoT) applications, yet their computational and memory demands far exceed the capabilities of a single typical ultra-low-power IoT device. We present CATS, a framework for distributed transformer inference on ultra-low-power wireless devices, enabling multiple devices to collaboratively execute models far larger than what a single device can sustain. At its core, CATS is a communication-aware distributed transformer inference scheme co-designed across transformer partitioning, wireless communication and training. It employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy. Building on SomeGather, we design a partitioning method that exploits this primitive for efficient model parallelism. To cope with unreliable wireless communication, CATS employs message-dropout during training, which mimics packet losses and yields models that are robust to message loss during inference. In real-world experiments, we show that CATS brings distributed transformer inference to ultra-low-power wireless devices for the first time, with deployments on up to 16 devices that collaboratively execute transformer models up to 14 times larger than what a single device can run.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CATS shows real hardware runs of 14x larger transformers across 16 low-power wireless nodes, but the accuracy cost of SomeGather pruning plus message dropout is the part that still needs numbers.

read the letter

The main thing to take from this paper is that they actually deployed a distributed transformer setup on real ultra-low-power wireless devices, letting up to 16 nodes handle models 14 times bigger than any single one could manage. The core idea is SomeGather, a pruned primitive that sends only selected activation columns instead of full broadcasts, combined with a partitioning scheme and message-dropout training that simulates packet losses so the model stays usable when links drop data. They co-designed the communication, the split, and the training to fit the constraints of tiny radios and limited RAM. The fact that they ran this on actual hardware rather than just simulation is the part that stands out as practical progress for edge IoT work. It directly tackles the bandwidth and memory walls that normally force cloud offloading for anything transformer-sized. The selective column approach looks like a reasonable way to shrink communication without needing full all-gather every step. Where the evidence is thinner is on whether accuracy holds up once the pruning and real losses hit. The abstract claims success but skips concrete retention numbers, baseline comparisons, ablation results on which columns matter most for attention or FFN layers, and measured packet-loss distributions from the target hardware. If the selective broadcast drops information that turns out to be load-bearing, the 14x size gain could come with a performance hit that makes the whole thing less useful in practice. No signs of circular fitting or invented math here; the claims rest on implemented experiments. This is the kind of paper that would interest people building distributed ML for sensor networks or constrained wireless systems. A reader focused on system-level tradeoffs in edge AI would get concrete takeaways from the deployments. It deserves peer review because the hardware validation gives it enough substance to warrant detailed feedback on the missing metrics and potential overheads.

Referee Report

2 major / 2 minor

Summary. The paper presents CATS, a framework for distributed transformer inference on ultra-low-power wireless IoT devices. It introduces SomeGather, a new pruned communication primitive for selective column broadcasting to reduce bandwidth and RAM, a partitioning scheme for model parallelism, and message-dropout training to handle wireless packet losses. Real-world experiments claim deployments on up to 16 devices that collaboratively run transformer models up to 14 times larger than a single device can support.

Significance. If the accuracy preservation claims hold, the work could enable substantially larger models on constrained wireless devices, advancing practical edge AI for IoT. The co-design of communication primitives, partitioning, and robust training, together with actual multi-device deployments rather than simulations, represents a concrete strength that goes beyond typical theoretical proposals in this area.

major comments (2)

[Abstract] Abstract: the central 14x size-gain claim rests on SomeGather plus message-dropout preserving end-to-end accuracy under real packet losses, yet no accuracy metrics, baselines, error bars, per-layer statistics, or ablation results on pruned columns are reported; without these the practical value of the size increase cannot be assessed.
[SomeGather] SomeGather description: the selective column broadcasting is presented as accuracy-neutral, but no analysis is given of which columns are dropped, whether the selection is input-dependent, or how it interacts with attention and FFN layers; this directly affects whether the reported communication savings are sustainable without hidden accuracy costs.

minor comments (2)

[Terminology] The acronym 'SomeGather' is introduced without explanation of its relation to standard gather primitives or the rationale for the name, which may hinder readability for readers in distributed systems.
[Related Work] The manuscript would benefit from explicit comparison tables against prior distributed inference systems for non-transformer models to better highlight the novelty of the wireless co-design.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and describe the revisions we will incorporate to improve clarity and support for the claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central 14x size-gain claim rests on SomeGather plus message-dropout preserving end-to-end accuracy under real packet losses, yet no accuracy metrics, baselines, error bars, per-layer statistics, or ablation results on pruned columns are reported; without these the practical value of the size increase cannot be assessed.

Authors: We agree that the abstract would benefit from explicit quantitative support for the accuracy claim. The body of the manuscript reports end-to-end accuracy under real packet losses together with single-device baselines and message-dropout ablations; however, these details are not summarized in the abstract. We will revise the abstract to include representative accuracy figures, reference to error bars from repeated runs, and a brief mention of the ablation results on pruned columns, while cross-referencing the experimental section for per-layer statistics. revision: yes
Referee: [SomeGather] SomeGather description: the selective column broadcasting is presented as accuracy-neutral, but no analysis is given of which columns are dropped, whether the selection is input-dependent, or how it interacts with attention and FFN layers; this directly affects whether the reported communication savings are sustainable without hidden accuracy costs.

Authors: The manuscript describes SomeGather as a magnitude-based pruning primitive applied uniformly across layers and states that it preserves accuracy when combined with message-dropout training. To strengthen the presentation, we will add a dedicated paragraph (or short subsection) that (i) specifies the column-selection criterion, (ii) clarifies that selection is performed on a per-activation basis and is therefore input-dependent, and (iii) discusses its application to both attention and FFN blocks, including why the chosen columns do not materially degrade the subsequent matrix multiplications. This addition will be supported by the existing end-to-end accuracy results rather than new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on experimental system validation

full rationale

The paper describes an engineering framework (CATS) for distributed transformer inference on wireless IoT devices, with core contributions being the SomeGather primitive, partitioning method, and message-dropout training. These are introduced as co-designed techniques and validated directly via real-world deployments on up to 16 devices executing models up to 14x larger than single-device capacity. No mathematical derivation chain, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the abstract or described content. The central claims are grounded in implemented experiments rather than reducing to inputs by construction, rendering the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on a new communication primitive and a training modification whose effectiveness is validated experimentally rather than derived from first principles; limited free parameters are visible in the abstract.

axioms (1)

domain assumption Packet losses in wireless channels can be adequately mimicked by random message dropout during training to produce inference-time robustness.
Invoked to justify the message-dropout component of the training procedure.

invented entities (1)

SomeGather no independent evidence
purpose: Pruned communication primitive that selectively broadcasts activation columns to reduce bandwidth and RAM usage.
Newly introduced component central to the partitioning and communication scheme.

pith-pipeline@v0.9.0 · 5748 in / 1336 out tokens · 77625 ms · 2026-05-20T20:11:14.608127+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CATS employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns... message-dropout during training

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Restructuring, pruning, and adjustment of deep models for parallel distributed infer- ence.arXiv preprint arXiv:2008.08289,

[Abdiet al., 2020 ] Afshin Abdi, Saeed Rashidi, Faramarz Fekri, and Tushar Krishna. Restructuring, pruning, and adjustment of deep models for parallel distributed infer- ence.arXiv preprint arXiv:2008.08289,

work page arXiv 2020
[2]

Wireless control for smart manufacturing: Recent approaches and open challenges.Proceedings of the IEEE,

[Baumannet al., 2020 ] Dominik Baumann, Fabian Mager, Ulf Wetzker, Lothar Thiele, Marco Zimmerling, and Se- bastian Trimpe. Wireless control for smart manufacturing: Recent approaches and open challenges.Proceedings of the IEEE,

work page 2020
[3]

Distributed inference with minimal off-chip traffic for trans- formers on low-power MCUs

[Bochemet al., 2025 ] Severin Bochem, Victor JB Jung, Arpan Suravi Prasad, Francesco Conti, and Luca Benini. Distributed inference with minimal off-chip traffic for trans- formers on low-power MCUs. InDesign, Automation & Test in Europe Conference (DATE). IEEE,

work page 2025
[4]

Survey on the characterization and classification of wireless sensor network applications.IEEE Communications Surveys & Tutorials,

[Borgeset al., 2014 ] Luis M Borges, Fernando J Velez, and António S Lebres. Survey on the characterization and classification of wireless sensor network applications.IEEE Communications Surveys & Tutorials,

work page 2014
[5]

The future of wireless mesh network in next-generation communication: A perspective overview.Evolving Systems,

[Chaiet al., 2024 ] Yuan Chai, Xiao-Jun Zeng, and Zixu Liu. The future of wireless mesh network in next-generation communication: A perspective overview.Evolving Systems,

work page 2024
[6]

RCIF: Towards robust distributed DNN collaborative inference under highly lossy networks

[Chenget al., 2024 ] Yujun Cheng, Zhewei Zhang, and Shengjin Wang. RCIF: Towards robust distributed DNN collaborative inference under highly lossy networks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,

work page 2024
[7]

Distributed deep convolutional neural net- works for the Internet of Things.IEEE Transactions on Computers,

[Disabatoet al., 2021 ] Simone Disabato, Manuel Roveri, and Cesare Alippi. Distributed deep convolutional neural net- works for the Internet of Things.IEEE Transactions on Computers,

work page 2021
[8]

An image is worth 16x16 words: Transformers for image recognition at scale

[Dosovitskiyet al., 2021 ] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Min- derer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Represen- tations,

work page 2021
[9]

Co- designing transformer architectures for distributed infer- ence with low communication.IEEE Transactions on Par- allel and Distributed Systems,

[Duet al., 2024 ] Jiangsu Du, Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, and Yutong Lu. Co- designing transformer architectures for distributed infer- ence with low communication.IEEE Transactions on Par- allel and Distributed Systems,

work page 2024
[10]

Efficient network flooding and time synchronization with Glossy

[Ferrariet al., 2011 ] Federico Ferrari, Marco Zimmerling, Lothar Thiele, and Olga Saukh. Efficient network flooding and time synchronization with Glossy. InProceedings of the 10th ACM/IEEE International Conference on Informa- tion Processing in Sensor Networks,

work page 2011
[11]

Monash time series forecasting archive

[Godahewaet al., 2021 ] Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643,

work page arXiv 2021
[12]

RockNet: Distributed learning on ultra-low-power devices.ACM Transactions on Cyber-Physical Systems,

[Gräfeet al., 2026 ] Alexander Gräfe, Fabian Mager, Marco Zimmerling, and Sebastian Trimpe. RockNet: Distributed learning on ultra-low-power devices.ACM Transactions on Cyber-Physical Systems,

work page 2026
[13]

DNN partitioning for cooperative infer- ence in edge intelligence: Modeling, solutions, toolchains

[Haoet al., 2025 ] Yuntao Hao, Nan Ding, Weiguo Xia, Hong- wei Ge, and Li Xu. DNN partitioning for cooperative infer- ence in edge intelligence: Modeling, solutions, toolchains. ACM Computing Surveys,

work page 2025
[14]

Mixer: Efficient many-to-all broad- cast in dynamic wireless mesh networks

[Herrmannet al., 2018 ] Carsten Herrmann, Fabian Mager, and Marco Zimmerling. Mixer: Efficient many-to-all broad- cast in dynamic wireless mesh networks. In16th ACM Con- ference on Embedded Networked Sensor Systems. ACM,

work page 2018
[15]

Karger, Michelle Effros, Jun Shi, and Ben Leong

[Hoet al., 2006 ] Tracey Ho, Muriel Médard, Ralf Koetter, David R. Karger, Michelle Effros, Jun Shi, and Ben Leong. A random linear network coding approach to multicast. IEEE Transactions on Information Theory,

work page 2006
[16]

Loss-adapter: Addressing network packet loss in distributed inference for lossy IoT environments.IEEE Internet of Things Journal,

[Hou and Ohtsuki, 2025] Zhangcheng Hou and Tomoaki Oht- suki. Loss-adapter: Addressing network packet loss in distributed inference for lossy IoT environments.IEEE Internet of Things Journal,

work page 2025
[17]

When the edge meets transformers: Distributed inference with trans- former models

[Hu and Li, 2024] Chenghao Hu and Baochun Li. When the edge meets transformers: Distributed inference with trans- former models. In44th International Conference on Dis- tributed Computing Systems (ICDCS). IEEE,

work page 2024
[18]

Communication-oriented model fine-tuning for packet-loss resilient distributed in- ference under highly lossy IoT networks.IEEE Access,

[Itaharaet al., 2022 ] Sohei Itahara, Takayuki Nishio, Yusuke Koda, and Koji Yamamoto. Communication-oriented model fine-tuning for packet-loss resilient distributed in- ference under highly lossy IoT networks.IEEE Access,

work page 2022
[19]

Challenges, applications, and future of wireless sensors in Internet of Things: A review.IEEE Sensors Journal,

[Jamshedet al., 2022 ] Muhammad Ali Jamshed, Kamran Ali, Qammer H Abbasi, Muhammad Ali Imran, and Masood Ur- Rehman. Challenges, applications, and future of wireless sensors in Internet of Things: A review.IEEE Sensors Journal,

work page 2022
[20]

Communication-aware DNN pruning

[Jianet al., 2023 ] Tong Jian, Debashri Roy, Batool Salehi, Nasim Soltani, Kaushik Chowdhury, and Stratis Ioannidis. Communication-aware DNN pruning. InIEEE Conference on Computer Communications,

work page 2023
[21]

Opti- mization framework for splitting DNN inference jobs over computing networks.Computer Networks,

[Jung and Lee, 2023] Sehun Jung and Hyang-Won Lee. Opti- mization framework for splitting DNN inference jobs over computing networks.Computer Networks,

work page 2023
[22]

Survey on computer vision techniques for Internet of Things devices

[Kaur and Jadhav, 2023] Ishmeet Kaur and Adwaita Janard- han Jadhav. Survey on computer vision techniques for Internet of Things devices. InInternational Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). IEEE,

work page 2023
[23]

Kingma and Jimmy Ba

[Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba. ADAM: A method for stochastic optimization. InInterna- tional Conference on Learning Representations,

work page 2015
[24]

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

[Laiet al., 2018 ] Liangzhen Lai, Naveen Suda, and Vikas Chandra. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs.arXiv preprint arXiv:1801.06601,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

The capture effect in FM receivers.IEEE Transactions on Communications,

[Leentvaar and Flint, 1976] Krijn Leentvaar and Jan Flint. The capture effect in FM receivers.IEEE Transactions on Communications,

work page 1976
[26]

Communication-efficient multi-device in- ference acceleration for transformer models.arXiv preprint arXiv:2505.19342,

[Liuet al., 2025b ] Xiao Liu, Lijun Zhang, Deepak Ganesan, and Hui Guan. Communication-efficient multi-device in- ference acceleration for transformer models.arXiv preprint arXiv:2505.19342,

work page internal anchor Pith review arXiv
[27]

MoDNN: Local dis- tributed mobile computing system for deep neural network

[Maoet al., 2017 ] Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. MoDNN: Local dis- tributed mobile computing system for deep neural network. InDesign, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,

work page 2017
[28]

A time series is worth 64 words: Long-term forecasting with transformers

[Nieet al., 2023 ] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The 11th International Conference on Learning Represen- tations,

work page 2023
[29]

Siracusa: A 16 nm heterogenous RISC-V SoC for extended reality with at-MRAM neural engine.IEEE Journal of Solid-State Circuits,

[Prasadet al., 2024 ] Arpan Suravi Prasad, Moritz Scherer, Francesco Conti, Davide Rossi, Alfio Di Mauro, Manuel Eggimann, Jorge Tomás Gómez, Ziyun Li, Syed Shakib Sarwar, Zhao Wang, et al. Siracusa: A 16 nm heterogenous RISC-V SoC for extended reality with at-MRAM neural engine.IEEE Journal of Solid-State Circuits,

work page 2024
[30]

Disco: Distributed inference with sparse communications.arXiv preprint arXiv:2302.11180,

[Qinet al., 2023 ] Minghai Qin, Chao Sun, Jaco Hofmann, and Dejan Vucinic. Disco: Distributed inference with sparse communications.arXiv preprint arXiv:2302.11180,

work page arXiv 2023
[31]

Wireless sensor networks in agri- culture through machine learning: A survey.Computers and Electronics in Agriculture,

[Rahaman and Azharuddin, 2022] Md Mohinur Rahaman and Md Azharuddin. Wireless sensor networks in agri- culture through machine learning: A survey.Computers and Electronics in Agriculture,

work page 2022
[32]

DISNET: Distributed micro-split deep learn- ing in heterogeneous dynamic IoT.IEEE Internet of Things Journal,

[Samikwaet al., 2023 ] Eric Samikwa, Antonio Di Maio, and Torsten Braun. DISNET: Distributed micro-split deep learn- ing in heterogeneous dynamic IoT.IEEE Internet of Things Journal,

work page 2023
[33]

Energy harvest- ing techniques for Internet of Things (IoT).IEEE Access,

[Sanislavet al., 2021 ] Teodora Sanislav, George Dan Mois, Sherali Zeadally, and Silviu Corneliu Folea. Energy harvest- ing techniques for Internet of Things (IoT).IEEE Access,

work page 2021
[34]

Structural health monitoring using wireless smart sensor network – an overview.Mechanical Systems and Signal Processing,

[Sofiet al., 2022 ] A Sofi, J Jane Regita, Bhagyesh Rane, and Hieng Ho Lau. Structural health monitoring using wireless smart sensor network – an overview.Mechanical Systems and Signal Processing,

work page 2022
[35]

An empirical study of low-power wireless.ACM Transactions on Sensor Net- works (TOSN),

[Srinivasanet al., 2010 ] Kannan Srinivasan, Prabal Dutta, Ar- salan Tavakoli, and Philip Levis. An empirical study of low-power wireless.ACM Transactions on Sensor Net- works (TOSN),

work page 2010
[36]

DeeperThings: Fully distributed CNN infer- ence on resource-constrained edge devices.International Journal of Parallel Programming,

[Stahlet al., 2021 ] Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, Andreas Gerstlauer, and Ulf Schlichtmann. DeeperThings: Fully distributed CNN infer- ence on resource-constrained edge devices.International Journal of Parallel Programming,

work page 2021
[37]

Self- organizing maps for anomaly localization and predictive maintenance in cyber-physical production systems.Proce- dia CIRP,

[V on Birgelenet al., 2018] Alexander V on Birgelen, Davide Buratti, Jens Mager, and Oliver Niggemann. Self- organizing maps for anomaly localization and predictive maintenance in cyber-physical production systems.Proce- dia CIRP,

work page 2018
[38]

Communication-efficient model parallelism for distributed in-situ transformer inference

[Weiet al., 2024 ] Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, Jiangsu Du, and Yutong Lu. Communication-efficient model parallelism for distributed in-situ transformer inference. InDesign, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,

work page 2024
[39]

EasyViT: An adaptive collaborative edge computing framework for vision trans- former.IEEE Internet of Things Journal,

[Wenet al., 2025 ] Dong Wen, Guanping Liang, Tianyun Li, Lin Chen, Junnan Li, and Tao Li. EasyViT: An adaptive collaborative edge computing framework for vision trans- former.IEEE Internet of Things Journal,

work page 2025
[40]

DeViT: Decompos- ing vision transformers for collaborative inference in edge devices.IEEE Transactions on Mobile Computing,

[Xuet al., 2023 ] Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, and Shiwen Mao. DeViT: Decompos- ing vision transformers for collaborative inference in edge devices.IEEE Transactions on Mobile Computing,

work page 2023
[41]

Communication- efficient distributed on-device LLM inference over wireless networks.arXiv preprint arXiv:2503.14882,

[Zhanget al., 2025a ] Kai Zhang, Hengtao He, Shenghui Song, Jun Zhang, and Khaled B Letaief. Communication- efficient distributed on-device LLM inference over wireless networks.arXiv preprint arXiv:2503.14882,

work page arXiv
[42]

Informer: Beyond efficient transformer for long sequence time-series forecasting

[Zhouet al., 2021 ] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence,

work page 2021
[43]

Synchronous transmissions in low- power wireless: A survey of communication protocols and network services.ACM Computing Surveys,

[Zimmerlinget al., 2020 ] Marco Zimmerling, Luca Mottola, and Silvia Santini. Synchronous transmissions in low- power wireless: A survey of communication protocols and network services.ACM Computing Surveys,

work page 2020
[44]

Integrat- ing large language models with Internet of Things: Appli- cations.Discover Internet of Things, 2025

[Zonget al., 2025 ] Mingyu Zong, Arvin Hekmati, Michael Guastalla, Yiyi Li, and Bhaskar Krishnamachari. Integrat- ing large language models with Internet of Things: Appli- cations.Discover Internet of Things, 2025

work page 2025

[1] [1]

Restructuring, pruning, and adjustment of deep models for parallel distributed infer- ence.arXiv preprint arXiv:2008.08289,

[Abdiet al., 2020 ] Afshin Abdi, Saeed Rashidi, Faramarz Fekri, and Tushar Krishna. Restructuring, pruning, and adjustment of deep models for parallel distributed infer- ence.arXiv preprint arXiv:2008.08289,

work page arXiv 2020

[2] [2]

Wireless control for smart manufacturing: Recent approaches and open challenges.Proceedings of the IEEE,

[Baumannet al., 2020 ] Dominik Baumann, Fabian Mager, Ulf Wetzker, Lothar Thiele, Marco Zimmerling, and Se- bastian Trimpe. Wireless control for smart manufacturing: Recent approaches and open challenges.Proceedings of the IEEE,

work page 2020

[3] [3]

Distributed inference with minimal off-chip traffic for trans- formers on low-power MCUs

[Bochemet al., 2025 ] Severin Bochem, Victor JB Jung, Arpan Suravi Prasad, Francesco Conti, and Luca Benini. Distributed inference with minimal off-chip traffic for trans- formers on low-power MCUs. InDesign, Automation & Test in Europe Conference (DATE). IEEE,

work page 2025

[4] [4]

Survey on the characterization and classification of wireless sensor network applications.IEEE Communications Surveys & Tutorials,

[Borgeset al., 2014 ] Luis M Borges, Fernando J Velez, and António S Lebres. Survey on the characterization and classification of wireless sensor network applications.IEEE Communications Surveys & Tutorials,

work page 2014

[5] [5]

The future of wireless mesh network in next-generation communication: A perspective overview.Evolving Systems,

[Chaiet al., 2024 ] Yuan Chai, Xiao-Jun Zeng, and Zixu Liu. The future of wireless mesh network in next-generation communication: A perspective overview.Evolving Systems,

work page 2024

[6] [6]

RCIF: Towards robust distributed DNN collaborative inference under highly lossy networks

[Chenget al., 2024 ] Yujun Cheng, Zhewei Zhang, and Shengjin Wang. RCIF: Towards robust distributed DNN collaborative inference under highly lossy networks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,

work page 2024

[7] [7]

Distributed deep convolutional neural net- works for the Internet of Things.IEEE Transactions on Computers,

[Disabatoet al., 2021 ] Simone Disabato, Manuel Roveri, and Cesare Alippi. Distributed deep convolutional neural net- works for the Internet of Things.IEEE Transactions on Computers,

work page 2021

[8] [8]

An image is worth 16x16 words: Transformers for image recognition at scale

[Dosovitskiyet al., 2021 ] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Min- derer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Represen- tations,

work page 2021

[9] [9]

Co- designing transformer architectures for distributed infer- ence with low communication.IEEE Transactions on Par- allel and Distributed Systems,

[Duet al., 2024 ] Jiangsu Du, Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, and Yutong Lu. Co- designing transformer architectures for distributed infer- ence with low communication.IEEE Transactions on Par- allel and Distributed Systems,

work page 2024

[10] [10]

Efficient network flooding and time synchronization with Glossy

[Ferrariet al., 2011 ] Federico Ferrari, Marco Zimmerling, Lothar Thiele, and Olga Saukh. Efficient network flooding and time synchronization with Glossy. InProceedings of the 10th ACM/IEEE International Conference on Informa- tion Processing in Sensor Networks,

work page 2011

[11] [11]

Monash time series forecasting archive

[Godahewaet al., 2021 ] Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643,

work page arXiv 2021

[12] [12]

RockNet: Distributed learning on ultra-low-power devices.ACM Transactions on Cyber-Physical Systems,

[Gräfeet al., 2026 ] Alexander Gräfe, Fabian Mager, Marco Zimmerling, and Sebastian Trimpe. RockNet: Distributed learning on ultra-low-power devices.ACM Transactions on Cyber-Physical Systems,

work page 2026

[13] [13]

DNN partitioning for cooperative infer- ence in edge intelligence: Modeling, solutions, toolchains

[Haoet al., 2025 ] Yuntao Hao, Nan Ding, Weiguo Xia, Hong- wei Ge, and Li Xu. DNN partitioning for cooperative infer- ence in edge intelligence: Modeling, solutions, toolchains. ACM Computing Surveys,

work page 2025

[14] [14]

Mixer: Efficient many-to-all broad- cast in dynamic wireless mesh networks

[Herrmannet al., 2018 ] Carsten Herrmann, Fabian Mager, and Marco Zimmerling. Mixer: Efficient many-to-all broad- cast in dynamic wireless mesh networks. In16th ACM Con- ference on Embedded Networked Sensor Systems. ACM,

work page 2018

[15] [15]

Karger, Michelle Effros, Jun Shi, and Ben Leong

[Hoet al., 2006 ] Tracey Ho, Muriel Médard, Ralf Koetter, David R. Karger, Michelle Effros, Jun Shi, and Ben Leong. A random linear network coding approach to multicast. IEEE Transactions on Information Theory,

work page 2006

[16] [16]

Loss-adapter: Addressing network packet loss in distributed inference for lossy IoT environments.IEEE Internet of Things Journal,

[Hou and Ohtsuki, 2025] Zhangcheng Hou and Tomoaki Oht- suki. Loss-adapter: Addressing network packet loss in distributed inference for lossy IoT environments.IEEE Internet of Things Journal,

work page 2025

[17] [17]

When the edge meets transformers: Distributed inference with trans- former models

[Hu and Li, 2024] Chenghao Hu and Baochun Li. When the edge meets transformers: Distributed inference with trans- former models. In44th International Conference on Dis- tributed Computing Systems (ICDCS). IEEE,

work page 2024

[18] [18]

Communication-oriented model fine-tuning for packet-loss resilient distributed in- ference under highly lossy IoT networks.IEEE Access,

[Itaharaet al., 2022 ] Sohei Itahara, Takayuki Nishio, Yusuke Koda, and Koji Yamamoto. Communication-oriented model fine-tuning for packet-loss resilient distributed in- ference under highly lossy IoT networks.IEEE Access,

work page 2022

[19] [19]

Challenges, applications, and future of wireless sensors in Internet of Things: A review.IEEE Sensors Journal,

[Jamshedet al., 2022 ] Muhammad Ali Jamshed, Kamran Ali, Qammer H Abbasi, Muhammad Ali Imran, and Masood Ur- Rehman. Challenges, applications, and future of wireless sensors in Internet of Things: A review.IEEE Sensors Journal,

work page 2022

[20] [20]

Communication-aware DNN pruning

[Jianet al., 2023 ] Tong Jian, Debashri Roy, Batool Salehi, Nasim Soltani, Kaushik Chowdhury, and Stratis Ioannidis. Communication-aware DNN pruning. InIEEE Conference on Computer Communications,

work page 2023

[21] [21]

Opti- mization framework for splitting DNN inference jobs over computing networks.Computer Networks,

[Jung and Lee, 2023] Sehun Jung and Hyang-Won Lee. Opti- mization framework for splitting DNN inference jobs over computing networks.Computer Networks,

work page 2023

[22] [22]

Survey on computer vision techniques for Internet of Things devices

[Kaur and Jadhav, 2023] Ishmeet Kaur and Adwaita Janard- han Jadhav. Survey on computer vision techniques for Internet of Things devices. InInternational Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). IEEE,

work page 2023

[23] [23]

Kingma and Jimmy Ba

[Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba. ADAM: A method for stochastic optimization. InInterna- tional Conference on Learning Representations,

work page 2015

[24] [24]

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

[Laiet al., 2018 ] Liangzhen Lai, Naveen Suda, and Vikas Chandra. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs.arXiv preprint arXiv:1801.06601,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

The capture effect in FM receivers.IEEE Transactions on Communications,

[Leentvaar and Flint, 1976] Krijn Leentvaar and Jan Flint. The capture effect in FM receivers.IEEE Transactions on Communications,

work page 1976

[26] [26]

Communication-efficient multi-device in- ference acceleration for transformer models.arXiv preprint arXiv:2505.19342,

[Liuet al., 2025b ] Xiao Liu, Lijun Zhang, Deepak Ganesan, and Hui Guan. Communication-efficient multi-device in- ference acceleration for transformer models.arXiv preprint arXiv:2505.19342,

work page internal anchor Pith review arXiv

[27] [27]

MoDNN: Local dis- tributed mobile computing system for deep neural network

[Maoet al., 2017 ] Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. MoDNN: Local dis- tributed mobile computing system for deep neural network. InDesign, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,

work page 2017

[28] [28]

A time series is worth 64 words: Long-term forecasting with transformers

[Nieet al., 2023 ] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The 11th International Conference on Learning Represen- tations,

work page 2023

[29] [29]

Siracusa: A 16 nm heterogenous RISC-V SoC for extended reality with at-MRAM neural engine.IEEE Journal of Solid-State Circuits,

[Prasadet al., 2024 ] Arpan Suravi Prasad, Moritz Scherer, Francesco Conti, Davide Rossi, Alfio Di Mauro, Manuel Eggimann, Jorge Tomás Gómez, Ziyun Li, Syed Shakib Sarwar, Zhao Wang, et al. Siracusa: A 16 nm heterogenous RISC-V SoC for extended reality with at-MRAM neural engine.IEEE Journal of Solid-State Circuits,

work page 2024

[30] [30]

Disco: Distributed inference with sparse communications.arXiv preprint arXiv:2302.11180,

[Qinet al., 2023 ] Minghai Qin, Chao Sun, Jaco Hofmann, and Dejan Vucinic. Disco: Distributed inference with sparse communications.arXiv preprint arXiv:2302.11180,

work page arXiv 2023

[31] [31]

Wireless sensor networks in agri- culture through machine learning: A survey.Computers and Electronics in Agriculture,

[Rahaman and Azharuddin, 2022] Md Mohinur Rahaman and Md Azharuddin. Wireless sensor networks in agri- culture through machine learning: A survey.Computers and Electronics in Agriculture,

work page 2022

[32] [32]

DISNET: Distributed micro-split deep learn- ing in heterogeneous dynamic IoT.IEEE Internet of Things Journal,

[Samikwaet al., 2023 ] Eric Samikwa, Antonio Di Maio, and Torsten Braun. DISNET: Distributed micro-split deep learn- ing in heterogeneous dynamic IoT.IEEE Internet of Things Journal,

work page 2023

[33] [33]

Energy harvest- ing techniques for Internet of Things (IoT).IEEE Access,

[Sanislavet al., 2021 ] Teodora Sanislav, George Dan Mois, Sherali Zeadally, and Silviu Corneliu Folea. Energy harvest- ing techniques for Internet of Things (IoT).IEEE Access,

work page 2021

[34] [34]

Structural health monitoring using wireless smart sensor network – an overview.Mechanical Systems and Signal Processing,

[Sofiet al., 2022 ] A Sofi, J Jane Regita, Bhagyesh Rane, and Hieng Ho Lau. Structural health monitoring using wireless smart sensor network – an overview.Mechanical Systems and Signal Processing,

work page 2022

[35] [35]

An empirical study of low-power wireless.ACM Transactions on Sensor Net- works (TOSN),

[Srinivasanet al., 2010 ] Kannan Srinivasan, Prabal Dutta, Ar- salan Tavakoli, and Philip Levis. An empirical study of low-power wireless.ACM Transactions on Sensor Net- works (TOSN),

work page 2010

[36] [36]

DeeperThings: Fully distributed CNN infer- ence on resource-constrained edge devices.International Journal of Parallel Programming,

[Stahlet al., 2021 ] Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, Andreas Gerstlauer, and Ulf Schlichtmann. DeeperThings: Fully distributed CNN infer- ence on resource-constrained edge devices.International Journal of Parallel Programming,

work page 2021

[37] [37]

Self- organizing maps for anomaly localization and predictive maintenance in cyber-physical production systems.Proce- dia CIRP,

[V on Birgelenet al., 2018] Alexander V on Birgelen, Davide Buratti, Jens Mager, and Oliver Niggemann. Self- organizing maps for anomaly localization and predictive maintenance in cyber-physical production systems.Proce- dia CIRP,

work page 2018

[38] [38]

Communication-efficient model parallelism for distributed in-situ transformer inference

[Weiet al., 2024 ] Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, Jiangsu Du, and Yutong Lu. Communication-efficient model parallelism for distributed in-situ transformer inference. InDesign, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,

work page 2024

[39] [39]

EasyViT: An adaptive collaborative edge computing framework for vision trans- former.IEEE Internet of Things Journal,

[Wenet al., 2025 ] Dong Wen, Guanping Liang, Tianyun Li, Lin Chen, Junnan Li, and Tao Li. EasyViT: An adaptive collaborative edge computing framework for vision trans- former.IEEE Internet of Things Journal,

work page 2025

[40] [40]

DeViT: Decompos- ing vision transformers for collaborative inference in edge devices.IEEE Transactions on Mobile Computing,

[Xuet al., 2023 ] Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, and Shiwen Mao. DeViT: Decompos- ing vision transformers for collaborative inference in edge devices.IEEE Transactions on Mobile Computing,

work page 2023

[41] [41]

Communication- efficient distributed on-device LLM inference over wireless networks.arXiv preprint arXiv:2503.14882,

[Zhanget al., 2025a ] Kai Zhang, Hengtao He, Shenghui Song, Jun Zhang, and Khaled B Letaief. Communication- efficient distributed on-device LLM inference over wireless networks.arXiv preprint arXiv:2503.14882,

work page arXiv

[42] [42]

Informer: Beyond efficient transformer for long sequence time-series forecasting

[Zhouet al., 2021 ] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence,

work page 2021

[43] [43]

Synchronous transmissions in low- power wireless: A survey of communication protocols and network services.ACM Computing Surveys,

[Zimmerlinget al., 2020 ] Marco Zimmerling, Luca Mottola, and Silvia Santini. Synchronous transmissions in low- power wireless: A survey of communication protocols and network services.ACM Computing Surveys,

work page 2020

[44] [44]

Integrat- ing large language models with Internet of Things: Appli- cations.Discover Internet of Things, 2025

[Zonget al., 2025 ] Mingyu Zong, Arvin Hekmati, Michael Guastalla, Yiyi Li, and Bhaskar Krishnamachari. Integrat- ing large language models with Internet of Things: Appli- cations.Discover Internet of Things, 2025

work page 2025