pith. sign in

arxiv: 2605.15694 · v2 · pith:3772BMCJnew · submitted 2026-05-15 · 💻 cs.LG

Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices

Pith reviewed 2026-05-20 20:11 UTC · model grok-4.3

classification 💻 cs.LG
keywords distributed inferencetransformer modelsultra-low-power deviceswireless IoTmodel parallelismcommunication primitiveSomeGatheredge AI
0
0 comments X

The pith

CATS enables distributed transformer inference on ultra-low-power wireless devices by running models up to 14 times larger across up to 16 nodes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CATS as a framework that lets multiple ultra-low-power wireless devices collaborate to execute transformer models too large for any single device. It centers on SomeGather, a pruned primitive that selectively broadcasts only necessary activation columns to cut communication and memory costs. Message-dropout training during the process builds robustness to the packet losses typical in wireless settings. A sympathetic reader cares because this co-design of partitioning, communication, and training could bring large-model capabilities to cheap battery-powered IoT hardware without relying on powerful central servers or constant connectivity.

Core claim

CATS is a communication-aware distributed transformer inference scheme co-designed across transformer partitioning, wireless communication and training. It employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy. Building on SomeGather, it designs a partitioning method that exploits this primitive for efficient model parallelism and uses message-dropout during training to yield models robust to message loss during inference.

What carries the argument

SomeGather, a pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy.

If this is right

  • Networks of up to 16 devices can execute transformer models 14 times larger than what fits on one device.
  • Message-dropout training produces models that retain accuracy despite packet losses during inference.
  • Partitioning built around SomeGather achieves efficient model parallelism with lower bandwidth and RAM demands.
  • The approach demonstrates the first real-world deployments of distributed transformer inference on ultra-low-power wireless hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The selective-broadcast idea could extend to other model families if analogous pruning rules are found for their layer operations.
  • Scaling to larger networks or mobile scenarios would likely need additional handling of device mobility and clock drift not tested here.
  • Combining CATS with local energy harvesting could support longer-running deployments in variable environments.

Load-bearing premise

SomeGather's selective column broadcasting combined with message-dropout training preserves model accuracy under real-world wireless packet losses and device constraints without hidden overheads that would negate the size gains.

What would settle it

An experiment on a real wireless testbed measuring whether accuracy remains within acceptable bounds at observed packet loss rates while total latency and energy stay below single-device baselines for equivalent model size.

Figures

Figures reproduced from arXiv: 2605.15694 by Alexander Gr\"afe, Ding Huo, Johannes Berger, Marco Zimmerling, Sebastian Trimpe, Vincent de Bakker.

Figure 1
Figure 1. Figure 1: Collaborative Inference at the Sensor-level ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Device Operation. Devices operate in synchronized rounds [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Model Size Scaling. Colored regions indicate feasible com [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Test Loss of Models Versus Message Loss. We train trans [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of Normal Pruning ( ) and SomeGather ( ) on Accuracy Versus Communication Trade-off. SomeGather’s accuracy remains approximately constant with decreasing communi￾cation, whereas normal pruning’s accuracy degrades significantly. 0 % to 90 % accelerates the attention block by 3.08× at 256 features and 4.37× at 512 features and the residual block by 3.21× and 4.68×, respectively. Overall, pruned in… view at source ↗
read the original abstract

Transformer models are rapidly becoming a cornerstone of modern Internet of Things (IoT) applications, yet their computational and memory demands far exceed the capabilities of a single typical ultra-low-power IoT device. We present CATS, a framework for distributed transformer inference on ultra-low-power wireless devices, enabling multiple devices to collaboratively execute models far larger than what a single device can sustain. At its core, CATS is a communication-aware distributed transformer inference scheme co-designed across transformer partitioning, wireless communication and training. It employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy. Building on SomeGather, we design a partitioning method that exploits this primitive for efficient model parallelism. To cope with unreliable wireless communication, CATS employs message-dropout during training, which mimics packet losses and yields models that are robust to message loss during inference. In real-world experiments, we show that CATS brings distributed transformer inference to ultra-low-power wireless devices for the first time, with deployments on up to 16 devices that collaboratively execute transformer models up to 14 times larger than what a single device can run.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents CATS, a framework for distributed transformer inference on ultra-low-power wireless IoT devices. It introduces SomeGather, a new pruned communication primitive for selective column broadcasting to reduce bandwidth and RAM, a partitioning scheme for model parallelism, and message-dropout training to handle wireless packet losses. Real-world experiments claim deployments on up to 16 devices that collaboratively run transformer models up to 14 times larger than a single device can support.

Significance. If the accuracy preservation claims hold, the work could enable substantially larger models on constrained wireless devices, advancing practical edge AI for IoT. The co-design of communication primitives, partitioning, and robust training, together with actual multi-device deployments rather than simulations, represents a concrete strength that goes beyond typical theoretical proposals in this area.

major comments (2)
  1. [Abstract] Abstract: the central 14x size-gain claim rests on SomeGather plus message-dropout preserving end-to-end accuracy under real packet losses, yet no accuracy metrics, baselines, error bars, per-layer statistics, or ablation results on pruned columns are reported; without these the practical value of the size increase cannot be assessed.
  2. [SomeGather] SomeGather description: the selective column broadcasting is presented as accuracy-neutral, but no analysis is given of which columns are dropped, whether the selection is input-dependent, or how it interacts with attention and FFN layers; this directly affects whether the reported communication savings are sustainable without hidden accuracy costs.
minor comments (2)
  1. [Terminology] The acronym 'SomeGather' is introduced without explanation of its relation to standard gather primitives or the rationale for the name, which may hinder readability for readers in distributed systems.
  2. [Related Work] The manuscript would benefit from explicit comparison tables against prior distributed inference systems for non-transformer models to better highlight the novelty of the wireless co-design.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and describe the revisions we will incorporate to improve clarity and support for the claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central 14x size-gain claim rests on SomeGather plus message-dropout preserving end-to-end accuracy under real packet losses, yet no accuracy metrics, baselines, error bars, per-layer statistics, or ablation results on pruned columns are reported; without these the practical value of the size increase cannot be assessed.

    Authors: We agree that the abstract would benefit from explicit quantitative support for the accuracy claim. The body of the manuscript reports end-to-end accuracy under real packet losses together with single-device baselines and message-dropout ablations; however, these details are not summarized in the abstract. We will revise the abstract to include representative accuracy figures, reference to error bars from repeated runs, and a brief mention of the ablation results on pruned columns, while cross-referencing the experimental section for per-layer statistics. revision: yes

  2. Referee: [SomeGather] SomeGather description: the selective column broadcasting is presented as accuracy-neutral, but no analysis is given of which columns are dropped, whether the selection is input-dependent, or how it interacts with attention and FFN layers; this directly affects whether the reported communication savings are sustainable without hidden accuracy costs.

    Authors: The manuscript describes SomeGather as a magnitude-based pruning primitive applied uniformly across layers and states that it preserves accuracy when combined with message-dropout training. To strengthen the presentation, we will add a dedicated paragraph (or short subsection) that (i) specifies the column-selection criterion, (ii) clarifies that selection is performed on a per-activation basis and is therefore input-dependent, and (iii) discusses its application to both attention and FFN blocks, including why the chosen columns do not materially degrade the subsequent matrix multiplications. This addition will be supported by the existing end-to-end accuracy results rather than new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on experimental system validation

full rationale

The paper describes an engineering framework (CATS) for distributed transformer inference on wireless IoT devices, with core contributions being the SomeGather primitive, partitioning method, and message-dropout training. These are introduced as co-designed techniques and validated directly via real-world deployments on up to 16 devices executing models up to 14x larger than single-device capacity. No mathematical derivation chain, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the abstract or described content. The central claims are grounded in implemented experiments rather than reducing to inputs by construction, rendering the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on a new communication primitive and a training modification whose effectiveness is validated experimentally rather than derived from first principles; limited free parameters are visible in the abstract.

axioms (1)
  • domain assumption Packet losses in wireless channels can be adequately mimicked by random message dropout during training to produce inference-time robustness.
    Invoked to justify the message-dropout component of the training procedure.
invented entities (1)
  • SomeGather no independent evidence
    purpose: Pruned communication primitive that selectively broadcasts activation columns to reduce bandwidth and RAM usage.
    Newly introduced component central to the partitioning and communication scheme.

pith-pipeline@v0.9.0 · 5748 in / 1336 out tokens · 77625 ms · 2026-05-20T20:11:14.608127+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Restructuring, pruning, and adjustment of deep models for parallel distributed infer- ence.arXiv preprint arXiv:2008.08289,

    [Abdiet al., 2020 ] Afshin Abdi, Saeed Rashidi, Faramarz Fekri, and Tushar Krishna. Restructuring, pruning, and adjustment of deep models for parallel distributed infer- ence.arXiv preprint arXiv:2008.08289,

  2. [2]

    Wireless control for smart manufacturing: Recent approaches and open challenges.Proceedings of the IEEE,

    [Baumannet al., 2020 ] Dominik Baumann, Fabian Mager, Ulf Wetzker, Lothar Thiele, Marco Zimmerling, and Se- bastian Trimpe. Wireless control for smart manufacturing: Recent approaches and open challenges.Proceedings of the IEEE,

  3. [3]

    Distributed inference with minimal off-chip traffic for trans- formers on low-power MCUs

    [Bochemet al., 2025 ] Severin Bochem, Victor JB Jung, Arpan Suravi Prasad, Francesco Conti, and Luca Benini. Distributed inference with minimal off-chip traffic for trans- formers on low-power MCUs. InDesign, Automation & Test in Europe Conference (DATE). IEEE,

  4. [4]

    Survey on the characterization and classification of wireless sensor network applications.IEEE Communications Surveys & Tutorials,

    [Borgeset al., 2014 ] Luis M Borges, Fernando J Velez, and António S Lebres. Survey on the characterization and classification of wireless sensor network applications.IEEE Communications Surveys & Tutorials,

  5. [5]

    The future of wireless mesh network in next-generation communication: A perspective overview.Evolving Systems,

    [Chaiet al., 2024 ] Yuan Chai, Xiao-Jun Zeng, and Zixu Liu. The future of wireless mesh network in next-generation communication: A perspective overview.Evolving Systems,

  6. [6]

    RCIF: Towards robust distributed DNN collaborative inference under highly lossy networks

    [Chenget al., 2024 ] Yujun Cheng, Zhewei Zhang, and Shengjin Wang. RCIF: Towards robust distributed DNN collaborative inference under highly lossy networks. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE,

  7. [7]

    Distributed deep convolutional neural net- works for the Internet of Things.IEEE Transactions on Computers,

    [Disabatoet al., 2021 ] Simone Disabato, Manuel Roveri, and Cesare Alippi. Distributed deep convolutional neural net- works for the Internet of Things.IEEE Transactions on Computers,

  8. [8]

    An image is worth 16x16 words: Transformers for image recognition at scale

    [Dosovitskiyet al., 2021 ] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Min- derer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Represen- tations,

  9. [9]

    Co- designing transformer architectures for distributed infer- ence with low communication.IEEE Transactions on Par- allel and Distributed Systems,

    [Duet al., 2024 ] Jiangsu Du, Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, and Yutong Lu. Co- designing transformer architectures for distributed infer- ence with low communication.IEEE Transactions on Par- allel and Distributed Systems,

  10. [10]

    Efficient network flooding and time synchronization with Glossy

    [Ferrariet al., 2011 ] Federico Ferrari, Marco Zimmerling, Lothar Thiele, and Olga Saukh. Efficient network flooding and time synchronization with Glossy. InProceedings of the 10th ACM/IEEE International Conference on Informa- tion Processing in Sensor Networks,

  11. [11]

    Monash time series forecasting archive

    [Godahewaet al., 2021 ] Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob J Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643,

  12. [12]

    RockNet: Distributed learning on ultra-low-power devices.ACM Transactions on Cyber-Physical Systems,

    [Gräfeet al., 2026 ] Alexander Gräfe, Fabian Mager, Marco Zimmerling, and Sebastian Trimpe. RockNet: Distributed learning on ultra-low-power devices.ACM Transactions on Cyber-Physical Systems,

  13. [13]

    DNN partitioning for cooperative infer- ence in edge intelligence: Modeling, solutions, toolchains

    [Haoet al., 2025 ] Yuntao Hao, Nan Ding, Weiguo Xia, Hong- wei Ge, and Li Xu. DNN partitioning for cooperative infer- ence in edge intelligence: Modeling, solutions, toolchains. ACM Computing Surveys,

  14. [14]

    Mixer: Efficient many-to-all broad- cast in dynamic wireless mesh networks

    [Herrmannet al., 2018 ] Carsten Herrmann, Fabian Mager, and Marco Zimmerling. Mixer: Efficient many-to-all broad- cast in dynamic wireless mesh networks. In16th ACM Con- ference on Embedded Networked Sensor Systems. ACM,

  15. [15]

    Karger, Michelle Effros, Jun Shi, and Ben Leong

    [Hoet al., 2006 ] Tracey Ho, Muriel Médard, Ralf Koetter, David R. Karger, Michelle Effros, Jun Shi, and Ben Leong. A random linear network coding approach to multicast. IEEE Transactions on Information Theory,

  16. [16]

    Loss-adapter: Addressing network packet loss in distributed inference for lossy IoT environments.IEEE Internet of Things Journal,

    [Hou and Ohtsuki, 2025] Zhangcheng Hou and Tomoaki Oht- suki. Loss-adapter: Addressing network packet loss in distributed inference for lossy IoT environments.IEEE Internet of Things Journal,

  17. [17]

    When the edge meets transformers: Distributed inference with trans- former models

    [Hu and Li, 2024] Chenghao Hu and Baochun Li. When the edge meets transformers: Distributed inference with trans- former models. In44th International Conference on Dis- tributed Computing Systems (ICDCS). IEEE,

  18. [18]

    Communication-oriented model fine-tuning for packet-loss resilient distributed in- ference under highly lossy IoT networks.IEEE Access,

    [Itaharaet al., 2022 ] Sohei Itahara, Takayuki Nishio, Yusuke Koda, and Koji Yamamoto. Communication-oriented model fine-tuning for packet-loss resilient distributed in- ference under highly lossy IoT networks.IEEE Access,

  19. [19]

    Challenges, applications, and future of wireless sensors in Internet of Things: A review.IEEE Sensors Journal,

    [Jamshedet al., 2022 ] Muhammad Ali Jamshed, Kamran Ali, Qammer H Abbasi, Muhammad Ali Imran, and Masood Ur- Rehman. Challenges, applications, and future of wireless sensors in Internet of Things: A review.IEEE Sensors Journal,

  20. [20]

    Communication-aware DNN pruning

    [Jianet al., 2023 ] Tong Jian, Debashri Roy, Batool Salehi, Nasim Soltani, Kaushik Chowdhury, and Stratis Ioannidis. Communication-aware DNN pruning. InIEEE Conference on Computer Communications,

  21. [21]

    Opti- mization framework for splitting DNN inference jobs over computing networks.Computer Networks,

    [Jung and Lee, 2023] Sehun Jung and Hyang-Won Lee. Opti- mization framework for splitting DNN inference jobs over computing networks.Computer Networks,

  22. [22]

    Survey on computer vision techniques for Internet of Things devices

    [Kaur and Jadhav, 2023] Ishmeet Kaur and Adwaita Janard- han Jadhav. Survey on computer vision techniques for Internet of Things devices. InInternational Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT). IEEE,

  23. [23]

    Kingma and Jimmy Ba

    [Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Ba. ADAM: A method for stochastic optimization. InInterna- tional Conference on Learning Representations,

  24. [24]

    CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

    [Laiet al., 2018 ] Liangzhen Lai, Naveen Suda, and Vikas Chandra. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs.arXiv preprint arXiv:1801.06601,

  25. [25]

    The capture effect in FM receivers.IEEE Transactions on Communications,

    [Leentvaar and Flint, 1976] Krijn Leentvaar and Jan Flint. The capture effect in FM receivers.IEEE Transactions on Communications,

  26. [26]

    Communication-efficient multi-device in- ference acceleration for transformer models.arXiv preprint arXiv:2505.19342,

    [Liuet al., 2025b ] Xiao Liu, Lijun Zhang, Deepak Ganesan, and Hui Guan. Communication-efficient multi-device in- ference acceleration for transformer models.arXiv preprint arXiv:2505.19342,

  27. [27]

    MoDNN: Local dis- tributed mobile computing system for deep neural network

    [Maoet al., 2017 ] Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. MoDNN: Local dis- tributed mobile computing system for deep neural network. InDesign, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,

  28. [28]

    A time series is worth 64 words: Long-term forecasting with transformers

    [Nieet al., 2023 ] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In The 11th International Conference on Learning Represen- tations,

  29. [29]

    Siracusa: A 16 nm heterogenous RISC-V SoC for extended reality with at-MRAM neural engine.IEEE Journal of Solid-State Circuits,

    [Prasadet al., 2024 ] Arpan Suravi Prasad, Moritz Scherer, Francesco Conti, Davide Rossi, Alfio Di Mauro, Manuel Eggimann, Jorge Tomás Gómez, Ziyun Li, Syed Shakib Sarwar, Zhao Wang, et al. Siracusa: A 16 nm heterogenous RISC-V SoC for extended reality with at-MRAM neural engine.IEEE Journal of Solid-State Circuits,

  30. [30]

    Disco: Distributed inference with sparse communications.arXiv preprint arXiv:2302.11180,

    [Qinet al., 2023 ] Minghai Qin, Chao Sun, Jaco Hofmann, and Dejan Vucinic. Disco: Distributed inference with sparse communications.arXiv preprint arXiv:2302.11180,

  31. [31]

    Wireless sensor networks in agri- culture through machine learning: A survey.Computers and Electronics in Agriculture,

    [Rahaman and Azharuddin, 2022] Md Mohinur Rahaman and Md Azharuddin. Wireless sensor networks in agri- culture through machine learning: A survey.Computers and Electronics in Agriculture,

  32. [32]

    DISNET: Distributed micro-split deep learn- ing in heterogeneous dynamic IoT.IEEE Internet of Things Journal,

    [Samikwaet al., 2023 ] Eric Samikwa, Antonio Di Maio, and Torsten Braun. DISNET: Distributed micro-split deep learn- ing in heterogeneous dynamic IoT.IEEE Internet of Things Journal,

  33. [33]

    Energy harvest- ing techniques for Internet of Things (IoT).IEEE Access,

    [Sanislavet al., 2021 ] Teodora Sanislav, George Dan Mois, Sherali Zeadally, and Silviu Corneliu Folea. Energy harvest- ing techniques for Internet of Things (IoT).IEEE Access,

  34. [34]

    Structural health monitoring using wireless smart sensor network – an overview.Mechanical Systems and Signal Processing,

    [Sofiet al., 2022 ] A Sofi, J Jane Regita, Bhagyesh Rane, and Hieng Ho Lau. Structural health monitoring using wireless smart sensor network – an overview.Mechanical Systems and Signal Processing,

  35. [35]

    An empirical study of low-power wireless.ACM Transactions on Sensor Net- works (TOSN),

    [Srinivasanet al., 2010 ] Kannan Srinivasan, Prabal Dutta, Ar- salan Tavakoli, and Philip Levis. An empirical study of low-power wireless.ACM Transactions on Sensor Net- works (TOSN),

  36. [36]

    DeeperThings: Fully distributed CNN infer- ence on resource-constrained edge devices.International Journal of Parallel Programming,

    [Stahlet al., 2021 ] Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, Andreas Gerstlauer, and Ulf Schlichtmann. DeeperThings: Fully distributed CNN infer- ence on resource-constrained edge devices.International Journal of Parallel Programming,

  37. [37]

    Self- organizing maps for anomaly localization and predictive maintenance in cyber-physical production systems.Proce- dia CIRP,

    [V on Birgelenet al., 2018] Alexander V on Birgelen, Davide Buratti, Jens Mager, and Oliver Niggemann. Self- organizing maps for anomaly localization and predictive maintenance in cyber-physical production systems.Proce- dia CIRP,

  38. [38]

    Communication-efficient model parallelism for distributed in-situ transformer inference

    [Weiet al., 2024 ] Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, Jiangsu Du, and Yutong Lu. Communication-efficient model parallelism for distributed in-situ transformer inference. InDesign, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,

  39. [39]

    EasyViT: An adaptive collaborative edge computing framework for vision trans- former.IEEE Internet of Things Journal,

    [Wenet al., 2025 ] Dong Wen, Guanping Liang, Tianyun Li, Lin Chen, Junnan Li, and Tao Li. EasyViT: An adaptive collaborative edge computing framework for vision trans- former.IEEE Internet of Things Journal,

  40. [40]

    DeViT: Decompos- ing vision transformers for collaborative inference in edge devices.IEEE Transactions on Mobile Computing,

    [Xuet al., 2023 ] Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, and Shiwen Mao. DeViT: Decompos- ing vision transformers for collaborative inference in edge devices.IEEE Transactions on Mobile Computing,

  41. [41]

    Communication- efficient distributed on-device LLM inference over wireless networks.arXiv preprint arXiv:2503.14882,

    [Zhanget al., 2025a ] Kai Zhang, Hengtao He, Shenghui Song, Jun Zhang, and Khaled B Letaief. Communication- efficient distributed on-device LLM inference over wireless networks.arXiv preprint arXiv:2503.14882,

  42. [42]

    Informer: Beyond efficient transformer for long sequence time-series forecasting

    [Zhouet al., 2021 ] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence,

  43. [43]

    Synchronous transmissions in low- power wireless: A survey of communication protocols and network services.ACM Computing Surveys,

    [Zimmerlinget al., 2020 ] Marco Zimmerling, Luca Mottola, and Silvia Santini. Synchronous transmissions in low- power wireless: A survey of communication protocols and network services.ACM Computing Surveys,

  44. [44]

    Integrat- ing large language models with Internet of Things: Appli- cations.Discover Internet of Things, 2025

    [Zonget al., 2025 ] Mingyu Zong, Arvin Hekmati, Michael Guastalla, Yiyi Li, and Bhaskar Krishnamachari. Integrat- ing large language models with Internet of Things: Appli- cations.Discover Internet of Things, 2025