pith. machine review for the scientific record. sign in

arxiv: 2602.22525 · v2 · submitted 2026-02-26 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

Systems-Level Attack Surface of Edge Agent Deployments on IoT

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:30 UTC · model grok-4.3

classification 💻 cs.CR
keywords edge deploymentIoT securityLLM agentsattack surfacesdeployment architecturehome automationMQTTsovereignty
0
0 comments X

The pith

IoT agent security hinges on deployment architecture, not model choice

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines security risks when large language model agents run directly on IoT devices instead of in the cloud. It compares three architectures—cloud-hosted, edge-local swarm, and hybrid—on a home automation testbed using MQTT messaging between devices and an Android phone for edge inference. The work identifies five systems-level attack surfaces, including coordination failures between devices and trust erosion when fallback mechanisms activate. Security is measured through concrete metrics such as data leaving the local system, time windows during recovery, integrity of control boundaries, and traceability of actions. The central result is that these placement choices affect risk more than the details of the model or its prompts.

Core claim

Edge deployment of LLM agents on IoT hardware introduces attack surfaces absent from cloud-hosted orchestration. The empirical analysis of cloud-hosted, edge-local swarm, and hybrid architectures on a multi-device home-automation testbed identifies five systems-level attack surfaces, including coordination-state divergence and induced trust erosion. Edge-local deployments eliminate routine cloud data exposure but silently degrade sovereignty when fallback mechanisms trigger, with boundary crossings invisible at the application layer. Provenance chains remain complete under cooperative operation yet are trivially bypassed without cryptographic enforcement. Failover windows create transient盲点可

What carries the argument

Three deployment architectures (cloud-hosted, edge-local swarm, hybrid) tested on a multi-device home-automation testbed with local MQTT messaging and Android edge node, tracked via metrics of data egress volume, failover window exposure, sovereignty boundary integrity, and provenance chain completeness.

If this is right

  • Edge-local swarms avoid routine cloud data exposure compared to cloud-hosted setups.
  • Fallback triggers in edge and hybrid setups silently reduce sovereignty with invisible boundary crossings.
  • Provenance chains stay reliable only under cooperative operation and need cryptographic enforcement.
  • Failover windows create transient blind spots that allow unauthorized device actuation.
  • Security risk in agent-controlled IoT depends primarily on deployment architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar architecture risks are likely present in other edge AI systems such as autonomous vehicles or industrial controls.
  • Designers could reduce hidden problems by adding visible boundary monitoring at the application layer.
  • Standards for IoT agents may need to specify deployment architecture requirements separately from model robustness.
  • Routine security evaluations that test only prompts and models will miss major practical vulnerabilities.

Load-bearing premise

The home-automation testbed with local MQTT and Android smartphone as edge node accurately represents real-world edge agent deployments on IoT hardware.

What would settle it

A follow-up study on a different IoT setup, such as industrial sensors without MQTT, that shows model or prompt changes reduce security risks more than architecture changes would disprove the primary determinant claim.

Figures

Figures reproduced from arXiv: 2602.22525 by Hamed Haddadi, Krinos Li, Yefan Zhang, Zhonghao Zhan.

Figure 1
Figure 1. Figure 1: Testbed topology. Inter-agent traffic traverses [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MQTT message flow and supervision architecture. Agents communicate via per-agent inbox topics [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Edge deployment of LLM agents on IoT hardware introduces attack surfaces absent from cloud-hosted orchestration. We present an empirical security analysis of three architectures (cloud-hosted, edge-local swarm, and hybrid) using a multi-device home-automation testbed with local MQTT messaging and an Android smartphone as an edge inference node. We identify five systems-level attack surfaces, including two emergent failures observed during live testbed operation: coordination-state divergence and induced trust erosion. We frame core security properties as measurable systems metrics: data egress volume, failover window exposure, sovereignty boundary integrity, and provenance chain completeness. Our measurements show that edge-local deployments eliminate routine cloud data exposure but silently degrade sovereignty when fallback mechanisms trigger, with boundary crossings invisible at the application layer. Provenance chains remain complete under cooperative operation yet are trivially bypassed without cryptographic enforcement. Failover windows create transient blind spots exploitable for unauthorised actuation. These results demonstrate that deployment architecture, not just model or prompt design, is a primary determinant of security risk in agent-controlled IoT systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript presents an empirical security analysis of LLM agent deployments on IoT hardware, comparing three architectures (cloud-hosted, edge-local swarm, and hybrid) via a multi-device home-automation testbed using local MQTT messaging and an Android smartphone as the edge inference node. It identifies five systems-level attack surfaces, including two emergent failures (coordination-state divergence and induced trust erosion) observed during live operation, and frames security properties as measurable metrics including data egress volume, failover window exposure, sovereignty boundary integrity, and provenance chain completeness. Measurements indicate that edge-local deployments eliminate routine cloud data exposure but introduce silent sovereignty degradation on fallback, with provenance chains trivially bypassable absent cryptographic enforcement and failover windows creating exploitable transient blind spots. The central claim is that deployment architecture, rather than model or prompt design alone, is a primary determinant of security risk in agent-controlled IoT systems.

Significance. If the testbed observations prove reproducible and generalizable, the work would usefully highlight architecture-driven risks (such as invisible boundary crossings and coordination failures) that are distinct from prompt-level vulnerabilities, providing concrete metrics for evaluating edge agent deployments. The empirical framing and identification of emergent failures represent a strength in shifting focus from isolated model attacks to systems-level properties.

major comments (3)
  1. [Abstract] Abstract and testbed description: the central claim that architecture is the primary determinant rests on observations from a single multi-device home-automation setup with MQTT and Android edge node; without explicit controls or comparisons isolating architectural effects from MQTT broker semantics, smartphone resource limits, or home-automation device interactions, the measurements (e.g., silent sovereignty degradation on fallback) may be configuration-specific rather than architecture-driven.
  2. [Abstract] Measurements section (implied by abstract): no detailed quantitative data, error bars, statistical analysis, or verification steps are described for the reported metrics such as data egress volume, failover window exposure, or sovereignty boundary integrity, leaving open whether the differences between cloud, edge-local, and hybrid architectures are significant or reproducible.
  3. [Abstract] Emergent failures (coordination-state divergence and induced trust erosion): these are presented as architecture-induced, yet the manuscript provides no ablation or alternative configuration (e.g., with cryptographic enforcement or different messaging) to demonstrate that they arise independently of the specific testbed choices rather than from the absence of enforcement mechanisms.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief enumeration of the five attack surfaces to allow readers to map them directly to the reported metrics.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the valuable feedback on our manuscript. We have addressed each of the major comments by clarifying our experimental controls, enhancing the quantitative presentation, and providing additional context on the emergent failures. The revised manuscript incorporates these improvements.

read point-by-point responses
  1. Referee: [Abstract] Abstract and testbed description: the central claim that architecture is the primary determinant rests on observations from a single multi-device home-automation setup with MQTT and Android edge node; without explicit controls or comparisons isolating architectural effects from MQTT broker semantics, smartphone resource limits, or home-automation device interactions, the measurements (e.g., silent sovereignty degradation on fallback) may be configuration-specific rather than architecture-driven.

    Authors: We acknowledge that the analysis uses a single testbed configuration. However, all three architectures were evaluated using identical hardware, the same MQTT broker, and the same home-automation devices, providing an internal control that isolates architectural effects from the underlying messaging semantics, resource limits, and device interactions. Differences in metrics such as data egress volume and sovereignty degradation are therefore attributable to the deployment architecture. We have revised the testbed description to explicitly state these controls and added a limitations paragraph on generalizability. revision: partial

  2. Referee: [Abstract] Measurements section (implied by abstract): no detailed quantitative data, error bars, statistical analysis, or verification steps are described for the reported metrics such as data egress volume, failover window exposure, or sovereignty boundary integrity, leaving open whether the differences between cloud, edge-local, and hybrid architectures are significant or reproducible.

    Authors: The manuscript body contains the quantitative measurements, but we agree that error bars, statistical analysis, and explicit verification steps were insufficiently detailed. In the revised version we have expanded the Measurements section with tables reporting means and standard deviations across repeated runs, p-values for architecture comparisons, and a step-by-step verification protocol for each metric to establish reproducibility and significance. revision: yes

  3. Referee: [Abstract] Emergent failures (coordination-state divergence and induced trust erosion): these are presented as architecture-induced, yet the manuscript provides no ablation or alternative configuration (e.g., with cryptographic enforcement or different messaging) to demonstrate that they arise independently of the specific testbed choices rather than from the absence of enforcement mechanisms.

    Authors: The failures were observed exclusively in the edge-local and hybrid architectures during live operation and were absent from the cloud-hosted baseline, indicating they stem from architectural features such as distributed coordination and fallback logic. We agree that explicit ablations would strengthen the claim. We have added a discussion clarifying the architectural origin of each failure and a note on an alternative messaging configuration in which similar divergence was observed; a full ablation study with cryptographic enforcement is identified as future work. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical testbed study with direct measurements

full rationale

The paper conducts an empirical security analysis of three deployment architectures using a home-automation testbed with MQTT and Android edge node. It reports observed attack surfaces and failures (coordination-state divergence, induced trust erosion) and measures concrete metrics (data egress volume, failover window exposure, sovereignty boundary integrity, provenance chain completeness) from live operation. No equations, derivations, fitted parameters, or self-referential definitions appear; claims follow from direct testbed observations rather than any reduction to inputs by construction. No self-citation chains or uniqueness theorems are invoked as load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the described testbed captures general edge deployment behaviors; no free parameters, invented entities, or additional axioms are introduced beyond standard domain assumptions about IoT messaging and inference nodes.

axioms (1)
  • domain assumption The multi-device home-automation testbed with MQTT and Android edge node represents typical real-world edge agent deployments
    Invoked to generalize empirical observations to broader IoT systems

pith-pipeline@v0.9.0 · 5479 in / 1264 out tokens · 15964 ms · 2026-05-15T19:30:48.708907+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

  1. [1]

    Omar Alrawi, Chaz Lever, Manos Antonakakis, and Fabian Monrose

  2. [2]

    (2019), 1362–1380

    Sok: Security evaluation of home-based iot deployments. (2019), 1362–1380. Systems-Level Attack Surface of Edge Agent Deployments on IoT EuroMLSys ’26, April 27–30, 2026, Edinburgh, Scotland Uk

  3. [3]

    Syaiful Andy, Budi Rahardjo, and Bagus Hanindhito. 2017. Attack scenarios and security analysis of MQTT communication protocol in IoT system. In2017 4th International conference on electrical engineering, computer science and informatics (EECSI). IEEE, 1–6

  4. [4]

    Davide Calvaresi, Kevin Appoggetti, Luca Lustrissimini, Mauro Mari- noni, Paolo Sernani, Aldo Franco Dragoni, Michael Schumacher, et al

  5. [5]

    Multi-Agent Systems’ Negotiation Protocols for Cyber-Physical Systems: Results from a Systematic Literature Review.ICAART (1), 224–235

  6. [6]

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer- Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. (2024), 82895–82920 pages

  7. [7]

    Jason A Donenfeld. 2017. Wireguard: next generation kernel network tunnel.. InNDSS. 1–12

  8. [8]

    Ali Dorri, Salil S Kanhere, Raja Jurdak, and Praveen Gauravaram. 2017. Blockchain for IoT security and privacy: The case study of a smart home. In2017 IEEE international conference on pervasive computing and communications workshops (PerCom workshops). IEEE, 618–623

  9. [9]

    Syed Naeem Firdous, Zubair Baig, Craig Valli, and Ahmed Ibrahim

  10. [10]

    Modelling and evaluation of malicious attacks against the IoT MQTT protocol. In2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). IEEE, 748–755

  11. [11]

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

  12. [12]

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large lan- guage model based multi-agents: A survey of progress and challenges. (2024)

  13. [13]

    MS Harsha, BM Bhavani, and KR Kundhavai. 2018. Analysis of vul- nerabilities in MQTT security using Shodan API and implementation of its countermeasures via authentication and ACLs. In2018 Inter- national Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2244–2250

  14. [14]

    Sean Hollister. 2026. The DJI Romo robovac had secu- rity so poor, this man remotely accessed thousands of them. https://www.theverge.com/tech/879088/dji-romo-hack- vulnerability-remote-control-camera-access-mqtt

  15. [15]

    Home Assistant. 2026. Model Context Protocol. Home Assistant documentation. https://www.home-assistant.io/integrations/mcp/ Introduced in Home Assistant 2025.2

  16. [16]

    Francesca Meneghello, Matteo Calore, Daniel Zucchetto, Michele Polese, and Andrea Zanella. 2019. IoT: Internet of threats? A survey of practical security vulnerabilities in real IoT devices.IEEE Internet of Things Journal6, 5 (2019), 8182–8201

  17. [17]

    Biswajeeban Mishra and Attila Kertesz. 2020. The use of MQTT in M2M and IoT systems: A survey.Ieee Access8 (2020), 201071–201086

  18. [18]

    2026.OpenClaw

    OpenClaw Project. 2026.OpenClaw. https://github.com/openclaw/ openclaw GitHub repository

  19. [19]

    OpenClaw Project. 2026. Security. OpenClaw documentation. https: //docs.openclaw.ai/gateway/security

  20. [20]

    OWASP Foundation. 2025. OWASP Top 10 for Agentic Applica- tions. https://owasp.org/www-project-top-10-for-large-language- model-applications/

  21. [21]

    Rodrigo Roman, Javier Lopez, and Masahiro Mambo. 2018. Mobile edge computing, fog et al.: A survey and analysis of security threats and challenges.Future Generation Computer Systems78 (2018), 680–698

  22. [22]

    Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J Maddison, and Tatsunori Hashimoto. 2023. Identifying the risks of lm agents with an lm- emulated sandbox.arXiv preprint arXiv:2309.15817(2023)

  23. [23]

    Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge computing: Vision and challenges.IEEE internet of things journal 3, 5 (2016), 637–646

  24. [24]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal rein- forcement learning. (2023), 8634–8652 pages

  25. [25]

    Keith Stouffer, Joe Falco, Karen Scarfone, et al. 2011. Guide to industrial control systems (ICS) security.NIST special publication800, 82 (2011), 16–16

  26. [26]

    Milijana Surbatovich, Jassim Aljuraidan, Lujo Bauer, Anupam Das, and Limin Jia. 2017. Some recipes can do more than spoil your appetite: Analyzing the security and privacy risks of IFTTT recipes. (2017), 1501–1510

  27. [27]

    SwitchBot. 2026. SwitchBot AI Hub | Matter-Compatible Smart Home Hub with Local AI. Official Product Page. https://us.switch-bot.com/ products/switchbot-ai-hub

  28. [28]

    Yashar Talebirad and Amirhossein Nadiri. 2023. Multi-agent collabo- ration: Harnessing the power of intelligent llm agents. (2023)

  29. [29]

    Yuhang Wang, Feiming Xu, Zheng Lin, Guangyu He, Yuzhe Huang, Haichang Gao, Zhenxing Niu, Shiguo Lian, and Zhaoxiang Liu. 2026. From Assistant to Double Agent: Formalizing and Benchmarking At- tacks on OpenClaw for Personalized Local AI Agent.arXiv preprint arXiv:2602.08412(2026)

  30. [30]

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, et al. 2025. The rise and potential of large language model based agents: A survey. Science China Information Sciences68, 2 (2025), 121101

  31. [31]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. (2022)

  32. [32]

    agent per device

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. 2024. Injeca- gent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of the Association for Computational Linguistics: ACL 2024. 10471–10506. A Extended Testbed and Measurements Following the measurements reported in the main text (three- node testbe...