LLM-assisted Agentic Edge Intelligence Framework

Alaa Saleh; Chinmaya Kumar Dehury; Praveen Kumar Donta; Qiyang Zhang; Siddharth Singh Kushwaha

arxiv: 2604.09607 · v1 · submitted 2026-03-11 · 💻 cs.DC

LLM-assisted Agentic Edge Intelligence Framework

Chinmaya Kumar Dehury , Siddharth Singh Kushwaha , Qiyang Zhang , Alaa Saleh , Praveen Kumar Donta This is my paper

Pith reviewed 2026-05-15 13:52 UTC · model grok-4.3

classification 💻 cs.DC

keywords edge intelligencelarge language modelsdynamic code generationedge computingagentic systemsresource efficiencyadaptive analytics

0 comments

The pith

A cloud LLM generates and deploys tailored lightweight programs on edge devices so the logic updates automatically when conditions change.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the LLM-assisted Edge Intelligence (LEI) framework to solve the rigidity of conventional edge analytics. Most edge systems rely on hardcoded scripts that must be manually rewritten and redeployed whenever data patterns shift or new questions arise, which raises costs and limits scale. LEI instead keeps a cloud LLM in the loop to interpret requirements, sample data, metadata, and resource limits, then produces and pushes a fresh lightweight program to each device. Experiments across air-quality, temperature, wind, and soil datasets using several LLM backends show that the resulting programs keep average CPU and memory utilization low while the system adapts.

Core claim

The LEI framework removes the need for manually specified business logic by letting a cloud-hosted LLM coordinate the creation and update of device-side code. For each edge device the LLM receives sample data, metadata, context, and current resource constraints, generates candidate lightweight programs, validates them, and deploys the selected version. This process repeats as requirements evolve, allowing every device to run a program that is specific to its current situation rather than a static, hand-written script.

What carries the argument

The LLM-assisted Edge Intelligence (LEI) framework, which uses a cloud LLM to generate, validate, and deploy device-specific lightweight programs based on local data and constraints.

If this is right

Edge deployments become scalable to large heterogeneous fleets because each device receives its own tailored program without central manual updates.
Iteration speed increases because new questions or data shifts trigger automatic code regeneration instead of engineer-written scripts.
Operating costs drop by reducing the frequency of human oversight and physical redeployments across resource-constrained devices.
The same mechanism supports multiple LLM backends, allowing the system to swap models as capabilities or pricing change.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Frequent LLM calls could be batched or cached to reduce latency in time-critical monitoring scenarios.
The approach naturally extends to privacy-sensitive settings if the cloud LLM operates only on anonymized summaries rather than raw device data.
Resource-aware program selection might be generalized beyond CPU and memory to include energy or bandwidth budgets on battery-powered nodes.

Load-bearing premise

A cloud LLM can reliably produce correct, safe, and resource-efficient lightweight programs for diverse edge hardware directly from sample data and constraints.

What would settle it

Deploy the programs generated by the LLM on the four tested datasets and measure whether they produce errors, security violations, or higher CPU/memory usage than the original hardcoded versions.

read the original abstract

Edge intelligence delivers low-latency inference, yet most edge analytics remain hard-coded and must be redeployed as conditions change. When data patterns shift or new questions arise, engineers often need to write new scripts and push updates to devices, which slows iteration and raises operating costs. This limited adaptability reduces scalability and autonomy in large, heterogeneous, and resource-constrained edge deployments, and it increases reliance on human oversight. Meanwhile, large language models (LLMs) can interpret instructions and generate code, but their compute and memory requirements typically prevent direct deployment on edge devices. We address this gap with the LLM-assisted Edge Intelligence (LEI) framework, which removes the need for manually specified business logic. In LEI, a cloud-hosted LLM coordinates the creation and update of device-side logic as requirements evolve. The system generates candidate lightweight programs, checks them against available data and constraints, and then deploys the selected version to each device. This lets each device receive a tailored program based on sample data, metadata, context, and current resource limits. We evaluate LEI on four heterogeneous datasets, including air quality, temperature \& humidity, wind, and soil datasets using multiple LLM backends. The experimental results show that the framework maintains low average CPU and memory utilization during the execution. These results indicate that the framework adapts efficiently to changing conditions while maintaining resource efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the LLM-assisted Edge Intelligence (LEI) framework, in which a cloud-hosted LLM generates, validates, and deploys lightweight, device-specific programs to edge nodes so that analytics logic can adapt to shifting data patterns or new queries without manual redeployment. Evaluation on four heterogeneous datasets (air quality, temperature & humidity, wind, soil) with multiple LLM backends reports low average CPU and memory utilization during execution, which the authors interpret as evidence of efficient adaptation and resource efficiency.

Significance. If the central claim holds, the work would offer a practical route to greater autonomy and scalability in large-scale, heterogeneous edge deployments by automating the creation of tailored lightweight code, thereby reducing engineering overhead and operating costs. The approach is timely given the growing interest in agentic systems that combine cloud-scale reasoning with constrained edge execution.

major comments (2)

[Abstract] Abstract: The claim that LEI 'adapts efficiently to changing conditions' rests entirely on reported low average CPU and memory utilization. No quantitative data are supplied on program-generation success rate, validation failure counts, runtime errors on target devices, or any safety/security checks performed on the LLM-produced code. Without these metrics the utilization figures cannot substantiate functional adaptation.
[Abstract] Abstract: The weakest assumption—that a cloud LLM can reliably produce correct, safe, and resource-efficient programs for diverse edge hardware from sample data and constraints—is not tested. The evaluation supplies no baselines, error bars, or comparison against hand-written equivalents, leaving the efficiency claims unverifiable from the reported results.

minor comments (1)

[Abstract] The abstract lists four datasets but does not indicate their sizes, heterogeneity metrics, or how 'changing conditions' were simulated; adding these details would strengthen the experimental description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify opportunities to strengthen the evaluation of adaptation and reliability in the LEI framework. We address each point below and will revise the manuscript to incorporate additional details and metrics where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that LEI 'adapts efficiently to changing conditions' rests entirely on reported low average CPU and memory utilization. No quantitative data are supplied on program-generation success rate, validation failure counts, runtime errors on target devices, or any safety/security checks performed on the LLM-produced code. Without these metrics the utilization figures cannot substantiate functional adaptation.

Authors: We agree that the current presentation relies primarily on resource utilization as evidence of adaptation. The framework performs validation and constraint checking before deployment, and successful execution across all four datasets indicates operational programs were produced. In the revision we will add a dedicated evaluation subsection reporting program-generation success rates, validation failure counts, observed runtime errors, and the specific safety checks (syntax validation, resource-bound enforcement, and basic security scanning) applied to generated code. This will directly substantiate the functional adaptation claim. revision: yes
Referee: [Abstract] Abstract: The weakest assumption—that a cloud LLM can reliably produce correct, safe, and resource-efficient programs for diverse edge hardware from sample data and constraints—is not tested. The evaluation supplies no baselines, error bars, or comparison against hand-written equivalents, leaving the efficiency claims unverifiable from the reported results.

Authors: The referee correctly identifies the absence of explicit baselines and comparisons. Our multi-LLM, multi-dataset results show consistent low utilization, but direct verification against hand-written code is missing. In revision we will add, for at least two representative datasets, side-by-side comparisons of LLM-generated versus hand-written programs on correctness of analytics output and resource consumption. Error bars from repeated runs will also be included. Full safety verification remains an assumption we will discuss more explicitly rather than claim to have exhaustively tested. revision: partial

Circularity Check

0 steps flagged

No circularity: framework proposal evaluated on external datasets

full rationale

The paper introduces the LEI framework as a new architecture in which a cloud LLM generates and deploys lightweight programs to edge devices. Evaluation consists of running the system on four independent public datasets (air quality, temperature & humidity, wind, soil) and reporting measured CPU/memory utilization. No equations, fitted parameters, or predictions are defined in terms of themselves; no self-citation chain is used to justify core claims; and no uniqueness theorems or ansatzes from prior author work are invoked. The reported results are direct experimental observations rather than quantities that reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that LLMs can produce deployable edge code; no explicit free parameters or invented entities beyond the framework itself are detailed in the abstract.

axioms (1)

domain assumption LLMs can interpret instructions and generate correct lightweight programs suitable for edge device constraints
Invoked as the core mechanism enabling the framework's operation without manual scripting.

invented entities (1)

LEI framework no independent evidence
purpose: Coordinates cloud LLM to generate, validate, and deploy tailored programs to edge devices
The primary new system introduced to solve the adaptability problem in edge intelligence.

pith-pipeline@v0.9.0 · 5555 in / 1253 out tokens · 48054 ms · 2026-05-15T13:52:54.141601+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Liu, F.et al.A survey on edge computing systems and tools.Proceedings of the IEEE107, 1537–1562 (2019)

work page 2019
[2]

Zhou, Z.et al.Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE107, 1738–1762 (2019)

work page 2019
[3]

Surv.57(2025)

Wang, X.et al.Empowering edge intelligence: A comprehensive survey on on- device ai models.ACM Comput. Surv.57(2025). URL https://doi.org/10.1145/ 3724420

work page 2025
[4]

Gong, T., Zhu, L., Yu, F. R. & Tang, T. Edge intelligence in intelligent trans- portation systems: A survey.IEEE Transactions on Intelligent Transportation Systems24, 8919–8944 (2023)

work page 2023
[5]

Li, Y.et al.Federated domain generalization: A survey.Proceedings of the IEEE 113, 370–410 (2025)

work page 2025
[6]

& Murturi, I

Dustdar, S. & Murturi, I. NA (ed.)Towards distributed edge-based systems. (ed.NA)IEEE Second International Conference on Cognitive Machine Intelli- gence (CogMI), 1–9 (IEEE, 2020). URL https://doi.org/10.1109/CogMI50398. 2020.00021

work page doi:10.1109/cogmi50398 2020
[7]

& Vassilakopoulos, M

Karanikolas, N., Manga, E., Samaridi, N., Tousidou, E. & Vassilakopoulos, M. Karanikolas, N. N., Vassilakopoulos, M. G., Marinagi, C., Kakarountas, A. & Voyiatzis, I. (eds)Large language models versus natural language understanding and generation. (eds Karanikolas, N. N., Vassilakopoulos, M. G., Marinagi, C., Kakarountas, A. & Voyiatzis, I.)Proceedings of...

work page arXiv 2024
[8]

B., Saeed, M

Raza, M., Jahangir, Z., Riaz, M. B., Saeed, M. J. & Sattar, M. A. Industrial applications of large language models.Scientific Reports15, 13755 (2025)

work page 2025
[9]

C., Donta, P

Pujol, V. C., Donta, P. K., Morichetta, A., Murturi, I. & Dustdar, S. Edge intel- ligence—research opportunities for distributed computing continuum systems. IEEE Internet Computing27, 53–74 (2023)

work page 2023
[10]

& Morabito, R

Abstreiter, M., Tarkoma, S. & Morabito, R. Sometimes painful but promising: Feasibility and trade-offs of on-device language model inference.ACM Trans. Embed. Comput. Syst.(2026). URL https://doi.org/10.1145/3788870. Just Accepted. 33

work page doi:10.1145/3788870 2026
[11]

Surv.57(2025)

Zheng, Y.et al.A review on edge large language models: Design, execution, and applications.ACM Comput. Surv.57(2025). URL https://doi.org/10.1145/ 3719664

work page 2025
[12]

Qin, R.et al.Empirical guidelines for deploying llms onto resource-constrained edge devices.ACM Trans. Des. Autom. Electron. Syst.30(2025). URL https: //doi.org/10.1145/3736721

work page doi:10.1145/3736721 2025
[13]

(ed.NA)2020 IEEE/ACM Symposium on Edge Computing (SEC), 110–124 (IEEE, 2020)

Jain, S.et al.NA (ed.)Spatula: Efficient cross-camera video analytics on large camera networks. (ed.NA)2020 IEEE/ACM Symposium on Edge Computing (SEC), 110–124 (IEEE, 2020)

work page 2020
[14]

& Kumar, A

Ma, R.et al.Yang, Y., Davani, A., Sil, A. & Kumar, A. (eds)Hpipe: Large lan- guage model pipeline parallelism for long context on heterogeneous cost-effective devices. (eds Yang, Y., Davani, A., Sil, A. & Kumar, A.)Proceedings of the 2024 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologi...

work page 2024
[15]

& Shu, Y

Lu, Y., Zhong, Z. & Shu, Y. Williams, B., Chen, Y. & Neville, J. (eds)Multi-view domain adaptive object detection on camera networks. (eds Williams, B., Chen, Y. & Neville, J.)Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 8966–8974 (2023)

work page 2023
[16]

Tian, W.et al.Large-scale deterministic networks: Architecture, enabling technologies, case study, and future directions.IEEE Network38, 284–291 (2024)

work page 2024
[17]

Goyal, S.et al.Power-bert: Accelerating bert inference via progressive word- vector elimination (2020)

work page 2020
[18]

& McAleer, S

Kim, G., Baldi, P. & McAleer, S. Language models can solve computer tasks. Advances in Neural Information Processing Systems36, 39648–39677 (2023)

work page 2023
[19]

Llmlingua: Compressing prompts for accelerated inference of large language models

Jiang, H., Wu, Q., Lin, C.-Y., Yang, Y. & Qiu, L. Llmlingua: Compress- ing prompts for accelerated inference of large language models.arXiv preprint arXiv:2310.05736(2023)

work page arXiv 2023
[20]

Adapting language models to compress contexts

Chevalier, A., Wettig, A., Ajith, A. & Chen, D. Adapting language models to compress contexts.arXiv preprint arXiv:2305.14788(2023)

work page arXiv 2023
[21]

Advances in Neural Information Processing Systems33, 18330–18341 (2020)

Zhou, W.et al.Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Processing Systems33, 18330–18341 (2020)

work page 2020
[22]

Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding, 2023

Bae, S., Ko, J., Song, H. & Yun, S.-Y. Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding.arXiv preprint arXiv:2310.05424(2023). 34

work page arXiv 2023
[23]

& Chen, C

Zeng, Z., Hong, Y., Dai, H., Zhuang, H. & Chen, C. Wooldridge, M., Dy, J. & Natarajan, S. (eds)Consistentee: A consistent and hardness-guided early exiting method for accelerating language models inference. (eds Wooldridge, M., Dy, J. & Natarajan, S.)Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 19506–19514 (2024)

work page 2024
[24]

& Lin, F

Guo, L., Choe, W. & Lin, F. X. NA (ed.)Sti: Turbocharge nlp inference at the edge via elastic pipelining. (ed.NA)Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 791–803 (Association for Computing Machinery, New York, NY, USA, 2023)

work page 2023
[25]

(eds Krause, A.et al.) International Conference on Machine Learning, 31094–31116 (PMLR, 2023)

Sheng, Y.et al.Krause, A.et al.(eds)Flexgen: High-throughput generative inference of large language models with a single gpu. (eds Krause, A.et al.) International Conference on Machine Learning, 31094–31116 (PMLR, 2023)

work page 2023
[26]

Dehury, C. K. Lei-llm-assistedei. https://github.com/chinmaya-dehury/ LEI-LLM-assistedEI (2026). 35

work page 2026

[1] [1]

Liu, F.et al.A survey on edge computing systems and tools.Proceedings of the IEEE107, 1537–1562 (2019)

work page 2019

[2] [2]

Zhou, Z.et al.Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE107, 1738–1762 (2019)

work page 2019

[3] [3]

Surv.57(2025)

Wang, X.et al.Empowering edge intelligence: A comprehensive survey on on- device ai models.ACM Comput. Surv.57(2025). URL https://doi.org/10.1145/ 3724420

work page 2025

[4] [4]

Gong, T., Zhu, L., Yu, F. R. & Tang, T. Edge intelligence in intelligent trans- portation systems: A survey.IEEE Transactions on Intelligent Transportation Systems24, 8919–8944 (2023)

work page 2023

[5] [5]

Li, Y.et al.Federated domain generalization: A survey.Proceedings of the IEEE 113, 370–410 (2025)

work page 2025

[6] [6]

& Murturi, I

Dustdar, S. & Murturi, I. NA (ed.)Towards distributed edge-based systems. (ed.NA)IEEE Second International Conference on Cognitive Machine Intelli- gence (CogMI), 1–9 (IEEE, 2020). URL https://doi.org/10.1109/CogMI50398. 2020.00021

work page doi:10.1109/cogmi50398 2020

[7] [7]

& Vassilakopoulos, M

Karanikolas, N., Manga, E., Samaridi, N., Tousidou, E. & Vassilakopoulos, M. Karanikolas, N. N., Vassilakopoulos, M. G., Marinagi, C., Kakarountas, A. & Voyiatzis, I. (eds)Large language models versus natural language understanding and generation. (eds Karanikolas, N. N., Vassilakopoulos, M. G., Marinagi, C., Kakarountas, A. & Voyiatzis, I.)Proceedings of...

work page arXiv 2024

[8] [8]

B., Saeed, M

Raza, M., Jahangir, Z., Riaz, M. B., Saeed, M. J. & Sattar, M. A. Industrial applications of large language models.Scientific Reports15, 13755 (2025)

work page 2025

[9] [9]

C., Donta, P

Pujol, V. C., Donta, P. K., Morichetta, A., Murturi, I. & Dustdar, S. Edge intel- ligence—research opportunities for distributed computing continuum systems. IEEE Internet Computing27, 53–74 (2023)

work page 2023

[10] [10]

& Morabito, R

Abstreiter, M., Tarkoma, S. & Morabito, R. Sometimes painful but promising: Feasibility and trade-offs of on-device language model inference.ACM Trans. Embed. Comput. Syst.(2026). URL https://doi.org/10.1145/3788870. Just Accepted. 33

work page doi:10.1145/3788870 2026

[11] [11]

Surv.57(2025)

Zheng, Y.et al.A review on edge large language models: Design, execution, and applications.ACM Comput. Surv.57(2025). URL https://doi.org/10.1145/ 3719664

work page 2025

[12] [12]

Qin, R.et al.Empirical guidelines for deploying llms onto resource-constrained edge devices.ACM Trans. Des. Autom. Electron. Syst.30(2025). URL https: //doi.org/10.1145/3736721

work page doi:10.1145/3736721 2025

[13] [13]

(ed.NA)2020 IEEE/ACM Symposium on Edge Computing (SEC), 110–124 (IEEE, 2020)

Jain, S.et al.NA (ed.)Spatula: Efficient cross-camera video analytics on large camera networks. (ed.NA)2020 IEEE/ACM Symposium on Edge Computing (SEC), 110–124 (IEEE, 2020)

work page 2020

[14] [14]

& Kumar, A

Ma, R.et al.Yang, Y., Davani, A., Sil, A. & Kumar, A. (eds)Hpipe: Large lan- guage model pipeline parallelism for long context on heterogeneous cost-effective devices. (eds Yang, Y., Davani, A., Sil, A. & Kumar, A.)Proceedings of the 2024 Conference of the North American Chapter of the Association for Compu- tational Linguistics: Human Language Technologi...

work page 2024

[15] [15]

& Shu, Y

Lu, Y., Zhong, Z. & Shu, Y. Williams, B., Chen, Y. & Neville, J. (eds)Multi-view domain adaptive object detection on camera networks. (eds Williams, B., Chen, Y. & Neville, J.)Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 8966–8974 (2023)

work page 2023

[16] [16]

Tian, W.et al.Large-scale deterministic networks: Architecture, enabling technologies, case study, and future directions.IEEE Network38, 284–291 (2024)

work page 2024

[17] [17]

Goyal, S.et al.Power-bert: Accelerating bert inference via progressive word- vector elimination (2020)

work page 2020

[18] [18]

& McAleer, S

Kim, G., Baldi, P. & McAleer, S. Language models can solve computer tasks. Advances in Neural Information Processing Systems36, 39648–39677 (2023)

work page 2023

[19] [19]

Llmlingua: Compressing prompts for accelerated inference of large language models

Jiang, H., Wu, Q., Lin, C.-Y., Yang, Y. & Qiu, L. Llmlingua: Compress- ing prompts for accelerated inference of large language models.arXiv preprint arXiv:2310.05736(2023)

work page arXiv 2023

[20] [20]

Adapting language models to compress contexts

Chevalier, A., Wettig, A., Ajith, A. & Chen, D. Adapting language models to compress contexts.arXiv preprint arXiv:2305.14788(2023)

work page arXiv 2023

[21] [21]

Advances in Neural Information Processing Systems33, 18330–18341 (2020)

Zhou, W.et al.Bert loses patience: Fast and robust inference with early exit. Advances in Neural Information Processing Systems33, 18330–18341 (2020)

work page 2020

[22] [22]

Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding, 2023

Bae, S., Ko, J., Song, H. & Yun, S.-Y. Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding.arXiv preprint arXiv:2310.05424(2023). 34

work page arXiv 2023

[23] [23]

& Chen, C

Zeng, Z., Hong, Y., Dai, H., Zhuang, H. & Chen, C. Wooldridge, M., Dy, J. & Natarajan, S. (eds)Consistentee: A consistent and hardness-guided early exiting method for accelerating language models inference. (eds Wooldridge, M., Dy, J. & Natarajan, S.)Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 19506–19514 (2024)

work page 2024

[24] [24]

& Lin, F

Guo, L., Choe, W. & Lin, F. X. NA (ed.)Sti: Turbocharge nlp inference at the edge via elastic pipelining. (ed.NA)Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 791–803 (Association for Computing Machinery, New York, NY, USA, 2023)

work page 2023

[25] [25]

(eds Krause, A.et al.) International Conference on Machine Learning, 31094–31116 (PMLR, 2023)

Sheng, Y.et al.Krause, A.et al.(eds)Flexgen: High-throughput generative inference of large language models with a single gpu. (eds Krause, A.et al.) International Conference on Machine Learning, 31094–31116 (PMLR, 2023)

work page 2023

[26] [26]

Dehury, C. K. Lei-llm-assistedei. https://github.com/chinmaya-dehury/ LEI-LLM-assistedEI (2026). 35

work page 2026