Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers

Fouad Trad; Harri Renney; Michael Mattarock; Zena Wood

arxiv: 2604.24785 · v1 · submitted 2026-04-24 · 💻 cs.AR · cs.AI· cs.DC· cs.PF

Cloud to Edge: Benchmarking LLM Inference On Hardware-Accelerated Single-Board Computers

Harri Renney , Fouad Trad , Michael Mattarock , Zena Wood This is my paper

Pith reviewed 2026-05-08 09:19 UTC · model grok-4.3

classification 💻 cs.AR cs.AIcs.DCcs.PF

keywords LLM inferenceedge computinghardware acceleratorssingle-board computersbenchmarkingpower efficiencytoken throughputprivacy-sensitive deployment

0 comments

The pith

Hardware accelerators on single-board computers improve LLM inference by balancing token speed against power use and device size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a structured way to benchmark large language model inference on edge hardware that includes accelerators. It tests four single-board computer setups designed for IoT use and measures performance together with power draw and physical dimensions rather than speed alone. This approach matters for environments where sending data to the cloud raises privacy risks or connectivity is unreliable, such as in unmanned vehicles or rugged field operations. The evaluation shows that NPUs and GPUs deliver higher throughput than CPU-only runs while making the resulting trade-offs visible. The work supplies concrete guidance for choosing hardware that fits the constraints of those settings.

Core claim

The paper claims that a multi-dimensional benchmarking methodology, applied to four IoT-suitable edge platform configurations, demonstrates the benefits of hardware accelerators such as NPUs and GPUs for LLM inference on single-board computers. It quantifies trade-offs among power efficiency, physical device size, and token throughput, thereby providing practical guidance for deploying generative AI in privacy-sensitive and connectivity-limited environments such as unmanned vehicles and portable, ruggedised operations.

What carries the argument

A multi-dimensional benchmarking methodology that jointly evaluates inference performance, power efficiency, physical size, and token throughput across edge platforms with hardware accelerators.

If this is right

Hardware accelerators such as NPUs and GPUs increase token throughput relative to CPU-only inference on the tested platforms.
Joint measurement of power, size, and speed allows concrete selection of hardware for given deployment constraints.
The resulting data supports local LLM use in unmanned vehicles and portable rugged operations where cloud access is restricted.
Local inference reduces data transmission, lowering both latency and privacy exposure compared with cloud-centric approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same benchmarking approach could be reused on newer accelerator chips or different model sizes to keep recommendations current.
Adding metrics such as sustained operation under battery limits or thermal constraints would strengthen applicability to field deployments.
The quantified trade-offs suggest a path toward standardized test suites for edge LLM hardware in industrial and defence contexts.

Load-bearing premise

The four selected IoT-suitable edge platform configurations and the chosen evaluation tasks adequately represent real-world operational technology and defence use cases.

What would settle it

Repeating the benchmarks on additional single-board computers or with tasks drawn directly from unmanned vehicle operations and finding no consistent throughput gains from accelerators or different trade-off patterns would undermine the offered guidance.

Figures

Figures reproduced from arXiv: 2604.24785 by Fouad Trad, Harri Renney, Michael Mattarock, Zena Wood.

**Figure 1.** Figure 1: Comparison of power efficiency between CPU and available hardware accelerators on the Raspberry Pi 5 view at source ↗

**Figure 2.** Figure 2: Device power consumption (W) against token throughput per second (T/s) where bubbles are sized according view at source ↗

**Figure 3.** Figure 3: Multi-dimensional throughput ratios across hardware configurations for all shared supported LLMs: (a) raw view at source ↗

read the original abstract

Large language models (LLMs) are becoming increasingly capable at small parameter scales. At the same time, conventional cloud-centric deployment introduces challenges around data privacy, latency, and cost that are acute in operational technology and defence environments. Advances in model distillation, quantisation, and affordable edge accelerators now make local LLM inference on single-board computers feasible, but the high dimensionality of the configuration space makes identifying optimal deployments difficult without structured evaluation. Existing LLM-specific edge benchmarking efforts rely on CPU-only inference, poor coverage of genuine single-board computers, and generic evaluation tasks that lack multi-dimensional assessment of hardware effectiveness. This paper proposes a multi-dimensional benchmarking methodology that jointly evaluates inference performance and hardware efficiency across four IoT-suitable edge platform configurations testing single-board computers with the latest available hardware accelerators. Our results reveal the benefits of using hardware accelerators such as NPUs and GPUs, along with multi-dimensional evaluations quantifying the trade-offs between power efficiency, physical device size and token throughput; offering practical guidance for deploying generative AI in privacy-sensitive and connectivity-limited environments such as unmanned vehicles and portable, ruggedised operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives some concrete benchmark numbers for small LLMs on four accelerator-equipped single-board computers but its claims about practical guidance for defence and unmanned vehicle deployments rest on unexamined assumptions about representativeness.

read the letter

The main things to know are that this is an empirical benchmarking study of LLM inference on four single-board computers with hardware accelerators, and it finds that NPUs and GPUs offer better efficiency and throughput than CPU alone, with some multi-dimensional metrics on power and size. It does well by addressing the gaps in prior edge LLM benchmarks, which were CPU-only and used generic tasks without looking at hardware variety or multiple efficiency dimensions. Using actual IoT-suitable platforms is a step forward for practical guidance in privacy-sensitive settings. The soft spots are around how well the setup represents the target use cases. The paper selects four configurations but doesn't lay out explicit criteria or map them to the constraints like power budgets in unmanned vehicles or rugged operations. The tasks seem standard rather than domain-specific, so the trade-offs might not generalise as claimed. The abstract talks about results without showing data or error bars, and since the full text details aren't in the summary, it's difficult to judge the soundness or reproducibility fully. That said, the circularity burden is low because it's direct measurements, not fitted models. This paper is aimed at researchers and engineers working on edge AI for constrained environments. Someone needing initial data on hardware choices for small LLMs on SBCs would get value from the comparisons. It deserves peer review because it brings new empirical insights to a growing area, even if the generalisation claims need bolstering with more justification. I would recommend engaging with it by sending to referees who can evaluate the methods and data in detail.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a multi-dimensional benchmarking methodology for LLM inference on four IoT-suitable single-board computer configurations equipped with hardware accelerators (NPUs and GPUs). It evaluates trade-offs among token throughput, power efficiency, and physical device size, claiming that the results demonstrate benefits of accelerators and provide practical guidance for deploying generative AI in privacy-sensitive, connectivity-limited environments such as unmanned vehicles and ruggedised operations.

Significance. If the measurements are reproducible and the platforms/tasks representative, the work would address a genuine gap in edge LLM benchmarking by extending beyond CPU-only studies and supplying joint performance-efficiency metrics. The emphasis on multi-dimensional evaluation is a positive contribution that could inform hardware selection for constrained deployments.

major comments (1)

Abstract: The headline claim that the benchmarking 'offer[s] practical guidance for deploying generative AI in privacy-sensitive and connectivity-limited environments such as unmanned vehicles and portable, ruggedised operations' is load-bearing on the assumption that the four chosen edge platforms and evaluation tasks capture the relevant constraints (power budgets, thermal envelopes, form-factor limits, and realistic workloads) of those target domains. The manuscript supplies no explicit selection criteria for the platforms, no mapping of their specifications to OT/defence requirements, and no domain-specific task suite, leaving the generalisation unsupported.

minor comments (1)

Abstract: The summary of results mentions benefits and trade-offs but does not list concrete metrics, error bars, or exclusion criteria, reducing immediate verifiability of the empirical claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the single major comment below, acknowledging where clarification and revision are warranted to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: The headline claim that the benchmarking 'offer[s] practical guidance for deploying generative AI in privacy-sensitive and connectivity-limited environments such as unmanned vehicles and portable, ruggedised operations' is load-bearing on the assumption that the four chosen edge platforms and evaluation tasks capture the relevant constraints (power budgets, thermal envelopes, form-factor limits, and realistic workloads) of those target domains. The manuscript supplies no explicit selection criteria for the platforms, no mapping of their specifications to OT/defence requirements, and no domain-specific task suite, leaving the generalisation unsupported.

Authors: We agree that the abstract claim would benefit from stronger grounding. The four platforms were chosen as commercially available single-board computers equipped with NPUs or GPUs that are representative of current edge hardware used in IoT and constrained deployments; the tasks consist of standard LLM inference workloads measuring token throughput under varying batch sizes and quantisation levels. However, the manuscript does not include explicit selection criteria or a direct mapping of specifications (e.g., power envelopes or thermal limits) to OT/defence use cases. We will revise the abstract to qualify the scope of the practical guidance and add a short subsection in the methodology explaining the platform selection rationale together with references to typical power budgets and form-factor constraints encountered in unmanned and ruggedised settings. This will make the generalisation explicit and supported by the text. revision: yes

Circularity Check

0 steps flagged

Empirical benchmarking study with direct measurements; no derivations or self-referential predictions

full rationale

The paper is a hardware benchmarking study that reports direct measurements of token throughput, power efficiency, and device size on four single-board computer configurations. No equations, fitted parameters, predictions, or first-principles derivations are present in the abstract or described methodology. Claims rest on observed experimental data rather than any reduction to inputs by construction. Existing benchmarking efforts are cited only for context, not as load-bearing self-citations. The representativeness concern raised by the skeptic is a question of external validity, not circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmarking paper with no mathematical derivations or theoretical claims; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5505 in / 1115 out tokens · 49809 ms · 2026-05-08T09:19:13.175683+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

[1]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[2]

Inter- preting and improving large language models in arithmetic calculation

Wei Zhang, Chaoqun Wan, Yonggang Zhang, Yiu-ming Cheung, Xinmei Tian, Xu Shen, and Jieping Ye. Inter- preting and improving large language models in arithmetic calculation. InProceedings of the 41st International Conference on Machine Learning, pages 59932–59950, 2024

work page 2024
[3]

A confederacy of models: A comprehensive evaluation of llms on creative writing.arXiv preprint arXiv:2310.08433, 2023

Carlos Gómez-Rodríguez and Paul Williams. A confederacy of models: A comprehensive evaluation of llms on creative writing.arXiv preprint arXiv:2310.08433, 2023

work page arXiv 2023
[4]

How large language models perform on the united states medical licensing examination: a systematic review.MedRxiv, pages 2023–09, 2023

Dana Brin, Vera Sorin, Eli Konen, Girish Nadkarni, Benjamin S Glicksberg, and Eyal Klang. How large language models perform on the united states medical licensing examination: a systematic review.MedRxiv, pages 2023–09, 2023

work page 2023
[5]

Gpt-4 passes the bar exam

Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo. Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270):20230254, 2024

work page 2024
[6]

Taming the llmaas market: A decision-making framework utilizing diverse enterprise-critical selection factors

Vasiliki Liagkou, George Fragiadakis, Evangelia Filiopoulou, Mara Nikolaidou, and Christos Michalakelis. Taming the llmaas market: A decision-making framework utilizing diverse enterprise-critical selection factors. Available at SSRN 5406285, 2025

work page 2025
[7]

Sok: The privacy paradox of large language models: Advancements, privacy risks, and mitigation

Yashothara Shanmugarasa, Ming Ding, Chamikara Mahawaga Arachchige, and Thierry Rakotoarivelo. Sok: The privacy paradox of large language models: Advancements, privacy risks, and mitigation. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security, pages 425–441, 2025

work page 2025
[8]

A review on edge large language models: Design, execution, and applications.ACM Computing Surveys, 57(8):1–35, 2025

Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, and Jiming Chen. A review on edge large language models: Design, execution, and applications.ACM Computing Surveys, 57(8):1–35, 2025

work page 2025
[9]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review arXiv 2024
[10]

A Yang Qwen, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengpeng Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint, 2024

work page 2024
[11]

Dettmers, M

Tim Dettmers, Mike Lewis, Sam Shleifer, and Luke Zettlemoyer. 8-bit optimizers via block-wise quantization. arXiv preprint arXiv:2110.02861, 2021

work page arXiv 2021
[12]

Stojkovic, J., Zhang, C., Goiri, ´I., Torrellas, J., and Choukse, E

Tianyao Shi and Yi Ding. Systematic characterization of llm quantization: A performance, energy, and quality perspective.arXiv preprint arXiv:2508.16712, 2025. 8 APREPRINT- APRIL29, 2026

work page arXiv 2025
[13]

Deepedgebench: Benchmarking deep neural networks on edge devices, 2021

Stephan Patrick Baller, Anshul Jindal, Mohak Chadha, and Michael Gerndt. Deepedgebench: Benchmarking deep neural networks on edge devices, 2021

work page 2021
[14]

Introducing leaf: Llm edge assessment framework for generative ai on the edge.Machine Learning and Knowledge Extraction, 8(2):48, 2026

Mustafa Abdulkadhim and Sandor R Repas. Introducing leaf: Llm edge assessment framework for generative ai on the edge.Machine Learning and Knowledge Extraction, 8(2):48, 2026

work page 2026
[15]

Descriptor: Benchmark dataset for generative ai on edge devices (bedged).IEEE Data Descriptions, 2025

Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi, and Jie Xu. Descriptor: Benchmark dataset for generative ai on edge devices (bedged).IEEE Data Descriptions, 2025

work page 2025
[16]

Llms at the edge: Performance and efficiency evaluation with ollama on diverse hardware

Donghao Huang and Zhaoxia Wang. Llms at the edge: Performance and efficiency evaluation with ollama on diverse hardware. In2025 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2025

work page 2025
[17]

Sometimes painful but certainly promising: Feasibility and trade-offs of language model inference at the edge.arXiv preprint arXiv:2503.09114, 2025

Maximilian Abstreiter, Sasu Tarkoma, and Roberto Morabito. Sometimes painful but certainly promising: Feasibility and trade-offs of language model inference at the edge.arXiv preprint arXiv:2503.09114, 2025

work page arXiv 2025
[18]

Llm inference at the edge: Mobile, npu, and gpu performance efficiency trade-offs under sustained load.arXiv preprint arXiv:2603.23640, 2026

Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, and Kautuk Kundan. Llm inference at the edge: Mobile, npu, and gpu performance efficiency trade-offs under sustained load.arXiv preprint arXiv:2603.23640, 2026

work page arXiv 2026
[19]

Edge ai: A taxonomy, systematic review and future directions.Cluster Computing, 28(1):18, 2025

Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, et al. Edge ai: A taxonomy, systematic review and future directions.Cluster Computing, 28(1):18, 2025

work page 2025
[20]

Sub-4-bit llm quantization: Enterprise guide to model compression & accuracy tradeoffs, 2026

picovoice. Sub-4-bit llm quantization: Enterprise guide to model compression & accuracy tradeoffs, 2026

work page 2026
[21]

QRazor: Reliable and effortless 4-bit LLM quantization by significant data razoring, 2025

Dongyoung Lee, Seungkyu Choi, and Ik Joon Chang. QRazor: Reliable and effortless 4-bit LLM quantization by significant data razoring, 2025

work page 2025
[22]

Generating plc code with universal large language models

Kilian Tran, Jingxi Zhang, Jérôme Pfeiffer, Andreas Wortmann, and Bianca Wiesmayr. Generating plc code with universal large language models. In2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation (ETF A), pages 1–8. IEEE, 2024

work page 2024
[23]

Automating maritime risk data collection and identification leveraging large language models

Donghao Huang, Xiuju Fu, Xiaofeng Yin, Haibo Pen, and Zhaoxia Wang. Automating maritime risk data collection and identification leveraging large language models. In2024 IEEE International Conference on Data Mining Workshops (ICDMW), pages 433–439. IEEE, 2024

work page 2024
[24]

Battery technology for sustainable aviation: a review of current trends and future prospects.Applied Energy, 397:126356, 2025

Tavish Pattanayak and Dimitri Mavris. Battery technology for sustainable aviation: a review of current trends and future prospects.Applied Energy, 397:126356, 2025

work page 2025
[25]

Satellite iot in practice: A first measurement study on network availability, performance, and costs

Wenchang Chai, Jinhong Liu, Ziyue Zhang, Xianjin Xia, Yuanqing Zheng, Ningning Hou, Qiang Yang, Weiwei Chen, and Tao Gu. Satellite iot in practice: A first measurement study on network availability, performance, and costs. InProceedings of the 2025 ACM Internet Measurement Conference, IMC ’25, page 891–899, New York, NY , USA, 2025. Association for Comput...

work page 2025
[26]

Design and implementation of a full-duplex ground station for the qo-100 satellite system based on sdr and raspberry pi.Acta Technica Napocensis, 64(2):9–14, 2024

Nicolae Cris, an. Design and implementation of a full-duplex ground station for the qo-100 satellite system based on sdr and raspberry pi.Acta Technica Napocensis, 64(2):9–14, 2024. 9

work page 2024

[1] [1]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[2] [2]

Inter- preting and improving large language models in arithmetic calculation

Wei Zhang, Chaoqun Wan, Yonggang Zhang, Yiu-ming Cheung, Xinmei Tian, Xu Shen, and Jieping Ye. Inter- preting and improving large language models in arithmetic calculation. InProceedings of the 41st International Conference on Machine Learning, pages 59932–59950, 2024

work page 2024

[3] [3]

A confederacy of models: A comprehensive evaluation of llms on creative writing.arXiv preprint arXiv:2310.08433, 2023

Carlos Gómez-Rodríguez and Paul Williams. A confederacy of models: A comprehensive evaluation of llms on creative writing.arXiv preprint arXiv:2310.08433, 2023

work page arXiv 2023

[4] [4]

How large language models perform on the united states medical licensing examination: a systematic review.MedRxiv, pages 2023–09, 2023

Dana Brin, Vera Sorin, Eli Konen, Girish Nadkarni, Benjamin S Glicksberg, and Eyal Klang. How large language models perform on the united states medical licensing examination: a systematic review.MedRxiv, pages 2023–09, 2023

work page 2023

[5] [5]

Gpt-4 passes the bar exam

Daniel Martin Katz, Michael James Bommarito, Shang Gao, and Pablo Arredondo. Gpt-4 passes the bar exam. Philosophical Transactions of the Royal Society A, 382(2270):20230254, 2024

work page 2024

[6] [6]

Taming the llmaas market: A decision-making framework utilizing diverse enterprise-critical selection factors

Vasiliki Liagkou, George Fragiadakis, Evangelia Filiopoulou, Mara Nikolaidou, and Christos Michalakelis. Taming the llmaas market: A decision-making framework utilizing diverse enterprise-critical selection factors. Available at SSRN 5406285, 2025

work page 2025

[7] [7]

Sok: The privacy paradox of large language models: Advancements, privacy risks, and mitigation

Yashothara Shanmugarasa, Ming Ding, Chamikara Mahawaga Arachchige, and Thierry Rakotoarivelo. Sok: The privacy paradox of large language models: Advancements, privacy risks, and mitigation. InProceedings of the 20th ACM Asia Conference on Computer and Communications Security, pages 425–441, 2025

work page 2025

[8] [8]

A review on edge large language models: Design, execution, and applications.ACM Computing Surveys, 57(8):1–35, 2025

Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, and Jiming Chen. A review on edge large language models: Design, execution, and applications.ACM Computing Surveys, 57(8):1–35, 2025

work page 2025

[9] [9]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review arXiv 2024

[10] [10]

A Yang Qwen, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengpeng Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint, 2024

work page 2024

[11] [11]

Dettmers, M

Tim Dettmers, Mike Lewis, Sam Shleifer, and Luke Zettlemoyer. 8-bit optimizers via block-wise quantization. arXiv preprint arXiv:2110.02861, 2021

work page arXiv 2021

[12] [12]

Stojkovic, J., Zhang, C., Goiri, ´I., Torrellas, J., and Choukse, E

Tianyao Shi and Yi Ding. Systematic characterization of llm quantization: A performance, energy, and quality perspective.arXiv preprint arXiv:2508.16712, 2025. 8 APREPRINT- APRIL29, 2026

work page arXiv 2025

[13] [13]

Deepedgebench: Benchmarking deep neural networks on edge devices, 2021

Stephan Patrick Baller, Anshul Jindal, Mohak Chadha, and Michael Gerndt. Deepedgebench: Benchmarking deep neural networks on edge devices, 2021

work page 2021

[14] [14]

Introducing leaf: Llm edge assessment framework for generative ai on the edge.Machine Learning and Knowledge Extraction, 8(2):48, 2026

Mustafa Abdulkadhim and Sandor R Repas. Introducing leaf: Llm edge assessment framework for generative ai on the edge.Machine Learning and Knowledge Extraction, 8(2):48, 2026

work page 2026

[15] [15]

Descriptor: Benchmark dataset for generative ai on edge devices (bedged).IEEE Data Descriptions, 2025

Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi, and Jie Xu. Descriptor: Benchmark dataset for generative ai on edge devices (bedged).IEEE Data Descriptions, 2025

work page 2025

[16] [16]

Llms at the edge: Performance and efficiency evaluation with ollama on diverse hardware

Donghao Huang and Zhaoxia Wang. Llms at the edge: Performance and efficiency evaluation with ollama on diverse hardware. In2025 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2025

work page 2025

[17] [17]

Sometimes painful but certainly promising: Feasibility and trade-offs of language model inference at the edge.arXiv preprint arXiv:2503.09114, 2025

Maximilian Abstreiter, Sasu Tarkoma, and Roberto Morabito. Sometimes painful but certainly promising: Feasibility and trade-offs of language model inference at the edge.arXiv preprint arXiv:2503.09114, 2025

work page arXiv 2025

[18] [18]

Llm inference at the edge: Mobile, npu, and gpu performance efficiency trade-offs under sustained load.arXiv preprint arXiv:2603.23640, 2026

Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, and Kautuk Kundan. Llm inference at the edge: Mobile, npu, and gpu performance efficiency trade-offs under sustained load.arXiv preprint arXiv:2603.23640, 2026

work page arXiv 2026

[19] [19]

Edge ai: A taxonomy, systematic review and future directions.Cluster Computing, 28(1):18, 2025

Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, et al. Edge ai: A taxonomy, systematic review and future directions.Cluster Computing, 28(1):18, 2025

work page 2025

[20] [20]

Sub-4-bit llm quantization: Enterprise guide to model compression & accuracy tradeoffs, 2026

picovoice. Sub-4-bit llm quantization: Enterprise guide to model compression & accuracy tradeoffs, 2026

work page 2026

[21] [21]

QRazor: Reliable and effortless 4-bit LLM quantization by significant data razoring, 2025

Dongyoung Lee, Seungkyu Choi, and Ik Joon Chang. QRazor: Reliable and effortless 4-bit LLM quantization by significant data razoring, 2025

work page 2025

[22] [22]

Generating plc code with universal large language models

Kilian Tran, Jingxi Zhang, Jérôme Pfeiffer, Andreas Wortmann, and Bianca Wiesmayr. Generating plc code with universal large language models. In2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation (ETF A), pages 1–8. IEEE, 2024

work page 2024

[23] [23]

Automating maritime risk data collection and identification leveraging large language models

Donghao Huang, Xiuju Fu, Xiaofeng Yin, Haibo Pen, and Zhaoxia Wang. Automating maritime risk data collection and identification leveraging large language models. In2024 IEEE International Conference on Data Mining Workshops (ICDMW), pages 433–439. IEEE, 2024

work page 2024

[24] [24]

Battery technology for sustainable aviation: a review of current trends and future prospects.Applied Energy, 397:126356, 2025

Tavish Pattanayak and Dimitri Mavris. Battery technology for sustainable aviation: a review of current trends and future prospects.Applied Energy, 397:126356, 2025

work page 2025

[25] [25]

Satellite iot in practice: A first measurement study on network availability, performance, and costs

Wenchang Chai, Jinhong Liu, Ziyue Zhang, Xianjin Xia, Yuanqing Zheng, Ningning Hou, Qiang Yang, Weiwei Chen, and Tao Gu. Satellite iot in practice: A first measurement study on network availability, performance, and costs. InProceedings of the 2025 ACM Internet Measurement Conference, IMC ’25, page 891–899, New York, NY , USA, 2025. Association for Comput...

work page 2025

[26] [26]

Design and implementation of a full-duplex ground station for the qo-100 satellite system based on sdr and raspberry pi.Acta Technica Napocensis, 64(2):9–14, 2024

Nicolae Cris, an. Design and implementation of a full-duplex ground station for the qo-100 satellite system based on sdr and raspberry pi.Acta Technica Napocensis, 64(2):9–14, 2024. 9

work page 2024