FlexServe: A Fast and Secure LLM Serving System for Mobile Devices with Flexible Resource Isolation
Pith reviewed 2026-05-15 14:18 UTC · model grok-4.3
The pith
FlexServe allows ARM TrustZone to protect mobile LLM inference by switching memory and NPU modes on demand, cutting time to first token by over 10x versus rigid baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FlexServe constructs Flexible Secure Memory and Flexible Secure NPU through a Flexible Resource Isolation mechanism that supports fast mode switches. Inside TrustZone's secure world it adds LLM-Aware Memory Management and a Secure Inference Pipeline for single-model acceleration, plus a Multi-Model Scheduler for agent-style workflows. Prototype measurements show these changes produce large reductions in inference latency compared with both basic and pipeline-enabled TrustZone strawman designs.
What carries the argument
Flexible Resource Isolation mechanism that switches memory pages and the NPU between unprotected and protected modes
Load-bearing premise
The overhead and security properties of rapid mode switches between protected and unprotected states remain stable when measured on production mobile hardware and under realistic kernel attacks.
What would settle it
If benchmarks on additional devices with live kernel exploits show that mode-switch latency or data exposure exceeds the reported gains, the central speedup and security claims would fail.
Figures
read the original abstract
Device-side Large Language Models (LLMs) have witnessed explosive growth, offering higher privacy and availability compared to cloud-side LLMs. During LLM inference, both model weights and user data are valuable, and attackers may even compromise the OS kernel to steal them. ARM TrustZone is the de facto hardware-based isolation technology on mobile devices, used to protect sensitive applications from a compromised OS. However, protecting LLM inference with TrustZone incurs significant overhead due to its inflexible isolation of memory and the NPU. To address these challenges, this paper introduces FlexServe, a fast and secure LLM serving system for mobile devices. It first introduces a Flexible Resource Isolation mechanism to construct Flexible Secure Memory (Flex-Mem) and Flexible Secure NPU (Flex-NPU). Both memory pages and the NPU can be efficiently switched between unprotected and protected modes. Based on these mechanisms, FlexServe designs a fast and secure LLM inference framework within TrustZone's secure world. The LLM-Aware Memory Management and Secure Inference Pipeline are introduced to accelerate inference. A Multi-Model Scheduler is proposed to optimize multi-model workflows. We implement a prototype of FlexServe and compare it with two TrustZone-based strawman designs. The results show that FlexServe achieves an average $10.05\times$ speedup in Time to First Token (TTFT) compared to the strawman, and an average $2.44\times$ TTFT speedup compared to an optimized strawman with pipeline and secure NPU enabled. For multi-model agent workflows, the end-to-end speedup is up to $24.30\times$ and $4.05\times$ compared to the strawman and optimized strawman, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents FlexServe, a secure LLM serving system for mobile devices that uses ARM TrustZone with a new Flexible Resource Isolation mechanism. This enables efficient dynamic switching of memory pages (Flex-Mem) and the NPU (Flex-NPU) between protected and unprotected modes. Building on these, the system adds LLM-Aware Memory Management, a Secure Inference Pipeline, and a Multi-Model Scheduler. A prototype implementation is evaluated against two TrustZone-based strawman designs, reporting average TTFT speedups of 10.05× versus the basic strawman and 2.44× versus an optimized strawman (with pipeline and secure NPU), plus end-to-end gains up to 24.30× and 4.05× for multi-model agent workflows.
Significance. If the performance claims are supported by complete characterization of mode-switching costs, this work would be significant for practical on-device LLM deployment. It directly addresses the tension between strong hardware isolation (TrustZone) and inference efficiency on resource-constrained mobile devices, offering a concrete prototype that demonstrates flexible isolation can deliver substantial speedups while maintaining security guarantees.
major comments (2)
- [Evaluation] Evaluation section: The headline TTFT claims (10.05× vs strawman, 2.44× vs optimized strawman) and multi-model gains (up to 24.30× / 4.05×) attribute improvements to Flexible Resource Isolation, yet no microbenchmark data, switch counts per inference step, or ablation isolating Flex-Mem/Flex-NPU switching latency from LLM-Aware Memory Management or the pipeline is provided. Without these, it is impossible to confirm that mode-switching overheads (e.g., TLB invalidation or NPU reconfiguration) are negligible relative to inference time.
- [§4.3] §4.3 (Secure Inference Pipeline): The integration of Flex-NPU mode switching with pipeline stages is described at a high level, but the paper does not quantify reconfiguration costs or their accumulation across token generation steps. This is load-bearing for the central claim that flexible isolation accelerates inference without eroding the reported speedups.
minor comments (2)
- [Abstract] The abstract and introduction refer to 'strawman designs' without a concise summary of their key limitations; adding one sentence would improve accessibility for readers.
- [Evaluation] Performance figures lack error bars, standard deviations, or details on workload selection and measurement methodology, which are standard for empirical systems papers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the evaluation. We agree that additional microbenchmark data and quantifications will strengthen the paper and will revise the manuscript accordingly to address both major points.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The headline TTFT claims (10.05× vs strawman, 2.44× vs optimized strawman) and multi-model gains (up to 24.30× / 4.05×) attribute improvements to Flexible Resource Isolation, yet no microbenchmark data, switch counts per inference step, or ablation isolating Flex-Mem/Flex-NPU switching latency from LLM-Aware Memory Management or the pipeline is provided. Without these, it is impossible to confirm that mode-switching overheads (e.g., TLB invalidation or NPU reconfiguration) are negligible relative to inference time.
Authors: We agree that microbenchmark data would better isolate contributions and confirm negligible overheads. In the revised manuscript we will add: (1) microbenchmarks measuring Flex-Mem and Flex-NPU switching latencies including TLB invalidation and NPU reconfiguration costs; (2) the exact number of mode switches per inference step for representative workloads; and (3) an ablation study separating Flexible Resource Isolation from LLM-Aware Memory Management and the pipeline. These additions will directly show that switching costs remain negligible relative to inference time and support the reported speedups. revision: yes
-
Referee: [§4.3] §4.3 (Secure Inference Pipeline): The integration of Flex-NPU mode switching with pipeline stages is described at a high level, but the paper does not quantify reconfiguration costs or their accumulation across token generation steps. This is load-bearing for the central claim that flexible isolation accelerates inference without eroding the reported speedups.
Authors: We acknowledge the need for explicit quantification. In the revision we will expand §4.3 with measured Flex-NPU reconfiguration latencies and an analysis of their cumulative impact across successive token-generation steps. The new data will demonstrate that these costs do not erode the overall speedups delivered by flexible isolation, thereby reinforcing the central performance claim. revision: yes
Circularity Check
No significant circularity; claims rest on empirical prototype benchmarks
full rationale
The paper describes a systems implementation (Flexible Resource Isolation, LLM-Aware Memory Management, Secure Inference Pipeline, Multi-Model Scheduler) and reports measured speedups from a prototype against strawman baselines. No equations, first-principles derivations, or predictions appear that reduce by construction to fitted inputs or self-referential definitions. Performance numbers are direct experimental results, not outputs of any model that was calibrated on the same quantities. Self-citations, if present, are not load-bearing for the central claims.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption ARM TrustZone provides hardware-based isolation between secure and normal worlds that protects against a compromised OS kernel.
invented entities (2)
-
Flex-Mem
no independent evidence
-
Flex-NPU
no independent evidence
Reference graph
Works this paper leans on
-
[1]
https://www.apple.com/ apple-intelligence/, Sep, 2025
Apple intelligence. https://www.apple.com/ apple-intelligence/, Sep, 2025
work page 2025
-
[2]
https://www.samsung.com/us/ galaxy-ai/, Sep, 2025
Galaxy ai. https://www.samsung.com/us/ galaxy-ai/, Sep, 2025
work page 2025
-
[3]
https://www.cvedetails.com/ version-list/33/47/1/Linux-Linux-Kernel
Linux cves. https://www.cvedetails.com/ version-list/33/47/1/Linux-Linux-Kernel. html, Sep, 2025
work page 2025
-
[4]
https://www.stackscale.com/blog/ linux-kernel-surpasses-40-million-lines-code/ , Sep, 2025
The linux kernel surpasses 40 million lines of code: A historic nilestone in open-source soft- ware. https://www.stackscale.com/blog/ linux-kernel-surpasses-40-million-lines-code/ , Sep, 2025
work page 2025
-
[5]
Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Har- rison, Russell J Hewett, Mojan Javaheripi, Piero Kauff- mann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [6]
-
[7]
Trustzone: Integrated hard- ware and software security.ARM white paper, 3(4):18– 24, 2004
Tiago Alves and Don Felton. Trustzone: Integrated hard- ware and software security.ARM white paper, 3(4):18– 24, 2004
work page 2004
-
[8]
Android virtualiza- tion framework (avf) overview
Android. Android virtualiza- tion framework (avf) overview. https://source.android.com/docs/core/virtualization, 2026
work page 2026
-
[9]
Memory allocation among processes
Android. Memory allocation among processes. https://developer.android.com/topic/ performance/memory-management, 2026
work page 2026
-
[10]
Android. Overview of memory management. https://developer.android.com/topic/ performance/memory-overview, 2026
work page 2026
-
[11]
What is the autogpt platform? https:// agpt.co/docs/platform, 2026
AutoGPT. What is the autogpt platform? https:// agpt.co/docs/platform, 2026
work page 2026
-
[12]
Skee: A lightweight secure kernel-level execution environment for arm
Ahmed M Azab, Kirk Swidowski, Jia Ma Bhutkar, Wenbo Shen, Ruowen Wang, and Peng Ning. Skee: A lightweight secure kernel-level execution environment for arm. InNetwork & Distributed System Security Symposium (NDSS), 2016
work page 2016
-
[13]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Sanctuary: Arming trustzone with user-space enclaves
Ferdinand Brasser, David Gens, Patrick Jauernig, Ahmad-Reza Sadeghi, and Emmanuel Stapf. Sanctuary: Arming trustzone with user-space enclaves. 2019
work page 2019
-
[15]
Char- acterizing mobile soc for accelerating heterogeneous llm inference
Le Chen, Dahu Feng, Erhu Feng, Yingrui Wang, Rong Zhao, Yubin Xia, Pinjie Xu, and Haibo Chen. Char- acterizing mobile soc for accelerating heterogeneous llm inference. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles, pages 359– 374, 2025
work page 2025
-
[16]
Yeongpil Cho, Junbum Shin, Donghyun Kwon, MyungJoo Ham, Yuna Kim, and Yunheung Paek. Hardware-assisted on-demand hypervisor activation for efficient security critical code execution on mobile de- vices. In2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 565–578. USENIX Associa- tion, 2016
work page 2016
-
[17]
Victor Costan and Srinivas Devadas. Intel sgx explained. Cryptology ePrint Archive, 2016
work page 2016
-
[18]
The rising costs of training frontier ai models,
Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej, Tamay Besiroglu, and David Owen. The ris- ing costs of training frontier ai models.arXiv preprint arXiv:2405.21015, 2024
-
[19]
Strongbox: A gpu tee on arm endpoints
Yunjie Deng, Chenxu Wang, Shunchang Yu, Shiqing Liu, Zhenyu Ning, Kevin Leach, Jin Li, Shoumeng Yan, Zhengyu He, Jiannong Cao, et al. Strongbox: A gpu tee on arm endpoints. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 769–783, 2022
work page 2022
-
[20]
The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024
work page 2024
-
[21]
Serdab: An iot framework for partitioning neural networks computa- tion across multiple enclaves
Tarek Elgamal and Klara Nahrstedt. Serdab: An iot framework for partitioning neural networks computa- tion across multiple enclaves. In2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pages 519–528. IEEE, 2020. 14
work page 2020
-
[22]
Shulin Fan, Zhichao Hua, Yubin Xia, and Haibo Chen. Xputee: a high-performance and practical heteroge- neous trusted execution environment for gpus.ACM Transactions on Computer Systems, 43(1-2):1–27, 2025
work page 2025
-
[23]
AI4Finance Foundation. Fingpt. https:// huggingface.co/FinGPT, 2026
work page 2026
-
[24]
On-device small language models with multi- modality, rag, and function calling, 2026
Google. On-device small language models with multi- modality, rag, and function calling, 2026
work page 2026
-
[25]
Privado: Prac- tical and secure dnn inference with enclaves.arXiv preprint arXiv:1810.00602, 2018
Karan Grover, Shruti Tople, Shweta Shinde, Ranjita Bhagwan, and Ramachandran Ramjee. Privado: Prac- tical and secure dnn inference with enclaves.arXiv preprint arXiv:1810.00602, 2018
-
[26]
Trustshadow: Se- cure execution of unmodified applications with arm trustzone
Le Guan, Peng Liu, Xinyu Xing, Xinyang Ge, Shengzhi Zhang, Meng Yu, and Trent Jaeger. Trustshadow: Se- cure execution of unmodified applications with arm trustzone. InProceedings of the 15th Annual Inter- national Conference on Mobile Systems, Applications, and Services, pages 488–501, 2017
work page 2017
-
[27]
D. Richard Hipp. Sqlite. https://www.sqlite.org/. Version 3.x, accessed 2024-05-10
work page 2024
-
[28]
{vTZ}: virtualizing {ARM}{TrustZone}
Zhichao Hua, Jinyu Gu, Yubin Xia, Haibo Chen, Binyu Zang, and Haibing Guan. {vTZ}: virtualizing {ARM}{TrustZone}. In26th USENIX Security Sympo- sium (USENIX Security 17), pages 541–556, 2017
work page 2017
-
[29]
Tyler Hunt, Zhipeng Jia, Vance Miller, Ariel Szekely, Yige Hu, Christopher J. Rossbach, and Emmett Witchel. Telekine: Secure computing with cloud GPUs. In17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 817–833, Santa Clara, CA, 2020. USENIX Association
work page 2020
-
[30]
Confidential execution of deep learning inference at the untrusted edge with arm trustzone
Md Shihabul Islam, Mahmoud Zamani, Chung Hwan Kim, Latifur Khan, and Kevin W Hamlen. Confidential execution of deep learning inference at the untrusted edge with arm trustzone. InProceedings of the Thir- teenth ACM Conference on Data and Application Secu- rity and Privacy, pages 153–164, 2023
work page 2023
-
[31]
SAGE: Software-based attestation for GPU execu- tion
Andrei Ivanov, Benjamin Rothenberger, Arnaud De- thise, Marco Canini, Torsten Hoefler, and Adrian Per- rig. SAGE: Software-based attestation for GPU execu- tion. In2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 485–499, Boston, MA, July
-
[32]
Heterogeneous isolated execution for commodity gpus
Insu Jang, Adrian Tang, Taehoon Kim, Simha Sethu- madhavan, and Jaehyuk Huh. Heterogeneous isolated execution for commodity gpus. InProceedings of the Twenty-Fourth International Conference on Architec- tural Support for Programming Languages and Operat- ing Systems, pages 455–468, 2019
work page 2019
-
[33]
Zhaolong Jian, Xu Liu, Qiankun Dong, Longkai Cheng, Xueshuo Xie, and Tao Li. Smartzone: Runtime sup- port for secure and efficient on-device inference on arm trustzone.IEEE Transactions on Computers, 2025
work page 2025
-
[34]
Nikhil Kandpal and Colin Raffel. Position: The most expensive part of an llm should be its training data. arXiv preprint arXiv:2504.12427, 2025
-
[35]
Gonza- lez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonza- lez, Hao Zhang, and Ion Stoica. Efficient memory man- agement for large language model serving with pagedat- tention, 2023
work page 2023
-
[36]
Occlumency: Privacy-preserving remote deep-learning inference us- ing sgx
Taegyeong Lee, Zhiqi Lin, Saumay Pushp, Caihua Li, Yunxin Liu, Youngki Lee, Fengyuan Xu, Chenren Xu, Lintao Zhang, and Junehwa Song. Occlumency: Privacy-preserving remote deep-learning inference us- ing sgx. InThe 25th Annual International Conference on Mobile Computing and Networking, pages 1–17, 2019
work page 2019
-
[37]
Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang, Xiyou Zhou, Jun Qin, Dian Ang Yap, Narendran Ragha- van, Xuankai Chang, Margit Bowler, Eray Yildiz, et al. Apple intelligence foundation language models: Tech report 2025.arXiv preprint arXiv:2507.13575, 2025
-
[38]
Translinkguard: safeguard- ing transformer models against model stealing in edge deployment
Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, Sheng Cheng, Xun Wang, and Jianwei Yin. Translinkguard: safeguard- ing transformer models against model stealing in edge deployment. InProceedings of the 32nd ACM Inter- national Conference on Multimedia, pages 3479–3488, 2024
work page 2024
-
[39]
Adat- tester: Secure online mobile advertisement attestation using trustzone
Wenhao Li, Haibo Li, Haibo Chen, and Yubin Xia. Adat- tester: Secure online mobile advertisement attestation using trustzone. InProceedings of the 13th annual in- ternational conference on mobile systems, applications, and services, pages 75–88, 2015
work page 2015
-
[40]
Build- ing trusted path on untrusted device drivers for mobile devices
Wenhao Li, Mingyang Ma, Jinchen Han, Yubin Xia, Binyu Zang, Cheng-Kang Chu, and Tieyan Li. Build- ing trusted path on untrusted device drivers for mobile devices. InProceedings of 5th Asia-Pacific Workshop on Systems, pages 1–7, 2014
work page 2014
-
[41]
Large language models on mobile devices: Measurements, analysis, and insights
Xiang Li, Zhenyan Lu, Dongqi Cai, Xiao Ma, and Meng- wei Xu. Large language models on mobile devices: Measurements, analysis, and insights. InProceedings of the Workshop on Edge and Mobile Foundation Models, pages 1–6, 2024
work page 2024
-
[42]
Robust safe reinforcement learning under adversarial disturbances
Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, and Yunan Wang. Robust safe reinforcement learning under adversarial disturbances. In2023 62nd IEEE 15 Conference on Decision and Control (CDC), pages 334–
-
[43]
Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moor- thy, Jeff Nichols, Yinfei Yang, and Zhe Gan. Ferret-ui 2: Mastering universal user interface understanding across platforms.arXiv preprint arXiv:2410.18967, 2024
-
[44]
OP-TEE: Open Portable Trusted Execution Environment
Linaro and Contributors. OP-TEE: Open Portable Trusted Execution Environment. GitHub repository, 2025
work page 2025
-
[45]
Tz-kms: A secure key management service for joint cloud com- puting with arm trustzone
Shiyu Luo, Zhichao Hua, and Yubin Xia. Tz-kms: A secure key management service for joint cloud com- puting with arm trustzone. In2018 IEEE Symposium on Service-Oriented System Engineering (SOSE), pages 180–185. IEEE, 2018
work page 2018
-
[46]
Honeycomb: Secure and efficient {GPU} executions via static valida- tion
Haohui Mai, Jiacheng Zhao, Hongren Zheng, Yiyang Zhao, Zibin Liu, Mingyu Gao, Cong Wang, Huimin Cui, Xiaobing Feng, and Christos Kozyrakis. Honeycomb: Secure and efficient {GPU} executions via static valida- tion. In17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 155–172, 2023
work page 2023
-
[47]
Darknetz: towards model privacy at the edge using trusted execution environments
Fan Mo, Ali Shahin Shamsabadi, Kleomenis Katevas, Soteris Demetriou, Ilias Leontiadis, Andrea Cavallaro, and Hamed Haddadi. Darknetz: towards model privacy at the edge using trusted execution environments. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services, pages 161– 174, 2020
work page 2020
- [48]
-
[49]
neomancr. Basics on android ram management, what is(n’t) bloat? https://www.reddit.com/r/ GalaxyS8/comments/6agads/basics_on_android_ ram_management_what_isnt_bloat/, 2026
work page 2026
-
[50]
The ai workspace that works for you
Notion. The ai workspace that works for you. https: //www.notion.com/product/ai, 2026
work page 2026
-
[51]
Oblivious {Multi-Party} machine learn- ing on trusted processors
Olga Ohrimenko, Felix Schuster, Cédric Fournet, Aastha Mehta, Sebastian Nowozin, Kapil Vaswani, and Manuel Costa. Oblivious {Multi-Party} machine learn- ing on trusted processors. In25th USENIX Security Sym- posium (USENIX Security 16), pages 619–636, 2016
work page 2016
-
[52]
Safe and practical gpu computation in trustzone
Heejin Park and Felix Xiaozhu Lin. Safe and practical gpu computation in trustzone. InProceedings of the Eighteenth European Conference on Computer Systems, pages 505–520, 2023
work page 2023
-
[53]
The ai companion who cares always here to listen and talk.https://replika.ai/, 2026
Replika. The ai companion who cares always here to listen and talk.https://replika.ai/, 2026
work page 2026
-
[54]
Using arm trustzone to build a trusted lan- guage runtime for mobile applications
Nuno Santos, Himanshu Raj, Stefan Saroiu, and Alec Wolman. Using arm trustzone to build a trusted lan- guage runtime for mobile applications. InProceedings of the 19th international conference on Architectural support for programming languages and operating sys- tems, pages 67–80, 2014
work page 2014
-
[55]
ennclave: Offline inference with model confidentiality
Alexander Schlögl and Rainer Böhme. ennclave: Offline inference with model confidentiality. InProceedings of the 13th ACM Workshop on Artificial Intelligence and Security, pages 93–104, 2020
work page 2020
-
[56]
In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 723–738, 2022
Tianxiang Shen, Ji Qi, Jianyu Jiang, Xian Wang, Siyuan Wen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Xiapu Luo, et al.{SOTER}: Guarding black- box inference for general neural networks at the edge. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 723–738, 2022
work page 2022
-
[57]
Standard Performance Evaluation Corporation (SPEC), Gainesville, V A, USA.SPEC CPU® 2017 Benchmark Suite, 2017.https://www.spec.org/cpu2017/
work page 2017
-
[58]
Trustice: Hardware-assisted isolated comput- ing environments on mobile devices
He Sun, Kun Sun, Yuewu Wang, Jiwu Jing, and Haining Wang. Trustice: Hardware-assisted isolated comput- ing environments on mobile devices. InDependable Systems and Networks (DSN), 2015 45th Annual IEEE/I- FIP International Conference on, pages 367–378. IEEE, 2015
work page 2015
-
[59]
Zhichuang Sun, Ruimin Sun, Changming Liu, Am- rita Roy Chowdhury, Long Lu, and Somesh Jha. Shad- ownet: A secure and efficient on-device model inference system for convolutional neural networks. In2023 IEEE Symposium on Security and Privacy (SP), pages 1596–
-
[60]
Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riv- ière, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
Graviton: Trusted execution environments on gpus
Stavros V olos, Kapil Vaswani, and Rodrigo Bruno. Graviton: Trusted execution environments on gpus. In OSDI, pages 681–696, 2018
work page 2018
-
[62]
Xunjie Wang, Jiacheng Shi, Zihan Zhao, Yang Yu, Zhichao Hua, and Jinyu Gu. Tz-llm: Protecting on- device large language models with arm trustzone.arXiv preprint arXiv:2511.13717, 2025
-
[63]
Building gpu tees using cpu secure enclaves with gevi- sor
Xiaolong Wu, Dave Jing Tian, and Chung Hwan Kim. Building gpu tees using cpu secure enclaves with gevi- sor. InProceedings of the 2023 ACM Symposium on Cloud Computing, pages 249–264, 2023. 16
work page 2023
-
[64]
Colony: A privi- leged trusted execution environment with extensibility
Yubin Xia, Zhichao Hua, Yang Yu, Jinyu Gu, Haibo Chen, Binyu Zang, and Haibing Guan. Colony: A privi- leged trusted execution environment with extensibility. IEEE Transactions on Computers, 71(2):479–492, 2021
work page 2021
-
[65]
Aegisdnn: Dependable and timely execution of dnn tasks with sgx
Yecheng Xiang, Yidi Wang, Hyunjong Choi, Mohsen Karimi, and Hyoseung Kim. Aegisdnn: Dependable and timely execution of dnn tasks with sgx. In2021 IEEE Real-Time Systems Symposium (RTSS), pages 68–81. IEEE, 2021
work page 2021
-
[66]
Lawformer: A pre-trained language model for chinese legal long documents.AI Open, 2:79– 84, 2021
Chaojun Xiao, Xueyu Hu, Zhiyuan Liu, Cunchao Tu, and Maosong Sun. Lawformer: A pre-trained language model for chinese legal long documents.AI Open, 2:79– 84, 2021
work page 2021
-
[67]
Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira, and Jimin Huang. Pixiu: A large language model, instruction data and evaluation benchmark for finance.arXiv preprint arXiv:2306.05443, 2023
-
[68]
On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088,
Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, and Ziyuan Ling. On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088, 2024
-
[69]
Ui-ug: A unified mllm for ui understanding and generation.arXiv preprint arXiv:2509.24361, 2025
Hao Yang, Weijie Qiu, Ru Zhang, Zhou Fang, Ruichao Mao, Xiaoyu Lin, Maji Huang, Zhaosong Huang, Teng Guo, Shuoyang Liu, et al. Ui-ug: A unified mllm for ui understanding and generation.arXiv preprint arXiv:2509.24361, 2025
-
[70]
Mengda Yang, Wenzhe Yi, Juan Wang, Hongxin Hu, Xiaoyang Xu, and Ziang Li. Penetralium: Privacy- preserving and memory-efficient neural network infer- ence at the edge.Future Generation Computer Systems, 156:30–41, 2024
work page 2024
- [71]
-
[72]
yhcvb. rknpu-driver. https://github.com/ airockchip/rknn-llm/tree/main/rknpu-driver, 2025
work page 2025
- [73]
-
[74]
Ferret-ui: Grounded mobile ui understand- ing with multimodal llms
Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, and Zhe Gan. Ferret-ui: Grounded mobile ui understand- ing with multimodal llms. InEuropean Conference on Computer Vision, pages 240–255. Springer, 2024
work page 2024
-
[75]
arXiv:2509.00531 [cs.MA] https://arxiv.org/abs/2509.00531
Cheng Zhang, Erhu Feng, Xi Zhao, Yisheng Zhao, Wangbo Gong, Jiahui Sun, Dong Du, Zhichao Hua, Yubin Xia, and Haibo Chen. Mobiagent: A system- atic framework for customizable mobile agents.arXiv preprint arXiv:2509.00531, 2025
-
[76]
You only look at screens: Multimodal chain-of-action agents
Zhuosheng Zhang and Aston Zhang. You only look at screens: Multimodal chain-of-action agents. InFindings of the Association for Computational Linguistics: ACL 2024, pages 3132–3149, 2024
work page 2024
-
[77]
Enabling rack-scale confidential computing using heterogeneous trusted execution environment
Jianping Zhu, Rui Hou, XiaoFeng Wang, Wenhao Wang, Jiangfeng Cao, Boyan Zhao, Zhongpu Wang, Yuhui Zhang, Jiameng Ying, Lixin Zhang, et al. Enabling rack-scale confidential computing using heterogeneous trusted execution environment. In2020 IEEE Sympo- sium on Security and Privacy (SP), pages 1450–1465. IEEE, 2020. 17
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.