Recognition: unknown
Agentic Witnessing: Pragmatic and Scalable TEE-Enabled Privacy-Preserving Auditing
Pith reviewed 2026-05-08 02:53 UTC · model grok-4.3
The pith
An LLM auditor inside a trusted execution environment lets outsiders verify high-level properties of private data using only yes-or-no questions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agentic Witnessing shifts verification from attested execution to attested reasoning. An LLM-based Auditor runs inside a TEE, uses the Model Context Protocol to inspect the Prover's private dataset on demand, answers a limited set of binary true/false questions from the Verifier, and emits a yes/no verdict together with a cryptographic transcript. The transcript is a signed hash chain that ties the reasoning steps to the original dataset and the TEE's hardware root of trust. The framework was demonstrated by auditing five high-level properties across the codebases of 21 peer-reviewed computer science papers, treating the source code as confidential.
What carries the argument
The TEE-isolated LLM Auditor that dynamically inspects datasets via the Model Context Protocol, answers boolean queries, and produces a signed hash-chain transcript bound to both the data and the hardware root of trust.
Load-bearing premise
The LLM auditor can reliably inspect the dataset through the context protocol and deliver accurate yes/no verdicts on high-level qualitative properties.
What would settle it
Run the auditor on a set of codebases whose true answers to the target properties are already known to an external party, then check whether the auditor's yes/no outputs match those known answers while the raw code remains hidden from the verifier.
read the original abstract
Auditing the semantic properties of proprietary data creates a fundamental tension: verification requires transparent access, while proprietary rights demand confidentiality. While Zero-Knowledge Proofs (ZKPs) ensure privacy, they are typically limited to precise algebraic constraints and are ill-suited for verifying qualitative, unstructured properties, such as the logic within a codebase. We propose {\em Agentic Witnessing}, a framework that moves verification from attested execution to {\em attested reasoning}. The system is composed of three agents: a Verifier (who wants to check properties of a dataset), a Prover (who owns the dataset) and an Auditor (that inspects the dataset). The Verifier is allowed to ask a limited number of simple binary true/false questions to the auditor. By isolating an LLM-based Auditor within a Trusted Execution Environment (TEE), the system enables the Verifier to query a Prover's private data via simple Boolean queries, without exposing the raw dataset. The Auditor uses the Model Context Protocol (MCP) to dynamically inspect the target dataset, producing a yes/no verdict accompanied by a cryptographic transcript: a signed hash chain binding the reasoning trace to both the original dataset and the TEE's hardware root of trust. We demonstrate this architecture by automating the artifact evaluation process for 21 peer-reviewed computer science papers with released codebases on GitHub (e.g. Does the codebase implement the system described in the paper?). We verified five high-level properties of these codebases described in the corresponding publications, treating the source code as private. Our results show that TEE-enabled agentic auditing provides a mechanism for privacy-preserving oversight, effectively decoupling qualitative verification from the need for data disclosure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Agentic Witnessing, a framework for privacy-preserving verification of high-level semantic properties on proprietary datasets. A Verifier poses simple Boolean queries to an LLM-based Auditor isolated inside a TEE; the Auditor uses the Model Context Protocol (MCP) to inspect the Prover's private data and returns yes/no verdicts accompanied by a signed hash chain that binds the reasoning trace to the dataset and the TEE hardware root of trust. The architecture is demonstrated by automating artifact evaluation on 21 peer-reviewed CS papers, verifying five qualitative properties (e.g., whether released code implements the system described in the paper) while treating the codebases as private.
Significance. If the LLM Auditor can be shown to produce reliable verdicts, the approach would offer a pragmatic complement to ZKPs for qualitative auditing tasks that resist algebraic formalization. The TEE-attested reasoning model and the concrete demonstration on real GitHub repositories are strengths that could influence practical privacy-preserving oversight in software and data auditing.
major comments (3)
- [§5 (Demonstration)] The 21-paper demonstration (abstract and §5) reports no quantitative metrics—accuracy, precision, recall, false-positive rate, or inter-rater agreement with human experts—for the LLM Auditor’s yes/no verdicts on the five high-level properties. Without these data the central claim that the system “provides a mechanism for privacy-preserving oversight” rests on an untested assumption.
- [§4 (Architecture)] The signed hash chain (abstract and §4) attests execution and data binding but cannot attest semantic correctness of the LLM trace. No analysis of hallucination, misinterpretation of code semantics, or failure modes on qualitative properties (e.g., “does the codebase implement the described system”) is provided, leaving the reliability of the attested reasoning unaddressed.
- [§3 (System Design)] The manuscript contains no security analysis or threat model for the TEE + MCP integration. It is unclear how the Auditor is prevented from leaking raw data through side channels or how the limited Boolean-query interface is enforced inside the enclave.
minor comments (2)
- [Abstract] The abstract states “our results show” yet supplies no numerical outcomes or tables summarizing the 21-paper evaluation; a results table or summary statistics should be added.
- [§4] Notation for the five verified properties and the exact MCP interface calls is introduced only informally; a concise table or pseudocode listing would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of attested reasoning in privacy-preserving auditing. We agree that the manuscript would benefit from stronger empirical support, explicit discussion of LLM limitations, and a security analysis. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [§5 (Demonstration)] The 21-paper demonstration (abstract and §5) reports no quantitative metrics—accuracy, precision, recall, false-positive rate, or inter-rater agreement with human experts—for the LLM Auditor’s yes/no verdicts on the five high-level properties. Without these data the central claim that the system “provides a mechanism for privacy-preserving oversight” rests on an untested assumption.
Authors: We agree that the absence of quantitative metrics leaves the reliability of the LLM Auditor’s verdicts unquantified. The demonstration in §5 was designed as an end-to-end feasibility study on real GitHub repositories rather than a controlled benchmark of LLM accuracy. In the revised manuscript we will add a new evaluation subsection that reports inter-rater agreement with human experts on a representative subset of the 21 codebases, together with precision, recall, and false-positive rates for the five properties. We will also explicitly qualify the central claim to reflect that the system provides attested execution of reasoning rather than verified correctness. revision: yes
-
Referee: [§4 (Architecture)] The signed hash chain (abstract and §4) attests execution and data binding but cannot attest semantic correctness of the LLM trace. No analysis of hallucination, misinterpretation of code semantics, or failure modes on qualitative properties (e.g., “does the codebase implement the described system”) is provided, leaving the reliability of the attested reasoning unaddressed.
Authors: The referee is correct that the hash chain provides cryptographic attestation of execution integrity and data binding inside the TEE but does not guarantee semantic correctness of the LLM trace. We will expand §4 with a dedicated paragraph on failure modes, including hallucinations, misinterpretation of code semantics, and the inherent limits of qualitative property verification. The revision will clarify that the contribution lies in attested reasoning under a restricted Boolean-query interface, note that the Verifier can issue follow-up queries to probe suspicious verdicts, and reference existing literature on LLM reliability in code analysis. revision: yes
-
Referee: [§3 (System Design)] The manuscript contains no security analysis or threat model for the TEE + MCP integration. It is unclear how the Auditor is prevented from leaking raw data through side channels or how the limited Boolean-query interface is enforced inside the enclave.
Authors: We acknowledge the lack of an explicit threat model. The original text relies on the standard isolation guarantees of TEEs and the design of the Model Context Protocol. In the revision we will insert a concise threat-model subsection in §3 that (i) states the assumed TEE security properties, (ii) describes how the Boolean-query interface is enforced at the protocol level to prevent arbitrary output, and (iii) discusses side-channel leakage risks together with standard mitigations. We will also note that a full formal analysis of the TEE+MCP composition is left for future work. revision: yes
Circularity Check
Architectural proposal with case study exhibits no circularity
full rationale
The paper advances an architectural framework (Agentic Witnessing) that isolates an LLM Auditor in a TEE to answer Verifier Boolean queries on Prover data without disclosure, using MCP for inspection and a signed hash chain for attestation. It supports the proposal with a case-study demonstration on 21 GitHub codebases for five high-level properties. No mathematical derivations, equations, fitted parameters, predictions of related quantities, or self-citation chains appear in the provided text or abstract. The central claim is a pragmatic system design whose validity rests on the external TEE guarantees and the (unvalidated) LLM reasoning reliability, not on any internal reduction to its own inputs by construction. This is a standard non-circular engineering proposal.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Trusted Execution Environments provide hardware-rooted isolation, attestation, and tamper resistance for code execution.
- ad hoc to paper An LLM auditor can accurately determine binary truth values for high-level semantic properties of code or data when given inspection access via MCP.
invented entities (2)
-
Agentic Witnessing framework
no independent evidence
-
Model Context Protocol (MCP)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Taming throughput- latency tradeoff in llm inference with sarathi-serve,
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and R. Ramjee. 2024. Taming Throughput- Latency Tradeoff in LLM Inference with Sarathi-Serve. In18th USENIX Sym- posium on Operating Systems Design and Implementation (OSDI 24). 117–134. doi:10.48550/arXiv.2403.02310
-
[2]
Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, Eran Tromer, and Madars Virza. 2013. SNARKs for C: Verifying Program Executions Succinctly and in Zero Knowledge.CRYPTO(2013), 90–108. doi:10.1007/978-3-642-40084-1_6
-
[3]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page internal anchor Pith review arXiv 2020
-
[4]
A. D. Camuto and J. Morton. 2023. EZKL. https://github.com/zkonduit/ezkl Accessed: January 12, 2026
2023
-
[5]
David Cerdeira, Nuno Santos, Pedro Fonseca, and Sandro Pinto. 2020. SoK: Understanding the Prevailing Security Vulnerabilities in TrustZone-assisted TEE Systems. InIEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, 1416–1432. doi:10.1109/sp40000.2020.00061
-
[6]
Yinwei Dai, Rui Pan, Anand Iyer, Kai Li, and Ravi Netravali. 2024. Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving. In Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP ’24). 607–623. doi:10.1145/3694715.3695963
-
[7]
D. Dolev and A. C. Yao. 1981. On the security of public key protocols.IEEE Transactions on Information Theory29, 2 (1981), 198–208. doi:10.1109/sfcs.1981.32
-
[8]
European Union. 2024. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence.Official Journal of the European UnionL (2024), 1–144. https: //tinyurl.com/4atj6det 2024/1689
2024
-
[9]
Flashbots. 2022. The Future of MEV is SUAVE. https://writings.flashbots.net/the- future-of-mev-is-suave/
2022
-
[10]
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yu- vraj Patel, and Luo Mai. 2024. ServerlessLLM: Low-Latency Serverless Inference for Large Language Models. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 135–153. http://arxiv.org/abs/2401.14351v2
-
[11]
S Goldwasser, S Micali, and C Rackoff. 1985. The knowledge complexity of inter- active proof-systems. InProceedings of the seventeenth annual ACM symposium on Theory of Computing, Vol. 18. 291–304. doi:10.1145/22145.22178
-
[12]
2026.Gemini CLI
Google. 2026.Gemini CLI. https://google-gemini.github.io/gemini-cli/ Command Line Interface for Gemini models
2026
-
[13]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not What You’ve Signed Up For: Compromising Real- World LLM-Integrated Applications with Indirect Prompt Injection.Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security(2023), 79–90. doi:10.1145/3605764.3623985
-
[14]
Antonio Gullí. 2025. Model Context Protocol. 147-162 pages. doi:10.1007/978-3- 032-01402-3_10 Technical Specification
-
[15]
Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, and Luo Mai. 2025. WaferLLM: Large Language Model Inference at Wafer Scale. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). 257–273. http://arxiv.org/abs/2502.04563v3
-
[16]
Zhisheng Hu, Pengfei Zuo, Yizou Chen, Chao Wang, Junliang Hu, and Ming- Chang Yang. 2024. Aceso: Achieving Efficient Fault Tolerance in Memory- Disaggregated Key-Value Stores. InProceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP ’24). 127–143. doi:10.1145/3694715.3695951
-
[17]
Shengyi Huang, Rousslan Fernand Julien Dossa, Chang Ye, and Jeff Braga. 2022. CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learn- ing Algorithms.Journal of Machine Learning Research23, 274 (2022), 1–18. https://jmlr.org/papers/volume23/21-1342/21-1342.pdf
2022
-
[18]
Scaling Laws for Neural Language Models
J. Kaplan, Sam McCandlish, T. Henighan, Tom B. Brown, Benjamin Chess, R. Child, Scott Gray, Alec Radford, Jeff Wu, and Dario Amodei. 2020. Scaling Laws for Neural Language Models.arXiv preprint arXiv:2001.08361(2020), 4257–4273. doi:10.48550/arXiv.2001.08361
-
[19]
Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. In 2019 IEEE Symposium on Security and Privacy (SP), Vol. 63. 1–19. doi:10.1109/sp. 2019.00002
work page doi:10.1109/sp 2019
-
[20]
Modulus Labs. 2023. The Cost of Intelligence: Proving Machine Learning Infer- ence with Zero-Knowledge. https://github.com/Modulus-Labs/Papers Accessed: January 12, 2026
2023
-
[21]
Leslie Lamport. 2002.Specifying Systems: The TLA+ Language and Tools for Hard- ware and Software Engineers [Book Review]. Vol. 35. Addison-Wesley Professional. 81–81 pages. doi:10.1109/mc.2002.1033032
-
[22]
Lorch, Oded Padon, and Bryan Parno
Andrea Lattuada, Travis Hance, Jay Bosamiya, Matthias Brun, Chanhee Cho, Hayley LeBlanc, Pranav Srinivasan, Reto Achermann, Tej Chajed, Chris Haw- blitzel, Jon Howell, Jacob R. Lorch, Oded Padon, and Bryan Parno. 2024. Verus: A Practical Foundation for Systems Verification. InProceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP ’24)...
-
[23]
Chenghao Liu, Wenjing Yang, Himanshu Mittal, Manpreet Singh, Doyen Sahoo, and S. Hoi. 2023. PyRCA: A Library for Metric-based Root Cause Analysis. arXiv:2306.11417 [cs.AI] doi:10.48550/arXiv.2306.11417
-
[24]
Haoran Ma, Yifan Qiao, Shiafun Liu, Shan Yu, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, and Harry Xu. 2024. DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ul- tra Efficiency. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 97–115. doi:10.48550/arXiv.2406.02803
-
[25]
Simon Meier, Benedikt Schmidt, Cas Cremers, and David Basin. 2013. The TAMARIN Prover for the Symbolic Analysis of Security Protocols. InComputer Aided Verification. Springer, 696–701. doi:10.1007/978-3-642-39799-8_48
-
[26]
Kun Ren, Dennis Li, and Daniel J. Abadi. 2019. SLOG: serializable, low-latency, geo-replicated transactions.Proceedings of the VLDB Endowment12, 11 (2019), 1747–1761. doi:10.14778/3342263.3342647
-
[27]
Yixin Song, Zeyu Mi, Haotong Xie, and Haibo Chen. 2024. PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU. InProceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP ’24). 590–606. doi:10.1145/3694715.3695964
-
[28]
Emil Stefanov, Marten Van Dijk, Elaine Shi, T.-H. Hubert Chan, Christopher Fletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas. 2018. Path ORAM: An Extremely Simple Oblivious RAM Protocol. InProceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security (CCS), Vol. 65. 299–310. doi:10.1145/3177872
-
[29]
Florian Tramer and Dan Boneh. 2018. Slalom: Fast, Verifiable and Private Execu- tion of Neural Networks in Trusted Hardware.arXiv preprint arXiv:1806.03287 (2018). http://arxiv.org/abs/1806.03287v2
work page Pith review arXiv 2018
-
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and I. Polosukhin. 2017. Attention is All you Need. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 30. doi:10.65215/nxvz2v36
-
[31]
Ke Wang, Felix Qu, Libin Xia, Zishuo Zhao, Chris Tong, Lynn Ai, and Eric Yang
-
[32]
doi:10.48550/ arXiv.2509.24257
VeriLLM: A Lightweight Framework for Publicly Verifiable Decentralized Inference.arXiv preprint arXiv:2509.24257abs/2509.24257 (2025). doi:10.48550/ arXiv.2509.24257
-
[33]
Zibo Wang, Pinghe Li, C. Liang, Feng Wu, and Francis Y. Yan. 2022. Autothrot- tle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices. In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). 149–165. http://arxiv.org/abs/2212.12180v5
-
[34]
Hao Wu, Yue Yu, Jun Deng, Shadi Ibrahim, Song Wu, Haoqiang Fan, Ziyue Cheng, and Hai Jin. 2024. StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow. In2024 USENIX Annual Technical Conference (USENIX ATC 24). 59–73. https://www.usenix.org/conference/atc24/presentation/wu-hao
2024
-
[35]
Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, and Shangguang Wang. 2024. FwdLLM: Efficient Federated Finetuning of Large Language Models with Per- turbed Inferences. In2024 USENIX Annual Technical Conference (USENIX ATC 24). 579–596. https://www.usenix.org/conference/atc24/presentation/xu-mengwei
2024
-
[36]
Yan, Hudson Ayers, Chenzhi Zhu, Sadjad Fouladi, James Hong, Keyi Zhang, P
Francis Y. Yan, Hudson Ayers, Chenzhi Zhu, Sadjad Fouladi, James Hong, Keyi Zhang, P. Levis, and Keith Winstein. 2019. Learning in situ: a randomized experiment in video streaming. InProceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2020). 495–511. http: //arxiv.org/abs/1906.01113v2
-
[37]
Yifan Yang, Lin He, Jiasheng Zhou, Xiaoyi Shi, Jiamin Cao, and Ying Liu. 2024. P4runpro: Enabling Runtime Programmability for RMT Programmable Switches. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). 393–409. doi:10.1145/3651890.3672230
-
[38]
Dingyan Zhang, Haotian Wang, Yang Liu, Xingda Wei, Yizhou Shan, Rong Chen, and Haibo Chen. 2024. BlitzScale: Fast and Live Large Model Autoscaling with O(1) Host Caching. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). 275–293. http://arxiv.org/abs/2412.17246v2
-
[39]
Shiwei Zhang, Lansong Diao, Chuan Wu, Zongyan Cao, Siyu Wang, and Wei Lin. 2024. HAP: SPMD DNN Training on Heterogeneous GPU Clusters with 10 Agentic Witnessing: TEE-Enabled Auditing Automated Program Synthesis. In2024 USENIX Annual Technical Conference (USENIX ATC 24). 19–36. doi:10.1145/3627703.3629580
-
[40]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, E. Xing, Haotong Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT- Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems, Vol. 36. http://arxiv.org/abs/2306.05685v4
work page internal anchor Pith review arXiv 2023
-
[41]
Beekman, Raluca A
Wenting Zheng, Ankur Dave, J. Beekman, Raluca A. Popa, Joseph E. Gonzalez, and Ion Stoica. 2017. Opaque: An Oblivious and Encrypted Distributed Ana- lytics Platform. In14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 283–298
2017
-
[42]
Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, Xiaozheng Lai, and Andrew Quinn. 2023. bpftime: userspace eBPF Runtime for Uprobe, Syscall and Kernel- User Interactions. arXiv:2311.07923 doi:10.48550/arXiv.2311.07923
-
[43]
Yusheng Zheng, Tong Yu, Yiwei Yang, Yanpeng Hu, Xiaozheng Lai, Dan Williams, and Andi Quinn. 2025. Extending Applications Safely and Efficiently. InPro- ceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). USENIX Association, Boston, MA, USA, 557–574. https://www.usenix.org/system/files/osdi25-zheng-yusheng.pdf
2025
-
[44]
Ziqiao Zhou, Anjali, Weiteng Chen, Sishuai Gong, Chris Hawblitzel, and Weidong Cui. 2024. VeriSMo: A Verified Security Module for Confidential VMs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 599–614. https://www.usenix.org/conference/osdi24/presentation/zhou
2024
-
[45]
Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tianyi Liu, Zihao Ye, Keisuke Kamahori, Chien-Nan Chen, Stephanie Wang, and Baris Kasikci. 2024. NanoFlow: Towards Optimal Large Language Model Serving Throughput. In19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). 749–765. doi:10.48550/...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.