DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

Dong Du; Haibo Chen; Jingkai He; Si Yu; Yubin Xia; Yunpeng Dong; Yuze Hou; Zhonghu Xu

arxiv: 2605.22781 · v1 · pith:YLLJHBWDnew · submitted 2026-05-21 · 💻 cs.OS · cs.AI

DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback

Yunpeng Dong , Jingkai He , Yuze Hou , Dong Du , Zhonghu Xu , Si Yu , Yubin Xia , Haibo Chen This is my paper

Pith reviewed 2026-05-22 02:40 UTC · model grok-4.3

classification 💻 cs.OS cs.AI

keywords AI agentscheckpoint rollbacksandboxOS mechanismsLLM agentsstate explorationtree searchreinforcement learning

0 comments

The pith

DeltaBox enables millisecond-level checkpoint and rollback for AI agent sandboxes by duplicating only changes between similar states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to remove a major slowdown for LLM agents that must rapidly explore many possible states, such as during tree search or reinforcement learning. Existing checkpoint and rollback methods copy the entire sandbox state each time, which takes hundreds of milliseconds and limits how far agents can search in a fixed time window. The authors notice that consecutive checkpoints in these workloads are usually very similar, so they replace full copies with incremental updates. They introduce DeltaBox, built on new OS mechanisms that track only the differences, and report 14 ms checkpoints and 5 ms rollbacks. If correct, agents could evaluate many more states without exhausting their time budget.

Core claim

The paper claims that an OS-level abstraction called DeltaState supports change-based transactional checkpoint and rollback. DeltaFS turns the filesystem into layers so that each checkpoint freezes the current writable layer and starts a new one, turning updates into copy-on-write operations and making rollback a simple layer switch. DeltaCR performs incremental process-state dumps and accelerates rollback by forking directly from a frozen template process instead of replaying logs. These two mechanisms together allow DeltaBox to capture and restore the full sandbox state, including files and process memory, at millisecond latency.

What carries the argument

DeltaState, a new OS-level abstraction that treats checkpoint and rollback as transactional operations on the differences between consecutive states, implemented through the paired mechanisms of DeltaFS for layered file management and DeltaCR for incremental process forking.

If this is right

Agents can explore substantially more nodes in search trees or RL episodes under any fixed time limit.
High-frequency state exploration becomes feasible for test-time scaling methods that previously hit latency walls.
Full sandbox state including files, memory, and contexts can be saved and restored without duplicating unchanged portions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same incremental approach could apply to other repeated-simulation workloads such as debugging sessions or multi-agent environments.
Lower C/R latency may reduce the total compute hours needed when scaling agent training or evaluation across clusters.
Operating systems might eventually expose similar change-tracking primitives as standard facilities for stateful AI code.

Load-bearing premise

Subsequent checkpoints in AI agent workloads remain highly similar, so that the cost of tracking and managing the incremental changes stays far below the cost of full duplication.

What would settle it

A direct measurement of checkpoint similarity on standard agent benchmarks such as SWE-bench showing low overlap between consecutive states, with the resulting incremental overhead exceeding the savings from full copies.

Figures

Figures reproduced from arXiv: 2605.22781 by Dong Du, Haibo Chen, Jingkai He, Si Yu, Yubin Xia, Yunpeng Dong, Yuze Hou, Zhonghu Xu.

**Figure 1.** Figure 1: Pass rate on SWE-bench Verified. (a) Linear ReAct vs. MCTS across three coding models. (b) Base vs. RL-trained across three open-weight model families. tree search and RL workloads. We propose the key insight of change-based DeltaState management. • We design DeltaFS, a runtime-reconfigurable overlayfs extension enabling unmount-free layer switching and lazy file descriptor redirection. • We design DeltaCR… view at source ↗

**Figure 4.** Figure 4: Fig.4.1 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 2.** Figure 2: Design Overview. DeltaBox utilizes diff-based checkpoint/restore (i.e., deltaCheckpoint and deltaRestore) to enable millisecond-level checkpoint/rollback. An agent application can fully run inside a sandbox, or utilize a sandbox to execute a set of tools and use the C/R capabilities of DeltaBox to ensure prompt rollback when necessary. Layer 4: Search Strategy (Linear / BoN / MCTS) Layer 2: DeltaFS (overla… view at source ↗

**Figure 3.** Figure 3: The DeltaBox architecture. The StateManager coordinates DeltaFS (Layer 2, filesystem state) and DeltaCR (Layer 3, process state) to maintain consistent (filesystem, memory) state pairs at each search tree node. Base storage (Layer 1) provides the real storage functionalities. It would be better to adopt XFS (with reflink) to achieve block-level CoW and eliminate write amplification. 1. The search strateg… view at source ↗

**Figure 4.** Figure 4: DeltaFS architecture. (a) Traditional overlayfs will prepare an upper layer file system which is writable for apps, and maintain a lower layer file system which is read-only and basically includes everything in the sandbox image. (b) DeltaFS extends the idea to support dynamic overlay, i.e., when an agent finishes a step of task and needs to make a checkpoint, instead of duplicating all files, DeltaFS inse… view at source ↗

**Figure 5.** Figure 5: DeltaCR architecture. Checkpointing creates both a CRIU image chain and a frozen template. Restores use the template fast path on hit, or the CRIU chain on miss; NPD keeps external I/O off the agent path. Dual-path checkpoint. At every checkpoint, DeltaCR simultaneously performs an asynchronous CRIU incremental dump and a template-creating fork(). The CRIU dump provides a durable image for crash recovery … view at source ↗

**Figure 4.** Figure 4: Fig.4.1 [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 6.** Figure 6: Per-event blocking-time CDF, pooled across the 9 trajectory replays underlying Table. 2 (3 workloads × 3 reps): (a) checkpoint, (b) restore. DeltaBox’s distribution is shifted 1– 3 orders of magnitude left of every coupled backend; the gap holds into the tail (no event-class crossover at any percentile). flask sympy django astropy matplotlib 0 1 2 3 4 Time / LLM-only floor (×) 1.05× 1.06× 1.04× 1.03× 1.04×… view at source ↗

**Figure 7.** Figure 7: End-to-end time for a 100-iteration MCTS trajectory (Qwen3-Coder-30B) on five SWE-bench Verified instances, normalized per-instance to the LLM-only floor (1.0× = pure LLM RTT sum). FC-Diff+dm and CHV+dm pair each VMM with dm-snapshot for filesystem coupling; FC-Diff+dm’s chain merge follows only the MCTS ancestor path. See §6.2.1 for the CubeSandbox approximation rationale. at the largest fan-out (p99=14.… view at source ↗

**Figure 8.** Figure 8: RL training fan-out characterisation, 5 substrates on the same GPU cluster. (a) Substrate 1:𝑁 fan-out latency on a single GPU (144 MB synthetic-anonymous parent template; 𝐾=10 prefork actions × 5 MB mid-state; 5 reps). The parent here is larger than DeltaBox’s realistic ∼15 MB agent.py measured in Table.3, so this panel stress-tests the substrate primitives rather than DeltaBox’s production parent RSS. (b)… view at source ↗

**Figure 9.** Figure 9: Per-event ckpt/restore latency vs. per-event [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Per-event CoW fault absorption vs. post-restore idle window (50 swe-search MCTS restore events). Shaded band: p1–p99 idle range from 2,311 LLM-driven restore events. Reflink-aware copy-up shares extents with the lower layer; only the 4 KB blocks the partial-write actually dirties contribute to duplicated bytes, so the reflink curve sits below the no-reflink lines but tracks them in slope. The benefit gro… view at source ↗

**Figure 13.** Figure 13: End-of-trajectory CRIU dump storage on 9 SWEbench instances (3 per archetype, averaged), replayed through real criu dump on a 5 MB Python process matched to the Mode A footprint. Comparison: reachability-aware GC (§5.2.1) versus retaining every checkpoint. GC effectiveness. In a stress test with moderate branching factor and depth, the GC mechanism reclaims snapshot storage at each prune event. GC runs … view at source ↗

**Figure 11.** Figure 11: Per-edit copy-up bytes (a) and physical I/O bytes (b) vs. edited-file size (log–log); real SWE-bench agent edits across three filesystem configurations, per-bin medians. Shaded band marks the typical agent edit range. ext4 and XFS-withoutreflink coincide on (a): copy-up benefit comes entirely from reflink, not XFS. 10 −1 10 0 10 1 10 2 Per-event checkpoint latency (ms) 0 100 200 ckpt events Standard (n=… view at source ↗

**Figure 12.** Figure 12: Per-event checkpoint latency on 1,689 ckpt events from 87 MCTS runs across 9 SWE-bench Verified repositories. Adaptive: pure-read cmds (LW, blue; 𝑛=1047) skip the dump; FS-mutating cmds (std, orange; 𝑛=642) take the full incremental dump. Standard (gray dashed): same events forced through the std path; 62.0% of events route to the LW peak. Lightweight skip ratio. Across 87 production MCTS runs spanning 9 … view at source ↗

read the original abstract

LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learning), relying on rapid checkpoint and rollback (C/R) of the complete sandbox state, including files and process state (e.g., memory, contexts, etc.). Existing mechanisms duplicate the entire state, causing hundreds of milliseconds to seconds of latency per C/R, which severely bottlenecks deep search and large-scale fan-outs. This paper observes that subsequent checkpoints in AI agents are highly similar. Therefore, instead of full duplication, a sandbox should only duplicate the changes between consecutive checkpoints (Key Insight). However, it is non-trivial to realize the idea, mainly due to the missing OS supports. This paper proposes a new OS-level abstraction, DeltaState, to enable the change-based transactional C/R for AI agents with two co-designed OS mechanisms. First, DeltaFS enables change-based filesystem C/R by organizing the file states into layers and dynamically freezing the writable layer and inserting a new one during checkpoint, reducing file updates to copy-on-write, and making rollback a simple layer switch. Second, DeltaCR enables change-based process state C/R using incremental dumps, and accelerates rollback by bypassing traditional pipelines to directly fork() from a frozen template process. We then present DeltaBox, a novel agent sandbox achieving millisecond level C/R through the two new mechanisms. Evaluations on SWE-bench and RL micro-benchmarks show DeltaBox completes checkpoint and rollback in millisecond-level latency (14ms and 5ms, respectively), empowering agents to explore substantially more nodes under fixed time budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeltaBox shows a practical way to cut sandbox C/R to milliseconds by copying only changes, which could help high-frequency agent search if the similarity assumption holds up in the full data.

read the letter

The core takeaway is that this paper gets checkpoint and rollback down to 14 ms and 5 ms for AI-agent sandboxes by building DeltaFS for layered file states and DeltaCR for incremental process forking instead of full duplication. That directly targets the bottleneck in tree search and RL loops where you need to explore many similar states quickly on things like SWE-bench. The co-design of the two mechanisms is the actual new piece; prior work either did full copies or generic VM snapshots that stayed too slow for this use case. The evaluations report concrete speedups in node exploration under fixed time budgets, which is the kind of result that matters for systems work on agent infrastructure. The mechanisms look grounded in real OS primitives like copy-on-write layers and direct fork from templates, and the paper ships the latency numbers rather than just theory. The soft spot is the load-bearing claim that consecutive agent checkpoints stay highly similar. The abstract states this as the key insight, but if delta sizes turn out larger than expected or if layer overhead grows with depth, the millisecond numbers would shrink and the extra nodes explored would be less impressive. The stress-test note is right to flag the lack of reported delta-size histograms or ablation on layer costs; those details need to be in the full evaluation to make the speedup claim stick across workloads. Readers who build LLM agents or work on OS primitives for stateful execution will get the most out of it. The paper is clear enough on its own terms and addresses a stated gap without circular reasoning or fitted parameters. It deserves a serious referee because the problem is timely and the proposed mechanisms are specific enough to evaluate and build on.

Referee Report

3 major / 1 minor

Summary. The paper presents DeltaBox, a new OS-level sandbox for stateful LLM agents that achieves millisecond-scale checkpoint/rollback by exploiting high similarity between consecutive agent states. It introduces the DeltaState abstraction together with two mechanisms: DeltaFS, which organizes filesystem state into layers and uses copy-on-write plus layer switching for incremental C/R, and DeltaCR, which performs incremental process-state dumps and accelerates rollback via direct fork from a frozen template. Evaluations on SWE-bench and RL micro-benchmarks report 14 ms checkpoint and 5 ms rollback latencies, enabling agents to explore substantially more nodes under fixed time budgets.

Significance. If the reported latencies and the underlying similarity assumption hold across realistic agent workloads, the work would meaningfully advance scalable tree search and reinforcement learning for agents by removing a major C/R bottleneck. The co-design of OS abstractions (DeltaFS layering and DeltaCR incremental fork) is a concrete engineering contribution that could be adopted beyond the immediate AI-agent setting.

major comments (3)

[Abstract] Abstract: the central performance claims (14 ms checkpoint, 5 ms rollback) are presented without error bars, number of runs, baseline latencies (e.g., CRIU or full-state duplication), or evaluation methodology, preventing verification of the stated orders-of-magnitude improvement.
[Abstract] Abstract: the load-bearing claim that 'subsequent checkpoints in AI agents are highly similar' is asserted without any supporting measurements—average delta sizes, similarity ratios, delta-size histograms, or ablation of layer-management overhead—leaving the speedup mechanism unverified even on the reported SWE-bench and RL workloads.
The manuscript does not discuss or quantify potential cumulative overheads of maintaining growing numbers of DeltaFS layers or the cost of incremental dumps in DeltaCR as checkpoint depth increases, which could erode the millisecond advantage in long-horizon agent sessions.

minor comments (1)

[Abstract] The abstract would be strengthened by a one-sentence comparison of the new latencies against a standard baseline such as CRIU or Docker checkpoint.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to strengthen the presentation and verification of our results.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claims (14 ms checkpoint, 5 ms rollback) are presented without error bars, number of runs, baseline latencies (e.g., CRIU or full-state duplication), or evaluation methodology, preventing verification of the stated orders-of-magnitude improvement.

Authors: We agree that the abstract would benefit from greater specificity to aid verification. In the revised manuscript we will update the abstract to note that the 14 ms and 5 ms figures are averages across multiple runs on the SWE-bench and RL micro-benchmarks, explicitly reference the baselines (CRIU and full-state duplication), and point to the Evaluation section for the full methodology, error bars, and per-run data. These details already exist in the body of the paper; the abstract revision will make them visible at the summary level. revision: yes
Referee: [Abstract] Abstract: the load-bearing claim that 'subsequent checkpoints in AI agents are highly similar' is asserted without any supporting measurements—average delta sizes, similarity ratios, delta-size histograms, or ablation of layer-management overhead—leaving the speedup mechanism unverified even on the reported SWE-bench and RL workloads.

Authors: The similarity property is the foundation of the reported speedups, and the millisecond latencies on the evaluated workloads serve as indirect evidence. To make the claim directly verifiable, we will add a dedicated paragraph (and accompanying figure) in the Evaluation section that reports average delta sizes, similarity ratios, delta-size histograms, and an ablation of layer-management overhead for both SWE-bench and RL workloads. A brief reference to these measurements will also be inserted into the abstract. revision: yes
Referee: The manuscript does not discuss or quantify potential cumulative overheads of maintaining growing numbers of DeltaFS layers or the cost of incremental dumps in DeltaCR as checkpoint depth increases, which could erode the millisecond advantage in long-horizon agent sessions.

Authors: This is a valid concern for long-horizon use cases. The current manuscript emphasizes per-operation latency but does not explicitly measure cumulative effects. We will add a new subsection in the Evaluation section that quantifies layer-maintenance and incremental-dump overhead as a function of checkpoint depth, together with experiments on extended agent sessions (hundreds of checkpoints) to demonstrate that the millisecond advantage is retained. We will also discuss any practical limits and mitigation strategies. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on new mechanisms and direct measurements

full rationale

The paper introduces DeltaState, DeltaFS, and DeltaCR as new OS abstractions motivated by an empirical observation of checkpoint similarity in AI agent workloads. The millisecond-level C/R latencies (14 ms checkpoint, 5 ms rollback) are presented as results of evaluations on SWE-bench and RL micro-benchmarks rather than any derived prediction, fitted parameter, or equation that reduces to prior inputs. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the provided text; the design and performance claims remain independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The central claim rests on the domain assumption of high checkpoint similarity and the introduction of two new OS abstractions whose correctness and performance are asserted via the described mechanisms.

axioms (1)

domain assumption Subsequent checkpoints in AI agents are highly similar
Stated as the key insight enabling change-based rather than full-duplication C/R.

invented entities (3)

DeltaState no independent evidence
purpose: New OS-level abstraction for change-based transactional checkpoint/rollback
Proposed to realize the similarity insight; no independent evidence outside the paper.
DeltaFS no independent evidence
purpose: Layered filesystem for change-based file C/R via copy-on-write and layer switching
New mechanism co-designed for the abstraction; no independent evidence outside the paper.
DeltaCR no independent evidence
purpose: Incremental process state C/R with direct fork from frozen template
New mechanism co-designed for the abstraction; no independent evidence outside the paper.

pith-pipeline@v0.9.0 · 5834 in / 1457 out tokens · 48356 ms · 2026-05-22T02:40:27.919368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 6 internal anchors

[1]

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-world Github Issues?. In In- ternational Conference on Learning Representations , B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun (Eds.), Vol. 2024. 54107–54157. https://proceedings.i...

work page 2024
[2]

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Re- alistic Web Environment for Building Autonomous Agents. InIn- ternational Conference on Learning Representations , B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. S...

work page 2024
[3]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv. org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

E2B. 2024. E2B: The Enterprise AI Agent Cloud.https://e2b.dev

work page 2024
[5]

Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar

work page
[6]

In In- ternational Conference on Learning Representations , Y

Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning. In In- ternational Conference on Learning Representations , Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu (Eds.), Vol. 2025. 10131– 10165. https://proceedings.iclr.cc/paper_files/paper/2025/file/ 1b623663fd9b874366f3ce019fdfdd44-Paper-Conference.pdf

work page 2025
[7]

Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. 2024. Language agent tree search unifies rea- soning, acting, and planning in language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Aus- tria)(ICML’24). JMLR.org, Article 2572, 23 pages

work page 2024
[8]

Cheng Zhang, Erhu Feng, Xi Zhao, Yisheng Zhao, Wangbo Gong, Jiahui Sun, Dong Du, Zhichao Hua, Yubin Xia, and Haibo Chen

work page
[9]

arXiv:2509.00531 [cs.MA] https://arxiv.org/abs/2509.00531

MobiAgent: A Systematic Framework for Customizable Mobile Agents. arXiv:2509.00531 [cs.MA] https://arxiv.org/abs/2509.00531

work page arXiv
[10]

The OpenClaw Project. 2026. openclaw/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way.https://github. com/openclaw/openclaw

work page 2026
[11]

OpenAI. 2024. OpenAI o1 System Card. arXiv: 2412.16720 [cs.AI] https://arxiv.org/abs/2412.16720

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capabil- ity in LLMs via Reinforcement Learning. arXiv: 2501.12948 [cs.AI] https://arxiv.org/abs/2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, and Azalia Mirhoseini. 2024. Large Lan- guage Monkeys: Scaling Inference Compute with Repeated Sampling. arXiv:2407.21787 [cs.LG] https://arxiv.org/abs/2407.21787

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Jingkai He, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yu- bin Xia, and Haibo Chen. 2026. History Doesn’t Repeat Itself but Roll- outs Rhyme: Accelerating Reinforcement Learning with RhymeRL. In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (USA) (ASPLOS ’26...

work page doi:10.1145/3779212.3790172 2026
[15]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathemat- ical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, juncai liu, LingJun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Ru Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xiangpeng Wei, Hao...

work page 2025
[17]

Lixiang Ao, George Porter, and Geoffrey M. Voelker. 2022. FaaSnap: FaaS made fast using snapshot-based VMs. InProceedings of the Sev- enteenth European Conference on Computer Systems (Rennes, France) (EuroSys ’22). Association for Computing Machinery, New York, NY, USA, 730–746. https://doi.org/10.1145/3492321.3524270

work page doi:10.1145/3492321.3524270 2022
[18]

Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Cheng- gang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub- millisecond Startup for Serverless Computing with Initialization-less Booting. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland)(...

work page arXiv 2020
[19]

Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, and Boris Grot. 2021. Benchmarking, analysis, and optimization of serverless function snapshots. InProceedings of the 26th ACM Inter- national Conference on Architectural Support for Programming Lan- guages and Operating Systems (Virtual, USA)(ASPLOS ’21). Associa- tion for Computing Machine...

work page doi:10.1145/3445814.3446714 2021
[20]

Xiaohu Chai, Tianyu Zhou, Keyang Hu, Jianfeng Tan, Tiwei Bie, Anqi Shen, Dawei Shen, Qi Xing, Shun Song, Tongkai Yang, Le Gao, Feng Yu, Zhengyu He, Dong Du, Yubin Xia, Kang Chen, and Yu Chen. 2025. Fork in the road: reflections and optimizations for cold start latency in production serverless systems. InProceedings of the 19th USENIX Conference on Operati...

work page 2025
[21]

Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, and Ana Klimovic. 2024. Dirigent: Lightweight Serverless Orches- tration. InProceedings of the ACM SIGOPS 30th Symposium on Op- erating Systems Principles (Austin, TX, USA)(SOSP ’24). Association for Computing Machinery, New York, NY, USA, 369–384. https: //doi.org/10.1145/3694715.3695966

work page doi:10.1145/3694715.3695966 2024
[22]

Dong Du, Qingyuan Liu, Xueqiang Jiang, Yubin Xia, Binyu Zang, and Haibo Chen. 2022. Serverless computing on heterogeneous comput- ers. In Proceedings of the 27th ACM International Conference on Archi- tectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland)(ASPLOS ’22) . Association for Computing Machinery, New York, NY, US...

work page arXiv 2022
[23]

Zijun Li, Jiagan Cheng, Quan Chen, Eryu Guan, Zizheng Bian, Yi Tao, Bin Zha, Qiang Wang, Weidong Han, and Minyi Guo. 2022. RunD: A Lightweight Secure Container Runtime for High-density Deploy- ment and High-concurrency Startup in Serverless Computing. In2022 USENIX Annual Technical Conference (USENIX ATC 22) . USENIX As- sociation, Carlsbad, CA, 53–68.htt...

work page 2022
[24]

Hanfei Yu, Rohan Basu Roy, Christian Fontenot, Devesh Tiwari, Jian Li, Hong Zhang, Hao Wang, and Seung-Jong Park. 2024. Rainbow- Cake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1...

work page doi:10.1145/3617232.3624871 2024
[25]

Jialiang Huang, MingXing Zhang, Teng Ma, Zheng Liu, Sixing Lin, Kang Chen, Jinlei Jiang, Xia Liao, Yingdi Shan, Ning Zhang, Mengt- ing Lu, Tao Ma, Haifeng Gong, and YongWei Wu. 2024. TrEnv: Trans- parently Share Serverless Execution Environments Across Different Functions and Nodes. InProceedings of the ACM SIGOPS 30th Sym- posium on Operating Systems Pri...

work page doi:10.1145/3694715.3695967 2024
[26]

Frans Kaashoek

Ariel Szekely, Adam Belay, Robert Morris, and M. Frans Kaashoek

work page
[27]

In Proceedings of the ACM SIGOPS 30th Symposium on Operating Sys- tems Principles (Austin, TX, USA) (SOSP ’24)

Unifying serverless and microservice workloads with SigmaOS. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Sys- tems Principles (Austin, TX, USA) (SOSP ’24) . Association for Com- puting Machinery, New York, NY, USA, 385–402. https://doi.org/10. 1145/3694715.3695947

work page arXiv
[28]

E2B. 2026. E2B Sandbox persistence. https://e2b.dev/docs/sandbox/ persistence

work page 2026
[29]

The CRIU Project. 2011. CRIU: Checkpoint/Restore In Userspace. https://criu.org

work page 2011
[30]

Alexandru Agache, Marc Brooker, Andreea Florescu, Alexandra Ior- dache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana- Maria Popa. 2020. Firecracker: lightweight virtualization for server- less applications. InProceedings of the 17th Usenix Conference on Net- worked Systems Design and Implementation (Santa Clara, CA, USA) (NSDI’20). USENIX Ass...

work page 2020
[31]

DeepSeek-AI. 2026. DeepSeek-V4 Technical Report . Technical Re- port. DeepSeek-AI. https://huggingface.co/deepseek-ai/DeepSeek- V4-Pro/blob/main/DeepSeek_V4.pdf

work page 2026
[32]

Tencent Cloud. 2026. CubeSandbox. https://github.com/ TencentCloud/CubeSandbox

work page 2026
[33]

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. 2024. OSWorld: Bench- marking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. InAdvances in Neural Infor...

work page doi:10.52202/079017-1650 2024
[34]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InIn- ternational Conference on Learning Repre...

work page 2024
[35]

John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent- Computer Interfaces Enable Automated Software Engineering. In Advances in Neural Information Processing Systems , A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Assoc...

work page doi:10.52202/079017-1601 2024
[36]

Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Daniel Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. 2025. OpenHands: An Open Platform for AI So...

work page 2025
[37]

Paul Gauthier. 2023. Aider: AI Pair Programming in Your Terminal. https://aider.chat/

work page 2023
[38]

Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. SOCK: 18 Rapid Task Provisioning with Serverless-Optimized Containers. In 2018 USENIX Annual Technical Conference (ATC) . USENIX Associa- tion, 57–70. https://www.usenix.org/conference/atc18/presentation/ oakes

work page 2018
[39]

James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and Jonathan Appavoo. 2020. SEUSS: skip redundant paths to make serverless fast. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece)(EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 32, 15 pages. https://doi.org/10.1145/3342...

work page doi:10.1145/3342195.3392698 2020
[40]

Nikita Lazarev, Varun Gohil, James Tsai, Andy Anderson, Bhushan Chitlur, Zhiru Zhang, and Christina Delimitrou. 2024. Sabre: hardware-accelerated snapshot compression for serverless Mi- croVMs. In Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation (Santa Clara, CA, USA)(OSDI’24). USENIX Association, USA, Article 1, 18 pages

work page 2024
[41]

LangChain, Inc. 2024. LangGraph: Building Stateful, Multi-Actor Ap- plications with LLMs.https://github.com/langchain-ai/langgraph

work page 2024
[42]

Qiu, and Yuqing Yang

Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, and Yuqing Yang. 2025. Agent Lightning: Train ANY AI Agents with Reinforcement Learning. arXiv:2508.03680 [cs.AI] https://arxiv.org/abs/2508.03680

work page arXiv 2025
[43]

Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Wang. 2025. SWE-Search: Enhancing Soft- ware Agents with Monte Carlo Tree Search and Iterative Re- finement. In International Conference on Learning Representations , Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu (Eds.), Vol. 2025. 64485–64515. https://proceedings.iclr.cc/p...

work page 2025
[44]

LangChain, Inc. 2022. LangChain: Building Applications with LLMs through Composability.https://github.com/langchain-ai/langchain

work page 2022
[45]

Gonzalez, Hao Zhang, and Ion Sto- ica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Sto- ica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP ’23)

work page 2023
[46]

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. 2025. Hybrid- Flow: A Flexible and Efficient RLHF Framework. InProceedings of the Twentieth European Conference on Computer Systems (EuroSys ’25)

work page 2025
[47]

Jian Hu, Xibin Wu, Zilin Zhu, Xianyu, Weixun Wang, Dehao Zhang, and Yu Cao. 2024. OpenRLHF: An Easy-to-use, Scalable and High- performance RLHF Framework. arXiv: 2405.11143 [cs.AI] https:// arxiv.org/abs/2405.11143

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

Daytona. 2024. Daytona.https://daytona.io

work page 2024
[49]

ZeroBoot. 2026. ZeroBoot: Sub-millisecond VM Sandboxes for AI Agents via Copy-on-Write Forking.https://github.com/zerobootdev/ zeroboot

work page 2026
[50]

Boyang Yan. 2025. Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution. arXiv:2512.12806 [cs.AI] https://arxiv.org/abs/2512.12806

work page arXiv 2025
[51]

Jialiang Huang, Teng Ma, Zheng Liu, Sixing Lin, Kang Chen, Jinlei Jiang, Xia Liao, Yingdi Shan, Yongwei Wu, Ning Zhang, Mengting Lu, Tao Ma, Haifeng Gong, and Mingxing Zhang. 2026. TrEnv-X: Trans- parently Share Serverless Execution Environments Across Different Functions and Nodes.ACM Transactions on Computer Systems(March 2026). https://doi.org/10.1145/3805475

work page doi:10.1145/3805475 2026
[52]

Ben Holmes, Baltasar Dinis, Lana Honcharuk, Joshua Fried, and Adam Belay. 2025. Taming Serverless Cold Starts Through OS Co- Design. arXiv: 2509.14292 [cs.OS] https://arxiv.org/abs/2509.14292

work page arXiv 2025
[53]

Yanning Yang, Dong Du, Haitao Song, and Yubin Xia. 2024. On- demand and Parallel Checkpoint/Restore for GPU Applications. In Proceedings of the 2024 ACM Symposium on Cloud Computing (Red- mond, W A, USA) (SoCC ’24) . Association for Computing Machin- ery, New York, NY, USA, 415–433. https://doi.org/10.1145/3698038. 3698510

work page doi:10.1145/3698038 2024
[54]

Tullmann, J

P. Tullmann, J. Lepreau, B. Ford, and M. Hibler. 1996. User-level check- pointing through exportable kernel state. InProceedings of the Fifth In- ternational Workshop on Object-Orientation in Operation Systems . 85–

work page 1996
[55]

https://doi.org/10.1109/IWOOOS.1996.557874

work page doi:10.1109/iwooos.1996.557874 1996
[56]

Dirk Vogt, Armando Miraglia, Georgios Portokalidis, Herbert Bos, Andy Tanenbaum, and Cristiano Giuffrida. 2015. Speculative Mem- ory Checkpointing. In Proceedings of the 16th Annual Middleware Conference (Vancouver, BC, Canada) (Middleware ’15) . Association for Computing Machinery, New York, NY, USA, 197–209. https: //doi.org/10.1145/2814576.2814802

work page doi:10.1145/2814576.2814802 2015
[57]

Dearle and D

A. Dearle and D. Hulse. 1995. On page-based optimistic process check- pointing. InProceedings of International Workshop on Object Orien- tation in Operating Systems . 24–32. https://doi.org/10.1109/IWOOS. 1995.470583

work page doi:10.1109/iwoos 1995
[58]

Emil Tsalapatis, Ryan Hancock, Tavian Barnes, and Ali José Mashti- zadeh. 2021. The Aurora Single Level Store Operating System. InPro- ceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany)(SOSP ’21). Association for Com- puting Machinery, New York, NY, USA, 788–803. https://doi.org/10. 1145/3477132.3483563

work page arXiv 2021
[59]

Plank, Micah Beck, Gerry Kingsley, and Kai Li

James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li. 1995. Libckpt: transparent checkpointing under Unix. In Proceedings of the USENIX 1995 Technical Conference Proceedings (New Orleans, Louisiana)(TCON’95). USENIX Association, USA, 18

work page 1995
[60]

Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Trans- parent checkpointing for cluster computations and the desktop. In Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’09). IEEE Computer Society, USA, 1–12. https://doi.org/10.1109/IPDPS.2009.5161063

work page doi:10.1109/ipdps.2009.5161063 2009
[61]

The Btrfs Project. 2009. Btrfs Documentation. https://btrfs. readthedocs.io

work page 2009
[62]

OpenZFS. 2013. OpenZFS Documentation.https://openzfs.github.io/ openzfs-docs/

work page 2013
[63]

Linux Kernel Project. 2018. EROFS: Enhanced Read-Only File System. https://erofs.docs.kernel.org. 19

work page 2018

[1] [1]

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-world Github Issues?. In In- ternational Conference on Learning Representations , B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun (Eds.), Vol. 2024. 54107–54157. https://proceedings.i...

work page 2024

[2] [2]

Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. 2024. WebArena: A Re- alistic Web Environment for Building Autonomous Agents. InIn- ternational Conference on Learning Representations , B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. S...

work page 2024

[3] [3]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv. org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

E2B. 2024. E2B: The Enterprise AI Agent Cloud.https://e2b.dev

work page 2024

[5] [5]

Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar

work page

[6] [6]

In In- ternational Conference on Learning Representations , Y

Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning. In In- ternational Conference on Learning Representations , Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu (Eds.), Vol. 2025. 10131– 10165. https://proceedings.iclr.cc/paper_files/paper/2025/file/ 1b623663fd9b874366f3ce019fdfdd44-Paper-Conference.pdf

work page 2025

[7] [7]

Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, and Yu-Xiong Wang. 2024. Language agent tree search unifies rea- soning, acting, and planning in language models. InProceedings of the 41st International Conference on Machine Learning (Vienna, Aus- tria)(ICML’24). JMLR.org, Article 2572, 23 pages

work page 2024

[8] [8]

Cheng Zhang, Erhu Feng, Xi Zhao, Yisheng Zhao, Wangbo Gong, Jiahui Sun, Dong Du, Zhichao Hua, Yubin Xia, and Haibo Chen

work page

[9] [9]

arXiv:2509.00531 [cs.MA] https://arxiv.org/abs/2509.00531

MobiAgent: A Systematic Framework for Customizable Mobile Agents. arXiv:2509.00531 [cs.MA] https://arxiv.org/abs/2509.00531

work page arXiv

[10] [10]

The OpenClaw Project. 2026. openclaw/openclaw: Your own personal AI assistant. Any OS. Any Platform. The lobster way.https://github. com/openclaw/openclaw

work page 2026

[11] [11]

OpenAI. 2024. OpenAI o1 System Card. arXiv: 2412.16720 [cs.AI] https://arxiv.org/abs/2412.16720

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capabil- ity in LLMs via Reinforcement Learning. arXiv: 2501.12948 [cs.AI] https://arxiv.org/abs/2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, and Azalia Mirhoseini. 2024. Large Lan- guage Monkeys: Scaling Inference Compute with Repeated Sampling. arXiv:2407.21787 [cs.LG] https://arxiv.org/abs/2407.21787

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Jingkai He, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yu- bin Xia, and Haibo Chen. 2026. History Doesn’t Repeat Itself but Roll- outs Rhyme: Accelerating Reinforcement Learning with RhymeRL. In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (USA) (ASPLOS ’26...

work page doi:10.1145/3779212.3790172 2026

[15] [15]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathemat- ical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, juncai liu, LingJun Liu, Xin Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Guangming Sheng, Yuxuan Tong, Chi Zhang, Mofan Zhang, Ru Zhang, Wang Zhang, Hang Zhu, Jinhua Zhu, Jiaze Chen, Jiangjie Chen, Chengyi Wang, Hongli Yu, Yuxuan Song, Xiangpeng Wei, Hao...

work page 2025

[17] [17]

Lixiang Ao, George Porter, and Geoffrey M. Voelker. 2022. FaaSnap: FaaS made fast using snapshot-based VMs. InProceedings of the Sev- enteenth European Conference on Computer Systems (Rennes, France) (EuroSys ’22). Association for Computing Machinery, New York, NY, USA, 730–746. https://doi.org/10.1145/3492321.3524270

work page doi:10.1145/3492321.3524270 2022

[18] [18]

Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Cheng- gang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub- millisecond Startup for Serverless Computing with Initialization-less Booting. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland)(...

work page arXiv 2020

[19] [19]

Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, and Boris Grot. 2021. Benchmarking, analysis, and optimization of serverless function snapshots. InProceedings of the 26th ACM Inter- national Conference on Architectural Support for Programming Lan- guages and Operating Systems (Virtual, USA)(ASPLOS ’21). Associa- tion for Computing Machine...

work page doi:10.1145/3445814.3446714 2021

[20] [20]

Xiaohu Chai, Tianyu Zhou, Keyang Hu, Jianfeng Tan, Tiwei Bie, Anqi Shen, Dawei Shen, Qi Xing, Shun Song, Tongkai Yang, Le Gao, Feng Yu, Zhengyu He, Dong Du, Yubin Xia, Kang Chen, and Yu Chen. 2025. Fork in the road: reflections and optimizations for cold start latency in production serverless systems. InProceedings of the 19th USENIX Conference on Operati...

work page 2025

[21] [21]

Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, and Ana Klimovic. 2024. Dirigent: Lightweight Serverless Orches- tration. InProceedings of the ACM SIGOPS 30th Symposium on Op- erating Systems Principles (Austin, TX, USA)(SOSP ’24). Association for Computing Machinery, New York, NY, USA, 369–384. https: //doi.org/10.1145/3694715.3695966

work page doi:10.1145/3694715.3695966 2024

[22] [22]

Dong Du, Qingyuan Liu, Xueqiang Jiang, Yubin Xia, Binyu Zang, and Haibo Chen. 2022. Serverless computing on heterogeneous comput- ers. In Proceedings of the 27th ACM International Conference on Archi- tectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland)(ASPLOS ’22) . Association for Computing Machinery, New York, NY, US...

work page arXiv 2022

[23] [23]

Zijun Li, Jiagan Cheng, Quan Chen, Eryu Guan, Zizheng Bian, Yi Tao, Bin Zha, Qiang Wang, Weidong Han, and Minyi Guo. 2022. RunD: A Lightweight Secure Container Runtime for High-density Deploy- ment and High-concurrency Startup in Serverless Computing. In2022 USENIX Annual Technical Conference (USENIX ATC 22) . USENIX As- sociation, Carlsbad, CA, 53–68.htt...

work page 2022

[24] [24]

Hanfei Yu, Rohan Basu Roy, Christian Fontenot, Devesh Tiwari, Jian Li, Hong Zhang, Hao Wang, and Seung-Jong Park. 2024. Rainbow- Cake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1...

work page doi:10.1145/3617232.3624871 2024

[25] [25]

Jialiang Huang, MingXing Zhang, Teng Ma, Zheng Liu, Sixing Lin, Kang Chen, Jinlei Jiang, Xia Liao, Yingdi Shan, Ning Zhang, Mengt- ing Lu, Tao Ma, Haifeng Gong, and YongWei Wu. 2024. TrEnv: Trans- parently Share Serverless Execution Environments Across Different Functions and Nodes. InProceedings of the ACM SIGOPS 30th Sym- posium on Operating Systems Pri...

work page doi:10.1145/3694715.3695967 2024

[26] [26]

Frans Kaashoek

Ariel Szekely, Adam Belay, Robert Morris, and M. Frans Kaashoek

work page

[27] [27]

In Proceedings of the ACM SIGOPS 30th Symposium on Operating Sys- tems Principles (Austin, TX, USA) (SOSP ’24)

Unifying serverless and microservice workloads with SigmaOS. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Sys- tems Principles (Austin, TX, USA) (SOSP ’24) . Association for Com- puting Machinery, New York, NY, USA, 385–402. https://doi.org/10. 1145/3694715.3695947

work page arXiv

[28] [28]

E2B. 2026. E2B Sandbox persistence. https://e2b.dev/docs/sandbox/ persistence

work page 2026

[29] [29]

The CRIU Project. 2011. CRIU: Checkpoint/Restore In Userspace. https://criu.org

work page 2011

[30] [30]

Alexandru Agache, Marc Brooker, Andreea Florescu, Alexandra Ior- dache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana- Maria Popa. 2020. Firecracker: lightweight virtualization for server- less applications. InProceedings of the 17th Usenix Conference on Net- worked Systems Design and Implementation (Santa Clara, CA, USA) (NSDI’20). USENIX Ass...

work page 2020

[31] [31]

DeepSeek-AI. 2026. DeepSeek-V4 Technical Report . Technical Re- port. DeepSeek-AI. https://huggingface.co/deepseek-ai/DeepSeek- V4-Pro/blob/main/DeepSeek_V4.pdf

work page 2026

[32] [32]

Tencent Cloud. 2026. CubeSandbox. https://github.com/ TencentCloud/CubeSandbox

work page 2026

[33] [33]

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, and Tao Yu. 2024. OSWorld: Bench- marking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. InAdvances in Neural Infor...

work page doi:10.52202/079017-1650 2024

[34] [34]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2024. AgentBench: Evaluating LLMs as Agents. InIn- ternational Conference on Learning Repre...

work page 2024

[35] [35]

John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent- Computer Interfaces Enable Automated Software Engineering. In Advances in Neural Information Processing Systems , A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Assoc...

work page doi:10.52202/079017-1601 2024

[36] [36]

Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Daniel Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. 2025. OpenHands: An Open Platform for AI So...

work page 2025

[37] [37]

Paul Gauthier. 2023. Aider: AI Pair Programming in Your Terminal. https://aider.chat/

work page 2023

[38] [38]

Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. SOCK: 18 Rapid Task Provisioning with Serverless-Optimized Containers. In 2018 USENIX Annual Technical Conference (ATC) . USENIX Associa- tion, 57–70. https://www.usenix.org/conference/atc18/presentation/ oakes

work page 2018

[39] [39]

James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and Jonathan Appavoo. 2020. SEUSS: skip redundant paths to make serverless fast. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece)(EuroSys ’20). Association for Computing Machinery, New York, NY, USA, Article 32, 15 pages. https://doi.org/10.1145/3342...

work page doi:10.1145/3342195.3392698 2020

[40] [40]

Nikita Lazarev, Varun Gohil, James Tsai, Andy Anderson, Bhushan Chitlur, Zhiru Zhang, and Christina Delimitrou. 2024. Sabre: hardware-accelerated snapshot compression for serverless Mi- croVMs. In Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation (Santa Clara, CA, USA)(OSDI’24). USENIX Association, USA, Article 1, 18 pages

work page 2024

[41] [41]

LangChain, Inc. 2024. LangGraph: Building Stateful, Multi-Actor Ap- plications with LLMs.https://github.com/langchain-ai/langgraph

work page 2024

[42] [42]

Qiu, and Yuqing Yang

Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, and Yuqing Yang. 2025. Agent Lightning: Train ANY AI Agents with Reinforcement Learning. arXiv:2508.03680 [cs.AI] https://arxiv.org/abs/2508.03680

work page arXiv 2025

[43] [43]

Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, and William Wang. 2025. SWE-Search: Enhancing Soft- ware Agents with Monte Carlo Tree Search and Iterative Re- finement. In International Conference on Learning Representations , Y. Yue, A. Garg, N. Peng, F. Sha, and R. Yu (Eds.), Vol. 2025. 64485–64515. https://proceedings.iclr.cc/p...

work page 2025

[44] [44]

LangChain, Inc. 2022. LangChain: Building Applications with LLMs through Composability.https://github.com/langchain-ai/langchain

work page 2022

[45] [45]

Gonzalez, Hao Zhang, and Ion Sto- ica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Sto- ica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles (SOSP ’23)

work page 2023

[46] [46]

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. 2025. Hybrid- Flow: A Flexible and Efficient RLHF Framework. InProceedings of the Twentieth European Conference on Computer Systems (EuroSys ’25)

work page 2025

[47] [47]

Jian Hu, Xibin Wu, Zilin Zhu, Xianyu, Weixun Wang, Dehao Zhang, and Yu Cao. 2024. OpenRLHF: An Easy-to-use, Scalable and High- performance RLHF Framework. arXiv: 2405.11143 [cs.AI] https:// arxiv.org/abs/2405.11143

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

Daytona. 2024. Daytona.https://daytona.io

work page 2024

[49] [49]

ZeroBoot. 2026. ZeroBoot: Sub-millisecond VM Sandboxes for AI Agents via Copy-on-Write Forking.https://github.com/zerobootdev/ zeroboot

work page 2026

[50] [50]

Boyang Yan. 2025. Fault-Tolerant Sandboxing for AI Coding Agents: A Transactional Approach to Safe Autonomous Execution. arXiv:2512.12806 [cs.AI] https://arxiv.org/abs/2512.12806

work page arXiv 2025

[51] [51]

Jialiang Huang, Teng Ma, Zheng Liu, Sixing Lin, Kang Chen, Jinlei Jiang, Xia Liao, Yingdi Shan, Yongwei Wu, Ning Zhang, Mengting Lu, Tao Ma, Haifeng Gong, and Mingxing Zhang. 2026. TrEnv-X: Trans- parently Share Serverless Execution Environments Across Different Functions and Nodes.ACM Transactions on Computer Systems(March 2026). https://doi.org/10.1145/3805475

work page doi:10.1145/3805475 2026

[52] [52]

Ben Holmes, Baltasar Dinis, Lana Honcharuk, Joshua Fried, and Adam Belay. 2025. Taming Serverless Cold Starts Through OS Co- Design. arXiv: 2509.14292 [cs.OS] https://arxiv.org/abs/2509.14292

work page arXiv 2025

[53] [53]

Yanning Yang, Dong Du, Haitao Song, and Yubin Xia. 2024. On- demand and Parallel Checkpoint/Restore for GPU Applications. In Proceedings of the 2024 ACM Symposium on Cloud Computing (Red- mond, W A, USA) (SoCC ’24) . Association for Computing Machin- ery, New York, NY, USA, 415–433. https://doi.org/10.1145/3698038. 3698510

work page doi:10.1145/3698038 2024

[54] [54]

Tullmann, J

P. Tullmann, J. Lepreau, B. Ford, and M. Hibler. 1996. User-level check- pointing through exportable kernel state. InProceedings of the Fifth In- ternational Workshop on Object-Orientation in Operation Systems . 85–

work page 1996

[55] [55]

https://doi.org/10.1109/IWOOOS.1996.557874

work page doi:10.1109/iwooos.1996.557874 1996

[56] [56]

Dirk Vogt, Armando Miraglia, Georgios Portokalidis, Herbert Bos, Andy Tanenbaum, and Cristiano Giuffrida. 2015. Speculative Mem- ory Checkpointing. In Proceedings of the 16th Annual Middleware Conference (Vancouver, BC, Canada) (Middleware ’15) . Association for Computing Machinery, New York, NY, USA, 197–209. https: //doi.org/10.1145/2814576.2814802

work page doi:10.1145/2814576.2814802 2015

[57] [57]

Dearle and D

A. Dearle and D. Hulse. 1995. On page-based optimistic process check- pointing. InProceedings of International Workshop on Object Orien- tation in Operating Systems . 24–32. https://doi.org/10.1109/IWOOS. 1995.470583

work page doi:10.1109/iwoos 1995

[58] [58]

Emil Tsalapatis, Ryan Hancock, Tavian Barnes, and Ali José Mashti- zadeh. 2021. The Aurora Single Level Store Operating System. InPro- ceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany)(SOSP ’21). Association for Com- puting Machinery, New York, NY, USA, 788–803. https://doi.org/10. 1145/3477132.3483563

work page arXiv 2021

[59] [59]

Plank, Micah Beck, Gerry Kingsley, and Kai Li

James S. Plank, Micah Beck, Gerry Kingsley, and Kai Li. 1995. Libckpt: transparent checkpointing under Unix. In Proceedings of the USENIX 1995 Technical Conference Proceedings (New Orleans, Louisiana)(TCON’95). USENIX Association, USA, 18

work page 1995

[60] [60]

Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Trans- parent checkpointing for cluster computations and the desktop. In Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’09). IEEE Computer Society, USA, 1–12. https://doi.org/10.1109/IPDPS.2009.5161063

work page doi:10.1109/ipdps.2009.5161063 2009

[61] [61]

The Btrfs Project. 2009. Btrfs Documentation. https://btrfs. readthedocs.io

work page 2009

[62] [62]

OpenZFS. 2013. OpenZFS Documentation.https://openzfs.github.io/ openzfs-docs/

work page 2013

[63] [63]

Linux Kernel Project. 2018. EROFS: Enhanced Read-Only File System. https://erofs.docs.kernel.org. 19

work page 2018