pith. sign in

arxiv: 2605.18697 · v1 · pith:CVTDHRF2new · submitted 2026-05-18 · 💻 cs.DC · cs.AI· cs.PL

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

Pith reviewed 2026-05-20 07:59 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.PL
keywords compound AIPython parallelismdynamic dispatchruntime optimizationexternal model callsend-to-end latencyopportunistic parallelization
0
0 comments X

The pith

PopPy finds safe ways to run Python compound AI calls in parallel for up to 6.4 times faster end-to-end execution while keeping sequential behavior unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PopPy to locate parallel execution opportunities inside Python programs that string together calls to machine learning models and other slow external services. These compound applications spend most of their time waiting on the external parts, so ordinary language compilers offer little help. PopPy pairs an ahead-of-time compiler with a runtime that tracks dynamic method selection and variable changes to decide when independent external calls can safely overlap. It requires only minimal extra annotations from the programmer. A reader would care because many practical AI tools are built this way and shorter total run times would make them more responsive without forcing developers to rewrite their code or accept different results.

Core claim

PopPy uncovers parallelization opportunities in Python applications that invoke heavy external components by using an ahead-of-time compiler paired with a runtime. The system manages language complexity, dynamic dispatch, and variable mutation to extract parallelism while maintaining the original sequential semantics. Experiments on real-world compound AI applications demonstrate end-to-end speedups reaching 6.4 times over standard Python execution with only minimal developer input.

What carries the argument

An ahead-of-time compiler combined with a runtime that tracks dependencies to safely parallelize external calls in dynamic Python code.

If this is right

  • End-to-end latency of compound AI applications drops because independent external model calls can execute at the same time.
  • Programs that use dynamic dispatch and mutate variables still receive speedups without losing correctness.
  • Developers obtain the gains after adding only small annotations rather than restructuring their code.
  • The original sequential behavior of the program remains exactly the same.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same opportunistic tracking of dependencies could be applied to other dynamic languages that orchestrate external AI services.
  • Integration into standard Python runtimes might eventually make this style of parallelism available without any code changes at all.
  • Testing the approach on larger or more varied workloads could expose additional safe parallel patterns not yet considered.

Load-bearing premise

The runtime can accurately detect all data dependencies and mutation effects across the supported Python fragment so that parallel schedules never alter observable results.

What would settle it

Running PopPy on the evaluated compound AI applications and observing either no improvement over ordinary Python or any change in final outputs compared with sequential execution.

Figures

Figures reproduced from arXiv: 2605.18697 by David Mell, Konstantinos Kallas, Osbert Bastani, Stephen Mell, Steve Zdancewic.

Figure 2
Figure 2. Figure 2: Illustration of the execution of get_values with states = ("a", "a", "b", "b"), after queueing all ex￾ternal calls but before any have resolved. Internal code ex￾ecution tree (left): Each code block that is executed at run￾time is shown, including duplicates from different loop it￾erations; block borders are colored to correspond to source blocks in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Tree-of-Thoughts [74] implementation in Python. The @_ lines are the annotations that need to be added to a program to be supported by PopPy. L1-L29 are provided by the application developer, while L32-L35 are provided by library developers. The colored bars indicate specific code blocks referenced in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The PopPy system architecture. Code (static) and processes (dynamic) are shown in boxes; transformations are shown in bubbles. all external calls but before any have finished, is depicted in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An example of the second compiler phase. Muta￾tion, sequencing, and control-flow are converted to function calls. In 𝜆 𝑂 , load, store, and ite are external functions. the program state during execution. The call to store takes a memory state M0 and returns a new memory state, but where the key x has value "foo". Sequencing of External Calls. The second challenge is that 𝜆 𝑂 has no execution order—i.e., it… view at source ↗
Figure 5
Figure 5. Figure 5: Median speedup of PopPy execution over Python across 10 trials. From CaMeL (C-𝑛) we only include applications that make at least one LLM call. Error bars show minimum to maximum speedup across trials. an LLM to both expand and score search nodes. The example in [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A single execution trace of ToT (with 2 steps of search and beam size 3), showing selected external calls. Dashed lines indicate the time between queueing and dis￾patch; solid lines indicate the time between dispatch and resolution. (LLM calls dispatch immediately; print calls re￾solve immediately). Calls are sorted from top to bottom by the order in which they would be executed under sequential execution.… view at source ↗
Figure 7
Figure 7. Figure 7: Absolute execution time overhead of PopPy vs the Python execution time, for each benchmark (median over 10 trials). Overhead is the time spent inside the 𝜆 𝑂 interpreter, with all external calls annotated as sequential. formance of PopPy, we zoom in on the precise timeline of external calls in a ToT execution with 2 steps of search and a beam size of 3 ( [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
read the original abstract

Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to traditional applications, execution time is dominated by the external components, which cannot be handled by traditional language optimization systems, like optimizing compilers. To address this problem, we develop PopPy, a system that can uncover parallelization opportunities in Python applications that invoke these heavy external components, including those used in compound AI applications. PopPy supports a very expressive fragment of Python and requires minimal developer input to uncover parallelism. It combines an ahead-of-time compiler with a runtime, addressing three key challenges in extracting parallelism from Python applications: language complexity, dynamic dispatch, and variable mutation. On a set of real-world compound AI applications, PopPy achieves up to $6.4\times$ speedups in end-to-end execution time compared to standard Python execution while preserving the sequential program semantics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents PopPy, a system designed to uncover and exploit parallelism in Python compound AI applications that invoke external ML models and other heavy components. By combining an ahead-of-time compiler with a runtime system, PopPy addresses challenges of language complexity, dynamic dispatch, and variable mutation with minimal developer input. The main result is that on real-world compound AI applications, it achieves up to 6.4× speedups in end-to-end execution time compared to standard Python while preserving the sequential program semantics.

Significance. If the soundness of the dependency tracking holds, this work has the potential to significantly impact the performance of latency-sensitive compound AI systems by automatically parallelizing independent external calls without requiring changes to the program's observable behavior. The support for an expressive fragment of Python and the evaluation on real-world applications are positive aspects. The opportunistic nature of the parallelization could make it practical for developers working on software engineering and enterprise automation tasks.

major comments (2)
  1. §4.2: The description of the combined AOT and runtime analysis for tracking variable mutations does not include a formal argument or sufficient test cases demonstrating that all possible mutation effects under dynamic dispatch are captured; this is load-bearing for the semantic preservation claim since a missed dependence would lead to incorrect results rather than just reduced performance.
  2. §5.1: The evaluation on real-world applications reports speedups but provides limited details on the methodology, such as the specific benchmarks used, number of runs, or error bars, which are necessary to substantiate the 6.4× claim.
minor comments (2)
  1. Abstract: Consider adding the number of applications evaluated to give context to the 'up to 6.4×' speedup.
  2. §3: The notation used for describing the supported Python fragment could benefit from additional examples to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and positive assessment of PopPy's potential impact. We address each major comment below and describe the changes planned for the revised manuscript.

read point-by-point responses
  1. Referee: §4.2: The description of the combined AOT and runtime analysis for tracking variable mutations does not include a formal argument or sufficient test cases demonstrating that all possible mutation effects under dynamic dispatch are captured; this is load-bearing for the semantic preservation claim since a missed dependence would lead to incorrect results rather than just reduced performance.

    Authors: We agree that soundness of the mutation tracking is critical to the semantic-preservation guarantee. The current manuscript explains the AOT conservative analysis for identifying mutation sites together with runtime checks that resolve dynamic dispatch and actual mutations at execution time. We did not supply a formal proof or an exhaustive test suite in the submission. In the revision we will expand §4.2 with a new subsection containing additional concrete test cases that exercise a range of dynamic-dispatch and mutation patterns; we will also articulate the key invariants maintained by the combined analysis. A complete mechanized proof for the full supported Python fragment is beyond the scope of this systems paper and will be noted as future work. revision: partial

  2. Referee: §5.1: The evaluation on real-world applications reports speedups but provides limited details on the methodology, such as the specific benchmarks used, number of runs, or error bars, which are necessary to substantiate the 6.4× claim.

    Authors: We thank the referee for highlighting this presentational gap. The revised §5.1 will explicitly list each benchmark (including source repositories or application names), state that each measurement is the median of ten runs, and include error bars showing the min/max or standard deviation across those runs. These additions will directly support the reported speedups. revision: yes

Circularity Check

0 steps flagged

No circularity detected in system architecture or evaluation claims

full rationale

The paper presents PopPy as an implemented engineering system that combines ahead-of-time compilation with runtime analysis to identify parallelization opportunities in an expressive Python fragment. Claims of up to 6.4× speedups rest on empirical measurements against standard Python execution on real-world compound AI applications, not on any mathematical derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the paper's own inputs by construction. The soundness of dependency tracking for mutation and dynamic dispatch is an engineering precondition evaluated externally via application benchmarks rather than internally derived.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the correctness of the PopPy implementation for handling Python language features and external calls; no free parameters, axioms, or invented entities beyond the system itself are described in the abstract.

invented entities (1)
  • PopPy system no independent evidence
    purpose: Uncover and exploit parallelization opportunities in Python applications with external components
    The paper introduces PopPy as the core new artifact that addresses language complexity, dynamic dispatch, and variable mutation.

pith-pipeline@v0.9.0 · 5721 in / 1117 out tokens · 29761 ms · 2026-05-20T07:59:29.432117+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 6 internal anchors

  1. [1]

    Official Repo of Tree of Thoughts

    2023. Official Repo of Tree of Thoughts . https://github.com/princeton- nlp/tree-of-thought-llm

  2. [2]

    Duane A. Adams. 1968. A Computation Model with Data-Sequenced Control. Technical Report. Stanford University. Technical Report CGTM 45

  3. [3]

    Duane A. Adams. 1969. A Computation Model with Data Flow Sequenc- ing. Ph. D. Dissertation

  4. [4]

    Jason Ansel, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, Peter Bell, David Berard, Evgeni Burovski, et al. 2024. Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation. In Proceedings of the 29th ACM international conference on architectural support for programmin...

  5. [5]

    Anthropic. 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_ Claude_3.pdf

  6. [6]

    Anthropic. 2025. How we built our multi-agent research system.https: //www.anthropic.com/engineering/multi-agent-research-system . Ac- cessed: 2026-04-01

  7. [7]

    Sotiris Apostolakis, Ziyang Xu, Greg Chan, Simone Campanoni, and David I August. 2020. Perspective: A sensible approach to speculative automatic parallelization. In Proceedings of the Twenty-Fifth Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems. 351–367

  8. [8]

    Andrew W. Appel. 1991. Compiling with Continuations. Cambridge University Press

  9. [9]

    Rishiyur S Nikhil Arvind. 1992. Id: a language with implicit parallelism. In A Comparative Study of Parallel Programming Languages . Elsevier, 169–215

  10. [10]

    Stefanos Baziotis, Daniel Kang, and Charith Mendis. 2024. Dias: Dy- namic rewriting of Pandas code. Proceedings of the ACM on Manage- ment of Data 2, 1 (2024), 1–27

  11. [11]

    Luca Beurer-Kellner, Marc Fischer, and Martin T. Vechev. 2023. Prompt- ing Is Programming: A Query Language for Large Language Models. In Proceedings of the 44th ACM SIGPLAN International Conference on Programming Language Design and Implementation . ACM, 1946–1969. https://doi.org/10.1145/3591300

  12. [12]

    Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, and Armin Rigo. 2009. Tracing the Meta-Level: PyPy’s Tracing JIT Compiler. In Proceedings of the 4th Workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems (ICOOOLPS). ACM, 18–25. https://doi.org/10.1145/1565824.1565827

  13. [13]

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, et al. 2021. Jax: Autograd and xla. Astrophysics Source Code Library (2021), ascl–2111

  14. [14]

    Brandt Bucher and Savannah Ostrowski. 2024. PEP 744: JIT Com- pilation. https://peps.python.org/pep-0744/. Python Enhancement Proposal, Draft status

  15. [15]

    Harrison Chase. 2023. LangChain. https://github.com/langchain- ai/langchain

  16. [16]

    Gohar Irfan Chaudhry, Esha Choukse, Íñigo Goiri, Rodrigo Fonseca, Adam Belay, and Ricardo Bianchini. 2025. Towards Resource-Efficient Compound AI Systems. In Proceedings of the 2025 Workshop on Hot Topics in Operating Systems (Banff, AB, Canada) (HotOS ’25). As- sociation for Computing Machinery, New York, NY, USA, 218–224. https://doi.org/10.1145/3713082.3730377

  17. [17]

    Alonzo Church. 1941. The Calculi of Lambda-Conversion . Annals of Mathematics Studies, Vol. 6. Princeton University Press

  18. [18]

    Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

  19. [19]

    In Advances in Neural Information Processing Systems , Vol

    FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In Advances in Neural Information Processing Systems , Vol. 35. 16344–16359

  20. [20]

    Davis and Robert M

    Alan L. Davis and Robert M. Keller. 1982. Data Flow Program Graphs. Computer 15, 02 (2 1982), 26–41

  21. [21]

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. 2026. Defeating Prompt Injec- tions by Design. arXiv preprint arXiv:2503.18813. In IEEE Confer- ence on Secure and Trustworthy Machine Learning (SaTML) . https: //arxiv.org/abs/2503.18813

  22. [22]

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer- Kellner, Marc Fischer, and Florian Tramèr. 2024. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents. Advances in Neural Information Processing Systems 37 (2024), 82895–82920

  23. [23]

    Jack B. Dennis. 1974. First version of a data flow procedure language. In Programming Symposium, B. Robinet (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 362–376

  24. [24]

    Yu Feng, Phu Mon Htut, Zheng Qi, Wei Xiao, Manuel Mager, Nikolaos Pappas, Kishaloy Halder, Yang Li, Yassine Benajiba, and Dan Roth

  25. [25]

    InFindings of the Association for Computational Linguistics: EMNLP 2025 , Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.)

    Rethinking LLM Uncertainty: A Multi-Agent Approach to Esti- mating Black-Box Model Uncertainty. InFindings of the Association for Computational Linguistics: EMNLP 2025 , Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Asso- ciation for Computational Linguistics, Suzhou, China, 12349–12375. https://doi.org/10.18653/v1...

  26. [26]

    Yu Feng, Ben Zhou, Weidong Lin, and Dan Roth. 2025. BIRD: A Trust- worthy Bayesian Inference Framework for Large Language Models. In The Thirteenth International Conference on Learning Representations . https://openreview.net/forum?id=fAAaT826Vv

  27. [27]

    Feo, David C

    John T. Feo, David C. Cann, and Rodney R. Oldehoeft. 1990. A report on the sisal language project. J. Parallel and Distrib. Comput. 10, 4 (1990), 349–366. https://doi.org/10.1016/0743-7315(90)90035-N Data-flow Processing

  28. [28]

    Duba, and Matthias Felleisen

    Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen

  29. [29]

    In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (Albuquerque, New Mexico, USA) (PLDI ’93)

    The essence of compiling with continuations. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (Albuquerque, New Mexico, USA) (PLDI ’93). As- sociation for Computing Machinery, New York, NY, USA, 237–247. https://doi.org/10.1145/155090.155113

  30. [30]

    Gemini Team, Google. 2023. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805 (2023)

  31. [31]

    GitHub Staff. 2025. Octoverse: A new developer joins GitHub every second as AI leads TypeScript to #1.https://github.blog/news-insights/ octoverse/. Accessed: 2026-03-30

  32. [32]

    Halstead

    Robert H. Halstead. 1985. MULTILISP: a language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501–538. https://doi.org/10.1145/4472.4478

  33. [33]

    Kang He and Kaushik Roy. 2025. LogicTree: Structured Proof Ex- ploration for Coherent and Rigorous Logical Reasoning with Large Language Models. arXiv preprint arXiv:2504.14089 (2025)

  34. [34]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Carlos E. Jimenez, John Yang, S. Friedman, et al. 2024. SWE-bench: Can Language Models Resolve Real-World GitHub Issues? arXiv preprint arXiv:2310.06770 (2024)

  35. [35]

    Tian Jin, Ellie Y Cheng, Zachary Ankner, Nikunj Saunshi, Blake M Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subrama- nian, and Michael Carbin. 2025. Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding. In Forty-second International Conference on Machine Learn- ing. https://openreview.net...

  36. [36]

    Michael Jungmair, Alexis Engelke, and Jana Giceva. 2024. HiPy: Ex- tracting High-Level Semantics from Python Code for Data Process- ing. Proc. ACM Program. Lang. 8, OOPSLA2, Article 297 (Oct. 2024), 27 pages. https://doi.org/10.1145/3689737 13

  37. [37]

    Konstantinos Kallas, Tammam Mustafa, Jan Bielak, Dimitris Karnikis, Thurston HY Dang, Michael Greenberg, and Nikos Vasilakis. 2022. Practically correct,Just-in-Time shell script parallelization. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 769–785

  38. [38]

    Karp and Raymond Miller

    Richard M. Karp and Raymond Miller. 1966. Properties of a Model for Parallel Computation: Determinacy, Termination, Queueing. SlAM J. of Applied Mathematics 14, 6 (11 1966), 1390–1411

  39. [39]

    Karp and Raymond Miller

    Richard M. Karp and Raymond Miller. 1969. Parallel Program Schemata. J. Comput. System Sci. 3 (1969), 147–195

  40. [40]

    Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts

    Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. 2024. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. The Twelfth International Conference on Learning Representations

  41. [41]

    Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. 2015. Numba: A llvm-based python jit compiler. In Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC . 1–6

  42. [42]

    LangChain Inc. 2024. LangGraph: Build Resilient Language Agents as Graphs. https://github.com/langchain-ai/langgraph

  43. [43]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al . 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459–9474

  44. [44]

    Shuo Li, Sangdon Park, Insup Lee, and Osbert Bastani. 2024. TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies (Volume 1: Long Papers), Kevin Duh, Helena Gomez, and Steven Betha...

  45. [45]

    Hao Liang, Xiaochen Ma, Zhou Liu, Zhen Hao Wong, Zhengyang Zhao, Zimo Meng, Runming He, Chengyu Shen, Qifeng Cai, Zhaoyang Han, et al. 2025. DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI. arXiv preprint arXiv:2512.16676 (2025)

  46. [46]

    Mingdao Liu, Aohan Zeng, Bowen Wang, Peng Zhang, Jie Tang, and Yuxiao Dong. 2024. APAR: LLMs Can Do Auto-Parallel Auto- Regressive Decoding. arXiv:2401.06761 [cs.CL] https://arxiv.org/abs/ 2401.06761

  47. [47]

    Shail Aditya Arvind Jan-Willem Maessen, Lennart Augustsson, and Rishiyur S Nikhil. 1995. Semantics of pH: A parallel dialect of Haskell. In In Proceedings from the Haskell Workshop (at FPCA 95) . 35–49

  48. [48]

    James R McGraw. 1982. The VAL language: Description and analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 1 (1982), 44–82

  49. [49]

    Stephen Mell, Konstantinos Kallas, Steve Zdancewic, and Osbert Bastani. 2025. Opportunistically Parallel Lambda Calculus. Proc. ACM Program. Lang. 9, OOPSLA2, Article 365 (Oct. 2025), 27 pages. https://doi.org/10.1145/3763143

  50. [50]

    Tammam Mustafa, Konstantinos Kallas, Pratyush Das, and Nikos Vasi- lakis. 2023. DiSh: Dynamic Shell-Script Distribution. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 341–356

  51. [51]

    n8n.io. 2026. n8n: Fair-code workflow automation platform with native AI capabilities. https://github.com/n8n-io/n8n

  52. [52]

    Ziyi Ni, Yifan Li, Ning Yang, Dou Shen, Pin Lyu, and Daxiang Dong

  53. [53]

    In Findings of the Association for Computational Linguistics: ACL 2025

    Tree-of-code: A self-growing tree framework for end-to-end code generation and execution in complex tasks. In Findings of the Association for Computational Linguistics: ACL 2025 . 9804–9819

  54. [54]

    Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, and Yu Wang. 2024. Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation. In The Twelfth International Conference on Learn- ing Representations. https://openreview.net/forum?id=mqVgBbNCm9

  55. [55]

    OpenAI. 2023. GPT-4 Technical Report . Technical Report. OpenAI. arXiv preprint arXiv:2303.08774

  56. [56]

    OpenAI. 2025. OpenAI Agents SDK. https://github.com/openai/openai- agents-python

  57. [57]

    Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, and Matei Zaharia. 2017. Weld: A common runtime for high performance data analytics. (2017)

  58. [58]

    Shoumik Palkar and Matei Zaharia. 2019. Optimizing Data-Intensive Computations in Existing Libraries with Split Annotations. In Pro- ceedings of the 27th ACM Symposium on Operating Systems Princi- ples (Huntsville, ON, Canada) (SOSP ’19). ACM, 291–305. https: //doi.org/10.1145/3341301.3359652

  59. [59]

    Joe Gibbs Politz, Alejandro Martinez, Mae Milano, Sumner Warren, Daniel Patterson, Junsong Li, Anand Chitipothu, and Shriram Krish- namurthi. 2013. Python: the full monty. SIGPLAN Not. 48, 10 (Oct. 2013), 217–232. https://doi.org/10.1145/2544173.2509536

  60. [60]

    Deepti Raghavan, Sadjad Fouladi, Philip Levis, and Matei Zaharia

  61. [61]

    In 2020 USENIX Annual Technical Conference (USENIX ATC 20)

    POSH: A Data-Aware Shell. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 617–631

  62. [62]

    Rodriguez

    Jorge E. Rodriguez. 1969. A Graph Model for Parallel Computations . Ph. D. Dissertation. MIT-LCS-TR64

  63. [63]

    doi: 10.1038/s41586-023-06924-6

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. 2024. Mathematical discoveries from program search with large language models. Nature 625, 7995 (2024), 468–475. https://doi.org/10.10...

  64. [64]

    Keshav Santhanam, Deepti Raghavan, Muhammad Shahir Rahman, Thejas Venkatesh, Neha Kunjal, Pratiksha Thaker, Philip Levis, and Matei Zaharia. 2024. ALTO: An Efficient Network Orchestrator for Compound AI Systems. In Proceedings of the 4th Workshop on Machine Learning and Systems (Athens, Greece) (EuroMLSys ’24). Association for Computing Machinery, New Yor...

  65. [65]

    Sastry and Roy D.C

    A.V.S. Sastry and Roy D.C. Ju. 1998. A New Algorithm for Scalar Register Promotion Based on SSA Form. PLDI ’98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation (1998)

  66. [66]

    Ariya Shajii, Gabriel Ramirez, Haris Smajlović, Jessica Ray, Bonnie Berger, Saman Amarasinghe, and Ibrahim Numanagić. 2023. Codon: A Compiler for High-Performance Pythonic Applications and DSLs. In Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction (Montréal, QC, Canada) (CC 2023). Association for Computing Machinery, Ne...

  67. [67]

    Jonathan Silva, Qin Ma, Jordi Cabot, Pierre Kelsen, and Henderik A. Proper. 2024. Application of the Tree-of-Thoughts Framework to LLM-Enabled Domain Modeling. In Conceptual Modeling: 43rd Inter- national Conference, ER 2024, Pittsburgh, PA, USA, October 28–31, 2024, Proceedings (Pittsburg, PA, USA). Springer-Verlag, Berlin, Heidelberg, 94–111. https://do...

  68. [68]

    Leonhard Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf, and Tim Kraska. 2021. Tuplex: Data Science in Python at Native Code Speed. In Proceedings of the 2021 International Conference on Man- agement of Data (Virtual Event, China) (SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 1718–1731. https: //doi.org/10.1145/3448016.3457244

  69. [69]

    David Suris et al. 2023. ViperGPT: Visual Inference via Python Execu- tion for Reasoning. arXiv preprint arXiv:2303.08128 (2023). 14

  70. [70]

    Le, He He, and Minh-Thang Luong

    Trieu Trinh, Yuhuai Wu, Quoc V. Le, He He, and Minh-Thang Luong

  71. [71]

    Nature 625 (2024), 476–482

    Solving Olympiad Geometry without Human Demonstrations. Nature 625 (2024), 476–482. https://doi.org/10.1038/s41586-023-06747- 5

  72. [72]

    Nikos Vasilakis, Konstantinos Kallas, Konstantinos Mamouras, Achilles Benetopoulos, and Lazar Cvetković. 2021. PaSh: Light-Touch Data-Parallel Shell Processing. In Proceedings of the Sixteenth Euro- pean Conference on Computer Systems (Online Event, United Kingdom) (EuroSys ’21). Association for Computing Machinery, New York, NY, USA, 49–66. https://doi.o...

  73. [73]

    Nikos Vasilakis, Ben Karel, Yash Palkhiwala, John Sonchack, André DeHon, and Jonathan M. Smith. 2019. Ignis: Scaling Distribution- Oblivious Systems with Light-Touch Distribution. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). ACM, 1010–1026. https://doi.org/10.1145/33142...

  74. [74]

    Philip Wadler. 1990. Comprehending monads. InProceedings of the 1990 ACM Conference on LISP and Functional Programming (Nice, France) (LFP ’90). Association for Computing Machinery, New York, NY, USA, 61–78. https://doi.org/10.1145/91556.91592

  75. [75]

    Whiting and Robert S

    Paul G. Whiting and Robert S. V. Pascoe. 1994. A history of data-flow languages. IEEE Annals of the History of Computing 16 (1994), 38–59. https://api.semanticscholar.org/CorpusID:7384421

  76. [76]

    Willard et al

    Brandon T. Willard et al. 2023. Guidance: A Guidance Language for Controlling Large Language Models . https://github.com/guidance- ai/guidance

  77. [77]

    Mengdi Wu, Xinhao Cheng, Shengyu Liu, Chunan Shi, Jianan Ji, Kit Ao, Praveen Velliengiri, Xupeng Miao, Oded Padon, and Zhihao Jia. 2025. Mirage: A Multi-Level Superoptimizer for Tensor Programs. In 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). USENIX Association, 1–18

  78. [78]

    Wenjiang Xu, Cindy Wang, Rui Fang, Mingkang Zhang, Lusong Li, Jing Xu, Jiayuan Gu, Zecui Zeng, and Rui Chen. 2025. Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model. arXiv:2512.08188 [cs.RO] https://arxiv.org/abs/2512.08188

  79. [79]

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent- computer interfaces enable automated software engineering.Advances in Neural Information Processing Systems 37 (2024), 50528–50652

  80. [80]

    Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems 36 (2023), 11809–11822

Showing first 80 references.