Recognition: 2 theorem links
· Lean TheoremSEVerA: Verified Synthesis of Self-Evolving Agents
Pith reviewed 2026-05-15 00:31 UTC · model grok-4.3
The pith
SEVerA embeds first-order logic contracts into self-evolving agents to guarantee zero violations while raising task performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SEVerA formulates agentic code generation as constrained learning and introduces Formally Guarded Generative Models that wrap each generative call in a rejection sampler backed by a verified fallback, ensuring every output satisfies a first-order logic contract for any input and parameter setting. The three-stage framework searches for candidate parametric programs containing such guarded calls, verifies correctness with respect to the hard constraints for all parameter values, and finally applies scalable gradient-based optimization to improve the soft objective while the verified contracts remain intact.
What carries the argument
Formally Guarded Generative Models (FGGM), which attach a first-order logic output contract to each generative model call and enforce it through rejection sampling plus a verified fallback.
If this is right
- Zero constraint violations on Dafny verification, symbolic math synthesis, and policy-compliant tool use.
- Performance gains over both unconstrained self-evolving agents and current state-of-the-art baselines.
- Formal contracts steer the search toward higher-quality agents rather than merely restricting them.
- After verification the remaining problem reduces to ordinary unconstrained optimization.
Where Pith is reading between the lines
- The same guarding technique could be applied to other generative systems that produce executable code or plans.
- If verification scales, the approach might support longer-horizon agent behaviors that currently lack safety assurances.
- Success hinges on whether first-order logic remains expressive enough for the contracts that real deployments require.
Load-bearing premise
A planner LLM can write first-order logic contracts that accurately capture desired behavior and that verification can prove hold for every possible parameter value without prohibitive cost.
What would settle it
A synthesized agent that violates one of its declared contracts on at least one input, or a verification step that fails to certify correctness for all parameter values within feasible compute limits.
Figures
read the original abstract
Recent advances have shown the effectiveness of self-evolving LLM agents on tasks such as program repair and scientific discovery. In this paradigm, a planner LLM synthesizes an agent program that invokes parametric models, including LLMs, which are then tuned per task to improve performance. However, existing self-evolving agent frameworks provide no formal guarantees of safety or correctness. Because such programs are often executed autonomously on unseen inputs, this lack of guarantees raises reliability and security concerns. We formulate agentic code generation as a constrained learning problem, combining hard formal specifications with soft objectives capturing task utility. We introduce Formally Guarded Generative Models (FGGM), which allow the planner LLM to specify a formal output contract for each generative model call using first-order logic. Each FGGM call wraps the underlying model in a rejection sampler with a verified fallback, ensuring every returned output satisfies the contract for any input and parameter setting. Building on FGGM, we present SEVerA (Self-Evolving Verified Agents), a three-stage framework: Search synthesizes candidate parametric programs containing FGGM calls; Verification proves correctness with respect to hard constraints for all parameter values, reducing the problem to unconstrained learning; and Learning applies scalable gradient-based optimization, including GRPO-style fine-tuning, to improve the soft objective while preserving correctness. We evaluate SEVerA on Dafny program verification, symbolic math synthesis, and policy-compliant agentic tool use ($\tau^2$-bench). Across tasks, SEVerA achieves zero constraint violations while improving performance over unconstrained and SOTA baselines, showing that formal behavioral constraints not only guarantee correctness but also steer synthesis toward higher-quality agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SEVerA, a three-stage framework (Search, Verification, Learning) for synthesizing self-evolving LLM agents with formal guarantees. It defines Formally Guarded Generative Models (FGGM) that wrap parametric model calls (including LLMs) in first-order logic contracts enforced by rejection sampling plus verified fallbacks, claiming this ensures zero constraint violations for any input and any parameter values. Verification reduces the problem to unconstrained learning, after which gradient-based optimization (including GRPO-style fine-tuning) improves task utility. Experiments on Dafny program verification, symbolic math synthesis, and policy-compliant tool use report zero violations together with gains over unconstrained and SOTA baselines.
Significance. If the zero-violation guarantee can be shown to hold at practical cost without restricting expressiveness, the work would provide a concrete route to formally safe self-evolving agents, combining hard constraints with scalable learning in a way that could influence both verification and agent-synthesis research.
major comments (3)
- [Abstract / FGGM definition] Abstract and FGGM construction: the guarantee that every output satisfies the FOL contract for arbitrary inputs and parameter settings rests on the rejection sampler terminating or the fallback being proven correct independently of the model; no termination argument, complexity bound, or fallback proof for LLM generators is supplied.
- [Verification stage] Verification stage: the claim that verification reduces the synthesis problem to unconstrained learning is central, yet the manuscript supplies no concrete argument, termination condition, or cost analysis showing the reduction succeeds without excessive overhead or loss of expressiveness for the three evaluated domains.
- [Evaluation section] Evaluation: performance gains are reported over unconstrained and SOTA baselines, but exact baseline implementations, fallback invocation frequencies, and verification scalability metrics (e.g., time or success rate per task) are not detailed, leaving the practical magnitude of the claimed improvements difficult to assess.
minor comments (2)
- [FGGM / SEVerA framework] Clarify notation for the FOL contracts and the precise interface between the planner LLM and the FGGM wrapper.
- [Evaluation] Add standard deviations or statistical tests for the reported performance improvements.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the formal guarantees, verification reduction, and evaluation details. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract / FGGM definition] Abstract and FGGM construction: the guarantee that every output satisfies the FOL contract for arbitrary inputs and parameter settings rests on the rejection sampler terminating or the fallback being proven correct independently of the model; no termination argument, complexity bound, or fallback proof for LLM generators is supplied.
Authors: We agree the manuscript does not supply an explicit termination argument or complexity bound for the rejection sampler with LLM generators. The FGGM design relies on a bounded number of rejection attempts (with timeout triggering the verified fallback) and assumes the fallback is proven correct independently of the generator. We will add a new subsection under FGGM definition providing the termination condition (fixed attempt limit plus fallback), a brief complexity discussion (linear in attempts, with practical termination rates observed), and note that the overall zero-violation guarantee holds because the fallback is verified. This addresses the gap without altering the core claim. revision: yes
-
Referee: [Verification stage] Verification stage: the claim that verification reduces the synthesis problem to unconstrained learning is central, yet the manuscript supplies no concrete argument, termination condition, or cost analysis showing the reduction succeeds without excessive overhead or loss of expressiveness for the three evaluated domains.
Authors: The verification stage proves that each FGGM-wrapped program satisfies its FOL contracts for all parameter values using sound verifiers (e.g., Dafny for the first domain), which decouples hard constraints from soft-objective optimization. We will expand the Verification section with a concrete argument (inductive proof over FGGM calls), termination condition (verifier completeness for the supported FOL fragment), and cost analysis (reporting average verification times and success rates per domain, showing overhead remains practical without restricting expressiveness). revision: yes
-
Referee: [Evaluation section] Evaluation: performance gains are reported over unconstrained and SOTA baselines, but exact baseline implementations, fallback invocation frequencies, and verification scalability metrics (e.g., time or success rate per task) are not detailed, leaving the practical magnitude of the claimed improvements difficult to assess.
Authors: We acknowledge the need for greater detail on baselines and metrics. We will add a dedicated evaluation subsection specifying exact baseline implementations (unconstrained agents omit FGGM guards entirely; SOTA baselines follow published code where available), report fallback invocation frequencies (observed to be low across runs), and include scalability metrics such as per-task verification time, success rate, and wall-clock overhead. These additions will allow readers to better assess the magnitude of improvements. revision: yes
Circularity Check
No significant circularity; verification reduces to unconstrained learning via external formal methods without tautological reduction of performance claims.
full rationale
The paper's derivation chain is self-contained: FGGM wraps models in rejection samplers plus verified fallbacks to enforce FOL contracts for all inputs/parameters, then Verification stage uses external tools (e.g., Dafny) to prove correctness and reduce to unconstrained optimization for the soft objective. No equations or self-citations make claimed zero-violation + performance gains equivalent to inputs by construction; the separation of hard constraints from learning is explicit and externally verifiable. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided text.
Axiom & Free-Parameter Ledger
free parameters (1)
- task-specific model parameters
axioms (2)
- domain assumption First-order logic suffices to express output contracts for the generative models used
- domain assumption Verification can establish correctness for all parameter values after the Search stage
invented entities (2)
-
Formally Guarded Generative Model (FGGM)
no independent evidence
-
SEVerA three-stage framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Each FGGM call wraps the underlying model in a rejection sampler with a verified fallback, ensuring every returned output satisfies the contract for any input and parameter setting.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We prove that SEVerA is sound: any agent returned satisfies the behavioral specification for all inputs and all parameter values (Theorem 5.4).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents
The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
Reference graph
Works this paper leans on
-
[1]
Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In2013 Formal Methods in Computer-Aided Design. 1–8. doi:10.1109/FMCAD.2013.6679385
-
[2]
Anthropic. 2025. System Card: Claude Sonnet 4.5. https://www-cdn.anthropic.com/ 963373e433e489a87a10c823c52a0a013e9172dd.pdf. Accessed: 2026-03-18
work page 2025
- [3]
-
[4]
Debangshu Banerjee, Tarun Suresh, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. 2025. CRANE: Reasoning with constrained LLM generation. InForty-second International Conference on Machine Learning. https://openreview. net/forum?id=wKs9fHYxCV
work page 2025
-
[5]
$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment
Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan. 2025.𝜏 2-Bench: Evaluating Conversational Agents in a Dual-Control Environment. arXiv:2506.07982 [cs.AI] https://arxiv.org/abs/2506.07982
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Luca Beurer-Kellner, Marc Fischer, and Martin Vechev. 2023. Prompting Is Programming: A Query Language for Large Language Models.Proc. ACM Program. Lang.7, PLDI, Article 186 (June 2023), 24 pages. doi:10.1145/3591300
-
[7]
Iwo Błądek and Krzysztof Krawiec. 2019. Solving symbolic regression problems with formal constraints. InProceedings of the Genetic and Evolutionary Computation Conference(Prague, Czech Republic)(GECCO ’19). Association for Computing Machinery, New York, NY, USA, 977–984. doi:10.1145/3321707.3321743
-
[8]
Miles Cranmer. 2023. Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl. arXiv:2305.01582 [astro-ph.IM] https://arxiv.org/abs/2305.01582
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[9]
Dafny Language Community. 2023. Integrating Dafny and Python Code. https://dafny.org/v3.10.0/DafnyRef/ integration-py/IntegrationPython. Accessed: 2026-03-17
work page 2023
-
[10]
DreamCoder: Bootstrapping Inductive Program Synthesis with Wake-Sleep Library Learning , year =
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2021. DreamCoder: bootstrapping inductive program synthesis with wake- sleep library learning. InProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation(V...
- [11]
-
[12]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation(Barcelona, Spain)(PLDI 2017). Association for Computing Machinery, New York, NY, USA, ...
- [13]
-
[14]
The Guardian. 2026. Rogue AI agents published passwords and bypassed security protections. News investigation
work page 2026
-
[15]
Chengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, and Bo Li. 2024. RedCode: Risky Code Execution and Generation Benchmark for Code Agents. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=mAG68wdggA
work page 2024
-
[16]
Sankha Narayan Guria, Jeffrey S. Foster, and David Van Horn. 2023. Absynthe: Abstract Interpretation-Guided Synthesis.Proc. ACM Program. Lang.7, PLDI, Article 171 (June 2023), 24 pages. doi:10.1145/3591285
-
[17]
Christian Haider and Gabriel Kronberger. 2022. Shape-Constrained Symbolic Regression with NSGA-III. InComputer Aided Systems Theory – EUROCAST 2022: 18th International Conference, Las Palmas de Gran Canaria, Spain, February 20–25, 2022, Revised Selected Papers(Las Palmas de Gran Canaria, Spain). Springer-Verlag, Berlin, Heidelberg, 164–172. doi:10.1007/97...
-
[18]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3
work page 2022
-
[19]
Shengran Hu, Cong Lu, and Jeff Clune. 2025. Automated Design of Agentic Systems. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=t9U3LW7JVX
work page 2025
- [20]
-
[21]
Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan A, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts
-
[22]
InThe Twelfth International Conference on Learning Representations
DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=sY5N0zY5Od
-
[23]
Jinwoo Kim, Qinheping Hu, Loris D’Antoni, and Thomas Reps. 2021. Semantics-guided synthesis.Proc. ACM Program. Lang.5, POPL, Article 30 (Jan. 2021), 32 pages. doi:10.1145/3434311 , Vol. 1, No. 1, Article . Publication date: April 2026. 26 Banerjee et al
-
[24]
Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Hoffmann. 2019. Resource-guided program synthesis. InProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation(Phoenix, AZ, USA)(PLDI 2019). Association for Computing Machinery, New York, NY, USA, 253–268. doi:10.1145/3314221.3314602
-
[25]
G. Kronberger, F. O. de Franca, B. Burlacu, C. Haider, and M. Kommenda. 2022. Shape-Constrained Symbolic Regression—Improving Extrapolation with Prior Knowledge.Evolutionary Computation30, 1 (03 2022), 75–98. arXiv:https://direct.mit.edu/evco/article-pdf/30/1/75/1995582/evco_a_00294.pdf doi:10.1162/evco_a_00294
- [26]
-
[27]
K. Rustan M. Leino. 2010. Dafny: an automatic program verifier for functional correctness. InProceedings of the 16th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning(Dakar, Senegal)(LPAR’10). Springer-Verlag, Berlin, Heidelberg, 348–370
work page 2010
- [28]
-
[29]
Wenqiang Li, Weijun Li, Linjun Sun, Min Wu, Lina Yu, Jingyi Liu, Yanjie Li, and Songsong Tian. 2023. Transformer- based model for symbolic regression via joint supervised learning. InThe Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=ULzyv9M1j5
work page 2023
-
[30]
Chloe R Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, and Max Tegmark. 2025. DafnyBench: A Benchmark for Formal Software Verification.Transactions on Machine Learning Research(2025). https://openreview.net/forum?id=yBgTVWccIx
work page 2025
-
[31]
Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. 2024. Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models.arXiv preprint arXiv:2410.05229(2024)
work page internal anchor Pith review arXiv 2024
-
[32]
Eric Mugnier, Emmanuel Anaya Gonzalez, Nadia Polikarpova, Ranjit Jhala, and Zhou Yuanyuan. 2025. Laurel: Unblocking Automated Verification with Large Language Models.Proc. ACM Program. Lang.9, OOPSLA1, Article 134 (April 2025), 27 pages. doi:10.1145/3720499
-
[33]
Niels Mündler, Jingxuan He, Hao Wang, Koushik Sen, Dawn Song, and Martin Vechev. 2025. Type-Constrained Code Generation with Language Models.Proc. ACM Program. Lang.9, PLDI, Article 171 (June 2025), 26 pages. doi:10.1145/3729274
-
[34]
Shaan Nagy, Timothy Zhou, Nadia Polikarpova, and Loris D’Antoni. 2026. ChopChop: A Programmable Framework for Semantically Constraining the Output of Language Models.Proc. ACM Program. Lang.10, POPL, Article 66 (Jan. 2026), 28 pages. doi:10.1145/3776708
-
[35]
Emilio Parisotto, Abdel rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, and Pushmeet Kohli. 2016. Neuro-Symbolic Program Synthesis. arXiv:1611.01855 [cs.AI] https://arxiv.org/abs/1611.01855
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[36]
Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni. 2024. Grammar-Aligned Decoding. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems. https://openreview.net/ forum?id=5G7ve8E1Lu
work page 2024
-
[37]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K. Reddy. 2025. LLM-SR: Scientific Equation Discovery via Programming with Large Language Models. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=m2nmp8P5in
work page 2025
-
[39]
Armando Solar-Lezama. 2013. Program Sketching.International Journal on Software Tools for Technology Transfer15, 5 (2013), 475–495. doi:10.1007/s10009-012-0249-7
-
[40]
Chuyue Sun, Ying Sheng, Oded Padon, and Clark Barrett. 2024. Clover: Closed-Loop Verifiable Code Generation. InAI Verification: First International Symposium, SAIV 2024, Montreal, QC, Canada, July 22–23, 2024, Proceedings(Montreal, QC, Canada). Springer-Verlag, Berlin, Heidelberg, 134–155. doi:10.1007/978-3-031-65112-0_7
-
[41]
Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, and Gagandeep Singh. 2025. DINGO: Constrained Inference for Diffusion LLMs. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems. https: //openreview.net/forum?id=KaYMGsnZ4R
work page 2025
-
[42]
Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. ViperGPT: Visual Inference via Python Execution for Reasoning. In2023 IEEE/CVF International Conference on Computer Vision (ICCV). 11854–11864. doi:10.1109/ICCV51070.2023.01092
-
[43]
Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https://arxiv.org/abs/2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Chunkai Fan, Junyu Lu, Yulin Luo, Qiang Zhou, Yiming Zhao, Ning Liu, Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, and Jian Tang. 2025. SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents. arXiv:2506.21669 [cs.AI] https://arxiv.org/abs/2506. 21669 , Vol. 1, No. 1, ...
-
[45]
Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, and Sasa Misailovic. 2025. IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/forum?id=ac93gRzxxV
work page 2025
-
[46]
Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, and Gagandeep Singh. 2025. SynCode: LLM Generation with Grammar Augmentation.Transactions on Machine Learning Research(2025). https://openreview.net/forum?id= HiUZtgAPoH
work page 2025
-
[47]
Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, and Swarat Chaudhuri. 2018. HOUDINI: lifelong learning as program synthesis. InProceedings of the 32nd International Conference on Neural Information Processing Systems(Montréal, Canada)(NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 8701–8712
work page 2018
-
[48]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar
-
[49]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs.AI] https: //arxiv.org/abs/2305.16291
work page internal anchor Pith review Pith/arXiv arXiv
- [50]
-
[51]
Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. 2024. Executable code actions elicit better LLM agents. InProceedings of the 41st International Conference on Machine Learning(Vienna, Austria)(ICML’24). JMLR.org, Article 2054, 25 pages
work page 2024
- [52]
-
[53]
Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, and Yuchen Eleanor Jiang. 2024. Symbolic Learning Enables Self-Evolving Agents. arXiv:2406.18532 [cs.CL] https://arxiv.org/abs/2406.18532
-
[54]
He Zhu, Zikang Xiong, Stephen Magill, and Suresh Jagannathan. 2019. An inductive synthesis framework for verifiable reinforcement learning. InProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation(Phoenix, AZ, USA)(PLDI 2019). Association for Computing Machinery, New York, NY, USA, 686–701. doi:10.1145/3314221.3314638
-
[55]
”⇐ ⇒𝑛𝑜𝐷𝑖 𝑓 𝑓(𝑎, 𝑏). • ∀𝑐∈Σ ∗. 𝑑𝑎𝑓 𝑛𝑦𝑉 𝑒𝑟𝑖 𝑓 𝑖𝑒𝑟𝑊 𝑖𝑡ℎ𝐸𝑟𝑟𝑜𝑟 𝑀𝑠𝑔(𝑐)=“
Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. 2024. GPTSwarm: Language Agents as Optimizable Graphs. InForty-first International Conference on Machine Learning. https://openreview.net/forum?id=uTC9AFXIhg , Vol. 1, No. 1, Article . Publication date: April 2026. 28 Banerjee et al. A Restricted Dafny ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.