Recognition: 2 theorem links
· Lean TheoremPromptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Pith reviewed 2026-05-16 08:08 UTC · model grok-4.3
The pith
An LLM can improve prompting by evolving both the task prompts and the mutation rules that generate them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Promptbreeder evolves a population of task-prompts whose mutation is governed by mutation-prompts that the LLM itself generates and refines in a self-referential loop, yielding prompts that outperform Chain-of-Thought and Plan-and-Solve strategies on arithmetic and commonsense reasoning benchmarks while also producing intricate prompts for hate-speech classification.
What carries the argument
The self-referential evolutionary loop in which the LLM simultaneously mutates task-prompts and improves the mutation-prompts that control those mutations.
If this is right
- Prompt engineering for new tasks can be automated instead of requiring human design of strategies such as Chain-of-Thought.
- The same evolutionary process can discover non-obvious prompt structures for difficult classification problems such as hate-speech detection.
- Performance gains on arithmetic and commonsense reasoning tasks are obtained without changing the underlying LLM weights.
- The approach supplies a general template for self-referential improvement that can be applied to other prompt-based capabilities.
Where Pith is reading between the lines
- If the self-referential loop scales, future systems could iteratively refine their own interaction protocols without external intervention.
- The method suggests that prompt spaces may contain discoverable structure that evolutionary search can locate more efficiently than manual trial-and-error.
- Similar self-referential evolution could be tested on code-generation or tool-use prompts to check whether the same loop yields gains outside reasoning benchmarks.
Load-bearing premise
The LLM generates mutations that are useful on average and evaluates prompt fitness on a training set without systematic errors that would collapse the evolutionary search.
What would settle it
Running the full Promptbreeder procedure on a held-out reasoning benchmark for a fixed number of generations and finding that the final evolved prompts score no higher than a standard Chain-of-Thought prompt would falsify the central performance claim.
read the original abstract
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Promptbreeder, an LLM-driven evolutionary framework that maintains a population of task-prompts whose mutations are themselves generated and refined by a second population of self-referential mutation-prompts. Fitness is assessed by accuracy on a training set; the process is claimed to yield prompts that outperform Chain-of-Thought and Plan-and-Solve prompting on arithmetic and commonsense reasoning benchmarks and to produce effective prompts for hate-speech classification.
Significance. If the empirical gains prove robust and transferable, the self-referential evolutionary loop offers a general, largely automated route to prompt optimization that could reduce reliance on hand-crafted strategies. The absence of free parameters in the core loop and the explicit evolution of the mutation operator itself are notable strengths that distinguish the work from prior prompt-search methods.
major comments (2)
- [Abstract, §4] Abstract and §4 (Experiments): the central claim that Promptbreeder outperforms CoT and Plan-and-Solve is stated without any numerical results, standard deviations, or statistical tests in the abstract and is only cursorily supported in the experiments section; without these data the magnitude and reliability of the improvement cannot be evaluated.
- [§3.2, §4.3] §3.2 (Evolutionary loop) and §4.3 (Hate-speech task): the same LLM family is used both to generate mutations and to score fitness on a small training set; no cross-model transfer experiments or out-of-distribution hold-out sets are reported, leaving open the possibility that evolved prompts exploit model-specific token biases rather than general reasoning improvements.
minor comments (2)
- [§3.1] §3.1: population size, number of generations, and exact selection/replacement rules are described only at a high level; explicit pseudocode or parameter values would aid reproducibility.
- [Figure 2, §4.2] Figure 2 and §4.2: axis labels and legend entries are too small to read at standard print size; enlarge or split the figure.
Simulated Author's Rebuttal
Thank you for the constructive referee report. We address each major comment below and indicate the corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Experiments): the central claim that Promptbreeder outperforms CoT and Plan-and-Solve is stated without any numerical results, standard deviations, or statistical tests in the abstract and is only cursorily supported in the experiments section; without these data the magnitude and reliability of the improvement cannot be evaluated.
Authors: We agree that the abstract and experiments section would benefit from more concrete quantitative support. In the revised manuscript we will update the abstract to report specific accuracy improvements (with standard deviations) on the arithmetic and commonsense benchmarks. We will also expand §4 with additional tables that include means, standard deviations across runs, and statistical significance tests to substantiate the reliability of the gains over CoT and Plan-and-Solve. revision: yes
-
Referee: [§3.2, §4.3] §3.2 (Evolutionary loop) and §4.3 (Hate-speech task): the same LLM family is used both to generate mutations and to score fitness on a small training set; no cross-model transfer experiments or out-of-distribution hold-out sets are reported, leaving open the possibility that evolved prompts exploit model-specific token biases rather than general reasoning improvements.
Authors: This is a valid concern about generalizability. While the final test benchmarks are distinct from the small training sets used for fitness (and therefore constitute an out-of-distribution evaluation), we did not conduct cross-model transfer experiments. In the revision we will add explicit discussion in §3.2 and §4.3 clarifying the train/test separation, acknowledging the possibility of model-specific biases, and listing cross-model evaluation as an important direction for future work. Full cross-model experiments are not feasible within the current resource budget. revision: partial
Circularity Check
No circularity: empirical evolutionary loop with external LLM fitness
full rationale
The paper describes an empirical procedure in which an LLM generates mutations to a population of task-prompts and mutation-prompts, then scores fitness on a held-out training set. No equations, derivations, or self-referential definitions appear that would reduce the reported benchmark gains to a fitted parameter or to the input data by construction. The central claims rest on experimental results rather than on any mathematical identity or self-citation chain that collapses the method onto itself. This is the normal case of a self-contained experimental algorithm.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can generate useful prompt mutations and evaluate their fitness on a training set without introducing systematic bias
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ a binary tournament genetic algorithm framework (Harvey, 2011)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 20 Pith papers
-
Teaching and Learning under Deductive Errors
Extends PAC machine teaching to handle deductive errors by requiring teachers to select sets that lead to approximately correct hypotheses with high probability despite learner mistakes, with complexity results and LL...
-
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard...
-
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.
-
Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems
PUDA enables effective promotion of unpopular target items in black-box LLM sequential recommenders by using evolutionary LLM refinement to infer hidden prompts, training a surrogate model, and combining adversarial t...
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, ...
-
Large Language Models as Optimizers
Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-des...
-
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.
-
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...
-
PrismaDV: Automated Task-Aware Data Unit Test Generation
PrismaDV generates task-aware data unit tests by jointly analyzing downstream code and dataset profiles, outperforming task-agnostic baselines on new benchmarks spanning 60 tasks, with SIFTA enabling automatic prompt ...
-
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems
Prompt optimization in compound AI systems is statistically indistinguishable from random chance except when tasks have exploitable output structure; a two-stage diagnostic predicts success.
-
LLM-Guided Prompt Evolution for Password Guessing
LLM-guided evolutionary prompt optimization using MAP-Elites and island models raises password cracking rates from 2.02% to 8.48% on a RockYou-derived test set across local, cloud, and ensemble LLM setups.
-
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
POES frames prompt evaluation as online adaptive testing and uses a provably submodular objective to pick informative examples, delivering 6.2% higher average accuracy and 35-60% token savings versus naive full-set scoring.
-
TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution
TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solu...
-
Pioneer Agent: Continual Improvement of Small Language Models in Production
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on ...
-
ExecTune: Effective Steering of Black-Box LLMs with Guide Models
ExecTune trains guide models via acceptance sampling, supervised fine-tuning, and structure-aware RL to boost executability of strategies for black-box LLMs, yielding up to 9.2% higher accuracy and 22.4% lower cost on...
-
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
-
Diversifying Toxicity Search in Large Language Models Through Speciation
ToxSearch-S applies unsupervised speciation to evolutionary prompt search, maintaining capacity-limited species with exemplar leaders and species-aware selection to achieve higher peak toxicity and broader semantic co...
-
EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
EGL-SCA co-evolves instructions and tools via structural credit assignment in graph reasoning agents and reports 92% average success on four benchmarks.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
Reference graph
Works this paper leans on
-
[1]
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Maxwell I. Nye and Anders Johan Andreassen and Guy Gur. Show Your Work: Scratchpads for Intermediate Computation with Language Models , journal =. 2021 , url =. 2112.00114 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2021
- [2]
-
[3]
Takeshi Kojima and Shixiang Shane Gu and Machel Reid and Yutaka Matsuo and Yusuke Iwasawa , title =. NeurIPS , year =
-
[5]
The Eleventh International Conference on Learning Representations,
Yongchao Zhou and Andrei Ioan Muresanu and Ziwen Han and Keiran Paster and Silviu Pitis and Harris Chan and Jimmy Ba , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =
work page 2023
-
[6]
Automatic prompt optimization with" gradient descent" and beam search , author=. arXiv preprint arXiv:2305.03495 , year=
-
[7]
arXiv preprint arXiv:2304.09797 , year=
Progressive-hint prompting improves reasoning in large language models , author=. arXiv preprint arXiv:2304.09797 , year=
-
[8]
Artificial Intelligence , volume=
Reward is enough , author=. Artificial Intelligence , volume=. 2021 , publisher=
work page 2021
-
[9]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models , author=. arXiv preprint arXiv:2305.04091 , year=
work page internal anchor Pith review arXiv
-
[10]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan and Niket Tandon and Prakhar Gupta and Skyler Hallinan and Luyu Gao and Sarah Wiegreffe and Uri Alon and Nouha Dziri and Shrimai Prabhumoye and Yiming Yang and Sean Welleck and Bodhisattwa Prasad Majumder and Shashank Gupta and Amir Yazdanbakhsh and Peter Clark , title =. CoRR , volume =. 2023 , url =. doi:10.48550/arXiv.2303.17651 , eprinttyp...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.17651 2023
-
[12]
Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , title =. NeurIPS , year =
-
[13]
The recursive mind: The origins of human language, thought, and civilization , author=. 2014 , publisher=
work page 2014
-
[14]
Swarm and evolutionary computation , volume=
An introduction and survey of estimation of distribution algorithms , author=. Swarm and evolutionary computation , volume=. 2011 , publisher=
work page 2011
-
[15]
BERT: pre-training of deep bidirectional transformers for language understanding
Jacob Devlin and Ming. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,. 2019 , url =. doi:10.18653/v1/n19-1423 , timestamp =
-
[16]
Proceedings of the 7th annual conference on Genetic and evolutionary computation , pages=
Niching in evolution strategies , author=. Proceedings of the 7th annual conference on Genetic and evolutionary computation , pages=
-
[17]
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias , author=. 2023 , eprint=
work page 2023
-
[18]
arXiv preprint arXiv:2306.04140 , year=
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions , author=. arXiv preprint arXiv:2306.04140 , year=
-
[19]
International Conference on Machine Learning , pages=
Linear transformers are secretly fast weight programmers , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
- [20]
- [21]
- [22]
-
[23]
People Systematically Overlook Subtractive Changes , author =. 2021 , month = apr, journal =. doi:10.1038/s41586-021-03380-y , urldate =
-
[24]
Adaptive Agent Team and Bauer, Jakob and Baumli, Kate and Baveja, Satinder and Behbahani, Feryal and Bhoopchand, Avishkar and. Human-. 2023 , month = jan, number =. doi:10.48550/arXiv.2301.07608 , urldate =. arxiv , keywords =:2301.07608 , primaryclass =
-
[25]
and Doren, Charles Van , year =
Adler, Mortimer J. and Doren, Charles Van , year =. How to
-
[26]
Ahn, Michael and Brohan, Anthony and Brown, Noah and Chebotar, Yevgen and Cortes, Omar and David, Byron and Finn, Chelsea and Fu, Chuyuan and Gopalakrishnan, Keerthana and Hausman, Karol and Herzog, Alex and Ho, Daniel and Hsu, Jasmine and Ibarz, Julian and Ichter, Brian and Irpan, Alex and Jang, Eric and Ruano, Rosario Jauregui and Jeffrey, Kyle and Jesm...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.01691
-
[27]
Flamingo: a Visual Language Model for Few-Shot Learning
Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katie and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Sebasti...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.14198
-
[28]
Alexander, Scott , year =. Janus'. Astral Codex Ten , urldate =
-
[29]
and Ma, Shi-Yuan and Wang, Tianyu and Wright, Logan G
Anderson, Maxwell G. and Ma, Shi-Yuan and Wang, Tianyu and Wright, Logan G. and McMahon, Peter L. , year =. Optical. doi:10.48550/arXiv.2302.10360 , urldate =. arxiv , keywords =:2302.10360 , primaryclass =
-
[30]
Andreas, Jacob , year =. Language. doi:10.48550/arXiv.2212.01681 , urldate =. arxiv , keywords =:2212.01681 , primaryclass =
- [31]
-
[32]
Large Language Models Are Not Zero-Shot Communicators , booktitle =
Anonymous , year =. Large Language Models Are Not Zero-Shot Communicators , booktitle =
- [33]
-
[34]
Askell, Amanda and Bai, Yuntao and Chen, Anna and Drain, Dawn and Ganguli, Deep and Henighan, Tom and Jones, Andy and Joseph, Nicholas and Mann, Ben and DasSarma, Nova and Elhage, Nelson and. A. 2021 , month = dec, number =. doi:10.48550/arXiv.2112.00861 , urldate =. arxiv , keywords =:2112.00861 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.00861 2021
-
[35]
Playing hard exploration games by watching YouTube
Aytar, Yusuf and Pfaff, Tobias and Budden, David and Paine, Tom Le and Wang, Ziyu and. Playing Hard Exploration Games by Watching. 2018 , month = nov, number =. doi:10.48550/arXiv.1805.11592 , urldate =. arxiv , keywords =:1805.11592 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.11592 2018
-
[36]
doi:10.48550/arXiv.2210.10243 , urldate =
Azad, Abdus Salam and Gur, Izzeddin and Faust, Aleksandra and Abbeel, Pieter and Stoica, Ion , year =. doi:10.48550/arXiv.2210.10243 , urldate =. arxiv , keywords =:2210.10243 , primaryclass =
-
[37]
Azizi, Shekoofeh and Kornblith, Simon and Saharia, Chitwan and Norouzi, Mohammad and Fleet, David J. , year =. Synthetic. doi:10.48550/arXiv.2304.08466 , urldate =. arxiv , keywords =:2304.08466 , primaryclass =
-
[38]
Bagaria, Akhil and Jiang, Ray and Kumar, Ramana and Schaul, Tom , year =. Scaling. doi:10.48550/arXiv.2302.04693 , urldate =. arxiv , keywords =:2302.04693 , primaryclass =
-
[39]
Constitutional AI: Harmlessness from AI Feedback
Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and. Constitutional. 2022 , month = dec, ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022
-
[40]
Baker, Bowen and Kanitscheider, Ingmar and Markov, Todor and Wu, Yi and Powell, Glenn and McGrew, Bob and Mordatch, Igor , year =. Emergent. doi:10.48550/arXiv.1909.07528 , urldate =. arxiv , keywords =:1909.07528 , primaryclass =
-
[41]
Baker, Bowen and Akkaya, Ilge and Zhokhov, Peter and Huizinga, Joost and Tang, Jie and Ecoffet, Adrien and Houghton, Brandon and Sampedro, Raul and Clune, Jeff , year =. Video. doi:10.48550/arXiv.2206.11795 , urldate =. arxiv , keywords =:2206.11795 , primaryclass =
-
[42]
Balestriero, Randall and Pesenti, Jerome and LeCun, Yann , year =. Learning in. doi:10.48550/arXiv.2110.09485 , urldate =. arxiv , keywords =:2110.09485 , primaryclass =
-
[43]
Bamford, Christopher and Jiang, Minqi and Samvelyan, Mikayel and Rockt. 2022 , month = jul, number =. arxiv , keywords =:2207.06105 , primaryclass =
-
[44]
Bansal, Arpit and Borgnia, Eitan and Chu, Hong-Min and Li, Jie S. and Kazemi, Hamid and Huang, Furong and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , year =. Cold. doi:10.48550/arXiv.2208.09392 , urldate =. arxiv , keywords =:2208.09392 , primaryclass =
-
[45]
Baradad, Manel and Chen, Chun-Fu and Wulff, Jonas and Wang, Tongzhou and Feris, Rogerio and Torralba, Antonio and Isola, Phillip , year =. Procedural. doi:10.48550/arXiv.2211.16412 , urldate =. arxiv , keywords =:2211.16412 , primaryclass =
-
[46]
doi:10.48550/arXiv.2207.13751 , urldate =
Bautista, Miguel Angel and Guo, Pengsheng and Abnar, Samira and Talbott, Walter and Toshev, Alexander and Chen, Zhuoyuan and Dinh, Laurent and Zhai, Shuangfei and Goh, Hanlin and Ulbricht, Daniel and Dehghan, Afshin and Susskind, Josh , year =. doi:10.48550/arXiv.2207.13751 , urldate =. arxiv , keywords =:2207.13751 , primaryclass =
-
[47]
The Arcade Learning Environment: An Evaluation Platform for General Agents
Bellemare, Marc G. and Naddaf, Yavar and Veness, Joel and Bowling, Michael , year =. The. Journal of Artificial Intelligence Research , volume =. doi:10.1613/jair.3912 , urldate =. arxiv , keywords =:1207.4708 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1613/jair.3912
-
[48]
Bengio, Emmanuel and Jain, Moksh and Korablyov, Maksym and Precup, Doina and Bengio, Yoshua , year =. Flow. doi:10.48550/arXiv.2106.04399 , urldate =. arxiv , keywords =:2106.04399 , primaryclass =
-
[49]
Beyer, Lucas and Zhai, Xiaohua and Royer, Am. Knowledge Distillation:. 2022 , month = jun, number =. doi:10.48550/arXiv.2106.05237 , urldate =. arxiv , keywords =:2106.05237 , primaryclass =
-
[50]
Bhatia, Jagdeep Singh and Jackson, Holly and Tian, Yunsheng and Xu, Jie and Matusik, Wojciech , year =. Evolution. doi:10.48550/arXiv.2201.09863 , urldate =. arxiv , keywords =:2201.09863 , primaryclass =
-
[51]
and Nikolaidis, Stefanos , year =
Bhatt, Varun and Tjanaka, Bryon and Fontaine, Matthew C. and Nikolaidis, Stefanos , year =. Deep. doi:10.48550/arXiv.2206.04199 , urldate =. arxiv , keywords =:2206.04199 , primaryclass =
-
[52]
Align your latents: High-resolution video synthesis with latent diffusion models, 2023 b
Blattmann, Andreas and Rombach, Robin and Ling, Huan and Dockhorn, Tim and Kim, Seung Wook and Fidler, Sanja and Kreis, Karsten , year =. Align Your. doi:10.48550/arXiv.2304.08818 , urldate =. arxiv , keywords =:2304.08818 , primaryclass =
-
[53]
Emergent Autonomous Scientific Research Capabilities of Large Language Models , author =. 2023 , month = apr, number =. doi:10.48550/arXiv.2304.05332 , urldate =. arxiv , keywords =:2304.05332 , primaryclass =
-
[54]
Bonnet, Cl. Jumanji: A. 2023 , month = jun, number =. doi:10.48550/arXiv.2306.09884 , urldate =. arxiv , keywords =:2306.09884 , primaryclass =
-
[55]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Bubeck, S. Sparks of. 2023 , month = mar, number =. doi:10.48550/arXiv.2303.12712 , urldate =. arxiv , keywords =:2303.12712 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.12712 2023
-
[56]
Exploration by Random Network Distillation
Burda, Yuri and Edwards, Harrison and Storkey, Amos and Klimov, Oleg , year =. Exploration by. doi:10.48550/arXiv.1810.12894 , urldate =. arxiv , keywords =:1810.12894 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1810.12894
-
[57]
doi:10.48550/arXiv.2210.04932 , urldate =
Byravan, Arunkumar and Humplik, Jan and Hasenclever, Leonard and Brussee, Arthur and Nori, Francesco and Haarnoja, Tuomas and Moran, Ben and Bohez, Steven and Sadeghi, Fereshteh and Vujatovic, Bojan and Heess, Nicolas , year =. doi:10.48550/arXiv.2210.04932 , urldate =. arxiv , keywords =:2210.04932 , primaryclass =
-
[58]
Cai, Tianle and Wang, Xuezhi and Ma, Tengyu and Chen, Xinyun and Zhou, Denny , year =. Large. doi:10.48550/arXiv.2305.17126 , urldate =. arxiv , keywords =:2305.17126 , primaryclass =
-
[59]
Caluwaerts, Ken and Iscen, Atil and Kew, J. Chase and Yu, Wenhao and Zhang, Tingnan and Freeman, Daniel and Lee, Kuang-Huei and Lee, Lisa and Saliceti, Stefano and Zhuang, Vincent and Batchelor, Nathan and Bohez, Steven and Casarini, Federico and Chen, Jose Enrique and Cortes, Omar and Coumans, Erwin and Dostmohamed, Adil and. Barkour:. 2023 , month = may...
-
[60]
Canaan, Rodrigo and Togelius, Julian and Nealen, Andy and Menzel, Stefan , year =. Diverse. doi:10.48550/arXiv.1907.03840 , urldate =. arxiv , keywords =:1907.03840 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.03840 1907
-
[61]
doi:10.48550/arXiv.2301.09632 , urldate =
Cao, Ang and Johnson, Justin , year =. doi:10.48550/arXiv.2301.09632 , urldate =. arxiv , keywords =:2301.09632 , primaryclass =
-
[62]
Carta, Thomas and Romac, Cl. Grounding. 2023 , month = feb, number =. doi:10.48550/arXiv.2302.02662 , urldate =. arxiv , keywords =:2302.02662 , primaryclass =
-
[63]
Catt, Elliot and Hutter, Marcus and Veness, Joel , year =. Reinforcement. doi:10.48550/arXiv.2109.15147 , urldate =. arxiv , keywords =:2109.15147 , primaryclass =
-
[64]
Chai, Lucy and Tucker, Richard and Li, Zhengqi and Isola, Phillip and Snavely, Noah , year =. Persistent. doi:10.48550/arXiv.2303.13515 , urldate =. arxiv , keywords =:2303.13515 , primaryclass =
-
[65]
Chan, Bert Wang-Chak , year =. Lenia -. Complex Systems , volume =. doi:10.25088/ComplexSystems.28.3.251 , urldate =. arxiv , keywords =:1812.05433 , primaryclass =
-
[66]
Chan, Bert Wang-Chak , year =. Lenia and. The 2020. doi:10.1162/isal_a_00297 , urldate =. arxiv , keywords =:2005.03742 , primaryclass =
-
[67]
Chan, Stephanie C. Y. and Lampinen, Andrew K. and Richemond, Pierre H. and Hill, Felix , year =. Zipfian Environments for. doi:10.48550/arXiv.2203.08222 , urldate =. arxiv , keywords =:2203.08222 , primaryclass =
-
[68]
Chan, Alan and Salganik, Rebecca and Markelius, Alva and Pang, Chris and Rajkumar, Nitarshan and Krasheninnikov, Dmitrii and Langosco, Lauro and He, Zhonghao and Duan, Yawen and Carroll, Micah and Lin, Michelle and Mayhew, Alex and Collins, Katherine and Molamohammadi, Maryam and Burden, John and Zhao, Wanru and Rismani, Shalaleh and Voudouris, Konstantin...
-
[69]
Chan, Bert Wang-Chak , year =. Towards. doi:10.48550/arXiv.2304.05639 , urldate =. arxiv , keywords =:2304.05639 , primaryclass =
-
[70]
Chang, Matthew and Gupta, Arjun and Gupta, Saurabh , year =. Learning. doi:10.48550/arXiv.2204.12458 , urldate =. arxiv , keywords =:2204.12458 , primaryclass =
-
[71]
Chang, Huiwen and Zhang, Han and Jiang, Lu and Liu, Ce and Freeman, William T. , year =. doi:10.48550/arXiv.2202.04200 , urldate =. arxiv , keywords =:2202.04200 , primaryclass =
-
[72]
Improved Baselines with Momentum Contrastive Learning
Chen, Xinlei and Fan, Haoqi and Girshick, Ross and He, Kaiming , year =. Improved. doi:10.48550/arXiv.2003.04297 , urldate =. arxiv , keywords =:2003.04297 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2003.04297 2003
-
[73]
Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey , year =. A. doi:10.48550/arXiv.2002.05709 , urldate =. arxiv , keywords =:2002.05709 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2002.05709 2002
-
[74]
Chen, Xinlei and Xie, Saining and He, Kaiming , year =. An. doi:10.48550/arXiv.2104.02057 , urldate =. arxiv , keywords =:2104.02057 , primaryclass =
-
[75]
Chen, Ting and Zhang, Ruixiang and Hinton, Geoffrey , year =. Analog. doi:10.48550/arXiv.2208.04202 , urldate =. arxiv , keywords =:2208.04202 , primaryclass =
-
[76]
Angelica Chen and David M. Dohan and David R. So , title =. CoRR , volume =. 2023 , url =. doi:10.48550/arXiv.2302.14838 , eprinttype =. 2302.14838 , timestamp =
-
[77]
doi:10.48550/arXiv.2302.06671 , urldate =
Chen, Zoey and Kiami, Sho and Gupta, Abhishek and Kumar, Vikash , year =. doi:10.48550/arXiv.2302.06671 , urldate =. arxiv , keywords =:2302.06671 , primaryclass =
-
[78]
doi:10.48550/arXiv.2302.01330 , urldate =
Chen, Zhaoxi and Wang, Guangcong and Liu, Ziwei , year =. doi:10.48550/arXiv.2302.01330 , urldate =. arxiv , keywords =:2302.01330 , primaryclass =
-
[79]
Teaching Large Language Models to Self-Debug
Chen, Xinyun and Lin, Maxwell and Sch. Teaching. 2023 , month = apr, number =. doi:10.48550/arXiv.2304.05128 , urldate =. arxiv , keywords =:2304.05128 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.05128 2023
-
[80]
Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran , year =. Diffusion. doi:10.48550/arXiv.2303.04137 , urldate =. arxiv , keywords =:2303.04137 , primaryclass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.04137
-
[81]
Deep reinforcement learning from human preferences
Deep Reinforcement Learning from Human Preferences , author =. 2023 , month = feb, number =. doi:10.48550/arXiv.1706.03741 , urldate =. arxiv , keywords =:1706.03741 , primaryclass =
work page internal anchor Pith review doi:10.48550/arxiv.1706.03741 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.