arxiv: 2309.16797 · v1 · submitted 2023-09-28 · 💻 cs.CL · cs.AI· cs.LG· cs.NE

Recognition: 2 theorem links

· Lean Theorem

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

Chrisantha Fernando , Dylan Banarse , Henryk Michalewski , Simon Osindero , Tim Rockt\"aschel

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:08 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LGcs.NE

keywords prompt evolutionself-referential improvementlarge language modelsevolutionary prompt searchreasoning benchmarkshate speech classificationprompt engineering

0 comments

The pith

An LLM can improve prompting by evolving both the task prompts and the mutation rules that generate them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Promptbreeder maintains a population of task prompts for a given problem and uses an LLM to mutate them according to a separate set of mutation prompts. The same LLM also evolves those mutation prompts across generations, creating a self-referential loop in which the rules for improvement themselves improve. This process is tested on arithmetic and commonsense reasoning benchmarks where it exceeds fixed strategies such as Chain-of-Thought prompting, and it is further shown to produce detailed prompts for hate-speech classification. The method replaces hand-designed prompt engineering with an automated evolutionary search driven entirely by the model under improvement.

Core claim

Promptbreeder evolves a population of task-prompts whose mutation is governed by mutation-prompts that the LLM itself generates and refines in a self-referential loop, yielding prompts that outperform Chain-of-Thought and Plan-and-Solve strategies on arithmetic and commonsense reasoning benchmarks while also producing intricate prompts for hate-speech classification.

What carries the argument

The self-referential evolutionary loop in which the LLM simultaneously mutates task-prompts and improves the mutation-prompts that control those mutations.

If this is right

Prompt engineering for new tasks can be automated instead of requiring human design of strategies such as Chain-of-Thought.
The same evolutionary process can discover non-obvious prompt structures for difficult classification problems such as hate-speech detection.
Performance gains on arithmetic and commonsense reasoning tasks are obtained without changing the underlying LLM weights.
The approach supplies a general template for self-referential improvement that can be applied to other prompt-based capabilities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the self-referential loop scales, future systems could iteratively refine their own interaction protocols without external intervention.
The method suggests that prompt spaces may contain discoverable structure that evolutionary search can locate more efficiently than manual trial-and-error.
Similar self-referential evolution could be tested on code-generation or tool-use prompts to check whether the same loop yields gains outside reasoning benchmarks.

Load-bearing premise

The LLM generates mutations that are useful on average and evaluates prompt fitness on a training set without systematic errors that would collapse the evolutionary search.

What would settle it

Running the full Promptbreeder procedure on a held-out reasoning benchmark for a fixed number of generations and finding that the final evolved prompts score no higher than a standard Chain-of-Thought prompt would falsify the central performance claim.

read the original abstract

Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Promptbreeder's self-referential evolution of the mutation prompts themselves is the real novelty, but the performance claims rest on thin evidence and a closed LLM loop that risks model-specific artifacts.

read the letter

The paper's main contribution is a straightforward evolutionary loop where an LLM mutates a population of task prompts and also mutates the prompts that control those mutations. This self-referential step is new relative to prior work that only evolves the task prompts. The setup is applied to arithmetic reasoning, commonsense tasks, and hate-speech classification, with the claim that the evolved prompts beat Chain-of-Thought and Plan-and-Solve on standard benchmarks. The mechanism itself is simple and general: generate mutations, score on a training set, keep the winners, and repeat while the mutation rules improve in parallel. That part is cleanly described and could be useful for anyone trying to automate prompt adaptation without hand-crafting operators each time. The experiments appear to run the loop on real benchmarks and produce task prompts that look more intricate than the baselines, especially for hate speech. The absence of any equations or fitted parameters is fine here; it is an empirical search method, not a theoretical derivation. The soft spots are in the support for the central claim. The abstract asserts outperformance but supplies no numbers, standard deviations, or controls for how many independent runs were done. Without those details it is hard to know whether the gains are stable or just variation from the evolutionary process. The closed loop inside one LLM family also leaves open the possibility that the evolved prompts are exploiting that model's particular token biases or training echoes rather than producing broadly better reasoning. This risk is higher on the hate-speech task where label boundaries are soft and the training set is small. The paper would be of interest to people working on automated prompt engineering or evolutionary methods for LLMs. A reader who wants to see a concrete implementation of self-improving mutation rules would get value from it. It is coherent on its own terms and shows clear engagement with the prompt literature, so it deserves a serious referee to check the experimental details and whether the improvements transfer to other models.

Referee Report

2 major / 2 minor

Summary. The paper introduces Promptbreeder, an LLM-driven evolutionary framework that maintains a population of task-prompts whose mutations are themselves generated and refined by a second population of self-referential mutation-prompts. Fitness is assessed by accuracy on a training set; the process is claimed to yield prompts that outperform Chain-of-Thought and Plan-and-Solve prompting on arithmetic and commonsense reasoning benchmarks and to produce effective prompts for hate-speech classification.

Significance. If the empirical gains prove robust and transferable, the self-referential evolutionary loop offers a general, largely automated route to prompt optimization that could reduce reliance on hand-crafted strategies. The absence of free parameters in the core loop and the explicit evolution of the mutation operator itself are notable strengths that distinguish the work from prior prompt-search methods.

major comments (2)

[Abstract, §4] Abstract and §4 (Experiments): the central claim that Promptbreeder outperforms CoT and Plan-and-Solve is stated without any numerical results, standard deviations, or statistical tests in the abstract and is only cursorily supported in the experiments section; without these data the magnitude and reliability of the improvement cannot be evaluated.
[§3.2, §4.3] §3.2 (Evolutionary loop) and §4.3 (Hate-speech task): the same LLM family is used both to generate mutations and to score fitness on a small training set; no cross-model transfer experiments or out-of-distribution hold-out sets are reported, leaving open the possibility that evolved prompts exploit model-specific token biases rather than general reasoning improvements.

minor comments (2)

[§3.1] §3.1: population size, number of generations, and exact selection/replacement rules are described only at a high level; explicit pseudocode or parameter values would aid reproducibility.
[Figure 2, §4.2] Figure 2 and §4.2: axis labels and legend entries are too small to read at standard print size; enlarge or split the figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive referee report. We address each major comment below and indicate the corresponding revisions to the manuscript.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (Experiments): the central claim that Promptbreeder outperforms CoT and Plan-and-Solve is stated without any numerical results, standard deviations, or statistical tests in the abstract and is only cursorily supported in the experiments section; without these data the magnitude and reliability of the improvement cannot be evaluated.

Authors: We agree that the abstract and experiments section would benefit from more concrete quantitative support. In the revised manuscript we will update the abstract to report specific accuracy improvements (with standard deviations) on the arithmetic and commonsense benchmarks. We will also expand §4 with additional tables that include means, standard deviations across runs, and statistical significance tests to substantiate the reliability of the gains over CoT and Plan-and-Solve. revision: yes
Referee: [§3.2, §4.3] §3.2 (Evolutionary loop) and §4.3 (Hate-speech task): the same LLM family is used both to generate mutations and to score fitness on a small training set; no cross-model transfer experiments or out-of-distribution hold-out sets are reported, leaving open the possibility that evolved prompts exploit model-specific token biases rather than general reasoning improvements.

Authors: This is a valid concern about generalizability. While the final test benchmarks are distinct from the small training sets used for fitness (and therefore constitute an out-of-distribution evaluation), we did not conduct cross-model transfer experiments. In the revision we will add explicit discussion in §3.2 and §4.3 clarifying the train/test separation, acknowledging the possibility of model-specific biases, and listing cross-model evaluation as an important direction for future work. Full cross-model experiments are not feasible within the current resource budget. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical evolutionary loop with external LLM fitness

full rationale

The paper describes an empirical procedure in which an LLM generates mutations to a population of task-prompts and mutation-prompts, then scores fitness on a held-out training set. No equations, derivations, or self-referential definitions appear that would reduce the reported benchmark gains to a fitted parameter or to the input data by construction. The central claims rest on experimental results rather than on any mathematical identity or self-citation chain that collapses the method onto itself. This is the normal case of a self-contained experimental algorithm.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that current LLMs can generate effective prompt mutations and perform accurate fitness evaluation on training data; no free parameters or invented entities are introduced.

axioms (1)

domain assumption LLMs can generate useful prompt mutations and evaluate their fitness on a training set without introducing systematic bias
This capability is required for the evolutionary loop to produce net improvement rather than random drift.

pith-pipeline@v0.9.0 · 5509 in / 1118 out tokens · 43022 ms · 2026-05-16T08:08:25.069041+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We employ a binary tournament genetic algorithm framework (Harvey, 2011)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Teaching and Learning under Deductive Errors
cs.LG 2026-05 conditional novelty 7.0

Extends PAC machine teaching to handle deductive errors by requiring teachers to select sets that lead to approximately correct hypotheses with high probability despite learner mistakes, with complexity results and LL...
Learning, Fast and Slow: Towards LLMs That Adapt Continually
cs.LG 2026-05 unverdicted novelty 7.0

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard...
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
cs.SE 2026-05 unverdicted novelty 7.0

TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.
Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems
cs.IR 2026-04 unverdicted novelty 7.0

PUDA enables effective promotion of unpopular target items in black-box LLM sequential recommenders by using evolutionary LLM refinement to infer hidden prompts, training a surrogate model, and combining adversarial t...
AlphaEvolve: A coding agent for scientific and algorithmic discovery
cs.AI 2025-06 unverdicted novelty 7.0

AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, ...
Large Language Models as Optimizers
cs.LG 2023-09 unverdicted novelty 7.0

Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-des...
Learning, Fast and Slow: Towards LLMs That Adapt Continually
cs.LG 2026-05 unverdicted novelty 6.0

Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
cs.AI 2026-05 unverdicted novelty 6.0

EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...
PrismaDV: Automated Task-Aware Data Unit Test Generation
cs.LG 2026-04 unverdicted novelty 6.0

PrismaDV generates task-aware data unit tests by jointly analyzing downstream code and dataset profiles, outperforming task-agnostic baselines on new benchmarks spanning 60 tasks, with SIFTA enabling automatic prompt ...
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems
cs.AI 2026-04 unverdicted novelty 6.0

Prompt optimization in compound AI systems is statistically indistinguishable from random chance except when tasks have exploitable output structure; a two-stage diagnostic predicts success.
LLM-Guided Prompt Evolution for Password Guessing
cs.CR 2026-04 unverdicted novelty 6.0

LLM-guided evolutionary prompt optimization using MAP-Elites and island models raises password cracking rates from 2.02% to 8.48% on a RockYou-derived test set across local, cloud, and ensemble LLM setups.
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
cs.AI 2026-04 unverdicted novelty 6.0

POES frames prompt evaluation as online adaptive testing and uses a provably submodular objective to pick informative examples, delivering 6.2% higher average accuracy and 35-60% token savings versus naive full-set scoring.
TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution
cs.NE 2026-04 unverdicted novelty 6.0

TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solu...
Pioneer Agent: Continual Improvement of Small Language Models in Production
cs.AI 2026-04 unverdicted novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on ...
ExecTune: Effective Steering of Black-Box LLMs with Guide Models
cs.LG 2026-04 unverdicted novelty 6.0

ExecTune trains guide models via acceptance sampling, supervised fine-tuning, and structure-aware RL to boost executability of strategies for black-box LLMs, yielding up to 9.2% higher accuracy and 22.4% lower cost on...
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
cs.RO 2026-03 unverdicted novelty 6.0

Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
Diversifying Toxicity Search in Large Language Models Through Speciation
cs.NE 2026-01 unverdicted novelty 6.0

ToxSearch-S applies unsupervised speciation to evolutionary prompt search, maintaining capacity-limited species with exemplar leaders and species-aware selection to achieve higher peak toxicity and broader semantic co...
EGL-SCA: Structural Credit Assignment for Co-Evolving Instructions and Tools in Graph Reasoning Agents
cs.AI 2026-05 unverdicted novelty 5.0

EGL-SCA co-evolves instructions and tools via structural credit assignment in graph reasoning agents and reports 92% average success on four benchmarks.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
cs.AI 2025-07 accept novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
cs.AI 2025-01 unverdicted novelty 3.0

The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.

Reference graph

Works this paper leans on

296 extracted references · 296 canonical work pages · cited by 19 Pith papers · 67 internal anchors

[1]

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Maxwell I. Nye and Anders Johan Andreassen and Guy Gur. Show Your Work: Scratchpads for Intermediate Computation with Language Models , journal =. 2021 , url =. 2112.00114 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

1995 , publisher=

The Hitchhiker's Guide to the Galaxy , author=. 1995 , publisher=

work page 1995
[3]

NeurIPS , year =

Takeshi Kojima and Shixiang Shane Gu and Machel Reid and Yutaka Matsuo and Yusuke Iwasawa , title =. NeurIPS , year =

work page
[5]

The Eleventh International Conference on Learning Representations,

Yongchao Zhou and Andrei Ioan Muresanu and Ziwen Han and Keiran Paster and Silviu Pitis and Harris Chan and Jimmy Ba , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

work page 2023
[6]

gradient descent

Automatic prompt optimization with" gradient descent" and beam search , author=. arXiv preprint arXiv:2305.03495 , year=

work page arXiv
[7]

arXiv preprint arXiv:2304.09797 , year=

Progressive-hint prompting improves reasoning in large language models , author=. arXiv preprint arXiv:2304.09797 , year=

work page arXiv
[8]

Artificial Intelligence , volume=

Reward is enough , author=. Artificial Intelligence , volume=. 2021 , publisher=

work page 2021
[9]

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models , author=. arXiv preprint arXiv:2305.04091 , year=

work page internal anchor Pith review arXiv
[10]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Self-Refine: Iterative Refinement with Self-Feedback

Aman Madaan and Niket Tandon and Prakhar Gupta and Skyler Hallinan and Luyu Gao and Sarah Wiegreffe and Uri Alon and Nouha Dziri and Shrimai Prabhumoye and Yiming Yang and Sean Welleck and Bodhisattwa Prasad Majumder and Shashank Gupta and Amir Yazdanbakhsh and Peter Clark , title =. CoRR , volume =. 2023 , url =. doi:10.48550/arXiv.2303.17651 , eprinttyp...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.17651 2023
[12]

Chi and Quoc V

Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , title =. NeurIPS , year =

work page
[13]

2014 , publisher=

The recursive mind: The origins of human language, thought, and civilization , author=. 2014 , publisher=

work page 2014
[14]

Swarm and evolutionary computation , volume=

An introduction and survey of estimation of distribution algorithms , author=. Swarm and evolutionary computation , volume=. 2011 , publisher=

work page 2011
[15]

BERT: pre-training of deep bidirectional transformers for language understanding

Jacob Devlin and Ming. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,. 2019 , url =. doi:10.18653/v1/n19-1423 , timestamp =

work page doi:10.18653/v1/n19-1423 2019
[16]

Proceedings of the 7th annual conference on Genetic and evolutionary computation , pages=

Niching in evolution strategies , author=. Proceedings of the 7th annual conference on Genetic and evolutionary computation , pages=

work page
[17]

2023 , eprint=

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias , author=. 2023 , eprint=

work page 2023
[18]

arXiv preprint arXiv:2306.04140 , year=

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions , author=. arXiv preprint arXiv:2306.04140 , year=

work page arXiv
[19]

International Conference on Machine Learning , pages=

Linear transformers are secretly fast weight programmers , author=. International Conference on Machine Learning , pages=. 2021 , organization=

work page 2021
[20]

2017 , month = apr, journal =

Evolution. 2017 , month = apr, journal =

work page 2017
[21]

2023 , month = jun, urldate =

Auto-. 2023 , month = jun, urldate =

work page 2023
[22]

2023 , month = may, journal =

Quality. 2023 , month = may, journal =

work page 2023
[23]

2021 , month = apr, journal =

People Systematically Overlook Subtractive Changes , author =. 2021 , month = apr, journal =. doi:10.1038/s41586-021-03380-y , urldate =

work page doi:10.1038/s41586-021-03380-y 2021
[24]

Adaptive Agent Team and Bauer, Jakob and Baumli, Kate and Baveja, Satinder and Behbahani, Feryal and Bhoopchand, Avishkar and. Human-. 2023 , month = jan, number =. doi:10.48550/arXiv.2301.07608 , urldate =. arxiv , keywords =:2301.07608 , primaryclass =

work page doi:10.48550/arxiv.2301.07608 2023
[25]

and Doren, Charles Van , year =

Adler, Mortimer J. and Doren, Charles Van , year =. How to

work page
[26]

Ahn, Michael and Brohan, Anthony and Brown, Noah and Chebotar, Yevgen and Cortes, Omar and David, Byron and Finn, Chelsea and Fu, Chuyuan and Gopalakrishnan, Keerthana and Hausman, Karol and Herzog, Alex and Ho, Daniel and Hsu, Jasmine and Ibarz, Julian and Ichter, Brian and Irpan, Alex and Jang, Eric and Ruano, Rosario Jauregui and Jeffrey, Kyle and Jesm...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.01691
[27]

Flamingo: a Visual Language Model for Few-Shot Learning

Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katie and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Sebasti...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.14198
[28]

Alexander, Scott , year =. Janus'. Astral Codex Ten , urldate =

work page
[29]

and Ma, Shi-Yuan and Wang, Tianyu and Wright, Logan G

Anderson, Maxwell G. and Ma, Shi-Yuan and Wang, Tianyu and Wright, Logan G. and McMahon, Peter L. , year =. Optical. doi:10.48550/arXiv.2302.10360 , urldate =. arxiv , keywords =:2302.10360 , primaryclass =

work page doi:10.48550/arxiv.2302.10360
[30]

Language

Andreas, Jacob , year =. Language. doi:10.48550/arXiv.2212.01681 , urldate =. arxiv , keywords =:2212.01681 , primaryclass =

work page doi:10.48550/arxiv.2212.01681
[31]

Expanding

Anonymous , year =. Expanding. The

work page
[32]

Large Language Models Are Not Zero-Shot Communicators , booktitle =

Anonymous , year =. Large Language Models Are Not Zero-Shot Communicators , booktitle =

work page
[33]

Outcome-Directed

Anonymous , year =. Outcome-Directed. The

work page
[34]

Askell, Amanda and Bai, Yuntao and Chen, Anna and Drain, Dawn and Ganguli, Deep and Henighan, Tom and Jones, Andy and Joseph, Nicholas and Mann, Ben and DasSarma, Nova and Elhage, Nelson and. A. 2021 , month = dec, number =. doi:10.48550/arXiv.2112.00861 , urldate =. arxiv , keywords =:2112.00861 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.00861 2021
[35]

Playing hard exploration games by watching YouTube

Aytar, Yusuf and Pfaff, Tobias and Budden, David and Paine, Tom Le and Wang, Ziyu and. Playing Hard Exploration Games by Watching. 2018 , month = nov, number =. doi:10.48550/arXiv.1805.11592 , urldate =. arxiv , keywords =:1805.11592 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.11592 2018
[36]

doi:10.48550/arXiv.2210.10243 , urldate =

Azad, Abdus Salam and Gur, Izzeddin and Faust, Aleksandra and Abbeel, Pieter and Stoica, Ion , year =. doi:10.48550/arXiv.2210.10243 , urldate =. arxiv , keywords =:2210.10243 , primaryclass =

work page doi:10.48550/arxiv.2210.10243
[37]

, year =

Azizi, Shekoofeh and Kornblith, Simon and Saharia, Chitwan and Norouzi, Mohammad and Fleet, David J. , year =. Synthetic. doi:10.48550/arXiv.2304.08466 , urldate =. arxiv , keywords =:2304.08466 , primaryclass =

work page doi:10.48550/arxiv.2304.08466
[38]

Bagaria, Akhil and Jiang, Ray and Kumar, Ramana and Schaul, Tom , year =. Scaling. doi:10.48550/arXiv.2302.04693 , urldate =. arxiv , keywords =:2302.04693 , primaryclass =

work page doi:10.48550/arxiv.2302.04693
[39]

Constitutional AI: Harmlessness from AI Feedback

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and. Constitutional. 2022 , month = dec, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022
[40]

Emergent

Baker, Bowen and Kanitscheider, Ingmar and Markov, Todor and Wu, Yi and Powell, Glenn and McGrew, Bob and Mordatch, Igor , year =. Emergent. doi:10.48550/arXiv.1909.07528 , urldate =. arxiv , keywords =:1909.07528 , primaryclass =

work page doi:10.48550/arxiv.1909.07528 1909
[41]

Baker, Bowen and Akkaya, Ilge and Zhokhov, Peter and Huizinga, Joost and Tang, Jie and Ecoffet, Adrien and Houghton, Brandon and Sampedro, Raul and Clune, Jeff , year =. Video. doi:10.48550/arXiv.2206.11795 , urldate =. arxiv , keywords =:2206.11795 , primaryclass =

work page doi:10.48550/arxiv.2206.11795
[42]

Learning in

Balestriero, Randall and Pesenti, Jerome and LeCun, Yann , year =. Learning in. doi:10.48550/arXiv.2110.09485 , urldate =. arxiv , keywords =:2110.09485 , primaryclass =

work page doi:10.48550/arxiv.2110.09485
[43]

2022 , month = jul, number =

Bamford, Christopher and Jiang, Minqi and Samvelyan, Mikayel and Rockt. 2022 , month = jul, number =. arxiv , keywords =:2207.06105 , primaryclass =

work page arXiv 2022
[44]

and Kazemi, Hamid and Huang, Furong and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , year =

Bansal, Arpit and Borgnia, Eitan and Chu, Hong-Min and Li, Jie S. and Kazemi, Hamid and Huang, Furong and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom , year =. Cold. doi:10.48550/arXiv.2208.09392 , urldate =. arxiv , keywords =:2208.09392 , primaryclass =

work page doi:10.48550/arxiv.2208.09392
[45]

Procedural

Baradad, Manel and Chen, Chun-Fu and Wulff, Jonas and Wang, Tongzhou and Feris, Rogerio and Torralba, Antonio and Isola, Phillip , year =. Procedural. doi:10.48550/arXiv.2211.16412 , urldate =. arxiv , keywords =:2211.16412 , primaryclass =

work page doi:10.48550/arxiv.2211.16412
[46]

doi:10.48550/arXiv.2207.13751 , urldate =

Bautista, Miguel Angel and Guo, Pengsheng and Abnar, Samira and Talbott, Walter and Toshev, Alexander and Chen, Zhuoyuan and Dinh, Laurent and Zhai, Shuangfei and Goh, Hanlin and Ulbricht, Daniel and Dehghan, Afshin and Susskind, Josh , year =. doi:10.48550/arXiv.2207.13751 , urldate =. arxiv , keywords =:2207.13751 , primaryclass =

work page doi:10.48550/arxiv.2207.13751
[47]

The Arcade Learning Environment: An Evaluation Platform for General Agents

Bellemare, Marc G. and Naddaf, Yavar and Veness, Joel and Bowling, Michael , year =. The. Journal of Artificial Intelligence Research , volume =. doi:10.1613/jair.3912 , urldate =. arxiv , keywords =:1207.4708 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1613/jair.3912
[48]

Bengio, Emmanuel and Jain, Moksh and Korablyov, Maksym and Precup, Doina and Bengio, Yoshua , year =. Flow. doi:10.48550/arXiv.2106.04399 , urldate =. arxiv , keywords =:2106.04399 , primaryclass =

work page doi:10.48550/arxiv.2106.04399
[49]

Knowledge Distillation:

Beyer, Lucas and Zhai, Xiaohua and Royer, Am. Knowledge Distillation:. 2022 , month = jun, number =. doi:10.48550/arXiv.2106.05237 , urldate =. arxiv , keywords =:2106.05237 , primaryclass =

work page doi:10.48550/arxiv.2106.05237 2022
[50]

Evolution

Bhatia, Jagdeep Singh and Jackson, Holly and Tian, Yunsheng and Xu, Jie and Matusik, Wojciech , year =. Evolution. doi:10.48550/arXiv.2201.09863 , urldate =. arxiv , keywords =:2201.09863 , primaryclass =

work page doi:10.48550/arxiv.2201.09863
[51]

and Nikolaidis, Stefanos , year =

Bhatt, Varun and Tjanaka, Bryon and Fontaine, Matthew C. and Nikolaidis, Stefanos , year =. Deep. doi:10.48550/arXiv.2206.04199 , urldate =. arxiv , keywords =:2206.04199 , primaryclass =

work page doi:10.48550/arxiv.2206.04199
[52]

Align your latents: High-resolution video synthesis with latent diffusion models, 2023 b

Blattmann, Andreas and Rombach, Robin and Ling, Huan and Dockhorn, Tim and Kim, Seung Wook and Fidler, Sanja and Kreis, Karsten , year =. Align Your. doi:10.48550/arXiv.2304.08818 , urldate =. arxiv , keywords =:2304.08818 , primaryclass =

work page doi:10.48550/arxiv.2304.08818
[53]

2023 , month = apr, number =

Emergent Autonomous Scientific Research Capabilities of Large Language Models , author =. 2023 , month = apr, number =. doi:10.48550/arXiv.2304.05332 , urldate =. arxiv , keywords =:2304.05332 , primaryclass =

work page doi:10.48550/arxiv.2304.05332 2023
[54]

Jumanji: A

Bonnet, Cl. Jumanji: A. 2023 , month = jun, number =. doi:10.48550/arXiv.2306.09884 , urldate =. arxiv , keywords =:2306.09884 , primaryclass =

work page doi:10.48550/arxiv.2306.09884 2023
[55]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Bubeck, S. Sparks of. 2023 , month = mar, number =. doi:10.48550/arXiv.2303.12712 , urldate =. arxiv , keywords =:2303.12712 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.12712 2023
[56]

Exploration by Random Network Distillation

Burda, Yuri and Edwards, Harrison and Storkey, Amos and Klimov, Oleg , year =. Exploration by. doi:10.48550/arXiv.1810.12894 , urldate =. arxiv , keywords =:1810.12894 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1810.12894
[57]

doi:10.48550/arXiv.2210.04932 , urldate =

Byravan, Arunkumar and Humplik, Jan and Hasenclever, Leonard and Brussee, Arthur and Nori, Francesco and Haarnoja, Tuomas and Moran, Ben and Bohez, Steven and Sadeghi, Fereshteh and Vujatovic, Bojan and Heess, Nicolas , year =. doi:10.48550/arXiv.2210.04932 , urldate =. arxiv , keywords =:2210.04932 , primaryclass =

work page doi:10.48550/arxiv.2210.04932
[58]

Cai, Tianle and Wang, Xuezhi and Ma, Tengyu and Chen, Xinyun and Zhou, Denny , year =. Large. doi:10.48550/arXiv.2305.17126 , urldate =. arxiv , keywords =:2305.17126 , primaryclass =

work page doi:10.48550/arxiv.2305.17126
[59]

Caluwaerts, Ken and Iscen, Atil and Kew, J. Chase and Yu, Wenhao and Zhang, Tingnan and Freeman, Daniel and Lee, Kuang-Huei and Lee, Lisa and Saliceti, Stefano and Zhuang, Vincent and Batchelor, Nathan and Bohez, Steven and Casarini, Federico and Chen, Jose Enrique and Cortes, Omar and Coumans, Erwin and Dostmohamed, Adil and. Barkour:. 2023 , month = may...

work page doi:10.48550/arxiv.2305.14654 2023
[60]

Canaan, Rodrigo and Togelius, Julian and Nealen, Andy and Menzel, Stefan , year =. Diverse. doi:10.48550/arXiv.1907.03840 , urldate =. arxiv , keywords =:1907.03840 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1907.03840 1907
[61]

doi:10.48550/arXiv.2301.09632 , urldate =

Cao, Ang and Johnson, Justin , year =. doi:10.48550/arXiv.2301.09632 , urldate =. arxiv , keywords =:2301.09632 , primaryclass =

work page doi:10.48550/arxiv.2301.09632
[62]

Grounding

Carta, Thomas and Romac, Cl. Grounding. 2023 , month = feb, number =. doi:10.48550/arXiv.2302.02662 , urldate =. arxiv , keywords =:2302.02662 , primaryclass =

work page doi:10.48550/arxiv.2302.02662 2023
[63]

Reinforcement

Catt, Elliot and Hutter, Marcus and Veness, Joel , year =. Reinforcement. doi:10.48550/arXiv.2109.15147 , urldate =. arxiv , keywords =:2109.15147 , primaryclass =

work page doi:10.48550/arxiv.2109.15147
[64]

Persistent

Chai, Lucy and Tucker, Richard and Li, Zhengqi and Isola, Phillip and Snavely, Noah , year =. Persistent. doi:10.48550/arXiv.2303.13515 , urldate =. arxiv , keywords =:2303.13515 , primaryclass =

work page doi:10.48550/arxiv.2303.13515
[65]

Chan, Bert Wang-Chak , year =. Lenia -. Complex Systems , volume =. doi:10.25088/ComplexSystems.28.3.251 , urldate =. arxiv , keywords =:1812.05433 , primaryclass =

work page doi:10.25088/complexsystems.28.3.251
[66]

Lenia and

Chan, Bert Wang-Chak , year =. Lenia and. The 2020. doi:10.1162/isal_a_00297 , urldate =. arxiv , keywords =:2005.03742 , primaryclass =

work page doi:10.1162/isal_a_00297 2020
[67]

Chan, Stephanie C. Y. and Lampinen, Andrew K. and Richemond, Pierre H. and Hill, Felix , year =. Zipfian Environments for. doi:10.48550/arXiv.2203.08222 , urldate =. arxiv , keywords =:2203.08222 , primaryclass =

work page doi:10.48550/arxiv.2203.08222
[68]

Harms from

Chan, Alan and Salganik, Rebecca and Markelius, Alva and Pang, Chris and Rajkumar, Nitarshan and Krasheninnikov, Dmitrii and Langosco, Lauro and He, Zhonghao and Duan, Yawen and Carroll, Micah and Lin, Michelle and Mayhew, Alex and Collins, Katherine and Molamohammadi, Maryam and Burden, John and Zhao, Wanru and Rismani, Shalaleh and Voudouris, Konstantin...

work page doi:10.48550/arxiv.2302.10329
[69]

Chan, Bert Wang-Chak , year =. Towards. doi:10.48550/arXiv.2304.05639 , urldate =. arxiv , keywords =:2304.05639 , primaryclass =

work page doi:10.48550/arxiv.2304.05639
[70]

Learning

Chang, Matthew and Gupta, Arjun and Gupta, Saurabh , year =. Learning. doi:10.48550/arXiv.2204.12458 , urldate =. arxiv , keywords =:2204.12458 , primaryclass =

work page doi:10.48550/arxiv.2204.12458
[71]

, year =

Chang, Huiwen and Zhang, Han and Jiang, Lu and Liu, Ce and Freeman, William T. , year =. doi:10.48550/arXiv.2202.04200 , urldate =. arxiv , keywords =:2202.04200 , primaryclass =

work page doi:10.48550/arxiv.2202.04200
[72]

Improved Baselines with Momentum Contrastive Learning

Chen, Xinlei and Fan, Haoqi and Girshick, Ross and He, Kaiming , year =. Improved. doi:10.48550/arXiv.2003.04297 , urldate =. arxiv , keywords =:2003.04297 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2003.04297 2003
[73]

Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey , year =. A. doi:10.48550/arXiv.2002.05709 , urldate =. arxiv , keywords =:2002.05709 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2002.05709 2002
[74]

Chen, Xinlei and Xie, Saining and He, Kaiming , year =. An. doi:10.48550/arXiv.2104.02057 , urldate =. arxiv , keywords =:2104.02057 , primaryclass =

work page doi:10.48550/arxiv.2104.02057
[75]

Chen, Ting and Zhang, Ruixiang and Hinton, Geoffrey , year =. Analog. doi:10.48550/arXiv.2208.04202 , urldate =. arxiv , keywords =:2208.04202 , primaryclass =

work page doi:10.48550/arxiv.2208.04202
[76]

Dohan and David R

Angelica Chen and David M. Dohan and David R. So , title =. CoRR , volume =. 2023 , url =. doi:10.48550/arXiv.2302.14838 , eprinttype =. 2302.14838 , timestamp =

work page doi:10.48550/arxiv.2302.14838 2023
[77]

doi:10.48550/arXiv.2302.06671 , urldate =

Chen, Zoey and Kiami, Sho and Gupta, Abhishek and Kumar, Vikash , year =. doi:10.48550/arXiv.2302.06671 , urldate =. arxiv , keywords =:2302.06671 , primaryclass =

work page doi:10.48550/arxiv.2302.06671
[78]

doi:10.48550/arXiv.2302.01330 , urldate =

Chen, Zhaoxi and Wang, Guangcong and Liu, Ziwei , year =. doi:10.48550/arXiv.2302.01330 , urldate =. arxiv , keywords =:2302.01330 , primaryclass =

work page doi:10.48550/arxiv.2302.01330
[79]

Teaching Large Language Models to Self-Debug

Chen, Xinyun and Lin, Maxwell and Sch. Teaching. 2023 , month = apr, number =. doi:10.48550/arXiv.2304.05128 , urldate =. arxiv , keywords =:2304.05128 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.05128 2023
[80]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Chi, Cheng and Feng, Siyuan and Du, Yilun and Xu, Zhenjia and Cousineau, Eric and Burchfiel, Benjamin and Song, Shuran , year =. Diffusion. doi:10.48550/arXiv.2303.04137 , urldate =. arxiv , keywords =:2303.04137 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.04137
[81]

Deep reinforcement learning from human preferences

Deep Reinforcement Learning from Human Preferences , author =. 2023 , month = feb, number =. doi:10.48550/arXiv.1706.03741 , urldate =. arxiv , keywords =:1706.03741 , primaryclass =

work page internal anchor Pith review doi:10.48550/arxiv.1706.03741 2023

Showing first 80 references.