{"paper":{"title":"Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution","license":"http://creativecommons.org/licenses/by/4.0/","headline":"An LLM can improve prompting by evolving both the task prompts and the mutation rules that generate them.","cross_cats":["cs.AI","cs.LG","cs.NE"],"primary_cat":"cs.CL","authors_text":"Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rockt\\\"aschel","submitted_at":"2023-09-28T19:01:07Z","abstract_excerpt":"Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that th"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the LLM can generate useful mutations and provide reliable fitness evaluations on a training set without systematic biases or errors that would derail the evolutionary process.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An LLM can improve prompting by evolving both the task prompts and the mutation rules that generate them.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"af8a4608b93492c9a639316a2a8363210f11448aa1c8d3fcc9d2985291fd12a3"},"source":{"id":"2309.16797","kind":"arxiv","version":1},"verdict":{"id":"7356eb50-875e-4a35-873b-bf88b81cd52d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T08:08:25.069041Z","strongest_claim":"Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.","one_line_summary":"Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the LLM can generate useful mutations and provide reliable fitness evaluations on a training set without systematic biases or errors that would derail the evolutionary process.","pith_extraction_headline":"An LLM can improve prompting by evolving both the task prompts and the mutation rules that generate them."},"references":{"count":296,"sample":[{"doi":"","year":2021,"title":"Show Your Work: Scratchpads for Intermediate Computation with Language Models","work_id":"a05b1e60-8e76-4f26-9bea-28927a5f8620","ref_index":1,"cited_arxiv_id":"2112.00114","is_internal_anchor":true},{"doi":"","year":1995,"title":"The Hitchhiker's Guide to the Galaxy , author=. 1995 , publisher=","work_id":"07683f0c-cb34-47d6-83ae-f5d0726ac43a","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"NeurIPS , year =","work_id":"387c2ec4-3205-43fa-9107-bd3febe774bc","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"The Eleventh International Conference on Learning Representations,","work_id":"399e38b9-d994-4207-a188-550020e608cf","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"gradient descent","work_id":"8f5910e5-bea1-4761-87ec-05f692dd6f04","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":296,"snapshot_sha256":"db2d6045761ca65dfd1b0fc1282cac50449235add2e3e1a7df7c663909c2df89","internal_anchors":67},"formal_canon":{"evidence_count":2,"snapshot_sha256":"34973332bdf97d1893a8162d3a15016cc6de881333fbca73ea85f05de7f36b4e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}