hub Mixed citations

EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan · 2023 · cs.CL · arXiv 2309.08532

Mixed citation behavior. Most common role is background (43%).

28 Pith papers citing it

Background 43% of classified citations

open full Pith review browse 28 citing papers arXiv PDF

abstract

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 31 datasets covering language understanding, generation tasks, as well as BIG-Bench Hard (BBH) tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation (e.g., up to 25% on BBH). Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 dataset 1 method 1

citation-polarity summary

background 3 unclear 2 use dataset 1 use method 1

representative citing papers

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

cs.CL · 2023-10-05 · conditional · novelty 8.0

DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

TextReg mitigates prompt distributional overfitting via regularized text-space optimization, reporting up to +16.5% OOD accuracy gains over prior methods on reasoning benchmarks.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

cs.SE · 2026-05-04 · unverdicted · novelty 7.0

TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.

Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation

cs.CR · 2026-04-20 · unverdicted · novelty 7.0

DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to induce soft-failure responses in RAG systems, achieving over 79% soft attack success rate with under 15% hard failures and high stealth across models and datasets.

Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA and fact-checking datasets.

PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.

RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition

cs.CV · 2025-12-06 · conditional · novelty 7.0

RAG-HAR combines retrieval-augmented generation with LLMs to deliver state-of-the-art human activity recognition across six benchmarks without any model training or fine-tuning.

Large Language Models as Optimizers

cs.LG · 2023-09-07 · unverdicted · novelty 7.0

Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.

Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

cs.CL · 2026-05-15 · conditional · novelty 6.0

NCCE reframes context engineering as instance-level recommendation via bootstrapped anchor contexts and a co-evolving neural collaborative filtering router that assigns specialized contexts per input.

OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation

cs.AI · 2026-05-14 · conditional · novelty 6.0 · 2 refs

OpenDeepThink uses Bradley-Terry aggregation of LLM pairwise judgments to rank and evolve parallel reasoning traces, improving Gemini 3.1 Pro Codeforces Elo by 405 points over eight rounds.

EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and outperforms static baselines on GAIA, HLE, and DeepResearcher.

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.

AgentGA: Evolving Code Solutions in Agent-Seed Space

cs.AI · 2026-04-16 · unverdicted · novelty 6.0 · 2 refs

AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.

Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

MemJack achieves 71.48% attack success rate on unmodified COCO val2017 images against Qwen3-VL-Plus by coordinating agents to map visual entities to malicious intents, apply multi-angle camouflage, and filter refusals via iterative nullspace projection while transferring strategies through a shared

AI-Driven Research for Databases

cs.DB · 2026-04-08 · unverdicted · novelty 6.0

Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.

Diversifying Toxicity Search in Large Language Models Through Speciation

cs.NE · 2026-01-28 · unverdicted · novelty 6.0

ToxSearch-S applies unsupervised speciation to evolutionary prompt search, maintaining capacity-limited species with exemplar leaders and species-aware selection to achieve higher peak toxicity and broader semantic coverage than standard methods.

FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data

cs.AI · 2025-10-29 · unverdicted · novelty 6.0

FELA deploys specialized LLM agents in an evolutionary framework to generate, validate, and refine explainable features from heterogeneous industrial event logs, improving downstream model performance.

LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

cs.LG · 2025-03-18 · unverdicted · novelty 6.0

LLM-FE is a framework that treats feature engineering as LLM-driven program search with data feedback, reporting consistent gains over baselines on classification and regression tabular tasks.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization

cs.AI · 2026-05-19 · conditional · novelty 5.0

MOCHA combines Chebyshev scalarization with exponential annealing to optimize LLM agent skills across performance and platform constraints, improving mean correctness by 7.5% over baselines on six tasks while finding more Pareto-optimal variants.

Prompt Optimization for LLM Code Generation via Reinforcement Learning

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

cs.AI · 2026-05-15 · unverdicted · novelty 5.0

FORGE is a staged population protocol that evolves prompt-injected memory (Rules, Examples, or Mixed) for ReAct agents via reflection and broadcast, yielding 1.7-7.7× gains over zero-shot and 29-72% over Reflexion on CybORG CAGE-2.

GEAR: Genetic AutoResearch for Agentic Code Evolution

cs.NE · 2026-05-08 · unverdicted · novelty 5.0

GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.

citing papers explorer

Showing 28 of 28 citing papers.

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines cs.CL · 2023-10-05 · conditional · none · ref 18 · internal anchor
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization cs.CL · 2026-05-20 · unverdicted · none · ref 23 · internal anchor
TextReg mitigates prompt distributional overfitting via regularized text-space optimization, reporting up to +16.5% OOD accuracy gains over prior methods on reasoning benchmarks.
Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 21 · 2 links · internal anchor
Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments cs.SE · 2026-05-04 · unverdicted · none · ref 9 · internal anchor
TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation cs.CR · 2026-04-20 · unverdicted · none · ref 1 · internal anchor
DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to induce soft-failure responses in RAG systems, achieving over 79% soft attack success rate with under 15% hard failures and high stealth across models and datasets.
Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS cs.CL · 2026-04-12 · unverdicted · none · ref 2 · internal anchor
Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA and fact-checking datasets.
PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space cs.LG · 2026-04-03 · unverdicted · none · ref 14 · internal anchor
PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.
RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition cs.CV · 2025-12-06 · conditional · none · ref 16 · internal anchor
RAG-HAR combines retrieval-augmented generation with LLMs to deliver state-of-the-art human activity recognition across six benchmarks without any model training or fine-tuning.
Large Language Models as Optimizers cs.LG · 2023-09-07 · unverdicted · none · ref 11 · internal anchor
Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-designed baselines.
Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering cs.CL · 2026-05-15 · conditional · none · ref 9 · internal anchor
NCCE reframes context engineering as instance-level recommendation via bootstrapped anchor contexts and a co-evolving neural collaborative filtering router that assigns specialized contexts per input.
OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation cs.AI · 2026-05-14 · conditional · none · ref 7 · 2 links · internal anchor
OpenDeepThink uses Bradley-Terry aggregation of LLM pairwise judgments to rank and evolve parallel reasoning traces, improving Gemini 3.1 Pro Codeforces Elo by 405 points over eight rounds.
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems cs.AI · 2026-05-09 · unverdicted · none · ref 8 · internal anchor
EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and outperforms static baselines on GAIA, HLE, and DeepResearcher.
FitText: Evolving Agent Tool Ecologies via Memetic Retrieval cs.AI · 2026-05-04 · unverdicted · none · ref 11 · internal anchor
FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.
AgentGA: Evolving Code Solutions in Agent-Seed Space cs.AI · 2026-04-16 · unverdicted · none · ref 8 · 2 links · internal anchor
AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs cs.AI · 2026-04-14 · unverdicted · none · ref 12 · internal anchor
MemJack achieves 71.48% attack success rate on unmodified COCO val2017 images against Qwen3-VL-Plus by coordinating agents to map visual entities to malicious intents, apply multi-angle camouflage, and filter refusals via iterative nullspace projection while transferring strategies through a shared
AI-Driven Research for Databases cs.DB · 2026-04-08 · unverdicted · none · ref 29 · internal anchor
Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.
Diversifying Toxicity Search in Large Language Models Through Speciation cs.NE · 2026-01-28 · unverdicted · none · ref 12 · internal anchor
ToxSearch-S applies unsupervised speciation to evolutionary prompt search, maintaining capacity-limited species with exemplar leaders and species-aware selection to achieve higher peak toxicity and broader semantic coverage than standard methods.
FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data cs.AI · 2025-10-29 · unverdicted · none · ref 41 · internal anchor
FELA deploys specialized LLM agents in an evolutionary framework to generate, validate, and refine explainable features from heterogeneous industrial event logs, improving downstream model performance.
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers cs.LG · 2025-03-18 · unverdicted · none · ref 15 · internal anchor
LLM-FE is a framework that treats feature engineering as LLM-driven program search with data feedback, reporting consistent gains over baselines on classification and regression tabular tasks.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 192 · internal anchor
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization cs.AI · 2026-05-19 · conditional · none · ref 12 · internal anchor
MOCHA combines Chebyshev scalarization with exponential annealing to optimize LLM agent skills across performance and platform constraints, improving mean correctness by 7.5% over baselines on six tasks while finding more Pareto-optimal variants.
Prompt Optimization for LLM Code Generation via Reinforcement Learning cs.SE · 2026-05-18 · unverdicted · none · ref 15 · internal anchor
A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast cs.AI · 2026-05-15 · unverdicted · none · ref 6 · internal anchor
FORGE is a staged population protocol that evolves prompt-injected memory (Rules, Examples, or Mixed) for ReAct agents via reflection and broadcast, yielding 1.7-7.7× gains over zero-shot and 29-72% over Reflexion on CybORG CAGE-2.
GEAR: Genetic AutoResearch for Agentic Code Evolution cs.NE · 2026-05-08 · unverdicted · none · ref 7 · internal anchor
GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.
Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction cs.CL · 2026-05-05 · unverdicted · none · ref 41 · internal anchor
Small open-weight language models can self-optimize prompts for clinical named entity recognition in dental notes, reaching micro F1 of 0.864 after DPO on Qwen2.5-14B.
ORFS-agent: Tool-Using Agents for Chip Design Optimization cs.AI · 2025-06-10 · unverdicted · none · ref 16 · internal anchor
ORFS-agent uses LLM agents to tune parameters in chip design flows, improving geometric-mean wirelength, clock period, and co-optimization objectives by up to 2.7% over OR-AutoTuner with 40% fewer iterations on ASAP7 and SKY130HD benchmarks.
Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation cs.SE · 2026-04-23 · accept · none · ref 16 · internal anchor
Execution feedback in refinement loops improves 1-3B code generation performance far more than complex pipeline topologies discovered via evolutionary search on HumanEval and sanitized MBPP.
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems cs.AI · 2026-04-16 · unreviewed · ref 3 · internal anchor

EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer