hub Mixed citations

TextGrad: Automatic "Differentiation" via Text

Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin · 2024 · cs.CL · arXiv 2406.07496

Mixed citation behavior. Most common role is background (62%).

67 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 67 citing papers arXiv PDF

abstract

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 2 baseline 1

citation-polarity summary

background 5 use method 2 baseline 1

representative citing papers

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

cs.RO · 2026-06-03 · unverdicted · novelty 7.0

VASO is a verification-guided self-evolution framework for LLM robot skill contracts that reaches 97.2% formal-specification compliance on Jackal and quadcopter tasks using under 100 samples.

When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

cs.LG · 2026-05-29 · unverdicted · novelty 7.0 · 2 refs

PromptPO shows LLMs can act as black-box policy optimizers for sequential RL when leveraging prior knowledge, matching baselines in exploration and robotics but underperforming in MuJoCo.

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Life-Harness evolves reusable interventions from training trajectories to enhance frozen LLM agents on unseen tasks across seven deterministic environments, yielding 88.5% average relative improvement in 116 of 126 model-environment settings.

Diagnosis Is Not Prescription: Linguistic Co-Adaptation Explains Patching Hazards in LLM Pipelines

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

Causal diagnosis identifies the routing module as bottleneck in LLM agents but prompt patching there degrades results due to linguistic co-adaptation, while upstream patching improves them.

TextReg: Mitigating Prompt Distributional Overfitting via Regularized Text-Space Optimization

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

TextReg mitigates prompt distributional overfitting via regularized text-space optimization, reporting up to +16.5% OOD accuracy gains over prior methods on reasoning benchmarks.

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

ReElicit uses LLMs to elicit adaptive feature embeddings for Gaussian process Bayesian optimization of system prompts under aggregate-only feedback, outperforming baselines across ten tasks with a 30-evaluation budget.

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

More Is Not Always Better: Cross-Component Interference in LLM Agent Scaffolding

cs.AI · 2026-05-07 · conditional · novelty 7.0

Full factorial testing of five LLM agent components reveals that the complete 'All-In' combination is consistently outperformed by smaller subsets due to cross-component interference, with optimal subsets being task- and scale-dependent.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

cs.SE · 2026-05-04 · unverdicted · novelty 7.0

TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.

RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design

cs.LG · 2026-04-19 · unverdicted · novelty 7.0

RosettaSearch applies LLM-driven multi-objective search at inference time to improve backbone-conditioned protein sequences, recovering designs with 18-68% better structural fidelity and 2.5x higher success rates than single-pass models like LigandMPNN.

Meta-Harness: End-to-End Optimization of Model Harnesses

cs.AI · 2026-03-30 · unverdicted · novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.

FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification

cs.AR · 2026-03-06 · unverdicted · novelty 7.0

FVRuleLearner introduces an Operator Reasoning Tree to learn operator-specific rules that improve natural-language to SystemVerilog assertion generation, raising syntax correctness by 3.95% and functional correctness by 31.17% over baselines.

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

cs.AI · 2026-01-22 · conditional · novelty 7.0

DeepVerifier enables self-evolving deep research agents via rubric-guided verification at test time, delivering 8-11% accuracy gains on GAIA and XBench-DeepSearch subsets.

Gen-n-Val: Agentic Image Data Generation and Validation

cs.CV · 2025-06-05 · conditional · novelty 7.0

Gen-n-Val uses LLM and VLLM agents with Layer Diffusion and TextGrad to generate and validate synthetic instance data, cutting invalid samples from 50% to 7% and improving rare-class performance on LVIS and COCO benchmarks.

Human-LLM Compound System for Scientific Ideation through Facet Recombination and Novelty Evaluation

cs.HC · 2024-09-23 · unverdicted · novelty 7.0

Scideator enables facet-based scientific ideation through LLM-driven extraction, human-guided recombination, analogous retrieval, and facet-grounded novelty verification, showing significantly higher creativity support than a baseline LLM in a user study with CS researchers.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Teaching LLMs to Recommend and Defer in Underrepresented Epilepsy Care

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

MANANA prompt-learning framework improves LLM prescription prediction accuracy by 4-8 points on Ugandan epilepsy cohorts and supports selective deferral at 95-99% precision on high-confidence cases.

Automating the Design of Embodied AgentArchitectures

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Automated architecture search for embodied agents produces directional success-rate gains on vision-language and manipulation tasks while exposing limits from simulation noise and incomplete credit assignment.

Towards Spec Learning: Inference-Time Alignment from Preference Pairs

cs.CL · 2026-06-22 · unverdicted · novelty 6.0

Proposes compiling preference pairs into readable natural-language specifications for inference-time LLM alignment, claiming outperformance over DPO on dense-preference domains.

VTOS: Learning to Orchestrate Vision Tools by Co-Searching Solutions and Observers

cs.CV · 2026-06-17 · unverdicted · novelty 6.0

VTOS jointly searches solution and observer programs to adaptively orchestrate vision tools, outperforming static pipelines on dense object counting and zero-shot plant disease segmentation.

EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

cs.AI · 2026-06-02 · unverdicted · novelty 6.0

EvoDS adds autonomous skill acquisition via synthesis-validation-reuse and adaptive context compression via learned control within a two-stage multi-agent RL scheme, claiming 28.9% average gains over prior agents on four benchmarks plus elimination of out-of-token failures.

MUSE: A Unified Agentic Harness for MLLMs

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

MUSE is a unified agentic harness that improves off-the-shelf MLLMs on visual spatial planning, perception, multimodal reasoning, and fine-grained discrimination benchmarks through structured execution modules and verifier-guided repair without model retraining.

citing papers explorer

Showing 18 of 18 citing papers after filters.

Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents cs.AI · 2026-05-21 · unverdicted · none · ref 36 · internal anchor
Life-Harness evolves reusable interventions from training trajectories to enhance frozen LLM agents on unseen tasks across seven deterministic environments, yielding 88.5% average relative improvement in 116 of 126 model-environment settings.
Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts cs.AI · 2026-05-18 · unverdicted · none · ref 11 · internal anchor
ReElicit uses LLMs to elicit adaptive feature embeddings for Gaussian process Bayesian optimization of system prompts under aggregate-only feedback, outperforming baselines across ten tasks with a 30-evaluation budget.
Harnessing Agentic Evolution cs.AI · 2026-05-13 · unverdicted · none · ref 38 · internal anchor
AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
Meta-Harness: End-to-End Optimization of Model Harnesses cs.AI · 2026-03-30 · unverdicted · none · ref 55 · internal anchor
Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.
EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management cs.AI · 2026-06-02 · unverdicted · none · ref 74 · internal anchor
EvoDS adds autonomous skill acquisition via synthesis-validation-reuse and adaptive context compression via learned control within a two-stage multi-agent RL scheme, claiming 28.9% average gains over prior agents on four benchmarks plus elimination of out-of-token failures.
FALAT: Tracing Failures in LLM Agent Trajectories via Dependency-Guided Search cs.AI · 2026-05-30 · unverdicted · none · ref 37 · internal anchor
FALAT improves failure attribution in LLM agent trajectories via dependency-guided search, achieving 46.0% step-level accuracy on algorithm-generated and 29.1% on hand-crafted trajectories in the Who&When benchmark.
SkillOpt: Executive Strategy for Self-Evolving Agent Skills cs.AI · 2026-05-22 · unverdicted · none · ref 41 · 2 links · internal anchor
SkillOpt introduces a controllable text-space optimizer that evolves agent skills via add/delete/replace edits accepted only on strict held-out validation improvement, reporting consistent gains across 52 model-benchmark-harness combinations.
PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement cs.AI · 2026-05-11 · unverdicted · none · ref 23 · internal anchor
PIVOT refines LLM agent trajectories through plan-inspect-evolve-verify stages using environment feedback, yielding up to 94% relative gains in constraint satisfaction and 3-5x token efficiency over prior refinement methods.
Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization cs.AI · 2026-04-22 · unverdicted · none · ref 3 · internal anchor
TPGO represents multi-agent systems as graphs of textual parameters and applies group relative optimization to enable self-improvement from execution history.
ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis cs.AI · 2026-04-20 · unverdicted · none · ref 9 · internal anchor
ContraPrompt extracts optimization rules from dyadic differences in reasoning traces on identical inputs and organizes them into input-aware decision trees, outperforming GEPA on four benchmarks with gains up to 8.29 pp.
SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization cs.AI · 2026-04-19 · unverdicted · none · ref 114 · internal anchor
SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees cs.AI · 2026-04-13 · unverdicted · none · ref 12 · internal anchor
POES frames prompt evaluation as online adaptive testing and uses a provably submodular objective to pick informative examples, delivering 6.2% higher average accuracy and 35-60% token savings versus naive full-set scoring.
Pioneer Agent: Continual Improvement of Small Language Models in Production cs.AI · 2026-04-10 · unverdicted · none · ref 103 · internal anchor
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory cs.AI · 2025-11-26 · unverdicted · none · ref 44 · internal anchor
ViLoMem is a dual-stream grow-and-refine memory system that separates visual and logical error patterns in MLLMs to improve pass@1 accuracy and reduce repeated mistakes across six multimodal benchmarks.
Self-Evolving World Models for LLM Agent Planning cs.AI · 2026-06-29 · unverdicted · none · ref 49 · internal anchor
WorldEvolver uses episodic memory, semantic memory, and selective foresight to self-evolve world models at test time, achieving top prediction accuracy and agent success on ALFWorld and ScienceWorld benchmarks.
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast cs.AI · 2026-05-15 · unverdicted · none · ref 24 · internal anchor
FORGE is a staged population protocol that evolves prompt-injected memory (Rules, Examples, or Mixed) for ReAct agents via reflection and broadcast, yielding 1.7-7.7× gains over zero-shot and 29-72% over Reflexion on CybORG CAGE-2.
SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation cs.AI · 2026-03-23 · unverdicted · none · ref 56 · internal anchor
SOLAR introduces a self-optimizing agent using meta-learning on model weights and RL-driven strategy discovery for lifelong adaptation in LLMs, claiming superior performance on reasoning tasks across domains.
RIZZ: Routing Interactions to Near Zero-Interference Zones for Continual Adaptation of Black-Box Agents cs.AI · 2026-06-02 · unverdicted · none · ref 45 · internal anchor
RIZZ is a continual adaptation framework for black-box LLM agents that uses dynamically spawned memory branches, context-aware routing, verifier-gated updates, and prompt compilation to control interference across nonstationary inputs.

TextGrad: Automatic "Differentiation" via Text

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer