pith. sign in

hub Mixed citations

TextGrad: Automatic "Differentiation" via Text

Mixed citation behavior. Most common role is background (62%).

67 Pith papers citing it
Background 62% of classified citations
abstract

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

hub tools

citation-role summary

background 5 method 2 baseline 1

citation-polarity summary

clear filters

representative citing papers

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

Meta-Harness: End-to-End Optimization of Model Harnesses

cs.AI · 2026-03-30 · unverdicted · novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.

Gen-n-Val: Agentic Image Data Generation and Validation

cs.CV · 2025-06-05 · conditional · novelty 7.0

Gen-n-Val uses LLM and VLLM agents with Layer Diffusion and TextGrad to generate and validate synthetic instance data, cutting invalid samples from 50% to 7% and improving rare-class performance on LVIS and COCO benchmarks.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Automating the Design of Embodied AgentArchitectures

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Automated architecture search for embodied agents produces directional success-rate gains on vision-language and manipulation tasks while exposing limits from simulation noise and incomplete credit assignment.

MUSE: A Unified Agentic Harness for MLLMs

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

MUSE is a unified agentic harness that improves off-the-shelf MLLMs on visual spatial planning, perception, multimodal reasoning, and fine-grained discrimination benchmarks through structured execution modules and verifier-guided repair without model retraining.

citing papers explorer

Showing 10 of 10 citing papers after filters.

  • When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks? cs.LG · 2026-05-29 · unverdicted · none · ref 22 · 2 links · internal anchor

    PromptPO shows LLMs can act as black-box policy optimizers for sequential RL when leveraging prior knowledge, matching baselines in exploration and robotics but underperforming in MuJoCo.

  • Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 67 · 2 links · internal anchor

    Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

  • RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design cs.LG · 2026-04-19 · unverdicted · none · ref 17 · internal anchor

    RosettaSearch applies LLM-driven multi-objective search at inference time to improve backbone-conditioned protein sequences, recovering designs with 18-68% better structural fidelity and 2.5x higher success rates than single-pass models like LigandMPNN.

  • Teaching LLMs to Recommend and Defer in Underrepresented Epilepsy Care cs.LG · 2026-06-30 · unverdicted · none · ref 4 · internal anchor

    MANANA prompt-learning framework improves LLM prescription prediction accuracy by 4-8 points on Ugandan epilepsy cohorts and supports selective deferral at 95-99% precision on high-confidence cases.

  • Harnesses for Inference-Time Alignment over Execution Trajectories cs.LG · 2026-05-15 · unverdicted · none · ref 35 · internal anchor

    Partial harnesses for LLM agents, specifying only initial execution steps, achieve higher pass rates than fully decomposed workflows, as analyzed through trajectory alignment and validated in synthetic and terminal benchmarks.

  • AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems cs.LG · 2026-05-09 · unverdicted · none · ref 7 · internal anchor

    AgentSlimming compresses graph-structured multi-agent systems by estimating agent importance and removing or replacing low-value agents, cutting token costs by up to 78.9% with negligible performance loss.

  • FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration cs.LG · 2026-05-08 · unverdicted · none · ref 32 · internal anchor

    FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

  • Reflective Context Learning: Studying the Optimization Primitives of Context Space cs.LG · 2026-04-03 · unverdicted · none · ref 13 · internal anchor

    Reflective Context Learning unifies context optimization for agents by recasting prior methods as instances of a shared learning problem and extending them with classical primitives such as batching, failure replay, and grouped rollouts, yielding improvements on AppWorld, BrowseComp+, and RewardBene

  • Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search cs.LG · 2026-03-02 · unverdicted · none · ref 34 · internal anchor

    Gome reaches 35.1% any-medal rate on MLE-Bench by mapping reasoning to gradient-based updates, outperforming tree search once models are sufficiently capable.

  • Supplement Generation Training for Enhancing Agentic Task Performance cs.LG · 2026-04-22 · unverdicted · none · ref 39 · internal anchor

    SGT trains a lightweight model to generate task-specific supplemental text that improves performance of a larger frozen LLM on agentic tasks without modifying the large model.