pith. sign in

hub Mixed citations

TextGrad: Automatic "Differentiation" via Text

Mixed citation behavior. Most common role is background (62%).

48 Pith papers citing it
Background 62% of classified citations
abstract

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

hub tools

citation-role summary

background 5 method 2 baseline 1

citation-polarity summary

clear filters

representative citing papers

Harnessing Agentic Evolution

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

Meta-Harness: End-to-End Optimization of Model Harnesses

cs.AI · 2026-03-30 · unverdicted · novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across held-out models.

Gen-n-Val: Agentic Image Data Generation and Validation

cs.CV · 2025-06-05 · conditional · novelty 7.0

Gen-n-Val uses LLM and VLLM agents with Layer Diffusion and TextGrad to generate and validate synthetic instance data, cutting invalid samples from 50% to 7% and improving rare-class performance on LVIS and COCO benchmarks.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

Ratchet: A Minimal Hygiene Recipe for Self-Evolving LLM Agents

cs.AI · 2026-05-21 · conditional · novelty 6.0

Ratchet provides a minimal hygiene recipe for self-managing skill libraries in frozen LLM agents, delivering +0.328 rolling-mean pass@1 gain on MBPP+ hard-100 and +0.22 peak lift on SWE-bench Verified.

Harnesses for Inference-Time Alignment over Execution Trajectories

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Partial harnesses for LLM agents, specifying only initial execution steps, achieve higher pass rates than fully decomposed workflows, as analyzed through trajectory alignment and validated in synthetic and terminal benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement cs.AI · 2026-05-11 · unverdicted · none · ref 23 · internal anchor

    PIVOT refines LLM agent trajectories through plan-inspect-evolve-verify stages using environment feedback, yielding up to 94% relative gains in constraint satisfaction and 3-5x token efficiency over prior refinement methods.

  • How Far Are Video Models from True Multimodal Reasoning? cs.CV · 2026-04-21 · unverdicted · none · ref 89 · internal anchor

    Current video models succeed on basic understanding but achieve under 25% success on logically grounded generation and near 0% on interactive generation, exposing gaps in multimodal reasoning.