ADOPT: Adaptive Dependency-Guided Joint Prompt Optimization for Multi-Step LLM Pipelines

Deyang Li; Minjun Zhao; Ruifeng Shi; Shuai Zhang; Xinyu Zhang

arxiv: 2512.24933 · v2 · submitted 2025-12-31 · 💻 cs.CL · cs.LG

ADOPT: Adaptive Dependency-Guided Joint Prompt Optimization for Multi-Step LLM Pipelines

Minjun Zhao , Xinyu Zhang , Shuai Zhang , Deyang Li , Ruifeng Shi This is my paper

Pith reviewed 2026-05-16 18:42 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords prompt optimizationmulti-step LLM pipelinestextual gradientsdependency analysisShapley value allocationjoint prompt tuningLLM chaining

0 comments

The pith

ADOPT decomposes final-task errors into per-step textual gradients using dependency analysis for joint prompt optimization in multi-step LLM pipelines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-step LLM pipelines can be jointly optimized by first analyzing how each step depends on the final output, then building a single global textual gradient from task errors and breaking it down into targeted local signals for each prompt. This decomposition supplies the missing step-level supervision while a Shapley-based allocator directs updates toward the steps that most influence the outcome. The method also separates gradient estimation from the actual prompt changes so that any single-step optimizer can be plugged in. A reader would care because chained LLM applications are common yet hard to tune reliably; better joint optimization could improve accuracy without extra labels or model retraining.

Core claim

ADOPT analyzes the dependency between each LLM step and the final output, constructs a global textual gradient from final-task errors, and decomposes it into step-level local textual gradients, providing more precise optimization signals for local prompt updates. It further decouples signal estimation from prompt updating, enabling flexible integration of single-prompt optimizers, and uses a Shapley-based strategy to adaptively allocate optimization resources to high-impact steps.

What carries the argument

Dependency-guided decomposition of a global textual gradient into local step-level signals, paired with Shapley-value adaptive resource allocation.

If this is right

Multi-step pipelines receive end-to-end prompt updates without needing explicit labels at intermediate steps.
Prompt changes are tailored to each step's measured contribution to the final result.
Existing single-prompt optimizers can be reused inside the joint framework without modification.
Optimization effort concentrates automatically on high-impact steps, improving sample efficiency.
The approach works across structurally different pipelines and real-world datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar decomposition of global feedback could extend to tool-use chains or multi-agent LLM systems where local rewards are also absent.
If dependency estimates prove noisy in practice, the Shapley allocator may still protect overall performance by down-weighting low-impact steps.
The decoupling of gradient computation from prompt editing opens a route for hybrid optimization that mixes gradient and non-gradient methods per step.
Users could monitor the allocated Shapley values over iterations to discover which pipeline stages are bottlenecks.

Load-bearing premise

Dependency analysis between steps and the final output can be performed accurately enough to turn the global error into useful, low-noise optimization directions for each prompt.

What would settle it

On a pipeline whose steps have weak or cyclic dependencies, ADOPT would show no gain over strong single-step baselines when the dependency analysis is replaced by uniform gradient allocation.

read the original abstract

Multi-step LLM pipelines can solve complex tasks, but jointly optimizing prompts across steps remains challenging due to missing step-level supervision and inter-step dependency. We propose ADOPT, an adaptive dependency-guided joint prompt optimization framework for multi-step LLM pipelines. ADOPT analyzes the dependency between each LLM step and the final output, constructs a global textual gradient from final-task errors, and decomposes it into step-level local textual gradients, providing more precise optimization signals for local prompt updates. It further decouples signal estimation from prompt updating, enabling flexible integration of single-prompt optimizers, and uses a Shapley-based strategy to adaptively allocate optimization resources to high-impact steps. Experiments on real-world datasets and structurally diverse pipelines demonstrate that ADOPT is effective and robust, consistently outperforming strong prompt optimization baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADOPT combines dependency analysis with Shapley allocation to jointly tune prompts across LLM steps, showing gains over baselines but without enough controls to credit the decomposition step specifically.

read the letter

ADOPT tries to fix prompt tuning for chains of LLMs by first mapping how each step affects the final output, then turning the end-task error into per-step update signals and using Shapley values to decide which steps get more optimization work. The framework keeps the signal estimation separate from the actual prompt changes so it can work with whatever single-step optimizer is already in use. That flexibility and the adaptive allocation are the concrete additions over prior single-prompt methods. The experiments cover real datasets and pipelines with different structures, and they report steady improvements against the baselines that were tested. Those results line up with the practical need for more reliable multi-step systems. The soft spot is the dependency step itself. The paper treats the analysis as accurate enough to produce useful local gradients, yet there is no ablation that disables or randomizes the dependency guidance while holding the Shapley allocation and base optimizer fixed. Without that isolation it is hard to tell how much of the reported lift comes from better signals versus simply allocating effort differently. The exact procedure for building the dependencies is also thin on detail, which makes it difficult to judge noise or bias in new pipelines. This work is aimed at people who already run chained LLM applications and want a drop-in way to improve prompt quality without retraining models. A reader focused on agent workflows or production pipelines would find the framework and the empirical comparisons useful. It has enough grounding and experimental coverage to merit a full referee process, even though the reviewers will probably press for the missing controls on the dependency component.

Referee Report

3 major / 1 minor

Summary. The paper proposes ADOPT, a framework for joint prompt optimization in multi-step LLM pipelines. It analyzes inter-step dependencies to construct a global textual gradient from final-task errors, decomposes this into per-step local textual gradients for more precise updates, decouples gradient estimation from prompt updating to allow integration with existing single-prompt optimizers, and applies a Shapley-value strategy to adaptively allocate optimization resources to high-impact steps. Experiments on real-world datasets and structurally diverse pipelines are reported to show consistent outperformance over strong prompt optimization baselines.

Significance. If the dependency-guided decomposition and adaptive allocation deliver meaningfully more precise per-step signals than baselines, ADOPT could advance prompt optimization for complex multi-step LLM systems by reducing the need for step-level supervision. The decoupling mechanism and Shapley allocation are potentially reusable contributions. However, the absence of detailed derivations, quantitative validation of the dependency analysis, and isolating ablations limits assessment of whether the reported gains are attributable to the proposed components rather than incidental factors.

major comments (3)

[Abstract, §3] Abstract and §3 (method description): The dependency analysis procedure used to decompose the global textual gradient into local per-step gradients is not described with sufficient algorithmic detail or pseudocode. Without this, it is impossible to verify how inter-step dependencies are quantified or whether the decomposition avoids substantial noise or bias, which is load-bearing for the central claim that ADOPT provides 'more precise optimization signals'.
[§4] §4 (experiments): No ablation is reported that disables or randomizes the dependency guidance while holding the underlying optimizer and Shapley allocation fixed. The reported outperformance on diverse pipelines could therefore arise from resource allocation alone or from the base optimizer rather than the dependency-guided decomposition, undermining attribution of gains to ADOPT.
[§4] §4 and abstract: The manuscript provides no quantitative metrics, error bars, or statistical significance tests for the claimed consistent outperformance. This makes it difficult to evaluate the robustness of results across the tested pipelines and datasets.

minor comments (1)

[§3] Notation for 'textual gradient' and 'global/local' decomposition should be formalized with equations in §3 to clarify the decomposition step.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and agree that the suggested additions will improve clarity and rigor. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (method description): The dependency analysis procedure used to decompose the global textual gradient into local per-step gradients is not described with sufficient algorithmic detail or pseudocode. Without this, it is impossible to verify how inter-step dependencies are quantified or whether the decomposition avoids substantial noise or bias, which is load-bearing for the central claim that ADOPT provides 'more precise optimization signals'.

Authors: We agree that the current description of the dependency analysis in §3 is insufficiently detailed. In the revised manuscript we will expand this section with a complete algorithmic description of how inter-step dependencies are quantified (via per-step contribution scores derived from the final-task error signal) and how the global textual gradient is decomposed into local gradients. We will also add pseudocode for the full procedure, including the decomposition step, to allow verification that noise and bias are controlled by prioritizing high-dependency paths. revision: yes
Referee: [§4] §4 (experiments): No ablation is reported that disables or randomizes the dependency guidance while holding the underlying optimizer and Shapley allocation fixed. The reported outperformance on diverse pipelines could therefore arise from resource allocation alone or from the base optimizer rather than the dependency-guided decomposition, undermining attribution of gains to ADOPT.

Authors: We acknowledge that the current experiments do not isolate the dependency-guided decomposition from the Shapley allocation and base optimizer. In the revised §4 we will add a targeted ablation that disables or randomizes the dependency guidance (replacing it with uniform or random decomposition) while keeping the optimizer and Shapley allocation unchanged. The results of this ablation will be reported to quantify the incremental benefit attributable to the dependency component. revision: yes
Referee: [§4] §4 and abstract: The manuscript provides no quantitative metrics, error bars, or statistical significance tests for the claimed consistent outperformance. This makes it difficult to evaluate the robustness of results across the tested pipelines and datasets.

Authors: We agree that the absence of error bars and statistical tests limits assessment of robustness. In the revised manuscript we will report standard deviations across multiple independent runs as error bars on all tables and figures in §4. We will also add statistical significance tests (paired t-tests or Wilcoxon signed-rank tests, as appropriate) comparing ADOPT against each baseline, with p-values reported alongside the performance numbers. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; method integrates external components

full rationale

The paper presents ADOPT as a framework that analyzes inter-step dependencies to decompose a global textual gradient (from final-task error) into local per-step signals, then integrates this with existing single-prompt optimizers via decoupling and Shapley allocation. No equations, self-referential derivations, or fitted parameters are shown that reduce the claimed outperformance to quantities defined by the method itself. The approach explicitly builds on prior optimizers rather than deriving results from its own outputs, and experiments on real-world datasets serve as external validation, keeping the chain self-contained without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified domain assumption that inter-step dependencies can be measured and used to decompose final-task errors into actionable local signals. No free parameters or invented entities with independent evidence are described in the abstract.

axioms (1)

domain assumption Dependency between each LLM step and the final output can be analyzed to produce useful decomposition of global errors
Invoked as the basis for constructing local textual gradients from final-task errors

invented entities (1)

textual gradient no independent evidence
purpose: To serve as an optimization signal that can be decomposed across pipeline steps
Introduced in the abstract as the mechanism for propagating final errors to individual prompts

pith-pipeline@v0.9.0 · 5441 in / 1281 out tokens · 34588 ms · 2026-05-16T18:42:49.803815+00:00 · methodology

ADOPT: Adaptive Dependency-Guided Joint Prompt Optimization for Multi-Step LLM Pipelines

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)