Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning

Jun Wang; Weinan Zhang; Weiwen Liu; Xinbei Ma; Xingyu Lou; Yansi Li; Zheng Wu; Zhuosheng Zhang

arxiv: 2601.03641 · v4 · submitted 2026-01-07 · 💻 cs.CL

Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning

Zheng Wu , Xingyu Lou , Xinbei Ma , Yansi Li , Weiwen Liu , Weinan Zhang , Jun Wang , Zhuosheng Zhang This is my paper

Pith reviewed 2026-05-16 17:10 UTC · model grok-4.3

classification 💻 cs.CL

keywords continual learningLLM agentsgradient consensusstability-plasticity dilemmaparameter fusionknowledge disentanglementcatastrophic forgetting

0 comments

The pith

Agent-Dice separates shared knowledge from task-specific gradient conflicts to let LLM agents learn new tasks without forgetting prior ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims the stability-plasticity dilemma in continual agent learning occurs because models cannot distinguish knowledge common to many tasks from updates that interfere with earlier ones. Agent-Dice addresses this by first filtering gradients through geometric consensus to drop those that point in opposing directions, then reweighting the survivors by curvature to strengthen the shared semantic components. A theoretical analysis is given to show why this fusion preserves performance across tasks. Experiments on GUI and tool-use agents report strong retention of old skills alongside acquisition of new ones while using little extra compute or parameter storage. Readers would care if the separation works because it removes the need for heavy replay or regularization tricks in sequential agent training.

Core claim

Agent-Dice executes a two-stage fusion on parameter updates: geometric consensus filtering removes gradients that disagree with the direction of the current model parameters, after which curvature-based importance weighting amplifies the remaining updates that encode semantics shared across tasks. This explicit disentanglement of common knowledge from task-specific interference is shown both theoretically to be valid and empirically to mitigate catastrophic forgetting in LLM-based agents operating on GUI and tool-use sequences.

What carries the argument

Directional consensus evaluation, which compares the angle between incoming gradient vectors and the existing parameter trajectory to prune conflicts and reweight shared content.

If this is right

Agents trained sequentially on new interfaces or tools retain high accuracy on earlier ones without replay buffers.
Only small additional computation is needed because the method operates on gradients already computed during normal fine-tuning.
The stability-plasticity trade-off is reframed as a geometric separation problem rather than an optimization trade-off.
Parameter counts stay close to the base model size because no task-specific adapters or heads are stored.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consensus filter could be tested on non-agent continual-learning benchmarks such as sequential image classification to check domain generality.
If curvature weighting proves robust, the approach might combine with low-rank adapters to further cut storage while preserving the separation effect.
Extending the directional test from gradients to attention-key or value updates could address forgetting in transformer internals rather than just final-layer weights.

Load-bearing premise

Measuring agreement in gradient direction reliably isolates common knowledge from interfering task-specific signals without introducing systematic bias.

What would settle it

Run the same sequence of GUI and tool-use tasks while replacing the directional-consensus filter with random gradient pruning of equal size; if prior-task accuracy collapses at the same rate as the unfiltered baseline, the separation claim is falsified.

read the original abstract

Large Language Model (LLM)-based agents significantly extend the utility of LLMs by interacting with dynamic environments. However, enabling agents to continually learn new tasks without catastrophic forgetting remains a critical challenge, known as the stability-plasticity dilemma. In this work, we argue that this dilemma fundamentally arises from the failure to explicitly distinguish between common knowledge shared across tasks and conflicting knowledge introduced by task-specific interference. To address this, we propose Agent-Dice, a parameter fusion framework based on directional consensus evaluation. Concretely, Agent-Dice disentangles knowledge updates through a two-stage process: geometric consensus filtering to prune conflicting gradients, and curvature-based importance weighting to amplify shared semantics. We provide a rigorous theoretical analysis that establishes the validity of the proposed fusion scheme and offers insight into the origins of the stability-plasticity dilemma. Extensive experiments on GUI agents and tool-use agent domains demonstrate that Agent-Dice exhibits outstanding continual learning performance with minimal computational overhead and parameter updates. The codes are available at https://github.com/Wuzheng02/Agent-Dice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Agent-Dice's two-stage gradient filter claims to separate shared and conflicting knowledge cleanly, but the directional consensus step rests on an assumption that often fails when LLM representations overlap.

read the letter

The paper's main move is a parameter fusion method called Agent-Dice that first prunes gradients by checking directional consensus around the mean and then reweights the survivors by curvature to favor shared semantics. This is pitched as a direct attack on the stability-plasticity dilemma in LLM agents that keep learning new tasks in changing environments like GUIs or tool use. The authors supply code and run experiments that reportedly show solid retention with low overhead, which is the practical part worth noting. They also sketch a theoretical bound that ties the dilemma to failure to disentangle the two kinds of updates. That combination of a concrete filter plus claimed analysis is what is new here compared with standard replay or regularization baselines in continual learning. The experiments on agent domains give it some grounding that pure theory papers lack. The soft spot is exactly the one the stress-test note flags. The pruning step assumes conflicting gradients sit outside a cone around the consensus direction. In practice, fine-tuning on overlapping tasks often produces gradients that share directional components, so the filter can either retain interference or discard useful shared signal. The paper does not appear to test robustness against partial overlap or supply counter-examples where the cone condition breaks. Without that, the theoretical bounds rest on an ideal case that may not match the data. The curvature weighting then inherits whatever the filter passes through. This is a real limitation rather than a minor detail, because the whole disentanglement claim depends on it. The work is aimed at people already working on continual learning for interactive agents. A reader who wants to try gradient-space fusion tricks could extract the two-stage idea and test it themselves, especially since the code is public. It is coherent enough on its own terms to deserve referee time, even though the central assumption needs direct checking against the actual gradient distributions in the experiments. I would send it to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper claims that the stability-plasticity dilemma in continual learning for LLM-based agents stems from failing to distinguish common knowledge from task-specific conflicting knowledge. It proposes Agent-Dice, a two-stage parameter fusion method consisting of geometric consensus filtering to prune conflicting gradients followed by curvature-based importance weighting to amplify shared semantics. The authors provide a rigorous theoretical analysis establishing the validity of the fusion scheme and report strong experimental results on GUI agents and tool-use agent domains, with minimal computational overhead and parameter updates.

Significance. If the geometric consensus reliably separates shared and conflicting updates without bias, the work would provide both a practical low-overhead method for agent continual learning and theoretical insight into the origins of catastrophic forgetting. The public code release supports reproducibility and potential follow-up work.

major comments (2)

[Section 3] Section 3: The stability-plasticity bounds are derived under the assumption that task-specific updates lie outside a cone around the mean gradient direction. This cone condition is load-bearing for the pruning step; when violated by partial directional overlap (common in LLM fine-tuning where representations share components), the filtering may retain interference or discard useful shared updates. No explicit robustness analysis or counter-example handling is provided.
[Section 4] Section 4 (experiments): The claim of 'outstanding' performance relies on the filtering and weighting steps being the causal factor, yet the manuscript lacks ablation results isolating the effect of the directional consensus threshold or curvature weighting under controlled interference levels. Without these, it is unclear whether gains exceed standard regularization baselines.

minor comments (2)

[Abstract] Abstract: The code repository link is given, but the manuscript should specify which scripts reproduce the exact tables and figures.
[Section 3] Notation: The definitions of the consensus direction and curvature weight should be stated explicitly with equation numbers in the main text rather than deferred to appendices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional analysis and experiments as suggested.

read point-by-point responses

Referee: [Section 3] Section 3: The stability-plasticity bounds are derived under the assumption that task-specific updates lie outside a cone around the mean gradient direction. This cone condition is load-bearing for the pruning step; when violated by partial directional overlap (common in LLM fine-tuning where representations share components), the filtering may retain interference or discard useful shared updates. No explicit robustness analysis or counter-example handling is provided.

Authors: We acknowledge that the cone condition is central to the theoretical bounds and pruning step. While the analysis in Section 3 derives the bounds under this assumption to characterize the stability-plasticity trade-off, we agree that robustness to partial directional overlap merits explicit treatment. In the revised version, we will add a dedicated subsection with (i) a theoretical extension bounding the residual interference under mild cone violations and (ii) synthetic counter-examples on low-dimensional gradient vectors that simulate partial overlap. These additions will show that the subsequent curvature-based weighting step mitigates retained interference, preserving the method's practical utility even when the strict cone condition is only approximately satisfied in LLM fine-tuning. revision: yes
Referee: [Section 4] Section 4 (experiments): The claim of 'outstanding' performance relies on the filtering and weighting steps being the causal factor, yet the manuscript lacks ablation results isolating the effect of the directional consensus threshold or curvature weighting under controlled interference levels. Without these, it is unclear whether gains exceed standard regularization baselines.

Authors: We agree that stronger causal evidence is desirable. The current experiments already compare Agent-Dice against standard regularization baselines (EWC, SI, and MAS) and report consistent gains with low overhead, but they do not isolate the two stages under controlled interference. In the revision we will add targeted ablations that (i) sweep the directional consensus threshold while fixing curvature weighting and (ii) vary the curvature importance factor while disabling filtering, all under synthetic interference levels generated by controlled gradient overlap. These results will be presented alongside the existing baseline comparisons to demonstrate that the performance improvements are attributable to the proposed geometric consensus and curvature mechanisms rather than generic regularization. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains independent of inputs

full rationale

The paper introduces a two-stage fusion method (directional consensus pruning followed by curvature weighting) and claims a theoretical analysis establishing its validity. No equations or steps in the abstract or description reduce any claimed prediction or bound to a fitted parameter or self-citation by construction. The stability-plasticity insight is presented as arising from the proposed disentanglement rather than being presupposed in the inputs. Self-citations, if any, are not load-bearing for the core claims, and the method is externally falsifiable via experiments on GUI and tool-use agents. This is the standard non-circular case.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that gradient directions encode distinguishable shared versus conflicting knowledge, which is not independently verified outside the method itself.

axioms (1)

domain assumption Gradient directions can be used to evaluate consensus and separate common knowledge from task-specific interference
Invoked as the basis for the geometric consensus filtering stage in the two-stage process.

pith-pipeline@v0.9.0 · 5504 in / 1168 out tokens · 43715 ms · 2026-05-16T17:10:55.761727+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

geometric consensus filtering to prune conflicting gradients, and curvature-based importance weighting to amplify shared semantics
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 2 (Consensus-Induced Variance Reduction) ... Hoeffding’s inequality ... exp(−2|Sj|(p−0.5)2)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.