How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

Abigail O'Neill; Alan Zhu; Alexandros G. Dimakis; Joseph E. Gonzalez; Matei Zaharia; Parth Asawa

arxiv: 2510.02453 · v3 · pith:75OYLG6Nnew · submitted 2025-10-02 · 💻 cs.LG · cs.AI· cs.CL

How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

Parth Asawa , Alan Zhu , Abigail O'Neill , Matei Zaharia , Alexandros G. Dimakis , Joseph E. Gonzalez This is my paper

Pith reviewed 2026-05-21 21:37 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords Advisor Modelsblack-box LLMsnatural language advicemodel steeringprompt optimizationtransferable advisorspersonalizationSWE agents

0 comments

The pith

Small open-weight models can be trained to generate dynamic natural language advice that improves black-box frontier LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Advisor Models as a method for training small open-weight models to produce per-instance natural language advice that steers larger black-box models like GPT-5.2 and Gemini 3 Pro. This approach delivers concrete gains such as a 27.4% performance lift on RuleArena tax tasks, a 24.6% reduction in steps for software engineering agents, and superior personalization results compared to static prompt methods. The advisor remains transferable from cheap student models to frontier systems and shows no degradation on unrelated benchmarks. The core idea is to achieve parametric customization of inaccessible models through lightweight, task-specific guidance rather than direct weight changes or fixed prompts.

Core claim

Advisor Models train small open-weight models to output dynamic, per-instance natural language advice that improves the capabilities of black-box frontier models, achieving reported gains of 27.4% on RuleArena (Taxes) for GPT-5.2, 24.6% fewer steps in SWE agent tasks for Gemini 3 Pro, and 85-100% success in personalizing GPT-5 versus 40-60% for static optimizers, while remaining transferable across model scales and robust across benchmarks.

What carries the argument

Advisor Models: small open-weight models trained to generate instance-specific natural language advice that the black-box frontier model receives and acts upon.

If this is right

Black-box frontier models can receive parametric optimization through advice instead of prompt engineering or weight access.
An advisor trained on a low-cost student model still transfers measurable improvements to larger frontier models.
The method avoids degradation on benchmarks outside the training pipeline.
Dynamic per-instance advice outperforms static prompt optimizers for user preference personalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Users could maintain a library of lightweight advisors for different domains without retraining the main model.
The approach might extend to chaining multiple advisors for compound tasks where single prompts fall short.
Cost savings could arise by routing routine decisions through the small advisor before invoking the expensive frontier call.

Load-bearing premise

The natural language advice from the trained small model will be reliably interpreted and acted on by the black-box frontier model to produce the measured gains without task-specific biases or per-model retuning.

What would settle it

Measuring performance on the same RuleArena and SWE tasks when the frontier model receives no advisor advice or receives only generic prompts, and finding zero or negative difference from the reported improvements.

read the original abstract

Frontier language models are deployed as black-box services, where model weights cannot be modified and customization is limited to prompting. We introduce Advisor Models, a method to train small open-weight models to generate dynamic, per-instance natural language advice that improves the capabilities of black-box frontier models. Advisor Models improve GPT-5.2's performance on RuleArena (Taxes) by 27.4%, reduce Gemini 3 Pro's steps taken in SWE agent tasks by 24.6%, and outperform static prompt optimizers in personalizing GPT-5 to user preferences (85-100% vs. 40-60%). We also find that advisors are transferable: an advisor trained with a low-cost student model still transfers improvements to a frontier model. Moreover, Advisor Models are robust: we observe no degradation on other benchmarks than the pipeline is trained on. Our method shows how to perform parametric optimization for black-box frontier models in a practical and cost-effective way.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Advisor Models: small open-weight models trained to output dynamic, per-instance natural language advice that is concatenated into prompts for black-box frontier LLMs. It reports a 27.4% performance gain for GPT-5.2 on RuleArena (Taxes), a 24.6% reduction in steps taken by Gemini 3 Pro on SWE-agent tasks, and superior personalization to user preferences (85-100% vs. 40-60% for static prompt optimizers). Additional claims include transferability of advisors trained on low-cost student models to frontier models and robustness with no degradation on unrelated benchmarks.

Significance. If the core mechanism holds, the work offers a practical, cost-effective route to parametric optimization of inaccessible black-box models. The transferability result would be especially useful for deployment, and the robustness observation supports broader applicability without per-task retuning.

major comments (2)

[Results on RuleArena (Taxes) and SWE-agent tasks] The headline gains (27.4% on RuleArena Taxes, 24.6% step reduction on SWE tasks) are reported as end-to-end accuracy or efficiency metrics. No ablation or verification is described that isolates whether the frontier model actually attends to and follows the specific advice content versus benefiting from any additional structured text of comparable length or phrasing. This directly affects the central claim that the advisor's natural-language output is the causal driver.
[Transferability experiments] Transferability is asserted on the basis that an advisor trained with a student model still improves a frontier model. This rests on the untested assumption that interpretation behavior remains stable across model families and tasks without per-model retuning; the manuscript provides no cross-model advice-following diagnostics or controlled tests to support this.

minor comments (2)

[Method] Clarify the exact training objective, data-generation procedure, and any filtering rules used for the advisor training set so that the parametric optimization claim can be reproduced.
[Abstract and Evaluation] The abstract references specific model versions (GPT-5.2, Gemini 3 Pro); the main text should state whether these are real releases or placeholders and include the precise task definitions and evaluation protocols.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below and describe the changes we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Results on RuleArena (Taxes) and SWE-agent tasks] The headline gains (27.4% on RuleArena Taxes, 24.6% step reduction on SWE tasks) are reported as end-to-end accuracy or efficiency metrics. No ablation or verification is described that isolates whether the frontier model actually attends to and follows the specific advice content versus benefiting from any additional structured text of comparable length or phrasing. This directly affects the central claim that the advisor's natural-language output is the causal driver.

Authors: We agree that a more direct isolation of the causal role of the specific advice content would strengthen the central claim. The current manuscript includes comparisons against static prompt optimizers and standard zero-shot prompting, which control for the presence of additional text to some degree. However, these do not fully rule out benefits from any structured addition of comparable length. We will add a new ablation in the revised version that supplies the frontier models with non-specific or randomly sampled advice of matched length and syntactic structure. The results of this ablation will be reported alongside the main results to confirm that gains are driven by the instance-specific advice. revision: yes
Referee: [Transferability experiments] Transferability is asserted on the basis that an advisor trained with a student model still improves a frontier model. This rests on the untested assumption that interpretation behavior remains stable across model families and tasks without per-model retuning; the manuscript provides no cross-model advice-following diagnostics or controlled tests to support this.

Authors: Our transferability results are empirical: advisors trained on student models produce measurable gains when the same advice is fed to frontier models on the target tasks. This provides practical evidence that the advice is usable across scales. We acknowledge that we did not include explicit diagnostics such as attention analysis or controlled interpretation tests. To address this, we will add qualitative examples of transferred advice and a small diagnostic analysis in the appendix showing how frontier models respond to the same advice strings, without claiming a full mechanistic study. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical training and held-out evaluation

full rationale

The paper introduces Advisor Models via supervised training of small open-weight models on task-specific data, then measures downstream gains on held-out benchmarks (RuleArena Taxes, SWE agent tasks, preference personalization). All reported improvements (27.4%, 24.6%, 85-100%) are direct experimental outcomes rather than quantities derived from the paper's own equations or fitted parameters. No self-definitional loops, fitted-input-as-prediction, or load-bearing self-citation chains appear; transferability and robustness are presented as observed results on separate evaluations. The method is self-contained against external black-box models and does not reduce any central claim to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of learned natural-language advice as a steering signal; the abstract provides no explicit free parameters or invented physical entities but implicitly relies on standard LLM training assumptions.

free parameters (1)

Advisor training data and objective
The small model must be trained on task-specific examples of useful advice; exact datasets and loss weighting are not specified in the abstract.

axioms (1)

domain assumption Natural language advice generated by a smaller model can be effectively consumed by a larger black-box model to improve task performance
This premise is invoked throughout the method description and result claims in the abstract.

invented entities (1)

Advisor Model no independent evidence
purpose: Small open-weight model that generates dynamic per-instance natural language advice
Newly introduced construct whose utility is demonstrated via the reported experiments.

pith-pipeline@v0.9.0 · 5716 in / 1443 out tokens · 133662 ms · 2026-05-21T21:37:48.117532+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce ADVISOR MODELS, lightweight parametric policies trained with reinforcement learning to reactively issue natural language steering instructions in-context to black-box models.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The advisor is a second small model that sits between the input and the model, shaping behavior on a per-instance basis using reward signals from the environment.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OpenJarvis: Personal AI, On Personal Devices
cs.LG 2026-05 unverdicted novelty 6.0

OpenJarvis decomposes personal AI into Intelligence, Engine, Agents, Tools & Memory, and Learning primitives and applies LLM-guided spec search to produce on-device configurations that reach within 3.2 pp of cloud bas...
ExecTune: Effective Steering of Black-Box LLMs with Guide Models
cs.LG 2026-04 unverdicted novelty 6.0

ExecTune trains guide models via acceptance sampling, supervised fine-tuning, and structure-aware RL to boost executability of strategies for black-box LLMs, yielding up to 9.2% higher accuracy and 22.4% lower cost on...
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace
cs.AI 2026-05 unverdicted novelty 5.0 partial

Shepherd is a runtime system that formalizes meta-agent operations via typed execution traces, enabling fast forking and demonstrated improvements in agent intervention, optimization, and training on benchmarks.
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
cs.LG 2026-05 unverdicted novelty 5.0

PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing...