QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

Florian A. D. Burnat

arxiv: 2605.04267 · v2 · pith:QZAS27HMnew · submitted 2026-05-05 · 💻 cs.LG · cs.NE· math.OC

QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization

Florian A. D. Burnat This is my paper

Pith reviewed 2026-06-30 23:48 UTC · model grok-4.3

classification 💻 cs.LG cs.NEmath.OC

keywords multi-objective optimizationpreference elicitationsurrogate-assisted evolutionary algorithmscost-aware queryinginteractive optimizationutility regretDTLZ WFG benchmarks

0 comments

The pith

QUIVER adaptively allocates budget between objective evaluations and heterogeneous preference queries to minimize final utility regret in multi-objective optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents QUIVER, an algorithm for interactive multi-objective optimization that decides at each step whether to evaluate an objective or elicit a preference query, choosing the option that gives the most expected improvement in decision quality per unit cost. Preference queries come in two modalities: cheap but noisy pairwise statements and more informative but expensive indifference adjustments. On benchmark problems, this cost-aware selection leads to lower utility regret than methods that use only one type of query or ignore costs. The method shows that on difficult problems the optimal strategy shifts toward using more of the richer queries.

Core claim

QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost, achieving utility regrets of 2.14 on WFG4 and 2.82 on WFG9, a 25% improvement over baselines, while adapting the proportion of pairwise preference and indifference adjustment queries based on problem difficulty.

What carries the argument

The action selection rule that computes expected regret reduction per unit cost for each possible next query or evaluation, using a surrogate model to estimate the effect on the decision quality.

If this is right

The proportion of indifference adjustment queries increases with problem difficulty, reaching 65% on the hardest WFG9 instance.
Single-modality approaches are suboptimal because they cannot adjust to the varying value of different query types.
The total budget is better allocated when both information content and cognitive cost are considered in the selection.
Surrogate models enable this by predicting the impact of each possible action without actually performing it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world deployment would require validating the synthetic DM models against actual human responses to see if the cost structures match.
This approach could extend to other interactive optimization settings where queries have heterogeneous costs.
Future systems should model the expected value of information from each query type explicitly rather than fixing one modality.

Load-bearing premise

The synthetic decision-maker models used in the experiments accurately capture the information content, noise, and cost structure of real human preference statements and indifference adjustments.

What would settle it

Running the optimizer with actual human decision makers providing preferences and measuring if the adaptive selection still reduces regret compared to fixed strategies.

Figures

Figures reproduced from arXiv: 2605.04267 by Florian A. D. Burnat.

**Figure 1.** Figure 1: Utility regret comparison across benchmarks. Preference-learning methods view at source ↗

**Figure 2.** Figure 2: Budget allocation for QUIVER across benchmarks. Evaluations dominate the view at source ↗

**Figure 3.** Figure 3: Cost sensitivity analysis on DTLZ2 (𝑚 = 3). QUIVER gradually reduces IA usage as the cost ratio increases, demonstrating cost-aware modality selection view at source ↗

read the original abstract

Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA). We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QUIVER adds a cost-per-gain scheduling rule for mixing objective evaluations with cheap noisy pairwise statements and richer indifference adjustments, but the reported regret drops live only inside synthetic decision-maker simulations.

read the letter

QUIVER gives a scheduling rule that picks, at each step, whether to evaluate an objective or issue a preference query, and if a query then whether the cheap noisy pairwise statement or the costlier indifference adjustment. It does this by maximizing expected improvement in final utility regret per unit total cost inside a surrogate-assisted evolutionary loop.

The paper is clear on the practical tension it targets: limited budget has to cover both expensive function calls and preference elicitation, and the two query types differ in both cost and information value. It shows the rule producing an adaptive mix that shifts with problem difficulty, from roughly 80 % cheap queries on easy DTLZ2 to 35 % richer ones on hard WFG9. The concrete claim is a 25 % regret reduction on WFG4 and WFG9 relative to single-modality baselines.

The soft spot is the exclusive reliance on synthetic decision-maker models. Those models define the noise, information content, and per-query costs; the regret numbers and the observed modality shift are therefore only as good as the models. The abstract supplies no comparison to real human responses, so it is unclear whether the gains would survive outside the simulation. Details on how expected improvement is computed and whether the reported differences carry error bars or statistical tests are also missing from the summary.

The work is aimed at people already running surrogate-assisted interactive EMO who need a concrete way to ration queries. It deserves a serious referee because the scheduling idea is externally grounded and falsifiable even if the current experiments rest on unvalidated user models.

Referee Report

1 major / 2 minor

Summary. The paper introduces QUIVER, a surrogate-assisted evolutionary multi-objective optimizer that adaptively allocates budget between objective evaluations and heterogeneous preference queries (pairwise statements PS and indifference adjustments IA) by selecting the action that maximizes expected decision-quality improvement per unit total cost. Under synthetic decision-maker models on DTLZ and WFG benchmarks, QUIVER reports the lowest final utility regret on challenging WFG instances (2.14 on WFG4, 2.82 on WFG9, a 25% improvement over single-modality baselines) and shows an adaptive shift in query mix (80% PS on easy problems to 35% IA on hard problems).

Significance. If the synthetic models prove representative, the work supplies a practical, cost-aware mechanism for balancing expensive evaluations against preference elicitation in interactive MO optimization. The explicit per-cost expected-improvement selection rule and the observed modality adaptation constitute concrete algorithmic contributions. The provision of specific numerical regret values on standard benchmarks is a positive feature for assessing the magnitude of the claimed gains.

major comments (1)

[Abstract / experimental evaluation] Abstract / experimental evaluation: the headline utility-regret reductions (2.14 on WFG4, 2.82 on WFG9, 25% better than baselines) and the reported shift from 80% PS to 35% IA queries are obtained exclusively under synthetic decision-maker models. No validation is supplied that these models reproduce the information content, noise structure, or per-query cognitive costs of real human statements; if the models are misspecified, both the regret improvements and the adaptive-mix observations become simulation artifacts rather than robust properties of the algorithm.

minor comments (2)

[Abstract] The abstract states concrete regret numbers but supplies no mention of the number of independent runs, error bars, or statistical tests used to support the 25% improvement claim.
A brief discussion of the limitations of the chosen synthetic DM models and of the conditions under which the observed modality adaptation would be expected to transfer to real users would strengthen the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the scope of our experimental validation. We address the concern below and outline targeted revisions to the manuscript.

read point-by-point responses

Referee: [Abstract / experimental evaluation] Abstract / experimental evaluation: the headline utility-regret reductions (2.14 on WFG4, 2.82 on WFG9, 25% better than baselines) and the reported shift from 80% PS to 35% IA queries are obtained exclusively under synthetic decision-maker models. No validation is supplied that these models reproduce the information content, noise structure, or per-query cognitive costs of real human statements; if the models are misspecified, both the regret improvements and the adaptive-mix observations become simulation artifacts rather than robust properties of the algorithm.

Authors: We agree that all reported numerical results and the observed adaptive query mix (80% PS to 35% IA) are obtained exclusively under the synthetic decision-maker models described in the paper. These models follow standard practice in the interactive MO literature to enable controlled, reproducible isolation of the cost-aware selection rule and its effect on regret. We do not claim or provide evidence that the models exactly reproduce human information content, noise, or cognitive costs. In the revised manuscript we will (i) add an explicit limitations paragraph in the discussion section stating that the reported gains and modality adaptations are conditional on the synthetic models, (ii) qualify the abstract and experimental claims to read “under synthetic decision-maker models,” and (iii) include a short future-work paragraph on the value of human-subject studies. These changes will prevent over-interpretation while preserving the algorithmic contribution of the per-cost expected-improvement selection mechanism. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation or results

full rationale

The paper defines QUIVER via an explicit selection rule of maximizing expected decision-quality improvement per unit total cost, then evaluates the resulting algorithm on standard DTLZ/WFG benchmarks under separately specified synthetic DM models for PS and IA queries. No quoted equations, parameter fits, or self-citations reduce the reported utility regret values, modality mix, or performance claims to the inputs by construction; the regret metric is computed from the known true utility in the simulation and is not used to fit the selection rule itself. The synthetic models constitute an evaluation assumption rather than a self-definitional loop, leaving the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies insufficient detail to enumerate free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about how expected decision-quality improvement is estimated and how query costs are modeled.

pith-pipeline@v0.9.1-grok · 5776 in / 1058 out tokens · 25754 ms · 2026-06-30T23:48:07.983673+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Scalable test problems for evolutionary multiobjective optimization

Bradley, R. A., and Terry, M. E. (1952) Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4), pp. 324–345. A vailable at: https://doi.org/10.2307/2334029. Branke, J., Greco, S., Slowinski, R., and Zielniewicz, P. (2015) Learning value functions in interactive evolutionary multiobjective optimization. IEEE Tra...

work page doi:10.2307/2334029 1952
[2]

Operations Research, 66(1), pp

Learning to optimize via information-directed sampling. Operations Research, 66(1), pp. 230–252. A vailable at: https://doi.org/10. 1287/opre.2017.1663. Settles, B. (2009) Active Learning Literature Survey . University of Wisconsin-Madison. Zhang, Q., Liu, W., Tsang, E., and Virginas, B. (2010) Expensive multiobjective optimiza- tion by MOEA/D with Gaussi...

work page doi:10.1109/tevc.2009 2017

[1] [1]

Scalable test problems for evolutionary multiobjective optimization

Bradley, R. A., and Terry, M. E. (1952) Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4), pp. 324–345. A vailable at: https://doi.org/10.2307/2334029. Branke, J., Greco, S., Slowinski, R., and Zielniewicz, P. (2015) Learning value functions in interactive evolutionary multiobjective optimization. IEEE Tra...

work page doi:10.2307/2334029 1952

[2] [2]

Operations Research, 66(1), pp

Learning to optimize via information-directed sampling. Operations Research, 66(1), pp. 230–252. A vailable at: https://doi.org/10. 1287/opre.2017.1663. Settles, B. (2009) Active Learning Literature Survey . University of Wisconsin-Madison. Zhang, Q., Liu, W., Tsang, E., and Virginas, B. (2010) Expensive multiobjective optimiza- tion by MOEA/D with Gaussi...

work page doi:10.1109/tevc.2009 2017