Sequential Resource Trading Using Comparison-Based Gradient Estimation

Mustafa O. Karabag; Surya Murthy; Ufuk Topcu

arxiv: 2408.11186 · v4 · pith:PJCTEQ6Rnew · submitted 2024-08-20 · 💻 cs.MA · cs.AI· math.OC

Sequential Resource Trading Using Comparison-Based Gradient Estimation

Surya Murthy , Mustafa O. Karabag , Ufuk Topcu This is my paper

Pith reviewed 2026-05-23 21:55 UTC · model grok-4.3

classification 💻 cs.MA cs.AImath.OC

keywords sequential resource tradingcomparison-based gradient estimationPareto optimalitymulti-agent negotiationaccept-reject feedbackutility functionsgreedily rational agents

0 comments

The pith

A comparison-based algorithm lets two agents find mutually beneficial trades using only accept or reject feedback on proposed resource exchanges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an algorithm for two greedily rational agents trading resources across multiple categories without knowing each other's utility functions. The offering agent uses accept and reject responses to estimate the direction of the responder's utility gradient by pruning infeasible directions. This process ensures every accepted trade benefits both parties and eventually either finds a beneficial trade or confirms the allocation is weakly Pareto optimal. The sequence of trades converges to the Pareto front under mild conditions on the utilities. Such a method matters because it enables efficient negotiation in settings where full utility information is private or hard to communicate.

Core claim

The algorithm guarantees that each accepted trade strictly improves both agents' utilities and, after finitely many rejected offers, either identifies a mutually beneficial trade or certifies that the current allocation is weakly Pareto optimal. The sequence of accepted trades asymptotically converges to the Pareto front under mild assumptions.

What carries the argument

Comparison-based gradient estimation, which interprets accept/reject feedback as pairwise comparisons to iteratively refine estimates of the responding agent's utility gradient and prune the space of feasible directions.

If this is right

Each accepted trade strictly improves both agents' utilities.
After a finite number of rejected offers, the algorithm either identifies a mutually beneficial trade or certifies weak Pareto optimality of the current allocation.
The sequence of accepted trades converges asymptotically to the Pareto front under mild assumptions on the utility functions.
The method achieves higher societal benefit with fewer offers compared to standard baselines in multiple trading settings.
Validation in a user study shows strong performance in scenarios with substantial resource conflict.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This mechanism could be adapted for multi-agent settings beyond two participants if a suitable feedback protocol is defined.
The binary feedback approach suggests potential for use in automated systems where agents have limited communication bandwidth.
Extending the convergence result to cases with noisy or inconsistent responses would broaden applicability to real-world negotiations.

Load-bearing premise

The responding agent accepts an offer only when it improves their own utility, and the utility functions meet the mild conditions needed for the convergence result.

What would settle it

A counterexample where the algorithm accepts a trade that does not improve the responder's utility, or fails to certify Pareto optimality despite no mutually beneficial trades existing, or where the trade sequence does not approach the Pareto front when utilities are linear and strictly concave.

Figures

Figures reproduced from arXiv: 2408.11186 by Mustafa O. Karabag, Surya Murthy, Ufuk Topcu.

**Figure 2.** Figure 2: 2D top-down view of 3D cone refinement using the [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Top-down view of cone refinement. Discrete offers lead [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Offer-benefit plots for discrete trading scenarios. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Offer-benefit plots for continuous trading scenarios. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Using the half-distance between the two farthest points on a polytope as the radius [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Prompt for Parsing Alice’s Counteroffer I. Example ST-CR Transcript (Stay) In [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Transcript of ST-CR interacting with a human negotiator. The human negotiator has a target state of 60 apples, 70 [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Prompt Given to GPT Trading Baseline System: GPT Utility Function Q: [[-1,-0,-0],[-0,-1,-0],[-0,-0,-1]] GPT Utility Function u: [50, 100, 150] GPT’s Trade Offer: GPT receives: 10 apples, 5 bananas User receives: 6 oranges User: No, I can give you 5 oranges if you give me 5 apples GPT’s Trade Offer: GPT receives: 5 oranges User receives: 5 apples User: Yes System: Offer Accepted! New User State: 55 apples, … view at source ↗

**Figure 10.** Figure 10: Example transcripts of GPT negotiating with a human agent without an underlying algorithm. In these examples, GPT [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: User Study Recruitment Interface on Prolific [PITH_FULL_IMAGE:figures/full_fig_p028_11.png] view at source ↗

read the original abstract

We study sequential multi-issue trading between two greedily rational agents who exchange resources from a finite set of categories. Each agent's utility depends on its allocation, but the offering agent does not know the responding agent's utility function and receives only accept or reject feedback. We propose a comparison-based algorithm that interprets acceptance and rejection responses as pairwise state comparisons, allowing the offering agent to iteratively estimate the responding agent's gradient. Rejected offers prune the space of feasible gradient directions, enabling systematic refinement of possibly mutually beneficial trades. The algorithm guarantees that each accepted trade strictly improves both agents' utilities and, after finitely many rejected offers, either identifies a mutually beneficial trade or certifies that the current allocation is weakly Pareto optimal. We further show that the sequence of accepted trades asymptotically converges to the Pareto front under mild assumptions. We evaluate the method against standard baselines and show that it achieves higher societal benefit with fewer offers across multiple trading settings. We further validate the approach in a user study, demonstrating strong performance in scenarios with substantial resource conflict.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a workable comparison-based method for bilateral resource trades from accept/reject signals, but the finite-rejection claim for certifying weak Pareto optimality does not obviously survive continuous allocation spaces.

read the letter

The core contribution is an algorithm that lets one agent propose resource trades to another and use the binary accept/reject answers to prune the space of possible improving directions. Each accepted trade is guaranteed to raise both utilities, and the sequence is claimed to reach the Pareto front asymptotically under mild conditions. The new piece is the explicit use of those binary signals as pairwise comparisons to iteratively narrow the gradient estimate without ever learning the full utility function. That framing is not just a restatement of earlier negotiation or gradient-free work in the citations I can see from the abstract. The evaluation section reports better societal benefit and fewer offers than standard baselines, plus a user study that at least checks real-agent behavior under conflict. Those are concrete positives for a methods paper in multi-agent systems. The main soft spot is the finite-termination guarantee. The abstract states that after finitely many rejections the algorithm either finds a mutually beneficial trade or certifies the current point is weakly Pareto optimal. In a continuous resource space the set of feasible directions is uncountable; each rejection removes only a cone. Nothing in the provided abstract supplies a discretization step, a finite candidate set, or a compactness argument that would make the remaining direction set empty after finite steps. If the full paper adds such an argument it would resolve the issue, but the claim as written sits on an assumption that is not spelled out. The asymptotic convergence part is easier to accept under standard continuity conditions. This work is aimed at researchers who build negotiation protocols for agents with private utilities and limited feedback. A reader already working on comparison-based or bandit-style methods in multi-agent settings would find the pruning technique useful to examine. The paper is coherent enough on its own terms to merit peer review; the termination argument is the part that needs the most referee attention.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a comparison-based algorithm for sequential multi-issue resource trading between two greedily rational agents. The offering agent uses only accept/reject feedback to iteratively prune the space of possible gradient directions of the responder's unknown utility function, with the goal of identifying mutually beneficial trades. Central claims include that every accepted trade strictly improves both agents' utilities, that finitely many rejections suffice to either locate such a trade or certify that the current allocation is weakly Pareto optimal, and that the sequence of accepted trades converges asymptotically to the Pareto front under mild assumptions. The approach is compared empirically to baselines on societal benefit and offer count, and validated via a user study.

Significance. If the finite-termination and convergence guarantees can be established rigorously, the work would provide a practical mechanism for private-utility negotiation in multi-agent systems without direct utility disclosure. The empirical results indicate advantages over standard baselines, and the user-study component adds evidence of real-world applicability. The comparison-based pruning idea is a potentially useful primitive for gradient estimation from binary feedback.

major comments (2)

[Abstract] Abstract: The claim that 'after finitely many rejected offers, either identifies a mutually beneficial trade or certifies that the current allocation is weakly Pareto optimal' cannot hold in the standard continuous-allocation setting. Each rejection eliminates at most a half-space (or cone) of directions, yet the set of feasible improving directions is uncountable; an open set of directions can remain unpruned after any finite number of responses. No discretization, compactness argument, or finite-direction assumption is visible in the abstract to close this gap. This directly undermines the central algorithmic guarantee.
[Abstract] Abstract (convergence statement): The assertion that 'the sequence of accepted trades asymptotically converges to the Pareto front under mild assumptions' is stated without identifying the assumptions or sketching the proof technique. Because the gradient estimate is constructed solely from binary comparisons, it is unclear whether the resulting direction sequence is guaranteed to be dense enough in the improving cone to drive convergence; this needs an explicit theorem statement and derivation.

minor comments (1)

The abstract refers to 'mild assumptions' for convergence and 'standard baselines' without naming either; both should be stated explicitly in the introduction or theorem statements for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript to improve clarity on the theoretical guarantees.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'after finitely many rejected offers, either identifies a mutually beneficial trade or certifies that the current allocation is weakly Pareto optimal' cannot hold in the standard continuous-allocation setting. Each rejection eliminates at most a half-space (or cone) of directions, yet the set of feasible improving directions is uncountable; an open set of directions can remain unpruned after any finite number of responses. No discretization, compactness argument, or finite-direction assumption is visible in the abstract to close this gap. This directly undermines the central algorithmic guarantee.

Authors: We agree the abstract phrasing is imprecise for a fully continuous setting. The manuscript (Section 3) implements pruning over a finite discretization of candidate directions, which enables the finite-termination claim under that modeling choice. This discretization is not referenced in the abstract. We will revise the abstract to state the discretization assumption explicitly and add a clarifying sentence in the introduction. revision: yes
Referee: [Abstract] Abstract (convergence statement): The assertion that 'the sequence of accepted trades asymptotically converges to the Pareto front under mild assumptions' is stated without identifying the assumptions or sketching the proof technique. Because the gradient estimate is constructed solely from binary comparisons, it is unclear whether the resulting direction sequence is guaranteed to be dense enough in the improving cone to drive convergence; this needs an explicit theorem statement and derivation.

Authors: We agree the abstract should identify the assumptions and proof approach. The mild assumptions are continuous differentiable utilities and compact convex allocation sets; convergence follows from showing that accumulated rejections make the estimated directions dense in the improving cone (detailed in Theorem 4 and its proof). We will update the abstract to name the assumptions and point to the theorem. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The abstract presents an algorithmic construction whose guarantees are stated directly in terms of the algorithm's accept/reject responses and the external definition of weak Pareto optimality. No equations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed result to an input by construction. The derivation is therefore self-contained against the stated external benchmarks and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that agents are greedily rational and on unspecified mild conditions for convergence; no free parameters or invented entities are identifiable from the abstract.

axioms (2)

domain assumption Agents are greedily rational and accept an offer only when it strictly improves their own utility
Invoked throughout the abstract as the behavioral model that turns accept/reject signals into reliable comparisons.
domain assumption Mild assumptions on utility functions suffice for asymptotic convergence to the Pareto front
Stated explicitly when claiming the sequence of accepted trades converges.

pith-pipeline@v0.9.0 · 5707 in / 1349 out tokens · 47688 ms · 2026-05-23T21:55:20.144157+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ST-CR maintains a cone of potential gradients... refine the cone... after a finite number of consecutively rejected offers... ϵ-weakly Pareto optimal
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We model the space of potential gradients as a cone and treat rejected offers as half-space constraints

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

[1]

Cambridge University Press, 2008

Yoav Shoham and Kevin Leyton-Brown.Multiagent systems: Algorith- mic, game-theoretic, and logical foundations. Cambridge University Press, 2008

work page 2008
[2]

John wiley & sons, 2009

Michael Wooldridge.An introduction to multiagent systems. John wiley & sons, 2009

work page 2009
[3]

Cooperative inverse reinforcement learning.Advances in neural information processing systems, 29, 2016

Dylan Hadfield-Menell, Stuart J Russell, Pieter Abbeel, and Anca Dragan. Cooperative inverse reinforcement learning.Advances in neural information processing systems, 29, 2016

work page 2016
[4]

A policy-blending formal- ism for shared control.The International Journal of Robotics Research, 32(7):790–805, 2013

Anca D Dragan and Siddhartha S Srinivasa. A policy-blending formal- ism for shared control.The International Journal of Robotics Research, 32(7):790–805, 2013

work page 2013
[5]

Designing conventions for automated negotiation

Jeffrey S Rosenschein and Gilad Zlotkin. Designing conventions for automated negotiation. AI magazine, 15(3):29–29, 1994

work page 1994
[6]

John F. Nash. The bargaining problem.Econometrica, 18(2):155–162, 1950

work page 1950
[7]

Preference elicitation and query learning.Journal of Machine Learning Research, 5(Jun):649–667, 2004

Avrim Blum, Jeffrey Jackson, Tuomas Sandholm, and Martin Zinkevich. Preference elicitation and query learning.Journal of Machine Learning Research, 5(Jun):649–667, 2004

work page 2004
[8]

Smooth convex optimization using sub-zeroth-order oracles

Mustafa O Karabag, Cyrus Neary, and Ufuk Topcu. Smooth convex optimization using sub-zeroth-order oracles. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3815– 3822, 2021

work page 2021
[9]

Optimal negotiation decision functions in time-sensitive domains

Tim Baarslag, Enrico H Gerding, Reyhan Aydogan, and MC Schraefel. Optimal negotiation decision functions in time-sensitive domains. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), volume 2, pages 190–197. IEEE, 2015

work page 2015
[10]

Negotiation decision functions for autonomous agents

Peyman Faratin, Carles Sierra, and Nick R Jennings. Negotiation decision functions for autonomous agents. Robotics and Autonomous Systems, 24(3-4):159–182, 1998

work page 1998
[11]

Perfect equilibrium in a bargaining model.Economet- rica: Journal of the Econometric Society, pages 97–109, 1982

Ariel Rubinstein. Perfect equilibrium in a bargaining model.Economet- rica: Journal of the Econometric Society, pages 97–109, 1982

work page 1982
[12]

Decoupling negotiating agents to explore the space of negotiation strategies.Novel insights in agent-based complex automated negotiation, pages 61–83, 2014

Tim Baarslag, Koen Hindriks, Mark Hendrikx, Alexander Dirkzwager, and Catholĳn Jonker. Decoupling negotiating agents to explore the space of negotiation strategies.Novel insights in agent-based complex automated negotiation, pages 61–83, 2014

work page 2014
[13]

Ne- gotiating complex contracts.Group Decision and Negotiation, 12:111– 125, 2003

Mark Klein, Peyman Faratin, Hiroki Sayama, and Yaneer Bar-Yam. Ne- gotiating complex contracts.Group Decision and Negotiation, 12:111– 125, 2003

work page 2003
[14]

A dependency-based mediation mechanism for complex negotiations.Modern Approaches to Agent-based Complex Automated Negotiation, pages 51–66, 2017

Akiyuki Mori, Shota Morii, and Takayuki Ito. A dependency-based mediation mechanism for complex negotiations.Modern Approaches to Agent-based Complex Automated Negotiation, pages 51–66, 2017

work page 2017
[15]

Automed: an automated mediator for multi-issue bilateral negotiations

Michal Chalamish and Sarit Kraus. Automed: an automated mediator for multi-issue bilateral negotiations. Autonomous Agents and Multi-Agent Systems, 24:536–564, 2012

work page 2012
[16]

Efficient issue-grouping approach for multiple interdependent issues negotiation between exag- gerator agents

Katsuhide Fujita, Takayuki Ito, and Mark Klein. Efficient issue-grouping approach for multiple interdependent issues negotiation between exag- gerator agents. Decision Support Systems, 60:10–17, 2014

work page 2014
[17]

A secure and fair negotiation protocol in highly complex utility space based on cone- constraints

Katsuhide Fujita, Takayuki Ito, and Mark Klein. A secure and fair negotiation protocol in highly complex utility space based on cone- constraints. In 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, volume 2, pages 427–430. IEEE, 2009

work page 2009
[18]

Searching for joint gains in multi-party negotiations.European Journal of Operational Research, 130(1):54–69, 2001

Harri Ehtamo, Eero Kettunen, and Raimo P Hämäläinen. Searching for joint gains in multi-party negotiations.European Journal of Operational Research, 130(1):54–69, 2001

work page 2001
[19]

The fawkes agent—the anac 2013 negotiation contest winner.Next Frontier in Agent-Based Complex Automated Negotiation, pages 143–151, 2015

Vincent J Koeman, Kees Boon, Joris Z van den Oever, Madalin Dumitru- Guzu, and Laurentiu Catalin Stanculescu. The fawkes agent—the anac 2013 negotiation contest winner.Next Frontier in Agent-Based Complex Automated Negotiation, pages 143–151, 2015

work page 2013
[20]

Hardheaded

Thĳs Van Krimpen, Daphne Looĳe, and Siamak Hajizadeh. Hardheaded. In Complex Automated Negotiations: Theories, Models, and Software Competitions, pages 223–227. Springer, 2013

work page 2013
[21]

Faratin, C

P. Faratin, C. Sierra, and N.R. Jennings. Using similarity criteria to make issue trade-offs in automated negotiations.Artificial Intelligence, 142(2):205–237, 2002. International Conference on MultiAgent Systems 2000

work page 2002
[22]

Automated multilateral negotiation on multiple issues with private information

Ronghuo Zheng, Tinglong Dai, Katia Sycara, and Nilanjan Chakraborty. Automated multilateral negotiation on multiple issues with private information. INFORMS Journal on Computing, 28(4):612–628, 2016

work page 2016
[23]

An offer-generating strategy for multiple negotiations with mixed types of issues and issue interdependency

Kai Li, Lei Niu, Fenghui Ren, and Xinguo Yu. An offer-generating strategy for multiple negotiations with mixed types of issues and issue interdependency. Engineering Applications of Artificial Intelligence, 136:108891, 2024

work page 2024
[24]

Optimal time-based strategy for automated negoti- ation

Yasser Mohammad. Optimal time-based strategy for automated negoti- ation. Applied Intelligence, 53(6):6710–6735, 2023

work page 2023
[25]

Bayesian-based pref- erence prediction in bilateral multi-issue negotiation between intelligent agents

Jihang Zhang, Fenghui Ren, and Minjie Zhang. Bayesian-based pref- erence prediction in bilateral multi-issue negotiation between intelligent agents. Knowledge-Based Systems, 84:108–120, 2015

work page 2015
[26]

An approach to scalable multi-issue negotiation: Decomposing the contract space based on issue interdependencies

Katsuhide Fujita, Takayuki Ito, and Mark Klein. An approach to scalable multi-issue negotiation: Decomposing the contract space based on issue interdependencies. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, volume 2, pages 399–406. IEEE, 2010

work page 2010
[27]

Popponent: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations.Artificial Intelligence, 237:59–91, 2016

Farhad Zafari and Faria Nassiri-Mofakham. Popponent: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations.Artificial Intelligence, 237:59–91, 2016

work page 2016
[28]

Learning on opponent’s preferences to make effective multi-issue negotiation trade-offs

Robert M Coehoorn and Nicholas R Jennings. Learning on opponent’s preferences to make effective multi-issue negotiation trade-offs. In Proceedings of the 6th international conference on Electronic commerce, pages 59–68, 2004

work page 2004
[29]

Rethinking frequency opponent modeling in automated negotiation

Okan Tunalı, Reyhan Aydoğan, and Victor Sanchez-Anguix. Rethinking frequency opponent modeling in automated negotiation. In PRIMA 2017:PrinciplesandPracticeofMulti-AgentSystems:20thInternational Conference, Nice, France, October 30–November 3, 2017, Proceedings 20, pages 263–279. Springer, 2017

work page 2017
[30]

Meta-strategy based on multi- armed bandit approach for multi-time negotiation

Ryohei Kawata and Katsuhide Fujita. Meta-strategy based on multi- armed bandit approach for multi-time negotiation. IEICE TRANSAC- TIONS on Information and Systems, 103(12):2540–2548, 2020

work page 2020
[31]

An analysis of the linear bilateral anac domains using the micro benchmark strategy

Dave De Jonge. An analysis of the linear bilateral anac domains using the micro benchmark strategy. InĲCAI, pages 223–229, 2022

work page 2022
[32]

A region-based multi-issue negotiation protocol for nonmonotonic utility spaces.Computational Intelligence, 27(2):166– 217, 2011

Miguel A Lopez-Carmona, Ivan Marsa-Maestre, Enrique De La Hoz, and Juan R Velasco. A region-based multi-issue negotiation protocol for nonmonotonic utility spaces.Computational Intelligence, 27(2):166– 217, 2011

work page 2011
[33]

Nb 3: a multilateral negotiation algorithm for large, non-linear agreement spaces with limited time

Dave De Jonge and Carles Sierra. Nb 3: a multilateral negotiation algorithm for large, non-linear agreement spaces with limited time. Autonomous Agents and Multi-Agent Systems, 29(5):896–942, 2015

work page 2015
[34]

Query complexity of derivative-free optimization.Advances in Neural Information Processing Systems, 25, 2012

Kevin G Jamieson, Robert Nowak, and Ben Recht. Query complexity of derivative-free optimization.Advances in Neural Information Processing Systems, 25, 2012

work page 2012
[35]

Sign-opt: A query-efficient hard-label adversarial attack

Minhao Cheng, Simranjit Singh, Patrick Chen, Pin Yu Chen, Sĳia Liu, and Cho Jui Hsieh. Sign-opt: A query-efficient hard-label adversarial attack. In 8th International Conference on Learning Representations, ICLR 2020, 2020

work page 2020
[36]

Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454, 2024

Chenyi Zhang and Tongyang Li. Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454, 2024

work page arXiv 2024
[37]

Multi-objective optimization

Kalyanmoy Deb, Karthik Sindhya, and Jussi Hakanen. Multi-objective optimization. In Decision sciences, pages 161–200. CRC Press, 2016

work page 2016
[38]

Pardalos, Antanas Žilinskas, and Julius Žilinskas.Scalariza- tion, pages 13–18

Panos M. Pardalos, Antanas Žilinskas, and Julius Žilinskas.Scalariza- tion, pages 13–18. Springer International Publishing, Cham, 2017

work page 2017
[39]

Evolutionary multi-objective optimization: a his- torical view of the field

CA Coello Coello. Evolutionary multi-objective optimization: a his- torical view of the field. IEEE computational intelligence magazine, 1(1):28–36, 2006

work page 2006
[40]

Steepest descent methods for mul- ticriteria optimization

Jörg Fliege and Benar Fux Svaiter. Steepest descent methods for mul- ticriteria optimization. Mathematical methods of operations research, 51:479–494, 2000

work page 2000
[41]

A projected gradient method for vector optimization problems.Computational Optimization and applications, 28:5–29, 2004

LM Grana Drummond and Alfredo N Iusem. A projected gradient method for vector optimization problems.Computational Optimization and applications, 28:5–29, 2004

work page 2004
[42]

Multiple-gradient descent algorithm (mgda) for multiobjective optimization

Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5- 6):313–318, 2012

work page 2012
[43]

Bargaining and markets

Martin J Osborne and Ariel Rubinstein. Bargaining and markets. Academic Press Limited, 1990

work page 1990
[44]

Automated negotiations protocols for complex utility function as social system

Katsuhide Fujita. Automated negotiations protocols for complex utility function as social system. In Human-Centered Services Computing for Smart Cities: IEICE Monograph, pages 213–258. Springer Nature Singapore Singapore, 2024

work page 2024
[45]

Gpt-4 technical report, 2024

OpenAI. Gpt-4 technical report, 2024. 13 VIII. Appendix In this document, we provide (i) the proof of Theorem 1, (ii) a detailed description of ST-CR in discrete scenarios, (iii) pseudocode for the baselines used in our numerical experiments, (iv) adjustments to ST-CR for numerical examples, (v) details on the reproducibility of numerical experiments, (vi...

work page 2024
[46]

We begin by finding then-dimensional vector corresponding to the hypersphere centervcenter = M c where M ∈ Rn×n−1 is a matrix such thatM = [v1,

Next, we determine the coneC(τ ′, θ′) that encloses the hypersphereB(c, r). We begin by finding then-dimensional vector corresponding to the hypersphere centervcenter = M c where M ∈ Rn×n−1 is a matrix such thatM = [v1, . . . , vn−1] . We then obtainτ ′ by rotating τ in the direction ofvcenter by tan−1(∥c∥) (Algorithm 3, Lines 19 - 22) τ ′ = τ cos(tan−1(∥...

work page
[47]

Since we assume that the responding agent’s utility function is smooth and the offers have a bounded magnitude, the change in the responding agent’s gradient direction is bounded

Cone Initialization in Sequential Trading:When making sequential trades, ST-CR can bypass stage 2.1 by using the previous cone to initialize the cone for the next trade. Since we assume that the responding agent’s utility function is smooth and the offers have a bounded magnitude, the change in the responding agent’s gradient direction is bounded. Given a...

work page
[48]

Ensuring Beneficial Trades:As we discussed in prior sections, ST-CR selects offer directions that align with the offering agent’s gradient direction ensuring that⟨T, ∇f A(SA)⟩ ≥ 0. For non-linear utility functions, this method can lead to non- beneficial trades for the offering agent if∥∇f A T (SA)∥ is small, overshooting the optimal trade in the directio...

work page
[49]

If an offer is accepted, the agents transition to a new state

Improving Self-Interested Behavior:ST-CR usesn − 1 rejected orthogonal offers to refine the gradient cone. If an offer is accepted, the agents transition to a new state. When making then − 1 offers, the offering agent prioritizes offers that are most beneficial to itself, ensuring that accepted offers will have a larger benefit for the offering agent

work page
[50]

This is also a risk in discrete trading scenarios, where offers are constrained to integer values

Cone Update in Discrete Trading:As we have established, a risk of two-point comparisons is potentially making incorrect halfspace cuts. This is also a risk in discrete trading scenarios, where offers are constrained to integer values. In the worst case, incorrect cuts can lead to an empty polyhedron of potential gradients, where the halfspace constraints ...

work page
[51]

User" with

Generating new Trades in Discrete Trading:As we stated in the discrete trading section, the non-orthogonal or off-center halfspace cuts caused by discrete offers can result in ST-CR needing more thann − 1 offers to update the gradient cone. In such cases, we use the halfspace that bisects the farthest corner points of the polyhedron of potential gradients...

work page

[1] [1]

Cambridge University Press, 2008

Yoav Shoham and Kevin Leyton-Brown.Multiagent systems: Algorith- mic, game-theoretic, and logical foundations. Cambridge University Press, 2008

work page 2008

[2] [2]

John wiley & sons, 2009

Michael Wooldridge.An introduction to multiagent systems. John wiley & sons, 2009

work page 2009

[3] [3]

Cooperative inverse reinforcement learning.Advances in neural information processing systems, 29, 2016

Dylan Hadfield-Menell, Stuart J Russell, Pieter Abbeel, and Anca Dragan. Cooperative inverse reinforcement learning.Advances in neural information processing systems, 29, 2016

work page 2016

[4] [4]

A policy-blending formal- ism for shared control.The International Journal of Robotics Research, 32(7):790–805, 2013

Anca D Dragan and Siddhartha S Srinivasa. A policy-blending formal- ism for shared control.The International Journal of Robotics Research, 32(7):790–805, 2013

work page 2013

[5] [5]

Designing conventions for automated negotiation

Jeffrey S Rosenschein and Gilad Zlotkin. Designing conventions for automated negotiation. AI magazine, 15(3):29–29, 1994

work page 1994

[6] [6]

John F. Nash. The bargaining problem.Econometrica, 18(2):155–162, 1950

work page 1950

[7] [7]

Preference elicitation and query learning.Journal of Machine Learning Research, 5(Jun):649–667, 2004

Avrim Blum, Jeffrey Jackson, Tuomas Sandholm, and Martin Zinkevich. Preference elicitation and query learning.Journal of Machine Learning Research, 5(Jun):649–667, 2004

work page 2004

[8] [8]

Smooth convex optimization using sub-zeroth-order oracles

Mustafa O Karabag, Cyrus Neary, and Ufuk Topcu. Smooth convex optimization using sub-zeroth-order oracles. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3815– 3822, 2021

work page 2021

[9] [9]

Optimal negotiation decision functions in time-sensitive domains

Tim Baarslag, Enrico H Gerding, Reyhan Aydogan, and MC Schraefel. Optimal negotiation decision functions in time-sensitive domains. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), volume 2, pages 190–197. IEEE, 2015

work page 2015

[10] [10]

Negotiation decision functions for autonomous agents

Peyman Faratin, Carles Sierra, and Nick R Jennings. Negotiation decision functions for autonomous agents. Robotics and Autonomous Systems, 24(3-4):159–182, 1998

work page 1998

[11] [11]

Perfect equilibrium in a bargaining model.Economet- rica: Journal of the Econometric Society, pages 97–109, 1982

Ariel Rubinstein. Perfect equilibrium in a bargaining model.Economet- rica: Journal of the Econometric Society, pages 97–109, 1982

work page 1982

[12] [12]

Decoupling negotiating agents to explore the space of negotiation strategies.Novel insights in agent-based complex automated negotiation, pages 61–83, 2014

Tim Baarslag, Koen Hindriks, Mark Hendrikx, Alexander Dirkzwager, and Catholĳn Jonker. Decoupling negotiating agents to explore the space of negotiation strategies.Novel insights in agent-based complex automated negotiation, pages 61–83, 2014

work page 2014

[13] [13]

Ne- gotiating complex contracts.Group Decision and Negotiation, 12:111– 125, 2003

Mark Klein, Peyman Faratin, Hiroki Sayama, and Yaneer Bar-Yam. Ne- gotiating complex contracts.Group Decision and Negotiation, 12:111– 125, 2003

work page 2003

[14] [14]

A dependency-based mediation mechanism for complex negotiations.Modern Approaches to Agent-based Complex Automated Negotiation, pages 51–66, 2017

Akiyuki Mori, Shota Morii, and Takayuki Ito. A dependency-based mediation mechanism for complex negotiations.Modern Approaches to Agent-based Complex Automated Negotiation, pages 51–66, 2017

work page 2017

[15] [15]

Automed: an automated mediator for multi-issue bilateral negotiations

Michal Chalamish and Sarit Kraus. Automed: an automated mediator for multi-issue bilateral negotiations. Autonomous Agents and Multi-Agent Systems, 24:536–564, 2012

work page 2012

[16] [16]

Efficient issue-grouping approach for multiple interdependent issues negotiation between exag- gerator agents

Katsuhide Fujita, Takayuki Ito, and Mark Klein. Efficient issue-grouping approach for multiple interdependent issues negotiation between exag- gerator agents. Decision Support Systems, 60:10–17, 2014

work page 2014

[17] [17]

A secure and fair negotiation protocol in highly complex utility space based on cone- constraints

Katsuhide Fujita, Takayuki Ito, and Mark Klein. A secure and fair negotiation protocol in highly complex utility space based on cone- constraints. In 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, volume 2, pages 427–430. IEEE, 2009

work page 2009

[18] [18]

Searching for joint gains in multi-party negotiations.European Journal of Operational Research, 130(1):54–69, 2001

Harri Ehtamo, Eero Kettunen, and Raimo P Hämäläinen. Searching for joint gains in multi-party negotiations.European Journal of Operational Research, 130(1):54–69, 2001

work page 2001

[19] [19]

The fawkes agent—the anac 2013 negotiation contest winner.Next Frontier in Agent-Based Complex Automated Negotiation, pages 143–151, 2015

Vincent J Koeman, Kees Boon, Joris Z van den Oever, Madalin Dumitru- Guzu, and Laurentiu Catalin Stanculescu. The fawkes agent—the anac 2013 negotiation contest winner.Next Frontier in Agent-Based Complex Automated Negotiation, pages 143–151, 2015

work page 2013

[20] [20]

Hardheaded

Thĳs Van Krimpen, Daphne Looĳe, and Siamak Hajizadeh. Hardheaded. In Complex Automated Negotiations: Theories, Models, and Software Competitions, pages 223–227. Springer, 2013

work page 2013

[21] [21]

Faratin, C

P. Faratin, C. Sierra, and N.R. Jennings. Using similarity criteria to make issue trade-offs in automated negotiations.Artificial Intelligence, 142(2):205–237, 2002. International Conference on MultiAgent Systems 2000

work page 2002

[22] [22]

Automated multilateral negotiation on multiple issues with private information

Ronghuo Zheng, Tinglong Dai, Katia Sycara, and Nilanjan Chakraborty. Automated multilateral negotiation on multiple issues with private information. INFORMS Journal on Computing, 28(4):612–628, 2016

work page 2016

[23] [23]

An offer-generating strategy for multiple negotiations with mixed types of issues and issue interdependency

Kai Li, Lei Niu, Fenghui Ren, and Xinguo Yu. An offer-generating strategy for multiple negotiations with mixed types of issues and issue interdependency. Engineering Applications of Artificial Intelligence, 136:108891, 2024

work page 2024

[24] [24]

Optimal time-based strategy for automated negoti- ation

Yasser Mohammad. Optimal time-based strategy for automated negoti- ation. Applied Intelligence, 53(6):6710–6735, 2023

work page 2023

[25] [25]

Bayesian-based pref- erence prediction in bilateral multi-issue negotiation between intelligent agents

Jihang Zhang, Fenghui Ren, and Minjie Zhang. Bayesian-based pref- erence prediction in bilateral multi-issue negotiation between intelligent agents. Knowledge-Based Systems, 84:108–120, 2015

work page 2015

[26] [26]

An approach to scalable multi-issue negotiation: Decomposing the contract space based on issue interdependencies

Katsuhide Fujita, Takayuki Ito, and Mark Klein. An approach to scalable multi-issue negotiation: Decomposing the contract space based on issue interdependencies. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, volume 2, pages 399–406. IEEE, 2010

work page 2010

[27] [27]

Popponent: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations.Artificial Intelligence, 237:59–91, 2016

Farhad Zafari and Faria Nassiri-Mofakham. Popponent: Highly accurate, individually and socially efficient opponent preference model in bilateral multi issue negotiations.Artificial Intelligence, 237:59–91, 2016

work page 2016

[28] [28]

Learning on opponent’s preferences to make effective multi-issue negotiation trade-offs

Robert M Coehoorn and Nicholas R Jennings. Learning on opponent’s preferences to make effective multi-issue negotiation trade-offs. In Proceedings of the 6th international conference on Electronic commerce, pages 59–68, 2004

work page 2004

[29] [29]

Rethinking frequency opponent modeling in automated negotiation

Okan Tunalı, Reyhan Aydoğan, and Victor Sanchez-Anguix. Rethinking frequency opponent modeling in automated negotiation. In PRIMA 2017:PrinciplesandPracticeofMulti-AgentSystems:20thInternational Conference, Nice, France, October 30–November 3, 2017, Proceedings 20, pages 263–279. Springer, 2017

work page 2017

[30] [30]

Meta-strategy based on multi- armed bandit approach for multi-time negotiation

Ryohei Kawata and Katsuhide Fujita. Meta-strategy based on multi- armed bandit approach for multi-time negotiation. IEICE TRANSAC- TIONS on Information and Systems, 103(12):2540–2548, 2020

work page 2020

[31] [31]

An analysis of the linear bilateral anac domains using the micro benchmark strategy

Dave De Jonge. An analysis of the linear bilateral anac domains using the micro benchmark strategy. InĲCAI, pages 223–229, 2022

work page 2022

[32] [32]

A region-based multi-issue negotiation protocol for nonmonotonic utility spaces.Computational Intelligence, 27(2):166– 217, 2011

Miguel A Lopez-Carmona, Ivan Marsa-Maestre, Enrique De La Hoz, and Juan R Velasco. A region-based multi-issue negotiation protocol for nonmonotonic utility spaces.Computational Intelligence, 27(2):166– 217, 2011

work page 2011

[33] [33]

Nb 3: a multilateral negotiation algorithm for large, non-linear agreement spaces with limited time

Dave De Jonge and Carles Sierra. Nb 3: a multilateral negotiation algorithm for large, non-linear agreement spaces with limited time. Autonomous Agents and Multi-Agent Systems, 29(5):896–942, 2015

work page 2015

[34] [34]

Query complexity of derivative-free optimization.Advances in Neural Information Processing Systems, 25, 2012

Kevin G Jamieson, Robert Nowak, and Ben Recht. Query complexity of derivative-free optimization.Advances in Neural Information Processing Systems, 25, 2012

work page 2012

[35] [35]

Sign-opt: A query-efficient hard-label adversarial attack

Minhao Cheng, Simranjit Singh, Patrick Chen, Pin Yu Chen, Sĳia Liu, and Cho Jui Hsieh. Sign-opt: A query-efficient hard-label adversarial attack. In 8th International Conference on Learning Representations, ICLR 2020, 2020

work page 2020

[36] [36]

Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454, 2024

Chenyi Zhang and Tongyang Li. Comparisons are all you need for optimizing smooth functions.arXiv preprint arXiv:2405.11454, 2024

work page arXiv 2024

[37] [37]

Multi-objective optimization

Kalyanmoy Deb, Karthik Sindhya, and Jussi Hakanen. Multi-objective optimization. In Decision sciences, pages 161–200. CRC Press, 2016

work page 2016

[38] [38]

Pardalos, Antanas Žilinskas, and Julius Žilinskas.Scalariza- tion, pages 13–18

Panos M. Pardalos, Antanas Žilinskas, and Julius Žilinskas.Scalariza- tion, pages 13–18. Springer International Publishing, Cham, 2017

work page 2017

[39] [39]

Evolutionary multi-objective optimization: a his- torical view of the field

CA Coello Coello. Evolutionary multi-objective optimization: a his- torical view of the field. IEEE computational intelligence magazine, 1(1):28–36, 2006

work page 2006

[40] [40]

Steepest descent methods for mul- ticriteria optimization

Jörg Fliege and Benar Fux Svaiter. Steepest descent methods for mul- ticriteria optimization. Mathematical methods of operations research, 51:479–494, 2000

work page 2000

[41] [41]

A projected gradient method for vector optimization problems.Computational Optimization and applications, 28:5–29, 2004

LM Grana Drummond and Alfredo N Iusem. A projected gradient method for vector optimization problems.Computational Optimization and applications, 28:5–29, 2004

work page 2004

[42] [42]

Multiple-gradient descent algorithm (mgda) for multiobjective optimization

Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5- 6):313–318, 2012

work page 2012

[43] [43]

Bargaining and markets

Martin J Osborne and Ariel Rubinstein. Bargaining and markets. Academic Press Limited, 1990

work page 1990

[44] [44]

Automated negotiations protocols for complex utility function as social system

Katsuhide Fujita. Automated negotiations protocols for complex utility function as social system. In Human-Centered Services Computing for Smart Cities: IEICE Monograph, pages 213–258. Springer Nature Singapore Singapore, 2024

work page 2024

[45] [45]

Gpt-4 technical report, 2024

OpenAI. Gpt-4 technical report, 2024. 13 VIII. Appendix In this document, we provide (i) the proof of Theorem 1, (ii) a detailed description of ST-CR in discrete scenarios, (iii) pseudocode for the baselines used in our numerical experiments, (iv) adjustments to ST-CR for numerical examples, (v) details on the reproducibility of numerical experiments, (vi...

work page 2024

[46] [46]

We begin by finding then-dimensional vector corresponding to the hypersphere centervcenter = M c where M ∈ Rn×n−1 is a matrix such thatM = [v1,

Next, we determine the coneC(τ ′, θ′) that encloses the hypersphereB(c, r). We begin by finding then-dimensional vector corresponding to the hypersphere centervcenter = M c where M ∈ Rn×n−1 is a matrix such thatM = [v1, . . . , vn−1] . We then obtainτ ′ by rotating τ in the direction ofvcenter by tan−1(∥c∥) (Algorithm 3, Lines 19 - 22) τ ′ = τ cos(tan−1(∥...

work page

[47] [47]

Since we assume that the responding agent’s utility function is smooth and the offers have a bounded magnitude, the change in the responding agent’s gradient direction is bounded

Cone Initialization in Sequential Trading:When making sequential trades, ST-CR can bypass stage 2.1 by using the previous cone to initialize the cone for the next trade. Since we assume that the responding agent’s utility function is smooth and the offers have a bounded magnitude, the change in the responding agent’s gradient direction is bounded. Given a...

work page

[48] [48]

Ensuring Beneficial Trades:As we discussed in prior sections, ST-CR selects offer directions that align with the offering agent’s gradient direction ensuring that⟨T, ∇f A(SA)⟩ ≥ 0. For non-linear utility functions, this method can lead to non- beneficial trades for the offering agent if∥∇f A T (SA)∥ is small, overshooting the optimal trade in the directio...

work page

[49] [49]

If an offer is accepted, the agents transition to a new state

Improving Self-Interested Behavior:ST-CR usesn − 1 rejected orthogonal offers to refine the gradient cone. If an offer is accepted, the agents transition to a new state. When making then − 1 offers, the offering agent prioritizes offers that are most beneficial to itself, ensuring that accepted offers will have a larger benefit for the offering agent

work page

[50] [50]

This is also a risk in discrete trading scenarios, where offers are constrained to integer values

Cone Update in Discrete Trading:As we have established, a risk of two-point comparisons is potentially making incorrect halfspace cuts. This is also a risk in discrete trading scenarios, where offers are constrained to integer values. In the worst case, incorrect cuts can lead to an empty polyhedron of potential gradients, where the halfspace constraints ...

work page

[51] [51]

User" with

Generating new Trades in Discrete Trading:As we stated in the discrete trading section, the non-orthogonal or off-center halfspace cuts caused by discrete offers can result in ST-CR needing more thann − 1 offers to update the gradient cone. In such cases, we use the halfspace that bisects the farthest corner points of the polyhedron of potential gradients...

work page