This is a quantity internal to Mul algorithm that depends on actions chosen by Sin, and not a quantity inherent to the bandit problem

To compare the above two quantities, an intermediate quantity (introduced below in Equation (26)) : R(Sin, B,· ), the ‘regret’ of the black-box decision/action sequence xτs on interacting with the reward buffer B · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.

citing papers explorer

Showing 1 of 1 citing paper.

Creator Incentives in Recommender Systems: A Cooperative Game-Theoretic Approach for Stable and Fair Collaboration in Multi-Agent Bandits cs.LG · 2026-04-09 · unverdicted · none · ref 12
For homogeneous agents in multi-agent linear bandits the regret-based TU game is convex with non-empty core containing the Shapley value; for heterogeneous agents a simple regret-based payout lies in the core and satisfies three Shapley axioms.

This is a quantity internal to Mul algorithm that depends on actions chosen by Sin, and not a quantity inherent to the bandit problem

fields

years

verdicts

representative citing papers

citing papers explorer