Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Shradha Sharma; Shweta Jain; Swapnil Dhamal

arxiv: 2605.00762 · v1 · submitted 2026-05-01 · 💻 cs.LG · cs.AI· cs.MA

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

Shradha Sharma , Swapnil Dhamal , Shweta Jain This is my paper

Pith reviewed 2026-05-09 20:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.MA

keywords K-Shapley valuemeritocratic fairnesscombinatorial multi-armed banditsfull-bandit feedbackfairness regretbudgeted banditsShapley valuescooperative game theory

0 comments

The pith

K-Shapley value uniquely satisfies four axioms and supports an algorithm with O(T^{3/4}) fairness regret in budgeted combinatorial bandits under full-bandit feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a fairness framework for budgeted combinatorial multi-armed bandits where only the total reward of a selected subset is observed. It extends the classical Shapley value to the K-Shapley value that measures an arm's marginal contribution only within coalitions of size at most K. The authors prove this value is the unique solution concept satisfying symmetry, linearity, the null-player property, and efficiency. They introduce the K-SVFair-FBF algorithm that adaptively estimates the unknown valuation function while controlling Monte Carlo approximation noise to enforce meritocratic selection. The resulting method guarantees that fairness regret grows as O(T to the 3/4).

Core claim

We extend the Shapley value to the K-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most K. We show that K-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates K-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves O(T^{3/4}) regret bound

What carries the argument

K-Shapley value, the average marginal contribution of an arm computed only over coalitions of size at most K, which serves as the merit measure for fair arm selection.

If this is right

K-Shapley value is the unique concept satisfying symmetry, linearity, null-player, and efficiency axioms.
The algorithm simultaneously learns the valuation function and keeps Monte Carlo noise from breaking fairness.
Fairness regret is bounded by O(T^{3/4}), which is sublinear.
The method is demonstrated on federated learning and social influence maximization tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same axiomatic construction could be used to define fairness in other online allocation problems that lack per-arm observations.
Noise control techniques developed here may stabilize other cooperative-game allocations when only aggregate rewards are available.
Because the setting uses stricter full-bandit feedback than semi-bandit models, the approach may extend to environments with even less information.

Load-bearing premise

The unknown valuation function can be adaptively estimated from full-bandit observations while the Monte Carlo noise in K-Shapley approximation can be controlled without destroying the fairness guarantee.

What would settle it

A controlled simulation with known valuations where the algorithm's fairness regret grows linearly in T or where the computed K-Shapley values produce selections that violate one of the four axioms.

Figures

Figures reproduced from arXiv: 2605.00762 by Shradha Sharma, Shweta Jain, Swapnil Dhamal.

**Figure 1.** Figure 1: Comparative analysis for Merit-to-selection ratio across all datasets for K-SVFair-FBF and benchmark algorithms. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Comparative analysis for Fairness Regret for K-SVFair-FBF and benchmark algorithms. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Comparative analysis for Synthetic Dataset for different values of [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Comparative analysis of Federated Learning Dataset and Social Network Dataset on different values of [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Comparative analysis for Fairness Regret for K-SVFair-FBF and benchmark algorithms. [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy Variation for Federated Learning Global Model [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

read the original abstract

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a K-Shapley value for size-limited coalitions and applies it to fairness in full-feedback budgeted combinatorial bandits, but the O(T^{3/4}) regret claim rests on unshown control of Monte Carlo approximation error.

read the letter

The main contribution is the K-Shapley value, which restricts coalitions to size at most K and is shown unique under symmetry, linearity, null player, and efficiency. They then build K-SVFair-FBF to estimate this value on the fly while learning an unknown valuation function from full-bandit observations only, and they claim this yields O(T^{3/4}) fairness regret while handling Monte Carlo sampling noise in the marginal contributions.

Referee Report

3 major / 2 minor

Summary. The paper introduces the K-Shapley value as an extension of the Shapley value to capture marginal contributions restricted to coalitions of size at most K. It proves uniqueness of this value under the axioms of Symmetry, Linearity, Null player, and Efficiency. The authors then propose the K-SVFair-FBF algorithm for budgeted combinatorial multi-armed bandits under full-bandit feedback, which adaptively estimates the unknown valuation function and the K-Shapley values while mitigating Monte Carlo approximation noise. They claim that this yields an O(T^{3/4}) bound on fairness regret and demonstrate the approach on federated learning and social influence maximization tasks.

Significance. If the regret analysis rigorously controls the Monte Carlo estimation error under full-bandit feedback, the result would provide a novel fairness mechanism for combinatorial bandits with limited observability. The axiomatic characterization of K-Shapley and the explicit handling of approximation noise in the algorithm are potentially useful contributions to meritocratic fairness in sequential decision-making.

major comments (3)

[Theoretical analysis / regret theorem] Theoretical analysis section (around the regret theorem): The O(T^{3/4}) fairness regret bound is stated for K-SVFair-FBF, but the manuscript does not provide an explicit derivation or lemma bounding the additional error term induced by finite-sample Monte Carlo estimation of the K-Shapley values. Under full-bandit feedback only the scalar reward of the chosen subset is observed, so each round supplies limited information about individual marginal contributions; without a concrete concentration inequality or sample-complexity argument showing that the MC variance is absorbed into the T^{3/4} rate (e.g., when the number of samples grows only polylogarithmically), the claimed guarantee does not follow.
[K-Shapley definition and uniqueness] Definition and uniqueness of K-Shapley (early theoretical section): The uniqueness claim under the four listed axioms is asserted, yet no proof sketch, inductive argument, or reference to the standard Shapley uniqueness proof adapted to the size-K restriction is supplied. The manuscript should either include the key steps or clarify the precise assumptions on the valuation function v (e.g., whether it is monotone, submodular, or merely any set function) that are required for the axiomatic characterization to hold.
[K-SVFair-FBF algorithm] Algorithm description and estimation procedure: The adaptive estimation of the unknown valuation function from full-bandit observations is described at a high level, but the manuscript does not specify how the Monte Carlo samples for marginal contributions are drawn or how the resulting estimator is plugged into the fairness constraint without violating the O(T^{3/4}) bound. A concrete description of the sampling distribution and the bias-variance trade-off is needed.

minor comments (2)

[Experiments] Experiments section: Quantitative results, confidence intervals, and explicit baseline comparisons (including standard combinatorial bandit algorithms and fairness-agnostic variants) are missing; only qualitative statements are provided.
[Notation throughout] Notation: The distinction between the true K-Shapley value and its Monte Carlo estimate is not consistently denoted throughout the regret analysis, making it difficult to track error propagation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the current manuscript would benefit from expanded details on the regret analysis, uniqueness proof, and algorithm. We will revise accordingly and respond point by point below.

read point-by-point responses

Referee: [Theoretical analysis / regret theorem] Theoretical analysis section (around the regret theorem): The O(T^{3/4}) fairness regret bound is stated for K-SVFair-FBF, but the manuscript does not provide an explicit derivation or lemma bounding the additional error term induced by finite-sample Monte Carlo estimation of the K-Shapley values. Under full-bandit feedback only the scalar reward of the chosen subset is observed, so each round supplies limited information about individual marginal contributions; without a concrete concentration inequality or sample-complexity argument showing that the MC variance is absorbed into the T^{3/4} rate (e.g., when the number of samples grows only polylogarithmically), the claimed guarantee does not follow.

Authors: We acknowledge that an explicit lemma bounding the Monte Carlo error is missing from the current version. In the revision we will add Lemma 5 (or similar) that applies Hoeffding's inequality to the MC estimator for each K-Shapley value. With O(log T) samples per estimation phase and reuse of past observations, the per-round additive error is O(T^{-1/4}), which integrates into the existing O(T^{3/4}) fairness-regret analysis without altering the rate. This accounts for the scalar observation per round under full-bandit feedback. revision: yes
Referee: [K-Shapley definition and uniqueness] Definition and uniqueness of K-Shapley (early theoretical section): The uniqueness claim under the four listed axioms is asserted, yet no proof sketch, inductive argument, or reference to the standard Shapley uniqueness proof adapted to the size-K restriction is supplied. The manuscript should either include the key steps or clarify the precise assumptions on the valuation function v (e.g., whether it is monotone, submodular, or merely any set function) that are required for the axiomatic characterization to hold.

Authors: We will insert a concise proof sketch (new Theorem 2 or appendix) that adapts the classical Shapley uniqueness argument to coalitions of size at most K. The argument proceeds by showing that any value satisfying the four axioms must coincide with the K-Shapley formula, using linearity to decompose v and symmetry/null-player to fix the weights. The characterization holds for arbitrary set functions v; no monotonicity or submodularity is required. We will state this explicitly. revision: yes
Referee: [K-SVFair-FBF algorithm] Algorithm description and estimation procedure: The adaptive estimation of the unknown valuation function from full-bandit observations is described at a high level, but the manuscript does not specify how the Monte Carlo samples for marginal contributions are drawn or how the resulting estimator is plugged into the fairness constraint without violating the O(T^{3/4}) bound. A concrete description of the sampling distribution and the bias-variance trade-off is needed.

Authors: We will expand the algorithm section (and pseudocode) to specify that, at each round t, we draw M = O(log T) random subsets S of size at most K by including each arm independently with probability 1/2, then estimate marginal contributions from the single observed full-bandit reward of the played action. The resulting estimator is inserted into the fairness constraint via a projected update whose step-size is tuned to keep the bias-variance contribution inside the O(T^{3/4}) envelope. A short bias-variance lemma will be added to justify the choice of M. revision: yes

Circularity Check

0 steps flagged

No significant circularity: K-Shapley defined axiomatically; regret bound is independent analysis

full rationale

The derivation begins with an explicit extension of the classical Shapley value to K-Shapley via marginal contributions over coalitions of size at most K, followed by a direct proof that this satisfies the four standard axioms and is unique under them. This is a self-contained axiomatic characterization, not a self-definition or renaming. The K-SVFair-FBF algorithm is then constructed to estimate the unknown valuation function from full-bandit observations while bounding Monte Carlo approximation error; the O(T^{3/4}) fairness-regret guarantee is presented as a new concentration analysis that absorbs the approximation noise rather than reducing to a fitted parameter or prior self-citation. No load-bearing self-citations, ansatzes smuggled via citation, or fitted-input-called-prediction patterns appear in the central claims. The paper is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on extending the classical Shapley axioms to a size-limited version and on the existence of an adaptive estimator whose Monte Carlo error can be bounded independently of the fairness objective.

axioms (1)

domain assumption K-Shapley value satisfies Symmetry, Linearity, Null player, and efficiency
Invoked to establish uniqueness of the solution concept.

invented entities (1)

K-Shapley value no independent evidence
purpose: Captures marginal contribution of an arm when only sets of size at most K are considered
Newly defined extension of Shapley value for the budgeted combinatorial setting.

pith-pipeline@v0.9.0 · 5545 in / 1247 out tokens · 36701 ms · 2026-05-09T20:20:23.173371+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Characterization of a value for games under restricted cooperation.Annals of Operations Re- search, 318(2):773–785,

[Albizuriet al., 2022 ] M Josune Albizuri, Satoshi Masuya, and Jos´e M Zarzuelo. Characterization of a value for games under restricted cooperation.Annals of Operations Re- search, 318(2):773–785,

work page 2022
[2]

Client selection for generalization in accelerated federated learning: A multi-armed bandit approach.IEEE Access, 13:33697–33713,

[Amiet al., 2025 ] Dan Ben Ami, Kobi Cohen, and Qing Zhao. Client selection for generalization in accelerated federated learning: A multi-armed bandit approach.IEEE Access, 13:33697–33713,

work page 2025
[3]

Resource sharing and payoff allocation in a three-stage system: Integrating network dea with the shap- ley value method.Omega, 85:16–25,

[Anet al., 2019 ] Qingxian An, Yao Wen, Tao Ding, and Yongli Li. Resource sharing and payoff allocation in a three-stage system: Integrating network dea with the shap- ley value method.Omega, 85:16–25,

work page 2019
[4]

Regret in online combinatorial optimization.Mathematics of Operations Research, 39(1):31–45,

[Audibertet al., 2014 ] Jean-Yves Audibert, S ´ebastien Bubeck, and G´abor Lugosi. Regret in online combinatorial optimization.Mathematics of Operations Research, 39(1):31–45,

work page 2014
[5]

Combinatorial multi-armed bandit: General framework and applications

[Chenet al., 2013 ] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. InInternational conference on machine learn- ing, pages 151–159. PMLR,

work page 2013
[6]

Merit-based fair combinatorial semi-bandit with unrestricted feedback delays

[Chenet al., 2024 ] Ziqun Chen, Kechao Cai, Zhuoyue Chen, Jinbei Zhang, and John Lui. Merit-based fair combinatorial semi-bandit with unrestricted feedback delays. InECAI 2024, pages 2058–2065. IOS Press,

work page 2024
[7]

A shapley value for games with restricted coalitions.Interna- tional Journal of Game Theory, 21(4):351–360,

[Derks and Peters, 1993] Jean Derks and Hans Peters. A shapley value for games with restricted coalitions.Interna- tional Journal of Game Theory, 21(4):351–360,

work page 1993
[8]

Ran- domized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback

[Fouratiet al., 2023 ] Fares Fourati, Vaneet Aggarwal, Christopher Quinn, and Mohamed-Slim Alouini. Ran- domized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, pages 7455–7471. PMLR,

work page 2023
[9]

Combina- torial stochastic-greedy bandit.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12052–12060, Mar

[Fouratiet al., 2024 ] Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal. Combina- torial stochastic-greedy bandit.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12052–12060, Mar

work page 2024
[10]

Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33:1229–1239,

[Fryeet al., 2020 ] Christopher Frye, Colin Rowat, and Ilya Feige. Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33:1229–1239,

work page 2020
[11]

Dependent rounding and its applications to approximation algorithms

[Gandhiet al., 2006 ] Rajiv Gandhi, Samir Khuller, Srini- vasan Parthasarathy, and Aravind Srinivasan. Dependent rounding and its applications to approximation algorithms. Journal of the ACM (JACM), 53(3):324–360,

work page 2006
[12]

Data shapley: Equitable valuation of data for machine learn- ing

[Ghorbani and Zou, 2019] Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learn- ing. InInternational conference on machine learning, pages 2242–2251. PMLR,

work page 2019
[13]

Improving fairness in adaptive social exergames via shapley bandits

[Grayet al., 2023 ] Robert C Gray, Jennifer Villareale, Thomas Boyd Fox, Diane H Dallal, Santiago Onta ˜n´on, Danielle Arigo, Shahin Jabbari, and Jichen Zhu. Improving fairness in adaptive social exergames via shapley bandits. InProceedings of the 28th International Conference on Intelligent User Interfaces, pages 322–336,

work page 2023
[14]

Fast unfolding of commu- nities in large networks.Journal Statistical Mechanics: Theory and Experiment, 10:P1008,

[Guillaume, 2008] L Guillaume. Fast unfolding of commu- nities in large networks.Journal Statistical Mechanics: Theory and Experiment, 10:P1008,

work page 2008
[15]

Thompson sampling for combinatorial semi-bandits with sleeping arms and long- term fairness constraints.arXiv preprint arXiv:2005.06725,

[Huanget al., 2020 ] Zhiming Huang, Yifan Xu, Bingshan Hu, Qipeng Wang, and Jianping Pan. Thompson sampling for combinatorial semi-bandits with sleeping arms and long- term fairness constraints.arXiv preprint arXiv:2005.06725,

work page arXiv 2020
[16]

Resource allocation using shapley value in lte networks

[Iturraldeet al., 2011 ] Mauricio Iturralde, Tara Ali Yahiya, Anne Wei, and Andr ´e-Luc Beylot. Resource allocation using shapley value in lte networks. In2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications, pages 31–35. IEEE,

work page 2011
[17]

Identifying top-k players in cooperative games via shapley bandits

[Kolpaczkiet al., 2021 ] Patrick Kolpaczki, Viktor Bengs, and Eyke H¨ullermeier. Identifying top-k players in cooperative games via shapley bandits. InLWDA, pages 133–144,

work page 2021
[18]

Learning to discover social circles in ego net- works.Advances in neural information processing systems, 25,

[Leskovec and Mcauley, 2012] Jure Leskovec and Julian Mcauley. Learning to discover social circles in ego net- works.Advances in neural information processing systems, 25,

work page 2012
[19]

Combinato- rial sleeping bandits with fairness constraints.IEEE Trans- actions on Network Science and Engineering, 7(3):1799– 1813,

[Liet al., 2019 ] Fengjiao Li, Jia Liu, and Bo Ji. Combinato- rial sleeping bandits with fairness constraints.IEEE Trans- actions on Network Science and Engineering, 7(3):1799– 1813,

work page 2019
[20]

Cambridge University Press,

[Maschleret al., 2020 ] Michael Maschler, Shmuel Zamir, and Eilon Solan.Game theory. Cambridge University Press,

work page 2020
[21]

Enhancing the naviga- bility in a social network of smart objects: A shapley-value based approach.Computer Networks, 103:1–14,

[Militanoet al., 2016 ] Leonardo Militano, Michele Nitti, Luigi Atzori, and Antonio Iera. Enhancing the naviga- bility in a social network of smart objects: A shapley-value based approach.Computer Networks, 103:1–14,

work page 2016
[22]

Game of gradients: Mitigating ir- relevant clients in federated learning

[Nagalapatti and Narayanam, 2021] Lokesh Nagalapatti and Ramasuri Narayanam. Game of gradients: Mitigating ir- relevant clients in federated learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9046–9054,

work page 2021
[23]

[Narahari, 2014] Yadati Narahari.Game theory and mecha- nism design, volume

work page 2014
[24]

A shapley value-based approach to dis- cover influential nodes in social networks.IEEE trans- actions on automation science and engineering, 8(1):130– 147,

[Narayanam and Narahari, 2010] Ramasuri Narayanam and Yadati Narahari. A shapley value-based approach to dis- cover influential nodes in social networks.IEEE trans- actions on automation science and engineering, 8(1):130– 147,

work page 2010
[25]

Trade-off between payoff and model rewards in shapley-fair collabo- rative machine learning.Advances in Neural Information Processing Systems, 35:30542–30553,

[Nguyenet al., 2022 ] Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet. Trade-off between payoff and model rewards in shapley-fair collabo- rative machine learning.Advances in Neural Information Processing Systems, 35:30542–30553,

work page 2022
[26]

Online learning via offline greedy algo- rithms: Applications in market design and optimization

[Niazadehet al., 2021 ] Rad Niazadeh, Negin Golrezaei, Joshua R Wang, Fransisca Susan, and Ashwinkumar Badanidiyuru. Online learning via offline greedy algo- rithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 737–738,

work page 2021
[27]

An explore-then-commit algorithm for submodular maximization under full-bandit feedback

[Nieet al., 2022 ] Guanyu Nie, Mridul Agarwal, Ab- hishek Kumar Umrawal, Vaneet Aggarwal, and Christo- pher John Quinn. An explore-then-commit algorithm for submodular maximization under full-bandit feedback. In James Cussens and Kun Zhang, editors,Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 ofProceedi...

work page 2022
[28]

Anytime fairness guarantees in stochastic combinatorial mabs: A novel learn- ing framework

[Pokhriyalet al., 2025 ] Subham Pokhriyal, Shweta Jain, Ganesh Ghalme, and Vaneet Aggarwal. Anytime fairness guarantees in stochastic combinatorial mabs: A novel learn- ing framework. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, pages 1660–1669,

work page 2025
[29]

Allocating the gains from resource pooling with the shapley value.Journal of the Operational Research Society, 56(8):997–1000,

[Reinhardt and Dada, 2005] Gilles Reinhardt and Maqbool Dada. Allocating the gains from resource pooling with the shapley value.Journal of the Operational Research Society, 56(8):997–1000,

work page 2005
[30]

Top-k combinatorial bandits with full-bandit feedback

[Rejwan and Mansour, 2020] Idan Rejwan and Yishay Man- sour. Top-k combinatorial bandits with full-bandit feedback. InAlgorithmic Learning Theory, pages 752–776. PMLR,

work page 2020
[31]

Some aspects of the se- quential design of experiments

[Robbins, 1952] Herbert Robbins. Some aspects of the se- quential design of experiments

work page 1952
[32]

A value for n-person games

[Shapley and others, 1953] Lloyd S Shapley et al. A value for n-person games

work page 1953
[33]

Cores of convex games

[Shapley, 1971] Lloyd S Shapley. Cores of convex games. International journal of game theory, 1:11–26,

work page 1971
[34]

Shapleyfl: Robust federated learning based on shapley value

[Sunet al., 2023 ] Qiheng Sun, Xiang Li, Jiayao Zhang, Li Xiong, Weiran Liu, Jinfei Liu, Zhan Qin, and Kui Ren. Shapleyfl: Robust federated learning based on shapley value. InProceedings of the 29th ACM SIGKDD Con- ference on Knowledge Discovery and Data Mining, pages 2096–2108,

work page 2023
[35]

The many shapley values for model expla- nation

[Sundararajan and Najmi, 2020] Mukund Sundararajan and Amir Najmi. The many shapley values for model expla- nation. InInternational conference on machine learning, pages 9269–9278. PMLR,

work page 2020
[36]

2d-oob: Attributing data con- tribution through joint valuation framework.Advances in Neural Information Processing Systems, 37:46764– 46790, 2024b

[Tastanet al., 2024 ] Nurbek Tastan, Samar Fares, Toluwani Aremu, Samuel Horvath, and Karthik Nandakumar. Re- defining contributions: Shapley-driven federated learning. arXiv preprint arXiv:2406.00569,

work page arXiv 2024
[37]

Measure contribution of participants in feder- ated learning

[Wanget al., 2019 ] Guan Wang, Charlie Xiaoqian Dang, and Ziye Zhou. Measure contribution of participants in feder- ated learning. In2019 IEEE international conference on big data (Big Data), pages 2597–2604. IEEE,

work page 2019
[38]

Shapley q-value: A local reward approach to solve global reward games

[Wanget al., 2020 ] Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, and Yunjie Gu. Shapley q-value: A local reward approach to solve global reward games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7285–7292,

work page 2020
[39]

Fairness of exposure in stochastic ban- dits

[Wanget al., 2021 ] Lequn Wang, Yiwei Bai, Wen Sun, and Thorsten Joachims. Fairness of exposure in stochastic ban- dits. InInternational Conference on Machine Learning, pages 10686–10696. PMLR,

work page 2021
[40]

A value for partially de- fined cooperative games.International Journal of Game Theory, 21(4):371–384,

[Willson, 1993] Stephen J Willson. A value for partially de- fined cooperative games.International Journal of Game Theory, 21(4):371–384,

work page 1993
[41]

Multi- armed bandit-based client scheduling for federated learn- ing.IEEE Transactions on Wireless Communications, 19(11):7108–7123,

[Xiaet al., 2020 ] Wenchao Xia, Tony QS Quek, Kun Guo, Wanli Wen, Howard H Yang, and Hongbo Zhu. Multi- armed bandit-based client scheduling for federated learn- ing.IEEE Transactions on Wireless Communications, 19(11):7108–7123,

work page 2020
[42]

Combinatorial multi-armed bandits with con- cave rewards and fairness constraints

[Xuet al., 2020 ] Huanle Xu, Yang Liu, Wing Cheong Lau, and Rui Li. Combinatorial multi-armed bandits with con- cave rewards and fairness constraints. InIJCAI, pages 2554–2560,

work page 2020
[43]

Federated learning in- centive mechanism design via enhanced shapley value method.Wireless Communications and Mobile Computing, 2022(1):9690657,

[Yanget al., 2022 ] Xun Yang, Weijie Tan, Changgen Peng, Shuwen Xiang, and Kun Niu. Federated learning in- centive mechanism design via enhanced shapley value method.Wireless Communications and Mobile Computing, 2022(1):9690657,

work page 2022
[44]

B.4 Proof of Theorem 3 Proof

Thus, with probability at least1−(δ 1 +δ 2), the true Shapley value of arma j lies within the following confidence interval ϕaj ∈ " ˆϕt,aj − r 1 2Nt,aj R ln 2M δ1 + 2 r 1 2L ln 4Nt,aj RM δ2 , ˆϕt,aj + r 1 2Nt,aj R ln 2M δ1 + 2 r 1 2L ln 4Nt,aj RM δ2 # Lemma 2.For anyδ∈(0,1), with probability at least1−δ/2, it holds that TX t=1 Ea∼πt q 1 Nt,a − TX t=1 X a∈...

work page 2024
[45]

s 1 2R ln 2M δ1 T /LRX t=⌈M/K⌉+1 Ea∼πt 1p Nt,a ! ·(LR)+ 2 r 1 2L T /LRX t=⌈M/K⌉+1 Ea∼πt r ln 4Nt,aRM δ2 ·(LR) # ≤(M+K) + 2KP a ϕa

The fairness regret can be upper bounded as follows: FRT = ⌈M/K⌉X t=1 MX a=1 πt(a)−π ∗(a) + TX t=⌈M/K⌉+1 MX a=1 πt(a)−π ∗(a) FRT ≤ M K + 1)K ! + 2KP a ϕa TX t=⌈M/K⌉+1 Eat∼πt "s 1 2Nt,aR ln 2M δ1 + 2 s 1 2L ln 4Nt,aRM δ2 # In our setting, the algorithm proceeds in discrete rounds, and within each round, it executes multiple timesteps corresponding to Monte...

work page 2020
[46]

Real-World Dataset for Social Influence Maximization: For the influence maximization experiments, we used a subset of the Facebook network introduced by [Leskovec and Mcauley, 2012]. The same dataset has also been used by [Nieet al., 2022; Fouratiet al., 2024 ] in their work, which further demonstrates its suitability for studying influence spread in soci...

work page 2012
[47]

The configurationR= 500 , L= 200 provides the best trade-off between performance and efficiency, yielding stable convergence and fairness consistency across all datasets

The results in Figures 3, 4, and 5 show that increasing R and L improves the stability of the estimated K-Shapley values, and hence meritocratic fairness, but at the cost of higher computational overhead. The configurationR= 500 , L= 200 provides the best trade-off between performance and efficiency, yielding stable convergence and fairness consistency ac...

work page 2022

[1] [1]

Characterization of a value for games under restricted cooperation.Annals of Operations Re- search, 318(2):773–785,

[Albizuriet al., 2022 ] M Josune Albizuri, Satoshi Masuya, and Jos´e M Zarzuelo. Characterization of a value for games under restricted cooperation.Annals of Operations Re- search, 318(2):773–785,

work page 2022

[2] [2]

Client selection for generalization in accelerated federated learning: A multi-armed bandit approach.IEEE Access, 13:33697–33713,

[Amiet al., 2025 ] Dan Ben Ami, Kobi Cohen, and Qing Zhao. Client selection for generalization in accelerated federated learning: A multi-armed bandit approach.IEEE Access, 13:33697–33713,

work page 2025

[3] [3]

Resource sharing and payoff allocation in a three-stage system: Integrating network dea with the shap- ley value method.Omega, 85:16–25,

[Anet al., 2019 ] Qingxian An, Yao Wen, Tao Ding, and Yongli Li. Resource sharing and payoff allocation in a three-stage system: Integrating network dea with the shap- ley value method.Omega, 85:16–25,

work page 2019

[4] [4]

Regret in online combinatorial optimization.Mathematics of Operations Research, 39(1):31–45,

[Audibertet al., 2014 ] Jean-Yves Audibert, S ´ebastien Bubeck, and G´abor Lugosi. Regret in online combinatorial optimization.Mathematics of Operations Research, 39(1):31–45,

work page 2014

[5] [5]

Combinatorial multi-armed bandit: General framework and applications

[Chenet al., 2013 ] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework and applications. InInternational conference on machine learn- ing, pages 151–159. PMLR,

work page 2013

[6] [6]

Merit-based fair combinatorial semi-bandit with unrestricted feedback delays

[Chenet al., 2024 ] Ziqun Chen, Kechao Cai, Zhuoyue Chen, Jinbei Zhang, and John Lui. Merit-based fair combinatorial semi-bandit with unrestricted feedback delays. InECAI 2024, pages 2058–2065. IOS Press,

work page 2024

[7] [7]

A shapley value for games with restricted coalitions.Interna- tional Journal of Game Theory, 21(4):351–360,

[Derks and Peters, 1993] Jean Derks and Hans Peters. A shapley value for games with restricted coalitions.Interna- tional Journal of Game Theory, 21(4):351–360,

work page 1993

[8] [8]

Ran- domized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback

[Fouratiet al., 2023 ] Fares Fourati, Vaneet Aggarwal, Christopher Quinn, and Mohamed-Slim Alouini. Ran- domized greedy learning for non-monotone stochastic submodular maximization under full-bandit feedback. In International Conference on Artificial Intelligence and Statistics, pages 7455–7471. PMLR,

work page 2023

[9] [9]

Combina- torial stochastic-greedy bandit.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12052–12060, Mar

[Fouratiet al., 2024 ] Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and Vaneet Aggarwal. Combina- torial stochastic-greedy bandit.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12052–12060, Mar

work page 2024

[10] [10]

Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33:1229–1239,

[Fryeet al., 2020 ] Christopher Frye, Colin Rowat, and Ilya Feige. Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability.Advances in neural information processing systems, 33:1229–1239,

work page 2020

[11] [11]

Dependent rounding and its applications to approximation algorithms

[Gandhiet al., 2006 ] Rajiv Gandhi, Samir Khuller, Srini- vasan Parthasarathy, and Aravind Srinivasan. Dependent rounding and its applications to approximation algorithms. Journal of the ACM (JACM), 53(3):324–360,

work page 2006

[12] [12]

Data shapley: Equitable valuation of data for machine learn- ing

[Ghorbani and Zou, 2019] Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learn- ing. InInternational conference on machine learning, pages 2242–2251. PMLR,

work page 2019

[13] [13]

Improving fairness in adaptive social exergames via shapley bandits

[Grayet al., 2023 ] Robert C Gray, Jennifer Villareale, Thomas Boyd Fox, Diane H Dallal, Santiago Onta ˜n´on, Danielle Arigo, Shahin Jabbari, and Jichen Zhu. Improving fairness in adaptive social exergames via shapley bandits. InProceedings of the 28th International Conference on Intelligent User Interfaces, pages 322–336,

work page 2023

[14] [14]

Fast unfolding of commu- nities in large networks.Journal Statistical Mechanics: Theory and Experiment, 10:P1008,

[Guillaume, 2008] L Guillaume. Fast unfolding of commu- nities in large networks.Journal Statistical Mechanics: Theory and Experiment, 10:P1008,

work page 2008

[15] [15]

Thompson sampling for combinatorial semi-bandits with sleeping arms and long- term fairness constraints.arXiv preprint arXiv:2005.06725,

[Huanget al., 2020 ] Zhiming Huang, Yifan Xu, Bingshan Hu, Qipeng Wang, and Jianping Pan. Thompson sampling for combinatorial semi-bandits with sleeping arms and long- term fairness constraints.arXiv preprint arXiv:2005.06725,

work page arXiv 2020

[16] [16]

Resource allocation using shapley value in lte networks

[Iturraldeet al., 2011 ] Mauricio Iturralde, Tara Ali Yahiya, Anne Wei, and Andr ´e-Luc Beylot. Resource allocation using shapley value in lte networks. In2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications, pages 31–35. IEEE,

work page 2011

[17] [17]

Identifying top-k players in cooperative games via shapley bandits

[Kolpaczkiet al., 2021 ] Patrick Kolpaczki, Viktor Bengs, and Eyke H¨ullermeier. Identifying top-k players in cooperative games via shapley bandits. InLWDA, pages 133–144,

work page 2021

[18] [18]

Learning to discover social circles in ego net- works.Advances in neural information processing systems, 25,

[Leskovec and Mcauley, 2012] Jure Leskovec and Julian Mcauley. Learning to discover social circles in ego net- works.Advances in neural information processing systems, 25,

work page 2012

[19] [19]

Combinato- rial sleeping bandits with fairness constraints.IEEE Trans- actions on Network Science and Engineering, 7(3):1799– 1813,

[Liet al., 2019 ] Fengjiao Li, Jia Liu, and Bo Ji. Combinato- rial sleeping bandits with fairness constraints.IEEE Trans- actions on Network Science and Engineering, 7(3):1799– 1813,

work page 2019

[20] [20]

Cambridge University Press,

[Maschleret al., 2020 ] Michael Maschler, Shmuel Zamir, and Eilon Solan.Game theory. Cambridge University Press,

work page 2020

[21] [21]

Enhancing the naviga- bility in a social network of smart objects: A shapley-value based approach.Computer Networks, 103:1–14,

[Militanoet al., 2016 ] Leonardo Militano, Michele Nitti, Luigi Atzori, and Antonio Iera. Enhancing the naviga- bility in a social network of smart objects: A shapley-value based approach.Computer Networks, 103:1–14,

work page 2016

[22] [22]

Game of gradients: Mitigating ir- relevant clients in federated learning

[Nagalapatti and Narayanam, 2021] Lokesh Nagalapatti and Ramasuri Narayanam. Game of gradients: Mitigating ir- relevant clients in federated learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9046–9054,

work page 2021

[23] [23]

[Narahari, 2014] Yadati Narahari.Game theory and mecha- nism design, volume

work page 2014

[24] [24]

A shapley value-based approach to dis- cover influential nodes in social networks.IEEE trans- actions on automation science and engineering, 8(1):130– 147,

[Narayanam and Narahari, 2010] Ramasuri Narayanam and Yadati Narahari. A shapley value-based approach to dis- cover influential nodes in social networks.IEEE trans- actions on automation science and engineering, 8(1):130– 147,

work page 2010

[25] [25]

Trade-off between payoff and model rewards in shapley-fair collabo- rative machine learning.Advances in Neural Information Processing Systems, 35:30542–30553,

[Nguyenet al., 2022 ] Quoc Phong Nguyen, Bryan Kian Hsiang Low, and Patrick Jaillet. Trade-off between payoff and model rewards in shapley-fair collabo- rative machine learning.Advances in Neural Information Processing Systems, 35:30542–30553,

work page 2022

[26] [26]

Online learning via offline greedy algo- rithms: Applications in market design and optimization

[Niazadehet al., 2021 ] Rad Niazadeh, Negin Golrezaei, Joshua R Wang, Fransisca Susan, and Ashwinkumar Badanidiyuru. Online learning via offline greedy algo- rithms: Applications in market design and optimization. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 737–738,

work page 2021

[27] [27]

An explore-then-commit algorithm for submodular maximization under full-bandit feedback

[Nieet al., 2022 ] Guanyu Nie, Mridul Agarwal, Ab- hishek Kumar Umrawal, Vaneet Aggarwal, and Christo- pher John Quinn. An explore-then-commit algorithm for submodular maximization under full-bandit feedback. In James Cussens and Kun Zhang, editors,Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 ofProceedi...

work page 2022

[28] [28]

Anytime fairness guarantees in stochastic combinatorial mabs: A novel learn- ing framework

[Pokhriyalet al., 2025 ] Subham Pokhriyal, Shweta Jain, Ganesh Ghalme, and Vaneet Aggarwal. Anytime fairness guarantees in stochastic combinatorial mabs: A novel learn- ing framework. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, pages 1660–1669,

work page 2025

[29] [29]

Allocating the gains from resource pooling with the shapley value.Journal of the Operational Research Society, 56(8):997–1000,

[Reinhardt and Dada, 2005] Gilles Reinhardt and Maqbool Dada. Allocating the gains from resource pooling with the shapley value.Journal of the Operational Research Society, 56(8):997–1000,

work page 2005

[30] [30]

Top-k combinatorial bandits with full-bandit feedback

[Rejwan and Mansour, 2020] Idan Rejwan and Yishay Man- sour. Top-k combinatorial bandits with full-bandit feedback. InAlgorithmic Learning Theory, pages 752–776. PMLR,

work page 2020

[31] [31]

Some aspects of the se- quential design of experiments

[Robbins, 1952] Herbert Robbins. Some aspects of the se- quential design of experiments

work page 1952

[32] [32]

A value for n-person games

[Shapley and others, 1953] Lloyd S Shapley et al. A value for n-person games

work page 1953

[33] [33]

Cores of convex games

[Shapley, 1971] Lloyd S Shapley. Cores of convex games. International journal of game theory, 1:11–26,

work page 1971

[34] [34]

Shapleyfl: Robust federated learning based on shapley value

[Sunet al., 2023 ] Qiheng Sun, Xiang Li, Jiayao Zhang, Li Xiong, Weiran Liu, Jinfei Liu, Zhan Qin, and Kui Ren. Shapleyfl: Robust federated learning based on shapley value. InProceedings of the 29th ACM SIGKDD Con- ference on Knowledge Discovery and Data Mining, pages 2096–2108,

work page 2023

[35] [35]

The many shapley values for model expla- nation

[Sundararajan and Najmi, 2020] Mukund Sundararajan and Amir Najmi. The many shapley values for model expla- nation. InInternational conference on machine learning, pages 9269–9278. PMLR,

work page 2020

[36] [36]

2d-oob: Attributing data con- tribution through joint valuation framework.Advances in Neural Information Processing Systems, 37:46764– 46790, 2024b

[Tastanet al., 2024 ] Nurbek Tastan, Samar Fares, Toluwani Aremu, Samuel Horvath, and Karthik Nandakumar. Re- defining contributions: Shapley-driven federated learning. arXiv preprint arXiv:2406.00569,

work page arXiv 2024

[37] [37]

Measure contribution of participants in feder- ated learning

[Wanget al., 2019 ] Guan Wang, Charlie Xiaoqian Dang, and Ziye Zhou. Measure contribution of participants in feder- ated learning. In2019 IEEE international conference on big data (Big Data), pages 2597–2604. IEEE,

work page 2019

[38] [38]

Shapley q-value: A local reward approach to solve global reward games

[Wanget al., 2020 ] Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, and Yunjie Gu. Shapley q-value: A local reward approach to solve global reward games. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7285–7292,

work page 2020

[39] [39]

Fairness of exposure in stochastic ban- dits

[Wanget al., 2021 ] Lequn Wang, Yiwei Bai, Wen Sun, and Thorsten Joachims. Fairness of exposure in stochastic ban- dits. InInternational Conference on Machine Learning, pages 10686–10696. PMLR,

work page 2021

[40] [40]

A value for partially de- fined cooperative games.International Journal of Game Theory, 21(4):371–384,

[Willson, 1993] Stephen J Willson. A value for partially de- fined cooperative games.International Journal of Game Theory, 21(4):371–384,

work page 1993

[41] [41]

Multi- armed bandit-based client scheduling for federated learn- ing.IEEE Transactions on Wireless Communications, 19(11):7108–7123,

[Xiaet al., 2020 ] Wenchao Xia, Tony QS Quek, Kun Guo, Wanli Wen, Howard H Yang, and Hongbo Zhu. Multi- armed bandit-based client scheduling for federated learn- ing.IEEE Transactions on Wireless Communications, 19(11):7108–7123,

work page 2020

[42] [42]

Combinatorial multi-armed bandits with con- cave rewards and fairness constraints

[Xuet al., 2020 ] Huanle Xu, Yang Liu, Wing Cheong Lau, and Rui Li. Combinatorial multi-armed bandits with con- cave rewards and fairness constraints. InIJCAI, pages 2554–2560,

work page 2020

[43] [43]

Federated learning in- centive mechanism design via enhanced shapley value method.Wireless Communications and Mobile Computing, 2022(1):9690657,

[Yanget al., 2022 ] Xun Yang, Weijie Tan, Changgen Peng, Shuwen Xiang, and Kun Niu. Federated learning in- centive mechanism design via enhanced shapley value method.Wireless Communications and Mobile Computing, 2022(1):9690657,

work page 2022

[44] [44]

B.4 Proof of Theorem 3 Proof

Thus, with probability at least1−(δ 1 +δ 2), the true Shapley value of arma j lies within the following confidence interval ϕaj ∈ " ˆϕt,aj − r 1 2Nt,aj R ln 2M δ1 + 2 r 1 2L ln 4Nt,aj RM δ2 , ˆϕt,aj + r 1 2Nt,aj R ln 2M δ1 + 2 r 1 2L ln 4Nt,aj RM δ2 # Lemma 2.For anyδ∈(0,1), with probability at least1−δ/2, it holds that TX t=1 Ea∼πt q 1 Nt,a − TX t=1 X a∈...

work page 2024

[45] [45]

s 1 2R ln 2M δ1 T /LRX t=⌈M/K⌉+1 Ea∼πt 1p Nt,a ! ·(LR)+ 2 r 1 2L T /LRX t=⌈M/K⌉+1 Ea∼πt r ln 4Nt,aRM δ2 ·(LR) # ≤(M+K) + 2KP a ϕa

The fairness regret can be upper bounded as follows: FRT = ⌈M/K⌉X t=1 MX a=1 πt(a)−π ∗(a) + TX t=⌈M/K⌉+1 MX a=1 πt(a)−π ∗(a) FRT ≤ M K + 1)K ! + 2KP a ϕa TX t=⌈M/K⌉+1 Eat∼πt "s 1 2Nt,aR ln 2M δ1 + 2 s 1 2L ln 4Nt,aRM δ2 # In our setting, the algorithm proceeds in discrete rounds, and within each round, it executes multiple timesteps corresponding to Monte...

work page 2020

[46] [46]

Real-World Dataset for Social Influence Maximization: For the influence maximization experiments, we used a subset of the Facebook network introduced by [Leskovec and Mcauley, 2012]. The same dataset has also been used by [Nieet al., 2022; Fouratiet al., 2024 ] in their work, which further demonstrates its suitability for studying influence spread in soci...

work page 2012

[47] [47]

The configurationR= 500 , L= 200 provides the best trade-off between performance and efficiency, yielding stable convergence and fairness consistency across all datasets

The results in Figures 3, 4, and 5 show that increasing R and L improves the stability of the estimated K-Shapley values, and hence meritocratic fairness, but at the cost of higher computational overhead. The configurationR= 500 , L= 200 provides the best trade-off between performance and efficiency, yielding stable convergence and fairness consistency ac...

work page 2022