A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback

Christopher John Quinn; Shweta Jain; Subham Pokhriyal; Vaneet Aggarwal

arxiv: 2503.12285 · v2 · submitted 2025-03-15 · 💻 cs.LG · cs.AI· cs.GT· cs.SY· eess.SY· stat.ML

A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback

Vaneet Aggarwal , Shweta Jain , Subham Pokhriyal , Christopher John Quinn This is my paper

Pith reviewed 2026-05-22 23:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.GTcs.SYeess.SYstat.ML

keywords bi-criteria combinatorial optimizationresilience frameworkblack-box reductioncombinatorial multi-armed banditsbandit feedbackregret boundsconstraint violation

0 comments

The pith

A black-box framework converts any resilient offline bi-criteria algorithm into an online bandit algorithm achieving sublinear regret and constraint violation of order Õ(δ^{2/3} N^{1/3} T^{2/3}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a notion of resilience for bi-criteria approximation algorithms that measures how their guarantees for objectives and constraints hold up under noisy evaluations. It then gives a general reduction that turns any such resilient offline algorithm into an online algorithm for combinatorial multi-armed bandits with only bandit feedback. The online version attains sublinear regret and cumulative constraint violation without assuming linearity, submodularity, or semi-bandit feedback. This matters because it extends single-criterion resilience techniques to coupled multi-objective settings common in practice.

Core claim

The paper establishes a general black-box framework that converts any (α,β,δ,N)-resilient offline bi-criteria approximation algorithm into an online algorithm for bi-criteria combinatorial multi-armed bandits with bandit feedback, yielding regret and constraint violation bounds of Õ(δ^{2/3} N^{1/3} T^{2/3}). It demonstrates this by showing resilience for several classical greedy algorithms in submodular optimization.

What carries the argument

The (α,β,δ,N)-resilience property, which captures controlled degradation of joint approximation guarantees for objectives and constraints under worst-case oracle noise of level δ, serving as the condition for the black-box offline-to-online reduction.

If this is right

The resulting online algorithm requires no structural assumptions on the noisy functions.
The bounds hold for any resilient offline algorithm to which the framework is applied.
Resilience holds for classical greedy algorithms in submodular optimization, enabling their online use.
The approach works with bandit feedback only.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If resilience can be verified for other bi-criteria problems, the framework would immediately yield online algorithms for them.
The noise dependence of δ^{2/3} suggests that improving resilience properties could tighten the online bounds.
Connections to other bandit settings might allow similar reductions if the resilience definition generalizes.

Load-bearing premise

The offline bi-criteria approximation algorithm must satisfy the (α,β,δ,N)-resilience property so that its guarantees degrade in a controlled manner under worst-case noise of level δ.

What would settle it

Finding a bi-criteria combinatorial problem and offline algorithm where the joint approximation guarantees degrade faster than allowed by any (α,β,δ,N) bound under δ-level noise would falsify the applicability of the framework, or observing that the constructed online algorithm exceeds the stated regret order in a concrete instance.

read the original abstract

We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenges due to the coupled degradation of approximation guarantees for objectives and constraints. We introduce a notion of $(\alpha,\beta,\delta,\texttt{N})$-resilience for bi-criteria approximation algorithms, capturing how joint approximation guarantees degrade under bounded (possibly worst-case) oracle noise, and develop a general black-box framework that converts any resilient offline algorithm into an online algorithm for bi-criteria combinatorial multi-armed bandits with bandit feedback. The resulting online guarantees achieve sublinear regret and cumulative constraint violation of order $\tilde{O}(\delta^{2/3}\texttt{N}^{1/3}T^{2/3})$ without requiring structural assumptions such as linearity, submodularity, or semi-bandit feedback on the noisy functions. We demonstrate the applicability of the framework by establishing resilience for several classical greedy algorithms in submodular optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines (α,β,δ,N)-resilience for bi-criteria problems and supplies a black-box reduction to bandit feedback that yields the stated regret order without structural assumptions.

read the letter

The paper's key move is defining an (α,β,δ,N)-resilience property for bi-criteria algorithms and then giving a black-box reduction that turns any offline algorithm meeting that property into an online bandit algorithm with sublinear regret and constraint violation of order Õ(δ^{2/3} N^{1/3} T^{2/3}). This is new because it handles the coupled degradation of the two criteria under noise, and the reduction works without needing linearity, submodularity, or semi-bandit feedback. They back it up by showing that several greedy algorithms are resilient in the submodular case. The bound they get comes directly from the reduction applied to the resilience parameters. The framework does a good job extending the single-objective resilience results to this setting. The stress-test note confirms there is no internal inconsistency in the reduction or noise model. The soft spots are that the actual proof steps for the reduction and the bound are not visible from the abstract, so the tightness is hard to judge without the full text. Also, the examples are limited to submodular optimization, which means the broad applicability claim rests on whether resilient offline algorithms exist more generally. This paper is for researchers in online learning and combinatorial optimization who deal with multiple objectives. It shows clear thinking on generalizing resilience, so it deserves a serious referee to verify the details.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces an (α,β,δ,N)-resilience property for offline bi-criteria approximation algorithms that quantifies the coupled degradation of objective and constraint guarantees under worst-case oracle noise of level δ. It then presents a black-box reduction that converts any offline algorithm satisfying this property into an online algorithm for bi-criteria combinatorial multi-armed bandits under bandit feedback, yielding sublinear regret and cumulative constraint violation of order Õ(δ^{2/3} N^{1/3} T^{2/3}). The framework requires no linearity, submodularity, or semi-bandit assumptions on the underlying functions. Applicability is demonstrated by verifying the resilience property for several classical greedy algorithms in the submodular case.

Significance. If the reduction holds, the work supplies a general method for obtaining online guarantees in bi-criteria combinatorial settings from offline resilient algorithms, extending prior single-objective resilience results without imposing structural assumptions on the noisy functions. The explicit verification that classical greedy algorithms meet the resilience condition provides concrete applicability.

minor comments (3)

[Definition 2.1] The definition of (α,β,δ,N)-resilience in the main text would benefit from an explicit statement of how the parameters α and β interact with the noise level δ when both objective and constraint approximations are considered simultaneously.
[Theorem 4.1] In the reduction theorem, the dependence on N in the final bound should be accompanied by a brief discussion of whether N is treated as a constant or scales with the problem size.
[Section 5] The experimental section (if present) or the submodular verification could include a short table comparing the observed degradation under noise to the theoretical resilience bounds.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and the positive recommendation of minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a new (α,β,δ,N)-resilience property for bi-criteria offline algorithms under oracle noise and supplies an explicit black-box reduction that converts any algorithm meeting this property into an online bandit algorithm whose regret and violation bounds follow directly from the definition and standard analysis of the reduction; the target Õ(δ^{2/3} N^{1/3} T^{2/3}) expression is obtained by applying the reduction rather than by fitting parameters or by re-using an unverified self-citation chain. The derivation is self-contained: resilience is stated as an assumption, the reduction is proved from it, and applicability is shown by verifying the property on classical greedy algorithms, with no load-bearing step that collapses to a tautology or to prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the new resilience definition and the assumption of bounded worst-case noise; no free parameters are fitted in the abstract statement of the result.

axioms (2)

domain assumption Oracle noise is bounded (worst-case) by a parameter δ.
Invoked to define (α,β,δ,N)-resilience and to obtain the regret scaling.
domain assumption There exist offline bi-criteria algorithms that satisfy the resilience property.
Required for the black-box conversion to produce the online guarantees.

invented entities (1)

(α,β,δ,N)-resilience no independent evidence
purpose: Quantifies controlled degradation of bi-criteria approximation guarantees under bounded noise.
Newly introduced definition that enables the black-box reduction.

pith-pipeline@v0.9.0 · 5735 in / 1513 out tokens · 35081 ms · 2026-05-22T23:49:55.091546+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Fares Fourati, Mohamed-Slim Alouini, and V aneet Aggar- wal

PMLR, 2023. Fares Fourati, Mohamed-Slim Alouini, and V aneet Aggar- wal. Federated combinatorial multi-agent multi-armed bandits. In F orty-ﬁrst International Conference on Ma- chine Learning, 2024a. Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and V aneet Aggarwal. Combinatorial stochastic-greedy bandit. In Proceedings of the AAAI Confere...

work page arXiv 2023
[2]

Expected α -regret: E[Rf (T )] = O ( δ2/ 3hN1/ 3T 2/ 3 log1/ 3 T ) ,

work page
[3]

Proof Sketch

Expected cumulative β -constraint violation: E[Vg(T )] = O ( δ2/ 3hN1/ 3T 2/ 3 log1/ 3 T ) . Proof Sketch. The proof follows the same structure as Theorem 1, with adjus tments for minimization:

work page
[4]

Clean Event: Concentration bounds hold as in Lemma 5

work page
[5]

Resilience Guarantees: Under clean event E, E[f (S)] ≤ αf (OPT) + δhrad, E[g(S)] ≥ βκ − δhrad

work page
[6]

Regret and CCV Decomposition : E[Rf (T )|E] = N∑ i=1 m(E[f (Si)] − αf (OPT)) + T∑ t=Nm+1 (E[f (S)] − αf (OPT)), E[Vg(T )|E] = N∑ i=1 m(βκ − E[g(Si)]) + T∑ t=Nm+1 (βκ − E[g(S)])

work page
[7]

Similar bounds apply to CCV , where exploration CCV is ≤ Nmβκ and exploitation CCV ≤ T δhrad

Bounding T erms: Exploration regret ≤ Nmfmax, exploitation regret ≤ T δhrad. Similar bounds apply to CCV , where exploration CCV is ≤ Nmβκ and exploitation CCV ≤ T δhrad

work page
[8]

The full proof mirrors Theorem 1, with inequalities modiﬁed as above to reﬂect minimization

Hyperparameter Substitution: Substituting m = O ( δ2/ 3T 2/ 3 log1/ 3 T N2/ 3 ) balances exploration and exploitation terms. The full proof mirrors Theorem 1, with inequalities modiﬁed as above to reﬂect minimization. B.4 KEY DIFFERENCES FROM MAXIMIZA TION FRAMEWORK

work page
[9]

Objective and Constraint Swap: Cost minimization replaces reward maximization; utility lower-bound replaces cost upper-bound

work page
[10]

Regret/CCV Deﬁnitions: Regret measures excess cost, while CCV measures utility sh ortfall

work page
[11]

This conversion demonstrates the framework’s ﬂexibility i n handling dual objectives across maximization and minimization problems under bandit feedback

Resilience Inequalities: Additive errors δǫ increase cost bounds and decrease utility bounds. This conversion demonstrates the framework’s ﬂexibility i n handling dual objectives across maximization and minimization problems under bandit feedback. C ALGORITHM AND PROOF FOR SUBMODULAR COVER PROBLEM Algorithm Setup: • Input: Ground set Ω, deterministic cost...

work page 2013
[12]

Given |ˆg(S) − g(S)| ≤ ǫ, g(S) ≥ ˆg(S) − ǫ ≥ (κ − ω ) − ǫ

Noisy Utility Propagation: The algorithm terminates when ˆg(S) ≥ κ − ω . Given |ˆg(S) − g(S)| ≤ ǫ, g(S) ≥ ˆg(S) − ǫ ≥ (κ − ω ) − ǫ. Rewriting for β : g(S) ≥ ( 1 − ω κ ) κ − ǫ = βκ − ǫ. Thus, β = 1 − ω κ and the utility error term is δgǫ = ǫ =⇒ δg = 1

work page
[13]

Let us denote x1, x 2,

Cost Error Analysis: Let OPT = arg min S′⊆ Ω{f (S′) | g(S′) ≥ κ}. Let us denote x1, x 2, . . . , x ℓ to be the elements added (in order) by the algorithm. Deﬁne the set Si = {x1, x 2, . . . , x i}. Thus, Sℓ denotes the ﬁnal set outputed by the algorithm. We want to bound f (Sℓ). We ﬁrst make two basic observations. For i ≤ ℓ, ˆg(Si− 1) < κ since the algor...

work page
[14]

Recursive Cost Bound: Let us deﬁne the utility gap κ i = κ − min(ˆg(Si), κ ). Then from the above inequality we get: ˆρxi (Si− 1) = κ i− 1 − κ i cxi ≥ κ − min(ˆg(Si− 1), κ ) f (OPT) − 3 ǫ cmin = κ i− 1 f (OPT) − 3 ǫ cmin =⇒ κ i ≤ κ i− 1 ( 1 − cxi f (OPT) ) + 3ǫcmax cmin =⇒ κ i ≤ κ i− 1e − cxi f (OPT) + 3ǫcmax cmin

work page
[15]

(9) At termination it follows that κ − min(ˆg(Sℓ), κ ) ≤ ω where ω is the threshold parameter

T elescoping Sum: Unrolling the recursion over ℓ − 1 iterations, we obtain: κ ℓ− 1 ≤ κe − ∑ℓ − 1 i=1 cxi f (OPT) + 3ǫcmax cmin (ℓ − 1) ≤ κe − f (Sℓ − 1 ) f (OPT) + 3ǫcmax cmin ℓ. (9) At termination it follows that κ − min(ˆg(Sℓ), κ ) ≤ ω where ω is the threshold parameter. Also, because ℓ is the last iteration, we have κ ℓ− 1 > ω and κ ℓ ≤ ω . Thus, ω ≤ κ...

work page
[16]

Thus, we get α = 1 + ln ( κ ω ) and δ = cmax ωc min fmax(3 + 6n)

Resilience Parameter δ and Oracle Calls N: Combining (11), (12), and (13), f (Sℓ) = cxℓ + f (Sℓ− 1) ≤ f (OPT) ( 1 + 3ǫ cmin cmax ω ) + f (OPT) ln( κ ω ) + f (OPT)6ǫ cmax ωc min ℓ = f (OPT) ( 1 + ln ( κ ω )) + ǫ cmax ωc min f (OPT)(3 + 6ℓ) (rearranging) ≤ f (OPT) ( 1 + ln ( κ ω )) + ǫ cmax ωc min fmax(3 + 6n). Thus, we get α = 1 + ln ( κ ω ) and δ = cmax ω...

work page

[1] [1]

Fares Fourati, Mohamed-Slim Alouini, and V aneet Aggar- wal

PMLR, 2023. Fares Fourati, Mohamed-Slim Alouini, and V aneet Aggar- wal. Federated combinatorial multi-agent multi-armed bandits. In F orty-ﬁrst International Conference on Ma- chine Learning, 2024a. Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and V aneet Aggarwal. Combinatorial stochastic-greedy bandit. In Proceedings of the AAAI Confere...

work page arXiv 2023

[2] [2]

Expected α -regret: E[Rf (T )] = O ( δ2/ 3hN1/ 3T 2/ 3 log1/ 3 T ) ,

work page

[3] [3]

Proof Sketch

Expected cumulative β -constraint violation: E[Vg(T )] = O ( δ2/ 3hN1/ 3T 2/ 3 log1/ 3 T ) . Proof Sketch. The proof follows the same structure as Theorem 1, with adjus tments for minimization:

work page

[4] [4]

Clean Event: Concentration bounds hold as in Lemma 5

work page

[5] [5]

Resilience Guarantees: Under clean event E, E[f (S)] ≤ αf (OPT) + δhrad, E[g(S)] ≥ βκ − δhrad

work page

[6] [6]

Regret and CCV Decomposition : E[Rf (T )|E] = N∑ i=1 m(E[f (Si)] − αf (OPT)) + T∑ t=Nm+1 (E[f (S)] − αf (OPT)), E[Vg(T )|E] = N∑ i=1 m(βκ − E[g(Si)]) + T∑ t=Nm+1 (βκ − E[g(S)])

work page

[7] [7]

Similar bounds apply to CCV , where exploration CCV is ≤ Nmβκ and exploitation CCV ≤ T δhrad

Bounding T erms: Exploration regret ≤ Nmfmax, exploitation regret ≤ T δhrad. Similar bounds apply to CCV , where exploration CCV is ≤ Nmβκ and exploitation CCV ≤ T δhrad

work page

[8] [8]

The full proof mirrors Theorem 1, with inequalities modiﬁed as above to reﬂect minimization

Hyperparameter Substitution: Substituting m = O ( δ2/ 3T 2/ 3 log1/ 3 T N2/ 3 ) balances exploration and exploitation terms. The full proof mirrors Theorem 1, with inequalities modiﬁed as above to reﬂect minimization. B.4 KEY DIFFERENCES FROM MAXIMIZA TION FRAMEWORK

work page

[9] [9]

Objective and Constraint Swap: Cost minimization replaces reward maximization; utility lower-bound replaces cost upper-bound

work page

[10] [10]

Regret/CCV Deﬁnitions: Regret measures excess cost, while CCV measures utility sh ortfall

work page

[11] [11]

This conversion demonstrates the framework’s ﬂexibility i n handling dual objectives across maximization and minimization problems under bandit feedback

Resilience Inequalities: Additive errors δǫ increase cost bounds and decrease utility bounds. This conversion demonstrates the framework’s ﬂexibility i n handling dual objectives across maximization and minimization problems under bandit feedback. C ALGORITHM AND PROOF FOR SUBMODULAR COVER PROBLEM Algorithm Setup: • Input: Ground set Ω, deterministic cost...

work page 2013

[12] [12]

Given |ˆg(S) − g(S)| ≤ ǫ, g(S) ≥ ˆg(S) − ǫ ≥ (κ − ω ) − ǫ

Noisy Utility Propagation: The algorithm terminates when ˆg(S) ≥ κ − ω . Given |ˆg(S) − g(S)| ≤ ǫ, g(S) ≥ ˆg(S) − ǫ ≥ (κ − ω ) − ǫ. Rewriting for β : g(S) ≥ ( 1 − ω κ ) κ − ǫ = βκ − ǫ. Thus, β = 1 − ω κ and the utility error term is δgǫ = ǫ =⇒ δg = 1

work page

[13] [13]

Let us denote x1, x 2,

Cost Error Analysis: Let OPT = arg min S′⊆ Ω{f (S′) | g(S′) ≥ κ}. Let us denote x1, x 2, . . . , x ℓ to be the elements added (in order) by the algorithm. Deﬁne the set Si = {x1, x 2, . . . , x i}. Thus, Sℓ denotes the ﬁnal set outputed by the algorithm. We want to bound f (Sℓ). We ﬁrst make two basic observations. For i ≤ ℓ, ˆg(Si− 1) < κ since the algor...

work page

[14] [14]

Recursive Cost Bound: Let us deﬁne the utility gap κ i = κ − min(ˆg(Si), κ ). Then from the above inequality we get: ˆρxi (Si− 1) = κ i− 1 − κ i cxi ≥ κ − min(ˆg(Si− 1), κ ) f (OPT) − 3 ǫ cmin = κ i− 1 f (OPT) − 3 ǫ cmin =⇒ κ i ≤ κ i− 1 ( 1 − cxi f (OPT) ) + 3ǫcmax cmin =⇒ κ i ≤ κ i− 1e − cxi f (OPT) + 3ǫcmax cmin

work page

[15] [15]

(9) At termination it follows that κ − min(ˆg(Sℓ), κ ) ≤ ω where ω is the threshold parameter

T elescoping Sum: Unrolling the recursion over ℓ − 1 iterations, we obtain: κ ℓ− 1 ≤ κe − ∑ℓ − 1 i=1 cxi f (OPT) + 3ǫcmax cmin (ℓ − 1) ≤ κe − f (Sℓ − 1 ) f (OPT) + 3ǫcmax cmin ℓ. (9) At termination it follows that κ − min(ˆg(Sℓ), κ ) ≤ ω where ω is the threshold parameter. Also, because ℓ is the last iteration, we have κ ℓ− 1 > ω and κ ℓ ≤ ω . Thus, ω ≤ κ...

work page

[16] [16]

Thus, we get α = 1 + ln ( κ ω ) and δ = cmax ωc min fmax(3 + 6n)

Resilience Parameter δ and Oracle Calls N: Combining (11), (12), and (13), f (Sℓ) = cxℓ + f (Sℓ− 1) ≤ f (OPT) ( 1 + 3ǫ cmin cmax ω ) + f (OPT) ln( κ ω ) + f (OPT)6ǫ cmax ωc min ℓ = f (OPT) ( 1 + ln ( κ ω )) + ǫ cmax ωc min f (OPT)(3 + 6ℓ) (rearranging) ≤ f (OPT) ( 1 + ln ( κ ω )) + ǫ cmax ωc min fmax(3 + 6n). Thus, we get α = 1 + ln ( κ ω ) and δ = cmax ω...

work page