pith. sign in

arxiv: 2503.12285 · v2 · submitted 2025-03-15 · 💻 cs.LG · cs.AI· cs.GT· cs.SY· eess.SY· stat.ML

A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback

Pith reviewed 2026-05-22 23:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.GTcs.SYeess.SYstat.ML
keywords bi-criteria combinatorial optimizationresilience frameworkblack-box reductioncombinatorial multi-armed banditsbandit feedbackregret boundsconstraint violation
0
0 comments X

The pith

A black-box framework converts any resilient offline bi-criteria algorithm into an online bandit algorithm achieving sublinear regret and constraint violation of order Õ(δ^{2/3} N^{1/3} T^{2/3}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a notion of resilience for bi-criteria approximation algorithms that measures how their guarantees for objectives and constraints hold up under noisy evaluations. It then gives a general reduction that turns any such resilient offline algorithm into an online algorithm for combinatorial multi-armed bandits with only bandit feedback. The online version attains sublinear regret and cumulative constraint violation without assuming linearity, submodularity, or semi-bandit feedback. This matters because it extends single-criterion resilience techniques to coupled multi-objective settings common in practice.

Core claim

The paper establishes a general black-box framework that converts any (α,β,δ,N)-resilient offline bi-criteria approximation algorithm into an online algorithm for bi-criteria combinatorial multi-armed bandits with bandit feedback, yielding regret and constraint violation bounds of Õ(δ^{2/3} N^{1/3} T^{2/3}). It demonstrates this by showing resilience for several classical greedy algorithms in submodular optimization.

What carries the argument

The (α,β,δ,N)-resilience property, which captures controlled degradation of joint approximation guarantees for objectives and constraints under worst-case oracle noise of level δ, serving as the condition for the black-box offline-to-online reduction.

If this is right

  • The resulting online algorithm requires no structural assumptions on the noisy functions.
  • The bounds hold for any resilient offline algorithm to which the framework is applied.
  • Resilience holds for classical greedy algorithms in submodular optimization, enabling their online use.
  • The approach works with bandit feedback only.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If resilience can be verified for other bi-criteria problems, the framework would immediately yield online algorithms for them.
  • The noise dependence of δ^{2/3} suggests that improving resilience properties could tighten the online bounds.
  • Connections to other bandit settings might allow similar reductions if the resilience definition generalizes.

Load-bearing premise

The offline bi-criteria approximation algorithm must satisfy the (α,β,δ,N)-resilience property so that its guarantees degrade in a controlled manner under worst-case noise of level δ.

What would settle it

Finding a bi-criteria combinatorial problem and offline algorithm where the joint approximation guarantees degrade faster than allowed by any (α,β,δ,N) bound under δ-level noise would falsify the applicability of the framework, or observing that the constructed online algorithm exceeds the stated regret order in a concrete instance.

read the original abstract

We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenges due to the coupled degradation of approximation guarantees for objectives and constraints. We introduce a notion of $(\alpha,\beta,\delta,\texttt{N})$-resilience for bi-criteria approximation algorithms, capturing how joint approximation guarantees degrade under bounded (possibly worst-case) oracle noise, and develop a general black-box framework that converts any resilient offline algorithm into an online algorithm for bi-criteria combinatorial multi-armed bandits with bandit feedback. The resulting online guarantees achieve sublinear regret and cumulative constraint violation of order $\tilde{O}(\delta^{2/3}\texttt{N}^{1/3}T^{2/3})$ without requiring structural assumptions such as linearity, submodularity, or semi-bandit feedback on the noisy functions. We demonstrate the applicability of the framework by establishing resilience for several classical greedy algorithms in submodular optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces an (α,β,δ,N)-resilience property for offline bi-criteria approximation algorithms that quantifies the coupled degradation of objective and constraint guarantees under worst-case oracle noise of level δ. It then presents a black-box reduction that converts any offline algorithm satisfying this property into an online algorithm for bi-criteria combinatorial multi-armed bandits under bandit feedback, yielding sublinear regret and cumulative constraint violation of order Õ(δ^{2/3} N^{1/3} T^{2/3}). The framework requires no linearity, submodularity, or semi-bandit assumptions on the underlying functions. Applicability is demonstrated by verifying the resilience property for several classical greedy algorithms in the submodular case.

Significance. If the reduction holds, the work supplies a general method for obtaining online guarantees in bi-criteria combinatorial settings from offline resilient algorithms, extending prior single-objective resilience results without imposing structural assumptions on the noisy functions. The explicit verification that classical greedy algorithms meet the resilience condition provides concrete applicability.

minor comments (3)
  1. [Definition 2.1] The definition of (α,β,δ,N)-resilience in the main text would benefit from an explicit statement of how the parameters α and β interact with the noise level δ when both objective and constraint approximations are considered simultaneously.
  2. [Theorem 4.1] In the reduction theorem, the dependence on N in the final bound should be accompanied by a brief discussion of whether N is treated as a constant or scales with the problem size.
  3. [Section 5] The experimental section (if present) or the submodular verification could include a short table comparing the observed degradation under noise to the theoretical resilience bounds.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and the positive recommendation of minor revision. No major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a new (α,β,δ,N)-resilience property for bi-criteria offline algorithms under oracle noise and supplies an explicit black-box reduction that converts any algorithm meeting this property into an online bandit algorithm whose regret and violation bounds follow directly from the definition and standard analysis of the reduction; the target Õ(δ^{2/3} N^{1/3} T^{2/3}) expression is obtained by applying the reduction rather than by fitting parameters or by re-using an unverified self-citation chain. The derivation is self-contained: resilience is stated as an assumption, the reduction is proved from it, and applicability is shown by verifying the property on classical greedy algorithms, with no load-bearing step that collapses to a tautology or to prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the new resilience definition and the assumption of bounded worst-case noise; no free parameters are fitted in the abstract statement of the result.

axioms (2)
  • domain assumption Oracle noise is bounded (worst-case) by a parameter δ.
    Invoked to define (α,β,δ,N)-resilience and to obtain the regret scaling.
  • domain assumption There exist offline bi-criteria algorithms that satisfy the resilience property.
    Required for the black-box conversion to produce the online guarantees.
invented entities (1)
  • (α,β,δ,N)-resilience no independent evidence
    purpose: Quantifies controlled degradation of bi-criteria approximation guarantees under bounded noise.
    Newly introduced definition that enables the black-box reduction.

pith-pipeline@v0.9.0 · 5735 in / 1513 out tokens · 35081 ms · 2026-05-22T23:49:55.091546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Fares Fourati, Mohamed-Slim Alouini, and V aneet Aggar- wal

    PMLR, 2023. Fares Fourati, Mohamed-Slim Alouini, and V aneet Aggar- wal. Federated combinatorial multi-agent multi-armed bandits. In F orty-first International Conference on Ma- chine Learning, 2024a. Fares Fourati, Christopher John Quinn, Mohamed-Slim Alouini, and V aneet Aggarwal. Combinatorial stochastic-greedy bandit. In Proceedings of the AAAI Confere...

  2. [2]

    Expected α -regret: E[Rf (T )] = O ( δ2/ 3hN1/ 3T 2/ 3 log1/ 3 T ) ,

  3. [3]

    Proof Sketch

    Expected cumulative β -constraint violation: E[Vg(T )] = O ( δ2/ 3hN1/ 3T 2/ 3 log1/ 3 T ) . Proof Sketch. The proof follows the same structure as Theorem 1, with adjus tments for minimization:

  4. [4]

    Clean Event: Concentration bounds hold as in Lemma 5

  5. [5]

    Resilience Guarantees: Under clean event E, E[f (S)] ≤ αf (OPT) + δhrad, E[g(S)] ≥ βκ − δhrad

  6. [6]

    Regret and CCV Decomposition : E[Rf (T )|E] = N∑ i=1 m(E[f (Si)] − αf (OPT)) + T∑ t=Nm+1 (E[f (S)] − αf (OPT)), E[Vg(T )|E] = N∑ i=1 m(βκ − E[g(Si)]) + T∑ t=Nm+1 (βκ − E[g(S)])

  7. [7]

    Similar bounds apply to CCV , where exploration CCV is ≤ Nmβκ and exploitation CCV ≤ T δhrad

    Bounding T erms: Exploration regret ≤ Nmfmax, exploitation regret ≤ T δhrad. Similar bounds apply to CCV , where exploration CCV is ≤ Nmβκ and exploitation CCV ≤ T δhrad

  8. [8]

    The full proof mirrors Theorem 1, with inequalities modified as above to reflect minimization

    Hyperparameter Substitution: Substituting m = O ( δ2/ 3T 2/ 3 log1/ 3 T N2/ 3 ) balances exploration and exploitation terms. The full proof mirrors Theorem 1, with inequalities modified as above to reflect minimization. B.4 KEY DIFFERENCES FROM MAXIMIZA TION FRAMEWORK

  9. [9]

    Objective and Constraint Swap: Cost minimization replaces reward maximization; utility lower-bound replaces cost upper-bound

  10. [10]

    Regret/CCV Definitions: Regret measures excess cost, while CCV measures utility sh ortfall

  11. [11]

    This conversion demonstrates the framework’s flexibility i n handling dual objectives across maximization and minimization problems under bandit feedback

    Resilience Inequalities: Additive errors δǫ increase cost bounds and decrease utility bounds. This conversion demonstrates the framework’s flexibility i n handling dual objectives across maximization and minimization problems under bandit feedback. C ALGORITHM AND PROOF FOR SUBMODULAR COVER PROBLEM Algorithm Setup: • Input: Ground set Ω, deterministic cost...

  12. [12]

    Given |ˆg(S) − g(S)| ≤ ǫ, g(S) ≥ ˆg(S) − ǫ ≥ (κ − ω ) − ǫ

    Noisy Utility Propagation: The algorithm terminates when ˆg(S) ≥ κ − ω . Given |ˆg(S) − g(S)| ≤ ǫ, g(S) ≥ ˆg(S) − ǫ ≥ (κ − ω ) − ǫ. Rewriting for β : g(S) ≥ ( 1 − ω κ ) κ − ǫ = βκ − ǫ. Thus, β = 1 − ω κ and the utility error term is δgǫ = ǫ =⇒ δg = 1

  13. [13]

    Let us denote x1, x 2,

    Cost Error Analysis: Let OPT = arg min S′⊆ Ω{f (S′) | g(S′) ≥ κ}. Let us denote x1, x 2, . . . , x ℓ to be the elements added (in order) by the algorithm. Define the set Si = {x1, x 2, . . . , x i}. Thus, Sℓ denotes the final set outputed by the algorithm. We want to bound f (Sℓ). We first make two basic observations. For i ≤ ℓ, ˆg(Si− 1) < κ since the algor...

  14. [14]

    Recursive Cost Bound: Let us define the utility gap κ i = κ − min(ˆg(Si), κ ). Then from the above inequality we get: ˆρxi (Si− 1) = κ i− 1 − κ i cxi ≥ κ − min(ˆg(Si− 1), κ ) f (OPT) − 3 ǫ cmin = κ i− 1 f (OPT) − 3 ǫ cmin =⇒ κ i ≤ κ i− 1 ( 1 − cxi f (OPT) ) + 3ǫcmax cmin =⇒ κ i ≤ κ i− 1e − cxi f (OPT) + 3ǫcmax cmin

  15. [15]

    (9) At termination it follows that κ − min(ˆg(Sℓ), κ ) ≤ ω where ω is the threshold parameter

    T elescoping Sum: Unrolling the recursion over ℓ − 1 iterations, we obtain: κ ℓ− 1 ≤ κe − ∑ℓ − 1 i=1 cxi f (OPT) + 3ǫcmax cmin (ℓ − 1) ≤ κe − f (Sℓ − 1 ) f (OPT) + 3ǫcmax cmin ℓ. (9) At termination it follows that κ − min(ˆg(Sℓ), κ ) ≤ ω where ω is the threshold parameter. Also, because ℓ is the last iteration, we have κ ℓ− 1 > ω and κ ℓ ≤ ω . Thus, ω ≤ κ...

  16. [16]

    Thus, we get α = 1 + ln ( κ ω ) and δ = cmax ωc min fmax(3 + 6n)

    Resilience Parameter δ and Oracle Calls N: Combining (11), (12), and (13), f (Sℓ) = cxℓ + f (Sℓ− 1) ≤ f (OPT) ( 1 + 3ǫ cmin cmax ω ) + f (OPT) ln( κ ω ) + f (OPT)6ǫ cmax ωc min ℓ = f (OPT) ( 1 + ln ( κ ω )) + ǫ cmax ωc min f (OPT)(3 + 6ℓ) (rearranging) ≤ f (OPT) ( 1 + ln ( κ ω )) + ǫ cmax ωc min fmax(3 + 6n). Thus, we get α = 1 + ln ( κ ω ) and δ = cmax ω...