arxiv: 2604.20031 · v1 · submitted 2026-04-21 · 🧮 math.OC · cs.LG· stat.ML

Recognition: unknown

Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

Konstantinos Ziliaskopoulos , Alexander Vinel

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:24 UTC · model grok-4.3

classification 🧮 math.OC cs.LGstat.ML

keywords decision-focused learningfederated learningSPO+heterogeneous objectivesheterogeneous constraintspredict-then-optimizesupport functionsFedAvg

0 comments

The pith

A direct FedAvg and SPO+ combination yields promising performance in decision-focused federated learning even when clients have different objectives and constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines a setting where multiple agents each train a predictive model to feed into their own downstream linear optimization problem while keeping raw data private. It derives bounds on the SPO+ surrogate loss that separate the effects of differing cost vectors from shifts in feasible sets, using a support-function description of constraints. These bounds lead to a simple rule comparing the cost of heterogeneity against the benefit of pooled data. Experiments on both polyhedral and strongly convex problems show that federation remains effective, especially when feasible regions are strongly convex.

Core claim

Heterogeneity in objectives and constraints can be quantified through support-function distances that isolate objective shifts via norm differences and feasible-set shifts via shape measures. For strongly convex regions, optimizer stability yields tighter bounds. A heuristic excess-risk rule then identifies when the statistical gain from federation exceeds the heterogeneity penalty. Direct application of FedAvg to the SPO+ objective produces robust decision quality across clients whose linear programs differ noticeably.

What carries the argument

Support-function representation of the feasible region, separating objective shifts by norm distances and feasible-set shifts by shape distances to produce heterogeneity bounds on the SPO+ loss.

Load-bearing premise

The support-function distances accurately reflect how objective and constraint differences affect the downstream optimizer and remain stable throughout federated training.

What would settle it

An empirical run in which measured excess decision risk exceeds the derived heterogeneity bound by more than a small additive constant when objective and constraint distances are fixed and known.

Figures

Figures reproduced from arXiv: 2604.20031 by Alexander Vinel, Konstantinos Ziliaskopoulos.

**Figure 1.** Figure 1: Example configurations of two clients with slightly different objective vectors and the impact on the [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Boxplots of the relative regret for the two synthetic experimental settings. The orange boxplots correspond [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Scatter plot of the computed FG Difference on the y-axis versus the empirical regret difference from the [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the data distribution of the energy pricing, and the volatility of the pricing among clients. [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: The % improvement in downstream regret across the different agents in the PJM experiments. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

**Figure 6.** Figure 6: FG Ratios for both the polyhedral and strongly convex synthetic experiments. FG Ratio = 1 is marked with [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Local versus global regret per client in the [PITH_FULL_IMAGE:figures/full_fig_p032_7.png] view at source ↗

**Figure 8.** Figure 8: Local versus global regret per client in the [PITH_FULL_IMAGE:figures/full_fig_p033_8.png] view at source ↗

**Figure 9.** Figure 9: Local versus global regret per client in the [PITH_FULL_IMAGE:figures/full_fig_p034_9.png] view at source ↗

**Figure 10.** Figure 10: Local versus global regret per client in the [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗

**Figure 11.** Figure 11: FG Difference versus Real Regret Difference for clients [PITH_FULL_IMAGE:figures/full_fig_p036_11.png] view at source ↗

read the original abstract

We consider what we refer to as {Decision-Focused Federated Learning (DFFL)} framework, i.e., a predict-then-optimize approach employed by a collection of agents, where each agent's predictive model is an input to a downstream linear optimization problem, and no direct exchange of raw data is allowed. Importantly, clients can differ both in objective functions and in feasibility constraints. We build on the well-known SPO+ approach and develop heterogeneity bounds for the SPO+ surrogate loss in this case. This is accomplished by employing a support function representation of the feasible region, separating (i) objective shift via norm distances between the cost vectors and (ii) feasible-set shift via shape distances between the constraint sets. In the case of strongly convex feasible regions, sharper bounds are derived due to the optimizer stability. Building on these results, we define a heuristic local-versus-federated excess risk decision rule which, under SPO+ risk, gives a condition for when federation can be expected to improve decision quality: the heterogeneity penalty must be smaller than the statistical advantage of pooling data. We implement a FedAvg-style DFFL set of experiments on both polyhedral and strongly convex problems and show that federation is broadly robust in the strongly convex setting, while performance in the polyhedral setting degrades primarily with constraint heterogeneity, especially for clients with many samples. In other words, especially for the strongly convex case, an approach following a direct implementation of FedAvg and SPO+ can still yield promising performance even when the downstream optimization problems are noticeably different.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends SPO+ heterogeneity bounds to federated settings with differing objectives and constraints, showing robustness for strongly convex cases via FedAvg, though static bounds may not fully capture the training dynamics.

read the letter

The main thing to know is that the paper derives heterogeneity bounds for the SPO+ surrogate loss when clients have different linear objectives and different feasible sets. It splits the shifts into objective differences (norm distances on cost vectors) and feasible-set differences (shape distances via support functions), then tightens the bounds for strongly convex regions using optimizer stability. From those bounds it gives a heuristic rule for when to federate: pool data only if the heterogeneity penalty is smaller than the statistical gain from more samples. Experiments with a direct FedAvg implementation suggest the approach stays useful in the strongly convex case even when the downstream problems differ noticeably.

Referee Report

2 major / 2 minor

Summary. The paper introduces Decision-Focused Federated Learning (DFFL), a predict-then-optimize framework in which clients maintain local predictive models that feed into heterogeneous downstream linear optimization problems (differing objectives and feasible sets) without sharing raw data. It extends the SPO+ surrogate by deriving heterogeneity bounds that separate objective shifts (via norm distances on cost vectors) from feasible-set shifts (via shape distances using support-function representations of the constraint sets), obtains sharper stability-based bounds when feasible regions are strongly convex, proposes a heuristic local-versus-federated excess-risk rule comparing heterogeneity penalty to statistical gain from pooling, and reports FedAvg-style experiments indicating that direct federation remains robust, especially under strong convexity, even when downstream problems differ substantially.

Significance. If the derived bounds are shown to hold under the actual FedAvg dynamics and the experimental claims are quantified, the work would offer a principled way to quantify and mitigate heterogeneity in decision-focused federated settings, potentially guiding when federation improves decision quality. The support-function separation of objective and constraint heterogeneity is a clean technical device, and the emphasis on optimizer stability for convex cases provides a concrete path to sharper guarantees. The practical message that a straightforward FedAvg+SPO+ implementation can still perform well under noticeable heterogeneity is useful for practitioners.

major comments (2)

[Heterogeneity bounds and decision-rule sections] The heterogeneity bounds (derived via support-function representation of feasible sets) are stated for a fixed predictive model; the manuscript does not establish that these static bounds continue to control regret or excess risk once the model is replaced by the FedAvg iterate that is evaluated on each client’s distinct feasible set at every round. This transfer is load-bearing for the claim that the bounds justify the local-versus-federated decision rule.
[Experiments] The experimental section reports only high-level robustness statements for polyhedral and strongly convex instances without quantitative tables of regret, bound tightness across communication rounds, controls that vary the degree of objective versus constraint heterogeneity, or statistical significance; consequently it is impossible to determine whether the observed performance validates the derived bounds or simply reflects the chosen synthetic regimes.

minor comments (2)

[Abstract] The abstract and introduction use the phrase “promising performance” without reference to any concrete metric or baseline comparison.
[Preliminaries / notation] Support-function notation and the definition of shape distance are introduced without a short preliminary subsection recalling the relevant convex-analysis facts, which may hinder readers outside optimization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. The feedback highlights key points for clarification and improvement, particularly regarding the applicability of the bounds and the presentation of experimental results. We address each major comment below and describe the revisions we will implement.

read point-by-point responses

Referee: The heterogeneity bounds (derived via support-function representation of feasible sets) are stated for a fixed predictive model; the manuscript does not establish that these static bounds continue to control regret or excess risk once the model is replaced by the FedAvg iterate that is evaluated on each client’s distinct feasible set at every round. This transfer is load-bearing for the claim that the bounds justify the local-versus-federated decision rule.

Authors: The heterogeneity bounds (Theorem 3.1 and Corollary 3.2) are derived for an arbitrary fixed predictive model θ and make no assumption on the training procedure used to obtain θ. Each FedAvg iterate θ^t is therefore a fixed model when the local SPO+ loss is evaluated on a client’s distinct feasible set, so the bounds apply directly to the sequence of models produced by federated training. The local-versus-federated decision rule in Section 4 is presented explicitly as a heuristic that compares the heterogeneity penalty (bounded via the support-function distances) against the statistical gain from pooling; it is not claimed to be a rigorous guarantee for the full dynamic process. A complete convergence analysis of FedAvg under heterogeneous objectives and constraints would require additional assumptions on step sizes and landscape properties that lie outside the paper’s scope. In the revision we will insert a clarifying paragraph after Theorem 3.2 stating that the static bounds hold for any model, including FedAvg iterates, and we will emphasize the heuristic character of the decision rule in Section 4. revision: partial
Referee: The experimental section reports only high-level robustness statements for polyhedral and strongly convex instances without quantitative tables of regret, bound tightness across communication rounds, controls that vary the degree of objective versus constraint heterogeneity, or statistical significance; consequently it is impossible to determine whether the observed performance validates the derived bounds or simply reflects the chosen synthetic regimes.

Authors: We agree that the experimental section would be strengthened by more detailed quantitative reporting. In the revised manuscript we will expand Section 5 to include: (i) tables of average regret and excess SPO+ risk for local versus federated models; (ii) numerical comparisons of empirical heterogeneity against the derived bounds to assess tightness; (iii) additional runs that separately vary the magnitude of objective heterogeneity (cost-vector distances) and constraint heterogeneity (shape distances); and (iv) error bars together with statistical significance tests for the key comparisons. These additions will allow readers to evaluate how closely the observed performance tracks the theoretical predictions and the proposed decision rule. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds and decision rule derived from first-principles convex analysis

full rationale

The paper's core derivation separates objective shifts (norm distances on cost vectors) from feasible-set shifts (shape distances via support functions) to bound SPO+ surrogate loss under heterogeneity, then obtains sharper stability-based bounds for strongly convex feasible regions. These steps rely on standard properties of support functions and Lipschitz continuity of the optimizer map, without defining any quantity in terms of itself or renaming a fitted input as a prediction. The local-versus-federated heuristic is explicitly presented as a condition derived from the preceding bounds rather than an independent claim. No load-bearing self-citations appear in the derivation chain, and the FedAvg experiments serve only as empirical illustration, not as input to the analytic results. The overall chain remains self-contained against external convex-optimization benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work relies on standard assumptions from convex optimization and learning theory for deriving bounds and stability results.

axioms (2)

domain assumption Feasible regions admit a support function representation that separates objective and constraint shifts
Invoked to derive heterogeneity bounds for SPO+ loss.
domain assumption Optimizer stability holds for strongly convex feasible regions
Used to obtain sharper bounds in the strongly convex case.

pith-pipeline@v0.9.0 · 5576 in / 1101 out tokens · 36235 ms · 2026-05-10T01:24:44.221077+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 5 canonical work pages

[1]

A survey on heterogeneous federated learning.arXiv preprint arXiv:2210.04505,

Dashan Gao, Xin Yao, and Qiang Yang. A survey on heterogeneous federated learning.arXiv preprint arXiv:2210.04505,

work page arXiv
[2]

A differentiable programming system to bridge machine learning and scientific computing.arXiv preprint arXiv:1907.07587,

Mike Innes, Alan Edelman, Keno Fischer, Chris Rackauckas, Elliot Saba, Viral B Shah, and Will Tebbutt. A differentiable programming system to bridge machine learning and scientific computing.arXiv preprint arXiv:1907.07587,

work page arXiv 1907
[3]

From centralized to decentralized federated learning: Theoretical insights, privacy preservation, and robustness challenges.arXiv preprint arXiv:2503.07505,

Qiongxiu Li, Wenrui Yu, Yufei Xia, and Jun Pang. From centralized to decentralized federated learning: Theoretical insights, privacy preservation, and robustness challenges.arXiv preprint arXiv:2503.07505,

work page arXiv
[4]

Marin Vlastelica Poganˇci´c, Anselm Paulus, Vit Musil, Georg Martius, and Michal Rolinek

Accessed: 2026-02-14. Marin Vlastelica Poganˇci´c, Anselm Paulus, Vit Musil, Georg Martius, and Michal Rolinek. Differentiation of blackbox combinatorial solvers. InInternational Conference on Learning Representations,

2026
[5]

Federated learning with domain generalization.arXiv preprint arXiv:2111.10487,

Liling Zhang, Xinyu Lei, Yichun Shi, Hongyu Huang, and Chao Chen. Federated learning with domain generalization.arXiv preprint arXiv:2111.10487,

work page arXiv
[6]

Federated deep reinforcement learning,

Hankz Hankui Zhuo, Wenfeng Feng, Yufeng Lin, Qian Xu, and Qiang Yang. Federated deep reinforcement learning.arXiv preprint arXiv:1901.08277,

work page arXiv 1901
[7]

URLhttps://open-meteo.com/. A Implementation Details A.1 Synthetic Data-Generation Process Each synthetic instance is a pair(𝑥 𝑖, 𝑐𝑖)with features𝑥 𝑖 ∈R 𝑝 and product-level costs𝑐 𝑖 ∈R 𝑑, with defaults𝑝=8 and 𝑑=50. For each run, the random seed is fixed. Feature generation.Features are sampled i.i.d. as 𝑥𝑖 ∼ N (0, 𝐼 𝑝), 𝑖=1, . . . , 𝑛. To couple features ...

2021
[8]

1:Initialize global parameters𝑤 0 2:for𝑡=0,1,

24 DFFL Under Heterogeneity Algorithm 1FedAvg-DFFL (this work) Require:Total clients𝐾; rounds𝑇; client fraction𝐶; local epochs𝐸; learning rate𝜂; local optimizer Opt. 1:Initialize global parameters𝑤 0 2:for𝑡=0,1, . . . , 𝑇−1do 3:𝑚←max( ⌊𝐶𝐾⌋,1)[FedAvg] 4:𝑆 𝑡 ←sample𝑚clients uniformly without replacement[FedAvg] 5:for all𝑘∈𝑆 𝑡 in parallel do 6:𝑤 𝑡+1 𝑘 ←Clien...

2004
[9]

See (Schneider 2013). Fix𝑢. For any𝑤∈𝑆 1 there exists ˜𝑤∈𝑆 2 with∥𝑤−˜𝑤∥ ≤𝛿. Then 𝑢⊤𝑤≤𝑢 ⊤ ˜𝑤+ ∥𝑢∥ ∥𝑤−˜𝑤∥ ≤𝜉 𝑆2 (𝑢) + ∥𝑢∥𝛿. Taking max 𝑤∈𝑆 1 gives𝜉 𝑆1 (𝑢) ≤𝜉 𝑆2 (𝑢) + ∥𝑢∥𝛿. Repeat with(𝑆 1, 𝑆2) swapped. Proof.Proof of Lemma

2013
[10]

The SPO+ loss is made up of three terms, which we can observe under set translation: •Optimizer shifts:𝑤 ★ 𝑆′ =𝑤 ★ 𝑆 +𝑣(Boyd and Vandenberghe 2004)

Let𝑆 ′ =𝑆+𝑣, 𝑣∈R 𝑑, the translated set. The SPO+ loss is made up of three terms, which we can observe under set translation: •Optimizer shifts:𝑤 ★ 𝑆′ =𝑤 ★ 𝑆 +𝑣(Boyd and Vandenberghe 2004). •Optimal value shifts:𝑧 ★ 𝑆′ (𝑐)=𝑧 ★ 𝑆 (𝑐) +𝑐 𝑇 𝑣. •The max term shifts similarly: max 𝑤∈𝑆 ′ (𝑐−2 ˆ𝑐)𝑇 𝑤=max 𝑤∈𝑆 (𝑐−2 ˆ𝑐)𝑇 𝑤+ (𝑐−2 ˆ𝑐) 𝑇 𝑣. Substituting the terms intoℓ...

2004
[11]

Choose ˜𝑤2 ∈𝑆 2 such that∥𝑤 1 −˜𝑤2∥ ≤𝛿

Let𝛿=𝑑 𝐻 (𝑆1, 𝑆2). Choose ˜𝑤2 ∈𝑆 2 such that∥𝑤 1 −˜𝑤2∥ ≤𝛿. Then |ˆ𝑐⊤𝑤1 −ˆ𝑐⊤𝑤2| ≤ |ˆ𝑐⊤ (𝑤 1 −˜𝑤2)| + |ˆ𝑐⊤(˜𝑤2 −𝑤 2)| ≤ ∥ˆ𝑐∥ ∥𝑤1 −˜𝑤2∥ + ∥ˆ𝑐∥ ∥˜𝑤2 −𝑤 2∥ ≤ ∥ˆ𝑐∥ (𝛿+𝐷(𝑆 2)). By symmetry we also have the bound∥ˆ𝑐∥ (𝛿+𝐷(𝑆 1)), taking the minimum completes the bound. Proof.Proof of Theorem 1 Using (2), ℓ𝑆 (ˆ𝑐, 𝑐)= 𝜉𝑆 (𝑐−2 ˆ𝑐) −𝑧 ★ 𝑆 (𝑐) +2 ˆ𝑐⊤𝑤★ 𝑆 (𝑐). Hence, ℓ𝑆...

1996
[12]

Proof.Proof of Lemma 8 Fix𝑝∈R 𝑑,∥𝑝∥=1,and let𝑥 1 :=𝑥 𝑆1 (𝑝),𝑥 2 :=𝑥 𝑆2 (𝑝), where𝑥 𝑆 :=arg max 𝑥∈𝑆 𝑝𝑇 𝑥, and 𝑝𝑇 𝑥𝑆 =𝜉 𝑆 (𝑝)

This is a standard consequence of the supporting-ball characterization of𝜌-strong convexity: the supporting ball at direction𝑝yields the inequality∥𝑥−𝑥 𝑆 (𝑝) ∥ 2 +2𝜌(𝑥− 𝑥𝑆 (𝑝)) ⊤ 𝑝≤0 for all𝑥∈𝑆. Proof.Proof of Lemma 8 Fix𝑝∈R 𝑑,∥𝑝∥=1,and let𝑥 1 :=𝑥 𝑆1 (𝑝),𝑥 2 :=𝑥 𝑆2 (𝑝), where𝑥 𝑆 :=arg max 𝑥∈𝑆 𝑝𝑇 𝑥, and 𝑝𝑇 𝑥𝑆 =𝜉 𝑆 (𝑝). By the definition of Hausdorff distan...

2021
[13]

Subtracting𝑅 SPO+(𝑓 ★ 𝑗 )from both sides and noting that the MSE term for𝑓 ★ 𝑗 is non-negative gives the result

Substituting𝑥=𝑐−2 ˆ𝑐,𝑦=−𝑐yieldsℓ SPO+ (ˆ𝑐, 𝑐) ≤ 2 𝜌 ∥ˆ𝑐−𝑐∥ 2.Taking expectations over(𝑋, 𝐶)as in Section 5, 𝑅SPO+ (𝑓) ≤ 2 𝜌 MSE 𝑗 (𝑓). Subtracting𝑅 SPO+(𝑓 ★ 𝑗 )from both sides and noting that the MSE term for𝑓 ★ 𝑗 is non-negative gives the result. Remark4 (Instantiation with Su et al.’s subspace model).In the subspace model of (Su et al. 2023), the minima...

2023