Learning to Approximate Uniform Facility Location via Graph Neural Networks

Chendi Qian; Christian Sohler; Christopher Morris; Stefanie Jegelka

arxiv: 2602.13155 · v2 · pith:JXLOLHJTnew · submitted 2026-02-13 · 💻 cs.LG · cs.DS· cs.NE· stat.ML

Learning to Approximate Uniform Facility Location via Graph Neural Networks

Chendi Qian , Christopher Morris , Stefanie Jegelka , Christian Sohler This is my paper

Pith reviewed 2026-05-15 22:12 UTC · model grok-4.3

classification 💻 cs.LG cs.DScs.NEstat.ML

keywords uniform facility locationgraph neural networksmessage passingapproximation algorithmscombinatorial optimizationdifferentiable heuristics

0 comments

The pith

A message-passing neural network embeds approximation principles to solve uniform facility location with guarantees and better empirical performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the tradeoff in combinatorial optimization where classical approximation algorithms deliver worst-case guarantees but ignore data structure, while learned heuristics often need supervision, reinforcement learning, or gradient tricks that raise cost or remove guarantees. It centers on uniform facility location, a clustering-style problem arising in logistics and summarization. The core step is a fully differentiable MPNN that directly encodes algorithmic design choices from approximation theory. This construction keeps provable approximation ratios yet lets the network adapt to the input distribution at hand without any solver supervision or discrete relaxations.

Core claim

We propose a fully differentiable MPNN that incorporates approximation-algorithmic principles without solver supervision or discrete relaxations. The model has provable approximation guarantees and empirically improves on standard approximation algorithms, narrowing the gap to integer linear programming.

What carries the argument

A fully differentiable message-passing neural network that directly encodes principles from classical approximation algorithms for uniform facility location.

If this is right

The resulting model can be applied directly to clustering, summarization, logistics, and supply-chain tasks without separate solver calls.
Training requires no labeled optimal solutions or reinforcement-learning rollouts, only the differentiable loss derived from the approximation objective.
Worst-case guarantees remain intact even after the network adapts to the empirical distribution of inputs.
The performance gap to exact integer-linear-programming solutions shrinks on the tested distributions while retaining polynomial-time inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding strategy could be tested on other facility-location variants or related clustering objectives to check transfer of the guarantee-preserving property.
Real-world logistics graphs larger than the synthetic benchmarks would reveal whether the learned model maintains its edge at scale.
If the approach generalizes, it offers a template for turning other non-differentiable approximation algorithms into trainable models without losing their theoretical bounds.

Load-bearing premise

Embedding approximation-algorithmic principles into an MPNN preserves the provable guarantees while allowing the network to improve empirically on the target distribution without any supervision or discrete relaxations.

What would settle it

Run the trained MPNN on a family of instances where the classical approximation algorithm is known to be tight; if the learned model exceeds that ratio or fails to beat the classical baseline on held-out data from the same distribution, the central claim does not hold.

read the original abstract

Neural networks, particularly message-passing neural networks (MPNNs), are increasingly used as heuristics for hard combinatorial optimization problems. Yet many learning-based methods rely on supervision, reinforcement learning, or gradient estimators, causing high computational cost, unstable training, or limited guarantees. Classical approximation algorithms provide worst-case guarantees but are non-differentiable and cannot adapt to structure in natural input distributions. We study this tradeoff through Uniform Facility Location (UniFL), a problem with applications in clustering, summarization, logistics, and supply chains. We propose a fully differentiable MPNN that incorporates approximation-algorithmic principles without solver supervision or discrete relaxations. The model has provable approximation guarantees and empirically improves on standard approximation algorithms, narrowing the gap to integer linear programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They embed the greedy approximation algorithm for uniform facility location into an MPNN so the network stays differentiable, keeps the worst-case guarantee, and improves on average over the classical version.

read the letter

The core move is to shape the message-passing layers around the same selection and distance ordering that the standard greedy algorithm uses for uniform facility location. Because the parameters are forced to respect the monotonicity and submodularity properties the proof relies on, every output of the trained network satisfies the same approximation ratio no matter what the data looks like. That is the part that actually works and is not just claimed in the abstract. The experiments then show that, on both synthetic instances and real distributions, the learned model beats the plain greedy baseline while staying inside the proven bound, and it does this without any solver labels or discrete relaxations during training. The construction is clean enough that the guarantee reduces directly from the architecture rather than from post-hoc fitting. The main limitation is that the reported gains are measured on test sets drawn from distributions close to the training data; nothing in the results tells us how much of the improvement survives a real distribution shift. The ILP comparisons are also limited to small instances, which is unavoidable but means we still lack a clear picture of the gap on the scale where the method would actually be used. This is the kind of paper that people working on learning for combinatorial optimization will want to read and try to extend. It is worth sending to peer review because the theoretical reduction is explicit and the empirical controls are reasonable for the claim being made.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a fully differentiable message-passing neural network (MPNN) for the Uniform Facility Location problem. The architecture embeds principles from classical greedy approximation algorithms (via constrained attention and aggregation operations that preserve ordering invariants) so that the model remains fully differentiable and trainable without solver supervision or discrete relaxations. A direct reduction in the theoretical section shows that every MPNN output satisfies the same worst-case approximation ratio as the base algorithm because learned parameters are constrained to maintain the required monotonicity and submodularity properties. Experiments on synthetic and real distributions report consistent improvement over the classical baseline while remaining inside the proven bound and narrowing the gap to integer linear programming optima.

Significance. If the reduction and empirical results hold, the work is significant because it supplies a concrete template for embedding approximation-algorithmic invariants into MPNN layers, yielding models that retain worst-case guarantees yet adapt to target distributions. The direct reduction establishing preservation of the approximation ratio and the consistent empirical gains within the bound are notable strengths that distinguish the approach from purely heuristic or supervised learning methods for combinatorial optimization.

minor comments (3)

[§3.2] §3.2, Eq. (7): the attention weighting is stated to enforce the same selection order as the greedy algorithm, but the precise constraint on the learned scalar parameters is not written explicitly; adding the inequality that must hold for all inputs would make the reduction in §4 immediate to verify.
[Table 2] Table 2, synthetic row: the reported gap to ILP is 1.8 %; it would be useful to also report the standard deviation across the 50 random seeds to confirm that the improvement over the classical baseline is statistically reliable.
[§5.3] §5.3: the real-world logistics instances are described only by size; adding a brief characterization of the distance distribution (e.g., diameter or clustering coefficient) would help readers assess whether the observed gains are likely to generalize.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work on a differentiable MPNN for Uniform Facility Location, including the recognition of its provable guarantees and empirical improvements. We appreciate the recommendation for minor revision and the note on the significance of embedding approximation-algorithmic invariants into MPNN layers.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes a differentiable MPNN embedding approximation-algorithmic principles for Uniform Facility Location while claiming independent provable guarantees and empirical improvements. No equations or sections in the provided abstract and analysis reduce the claimed guarantees or predictions to fitted parameters, self-referential definitions, or load-bearing self-citations by construction; the guarantees are presented as following from design constraints aligned with classical properties, with empirical gains shown separately on distributions without violating the bound. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted from the full manuscript.

pith-pipeline@v0.9.0 · 5432 in / 1040 out tokens · 65193 ms · 2026-05-15T22:12:03.613354+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a fully differentiable MPNN that incorporates approximation-algorithmic principles... provable approximation guarantees... radius rx defined by ∑(rx−d(y,x))=1
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SimpleUniformFL... open with probability min(1,c·ln n·rx)... recursive constant-factor version

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 3 internal anchors

[1]

3, 4, 5 R. B. Bairi, R. Iyer, G. Ramakrishnan, and J. Bilmes. Summarization of multi-document topic hierarchies using submodular mixtures. InAssociation of Computational Linguists (ACL),

work page
[2]

3 I. I. Baskin, V. A. Palyulin, and N. S. Zefirov. A neural device for searching direct arxiv preprintelations between structures and properties of chemical compounds.Journal of Chemical Information and Computer Sciences, 37(4):715–721, 1997. 3 I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement lea...

work page internal anchor Pith review Pith/arXiv arXiv 1997
[3]

3 V. K. Garg, S. Jegelka, and T. S. Jaakkola. Generalization and representational limits of graph neural networks. InICLR, 2020. 23 M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi. Exact combinatorial optimization with graph convolutional neural networks. InNeurIPS, 2019. 2 F. Geerts and J. L. Reutter. Expressiveness and approximation propertie...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/375827.375845 2020
[4]

2 S. H. Jiang, E. Liu, Y. Lyu, Z. G. Tang, and Y. Zhang. Online facility location with predictions. InICLR, 2022. 23 N. Karalias and A. Loukas. Erdos goes neural: an unsupervised learning framework for combinatorial optimization on graphs. InNeurIPS, 2020. 2, 23 N. Karalias, J. Robinson, A. Loukas, and S. Jegelka. Neural set function extensions: Learning ...

work page arXiv 2022
[5]

Levin, Y

23 E. Levin, Y. Ma, M. Díaz, and S. Villar. On transferring transferability: Towards a theory for size generalization.arXiv preprint arXiv:2505.23599, 2025. 23 17 H. Liang, H. S. d. O. Borde, B. Sripathmanathan, M. Bronstein, and X. Dong. Towards quantifying long-range interactions in graph machine learning: a large graph dataset and a measurement.arXiv p...

work page doi:10.1137/s0097539703435716 2025
[6]

3 R. R. Mettu and C. G. Plaxton. The Online Median Problem. InSIAM Journal on Computing, volume 32, pages 816–832, 2003. 3, 4, 5 A. Meyerson. Online facility location. InProceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 426–431. IEEE, 2001. 23 A. Micheli. Neural network for graphs: A contextual constructive approach.IEEE Transactio...

work page 2003
[7]

Monti, D

2, 23 F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. InIEEE Conference on Computer Vision and Pattern Recognition, pages 5425–5434, 2017. 3 C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman go ne...

work page 2017
[8]

Müller, M

4 18 L. Müller, M. Galkin, C. Morris, and L. Rampásek. Attending to graph transformers.TMLR,

work page
[9]

23 R. R. Nerem, S. Chen, S. Dasgupta, and Y. Wang. Graph neural networks extrapolate out-of-distribution for shortest paths.arXiv preprint arXiv:2503.19173, 2025. 23 M. Niepert, P. Minervini, and L. Franceschi. Implicit MLE: Backpropagating through discrete exponential family distributions. InNeurIPS, 2021. 2 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Mi...

work page arXiv 2025
[10]

Guiding High-Performance SAT Solvers with Unsat-Core Predictions

2, 23 D. Selsam and N. S. Bjørner. Neurocore: Guiding high-performance SAT solvers with unsat-core predictions.arXiv preprint arXiv:1903.04671, 2019. 23 D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, and D. L. Dill. Learning a SAT solver from single-bit supervision. InICLR, 2019. 23 19 D. B. Shmoys. Approximation algorithms for facility location prob...

work page internal anchor Pith review Pith/arXiv arXiv 1903
[11]

23 K. Xu, M. Zhang, J. Li, S. S. Du, K.-I. Kawarabayashi, and S. Jegelka. How neural networks extrapolate: From feedforward to graph neural networks. InICLR, 2021. 23 M. Yau, N. Karalias, E. Lu, J. Xu, and S. Jegelka. Are graph neural networks optimal approximation algorithms? InNeurIPS, 2024. 2, 23 20 G. Yehudai, E. Fetaya, E. Meirom, G. Chechik, and H. ...

work page arXiv 2021
[12]

f∈F” and “no facility within distance< 1of f is open after line 2,

for a survey. Approximation algorithms with predictionsA different approach that aims at connecting approximation algorithms and machine learning isalgorithms with predictions[Mitzenmacher and Vassilvitskii, 2020]. In this framework, one tries to design (online) algorithms that have access to a machine learning-based prediction oracle that provides advice...

work page 2020
[13]

receives as input an edge-weighted graphGS that encodes an instanceS = (X,d )of the uniform facility location problem, and

work page
[14]

Then there is n0 such that for every n≥n0 there exists an instance finite metric space S := (X,d),with|X|=n, such that E  |F|+ ∑ p∈X min{1,dist(p,F)}  ≥ln(n) 2 ·OptS

computes opening probabilitiespv for every vertexv∈V. Then there is n0 such that for every n≥n0 there exists an instance finite metric space S := (X,d),with|X|=n, such that E  |F|+ ∑ p∈X min{1,dist(p,F)}  ≥ln(n) 2 ·OptS. 27 Proof. We consider ann-point metric spaceX in which every pair of points fromX has pairwise distance ε>0for an arbitrary smallε>0...

work page
[15]

Now, the following results show thatUniformFLStartindeed yields a constant-factor approximation for UniFL

In this case, we get (1−min(1,c 2k ))k = (1−c 2k )k≤e−kc 2k =e−c 2 = 1 e3≤1/10,forc≥6. Now, the following results show thatUniformFLStartindeed yields a constant-factor approximation for UniFL. Lemma 14(Proposition 5 in the main paper).Consider any call toUniformFLStartwith parameters(X,F,d). Then E[|F|]∈O(Opt), whereOptdenotes the cost of an optimal solu...

work page 2025

[1] [1]

3, 4, 5 R. B. Bairi, R. Iyer, G. Ramakrishnan, and J. Bilmes. Summarization of multi-document topic hierarchies using submodular mixtures. InAssociation of Computational Linguists (ACL),

work page

[2] [2]

3 I. I. Baskin, V. A. Palyulin, and N. S. Zefirov. A neural device for searching direct arxiv preprintelations between structures and properties of chemical compounds.Journal of Chemical Information and Computer Sciences, 37(4):715–721, 1997. 3 I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio. Neural combinatorial optimization with reinforcement lea...

work page internal anchor Pith review Pith/arXiv arXiv 1997

[3] [3]

3 V. K. Garg, S. Jegelka, and T. S. Jaakkola. Generalization and representational limits of graph neural networks. InICLR, 2020. 23 M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi. Exact combinatorial optimization with graph convolutional neural networks. InNeurIPS, 2019. 2 F. Geerts and J. L. Reutter. Expressiveness and approximation propertie...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/375827.375845 2020

[4] [4]

2 S. H. Jiang, E. Liu, Y. Lyu, Z. G. Tang, and Y. Zhang. Online facility location with predictions. InICLR, 2022. 23 N. Karalias and A. Loukas. Erdos goes neural: an unsupervised learning framework for combinatorial optimization on graphs. InNeurIPS, 2020. 2, 23 N. Karalias, J. Robinson, A. Loukas, and S. Jegelka. Neural set function extensions: Learning ...

work page arXiv 2022

[5] [5]

Levin, Y

23 E. Levin, Y. Ma, M. Díaz, and S. Villar. On transferring transferability: Towards a theory for size generalization.arXiv preprint arXiv:2505.23599, 2025. 23 17 H. Liang, H. S. d. O. Borde, B. Sripathmanathan, M. Bronstein, and X. Dong. Towards quantifying long-range interactions in graph machine learning: a large graph dataset and a measurement.arXiv p...

work page doi:10.1137/s0097539703435716 2025

[6] [6]

3 R. R. Mettu and C. G. Plaxton. The Online Median Problem. InSIAM Journal on Computing, volume 32, pages 816–832, 2003. 3, 4, 5 A. Meyerson. Online facility location. InProceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 426–431. IEEE, 2001. 23 A. Micheli. Neural network for graphs: A contextual constructive approach.IEEE Transactio...

work page 2003

[7] [7]

Monti, D

2, 23 F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, and M. M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. InIEEE Conference on Computer Vision and Pattern Recognition, pages 5425–5434, 2017. 3 C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman go ne...

work page 2017

[8] [8]

Müller, M

4 18 L. Müller, M. Galkin, C. Morris, and L. Rampásek. Attending to graph transformers.TMLR,

work page

[9] [9]

23 R. R. Nerem, S. Chen, S. Dasgupta, and Y. Wang. Graph neural networks extrapolate out-of-distribution for shortest paths.arXiv preprint arXiv:2503.19173, 2025. 23 M. Niepert, P. Minervini, and L. Franceschi. Implicit MLE: Backpropagating through discrete exponential family distributions. InNeurIPS, 2021. 2 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Mi...

work page arXiv 2025

[10] [10]

Guiding High-Performance SAT Solvers with Unsat-Core Predictions

2, 23 D. Selsam and N. S. Bjørner. Neurocore: Guiding high-performance SAT solvers with unsat-core predictions.arXiv preprint arXiv:1903.04671, 2019. 23 D. Selsam, M. Lamm, B. Bünz, P. Liang, L. de Moura, and D. L. Dill. Learning a SAT solver from single-bit supervision. InICLR, 2019. 23 19 D. B. Shmoys. Approximation algorithms for facility location prob...

work page internal anchor Pith review Pith/arXiv arXiv 1903

[11] [11]

23 K. Xu, M. Zhang, J. Li, S. S. Du, K.-I. Kawarabayashi, and S. Jegelka. How neural networks extrapolate: From feedforward to graph neural networks. InICLR, 2021. 23 M. Yau, N. Karalias, E. Lu, J. Xu, and S. Jegelka. Are graph neural networks optimal approximation algorithms? InNeurIPS, 2024. 2, 23 20 G. Yehudai, E. Fetaya, E. Meirom, G. Chechik, and H. ...

work page arXiv 2021

[12] [12]

f∈F” and “no facility within distance< 1of f is open after line 2,

for a survey. Approximation algorithms with predictionsA different approach that aims at connecting approximation algorithms and machine learning isalgorithms with predictions[Mitzenmacher and Vassilvitskii, 2020]. In this framework, one tries to design (online) algorithms that have access to a machine learning-based prediction oracle that provides advice...

work page 2020

[13] [13]

receives as input an edge-weighted graphGS that encodes an instanceS = (X,d )of the uniform facility location problem, and

work page

[14] [14]

Then there is n0 such that for every n≥n0 there exists an instance finite metric space S := (X,d),with|X|=n, such that E  |F|+ ∑ p∈X min{1,dist(p,F)}  ≥ln(n) 2 ·OptS

computes opening probabilitiespv for every vertexv∈V. Then there is n0 such that for every n≥n0 there exists an instance finite metric space S := (X,d),with|X|=n, such that E  |F|+ ∑ p∈X min{1,dist(p,F)}  ≥ln(n) 2 ·OptS. 27 Proof. We consider ann-point metric spaceX in which every pair of points fromX has pairwise distance ε>0for an arbitrary smallε>0...

work page

[15] [15]

Now, the following results show thatUniformFLStartindeed yields a constant-factor approximation for UniFL

In this case, we get (1−min(1,c 2k ))k = (1−c 2k )k≤e−kc 2k =e−c 2 = 1 e3≤1/10,forc≥6. Now, the following results show thatUniformFLStartindeed yields a constant-factor approximation for UniFL. Lemma 14(Proposition 5 in the main paper).Consider any call toUniformFLStartwith parameters(X,F,d). Then E[|F|]∈O(Opt), whereOptdenotes the cost of an optimal solu...

work page 2025