Network Learning with Semi-relaxed Gromov-Wasserstein

Charles Dufour; Leonardo V. Santoro; Ulysse Naepels

arxiv: 2606.02223 · v1 · pith:LZBBE7SWnew · submitted 2026-06-01 · 💻 cs.LG · math.ST· stat.ME· stat.TH

Network Learning with Semi-relaxed Gromov-Wasserstein

Charles Dufour , Ulysse Naepels , Leonardo V. Santoro This is my paper

Pith reviewed 2026-06-28 15:35 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.MEstat.TH

keywords network estimationGromov-Wassersteinstochastic block modelgraphonoptimal transportnode alignmentconsistency

0 comments

The pith

Semi-relaxed Gromov-Wasserstein recovers network generative mechanisms with an optimality gap vanishing at rate O(1/n).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Estimating the latent connectivity structure of large networks is combinatorially hard because nodes carry no canonical labels. The paper relaxes the deterministic assignment problem to probabilistic couplings through a semi-relaxed Gromov-Wasserstein objective, which produces a low-dimensional representation of the generative structure. This objective is solved by a block-coordinate conditional gradient algorithm. The central technical result shows that the gap between the relaxed solution and the deterministic assignment shrinks at rate O(1/n). The vanishing gap in turn supports tractable recovery together with consistency and minimax-optimal convergence rates when the network is generated by a stochastic block model or a Hölder-smooth graphon.

Core claim

The semi-relaxed Gromov-Wasserstein objective relaxes the combinatorial node-assignment problem to probabilistic couplings and thereby yields a tractable estimator for the generative mechanism of a network. Despite the relaxation, the resulting solution is typically deterministic because the optimality gap to the true deterministic assignment vanishes at rate O(1/n). This property permits rigorous statistical analysis that establishes consistency and minimax-optimal convergence rates for both stochastic block models and Hölder-smooth graphons.

What carries the argument

semi-relaxed Gromov-Wasserstein objective that relaxes node assignment to probabilistic couplings

If this is right

The block-coordinate conditional gradient algorithm computes the estimator efficiently and scales with the number of nodes.
The method recovers the underlying model consistently for stochastic block models.
Minimax-optimal convergence rates hold for both stochastic block models and Hölder-smooth graphons.
The low-dimensional representation obtained from the objective captures the essential generative structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relaxation technique could support statistical guarantees for other random-graph families if comparable gap bounds can be established.
The formulation may extend to other graph-matching tasks where exact combinatorial alignment remains intractable.
Empirical checks on networks whose structure deviates from the assumed classes would test how quickly the O(1/n) gap bound degrades.

Load-bearing premise

The network is generated from a stochastic block model or a Hölder-smooth graphon.

What would settle it

A sequence of networks drawn from a stochastic block model for which the estimation error fails to attain the claimed minimax rate or for which the optimality gap does not vanish at rate O(1/n).

Figures

Figures reproduced from arXiv: 2606.02223 by Charles Dufour, Leonardo V. Santoro, Ulysse Naepels.

**Figure 2.** Figure 2: Left: estimator for increasing network size in the ’G4’ example. Right: empirical loss, [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: EEG dataset: inferred block connectivity (left) and co-activation structures (right) for the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: OpenFlight network: inferred block-level connectivity. Geographic map in Section D. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Left: Empirical performance of the relaxed least-squares estimator across different graphon [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Left: Empirical performance of the relaxed least-squares estimator across different SBM [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Empirical rates of estimation (GW distance) for graphons for increasing network size. [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Empirical rates of estimation (GW distance) for SBMs for increasing network size and [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Empirical rates of estimation (GW distance) for SBMs for increasing network size and [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗

**Figure 10.** Figure 10: Runtime of our method with k-means initialization for graphons in Table 2. Lines represent [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Average running time (seconds, CPU) for different methods. For Histogram and the [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗

**Figure 12.** Figure 12: Complete results for all graphons (top) and SBMs (bottom) for all methods. [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

**Figure 13.** Figure 13: Low-dimensional co-activation structures estimated by the srGW barycenter for subjects [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗

**Figure 14.** Figure 14: Low-dimensional co-activation structures estimated by the srGW barycenter for subjects [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗

**Figure 15.** Figure 15: Low-dimensional co-activation structures estimated by the srGW barycenter for subjects [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗

**Figure 16.** Figure 16: Airports location colored by inferred cluster membership, with node size proportional to [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

read the original abstract

Estimating the generative mechanism of large-scale networks is a fundamental challenge in statistical machine learning. It requires the identification of the latent connectivity structure, which is in general an NP-hard combinatorial problem due to the absence of canonical node labels. We address this challenge by allowing for probabilistic couplings, thereby relaxing the assignment problem. Our estimation framework can be formulated as a semi-relaxed Gromov-Wasserstein objective and provides a low-dimensional representation of the generative structure. We solve this via a block-coordinate conditional gradient algorithm. Despite the relaxation, the resulting solution is typically deterministic: in fact, we show that the optimality gap between the relaxed solution and the deterministic assignment vanishes at rate $O(1/n)$, where $n$ is the number of nodes. This allows for tractable recovery of the underlying model and enables rigorous statistical analysis: we establish consistency and minimax-optimal convergence rates for both stochastic block models and Holder-smooth graphons. Our implementation scales efficiently with $n$, as demonstrated on both synthetic and real-world datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a semi-relaxed GW objective for unlabeled network estimation with an O(1/n) gap to deterministic assignments plus consistency and minimax rates under SBM and Hölder graphons.

read the letter

The core advance is a semi-relaxed Gromov-Wasserstein objective that turns the usual hard assignment problem into something solvable while still recovering the latent structure. They prove the relaxed solution sits within O(1/n) of a deterministic coupling, which then supports consistency and minimax rates for stochastic block models and Hölder-smooth graphons. The solver is a block-coordinate conditional gradient method that runs efficiently on large n, with checks on both synthetic and real data.

This is useful because the relaxation buys tractability without losing the statistical guarantees under the stated model classes. The O(1/n) gap result is the part that stands out; it directly addresses why the probabilistic version can still be used for recovery.

The main limitation is the dependence on the generative model. The rates only hold inside SBM or Hölder graphons, and the abstract gives no indication of what happens when the network deviates from those assumptions. Without the full derivations it is also difficult to judge how much the semi-relaxed form overlaps with earlier GW relaxations already in the literature.

The work is aimed at people doing statistical inference on unlabeled networks or graphons. Anyone already using optimal transport on graphs would find the rates and the algorithm concrete enough to examine. The quantitative claims are sharp enough that a serious referee should see the paper, even if revisions are needed on the assumption list and comparisons.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a semi-relaxed Gromov-Wasserstein objective to estimate the generative mechanism of large networks by relaxing the combinatorial assignment problem to probabilistic couplings. It is solved via a block-coordinate conditional gradient algorithm. The central theoretical claims are that the optimality gap to a deterministic assignment vanishes at rate O(1/n) and that this enables consistency plus minimax-optimal rates under stochastic block models and Hölder-smooth graphons. Experiments on synthetic and real data are mentioned to illustrate scalability.

Significance. If the O(1/n) gap result and the accompanying consistency/minimax statements hold, the work would supply a computationally attractive relaxation of network alignment that still recovers deterministic structure asymptotically, together with nonparametric rates under standard generative assumptions. This would be a useful bridge between optimal transport relaxations and statistical graphon estimation.

major comments (2)

[Abstract] Abstract: the existence of proofs for the O(1/n) optimality gap and for minimax rates is asserted, yet no derivation steps, explicit assumption list, or equation references are supplied; the central claims therefore cannot be verified from the available text.
[Abstract] Abstract: the consistency and minimax statements are conditioned on the generative mechanism belonging to the class of stochastic block models or Hölder-smooth graphons; no discussion is given of what occurs when this modeling assumption is violated, which is load-bearing for the recovery guarantees.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback on the abstract. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the existence of proofs for the O(1/n) optimality gap and for minimax rates is asserted, yet no derivation steps, explicit assumption list, or equation references are supplied; the central claims therefore cannot be verified from the available text.

Authors: The abstract is a concise summary and therefore omits detailed derivations, assumption lists, and equation references. The full manuscript supplies these elements in the theoretical development (proof of the O(1/n) gap, explicit assumptions, and minimax statements with equation references). To improve verifiability directly from the abstract, we will add brief references to the relevant theorems. revision: yes
Referee: [Abstract] Abstract: the consistency and minimax statements are conditioned on the generative mechanism belonging to the class of stochastic block models or Hölder-smooth graphons; no discussion is given of what occurs when this modeling assumption is violated, which is load-bearing for the recovery guarantees.

Authors: The consistency and minimax results are derived under the stated generative assumptions, which are standard for obtaining sharp rates. The manuscript does not analyze behavior under misspecification. We agree that a discussion of this limitation is warranted and will add a short paragraph in the discussion section noting that the guarantees may not hold, or may degrade, when the network is not generated from an SBM or Hölder-smooth graphon. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation chain begins from the semi-relaxed GW objective, proceeds to a block-coordinate conditional gradient solver, then establishes an O(1/n) optimality gap to deterministic assignments via direct analysis of the relaxation, and finally invokes standard consistency arguments under explicit SBM or Hölder graphon assumptions. No equation reduces to a fitted parameter renamed as prediction, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled via prior work by the same authors. The statistical rates are conditioned on the model class in the usual nonparametric manner and do not collapse to the method's own inputs by construction. The provided abstract and claims contain no self-definitional or renaming steps, confirming the central results are independent of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; ledger is therefore minimal and provisional.

axioms (1)

domain assumption Network observations are generated exactly from an SBM or Holder-smooth graphon
Consistency and rate claims are stated only for these two model classes.

pith-pipeline@v0.9.1-grok · 5717 in / 1188 out tokens · 24391 ms · 2026-06-28T15:35:18.582468+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 17 canonical work pages

[1]

Social structure of facebook networks

Amanda L Traud, Peter J Mucha, and Mason A Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165–4180, 2012

2012
[2]

Modeling polypharmacy side effects with graph convolutional networks.Bioinformatics, 34(13):i457–i466, 2018

Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks.Bioinformatics, 34(13):i457–i466, 2018

2018
[3]

Large-scale analysis of disease pathways in the human interactome

Monica Agrawal, Marinka Zitnik, and Jure Leskovec. Large-scale analysis of disease pathways in the human interactome. InPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, volume 23, page 111, 2018

2018
[4]

Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

2017
[5]

Hierarchical graph learning for protein–protein interaction

Ziqi Gao, Chenran Jiang, Jiawen Zhang, Xiaosen Jiang, Lanqing Li, Peilin Zhao, Huanming Yang, Yong Huang, and Jia Li. Hierarchical graph learning for protein–protein interaction. Nature Communications, 14(1):1093, 2023

2023
[6]

Graph convolutional neural networks for web-scale recommender systems

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 974–983, 2018

2018
[7]

Diffusion convolutional recurrent neural network: Data-driven traffic forecasting

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. InProceedings of the International Conference on Learning Representations (ICLR), 2018

2018
[8]

Chao Gao, Yu Lu, and Harrison H. Zhou. Rate-Optimal Graphon Estimation.The Annals of Statistics, 43(6):2624–2652, 2015. ISSN 0090-5364. doi: 10.1214/15-AOS1354

work page doi:10.1214/15-aos1354 2015
[9]

doi: https://doi.org/10.1016/0378-8733(83)90021-7

Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109–137, 1983. doi: 10.1016/0378-8733(83)90021-7

work page doi:10.1016/0378-8733(83)90021-7 1983
[10]

American Mathematical Soc.,

L´aszl´o Lov´asz.Large networks and graph limits, volume 60. American Mathematical Soc.,
[11]

Gromov–Wasserstein Distances and the Metric Approach to Object Matching

Facundo M´emoli. Gromov–Wasserstein Distances and the Metric Approach to Object Matching. Foundations of Computational Mathematics, 11(4):417–487, August 2011. ISSN 1615-3383. doi: 10.1007/s10208-011-9093-5

work page doi:10.1007/s10208-011-9093-5 2011
[12]

Gromov-Wasserstein Averaging of Kernel and Distance Matrices

Gabriel Peyr´e, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices. InProceedings of The 33rd International Conference on Machine Learning, pages 2664–2672. PMLR, June 2016

2016
[13]

Gromov-Wasserstein Learning for Graph Matching and Node Embedding

Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin Duke. Gromov-Wasserstein Learning for Graph Matching and Node Embedding. InProceedings of the 36th International Conference on Machine Learning, pages 6932–6941. PMLR, May 2019

2019
[14]

Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 13(9):212, August 2020

Titouan Vayer, Laetitia Chapel, Remi Flamary, Romain Tavenard, and Nicolas Courty. Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 13(9):212, August 2020. ISSN 1999-4893. doi: 10.3390/a13090212

work page doi:10.3390/a13090212 2020
[15]

The Z-Gromov-Wasserstein Distance, January 2025

Martin Bauer, Facundo M´emoli, Tom Needham, and Mao Nishino. The Z-Gromov-Wasserstein Distance, January 2025

2025
[16]

Learning Graphons via Structured Gromov-Wasserstein Barycenters.Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10505–10513, May 2021

Hongteng Xu, Dixin Luo, Lawrence Carin, and Hongyuan Zha. Learning Graphons via Structured Gromov-Wasserstein Barycenters.Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10505–10513, May 2021. ISSN 2374-3468. doi: 10.1609/aaai.v35i12. 17257

work page doi:10.1609/aaai.v35i12 2021
[17]

G-Mixup: Graph Data Augmentation for Graph Classification

Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. G-Mixup: Graph Data Augmentation for Graph Classification. InProceedings of the 39th International Conference on Machine Learning, pages 8230–8248. PMLR, June 2022. 10

2022
[18]

Tsybakov, and Nicolas Verzelen

Olga Klopp, Alexandre B. Tsybakov, and Nicolas Verzelen. Oracle inequalities for network models and sparse graphon estimation.The Annals of Statistics, 45(1):316–354, February 2017. ISSN 0090-5364, 2168-8966. doi: 10.1214/16-AOS1454

work page doi:10.1214/16-aos1454 2017
[19]

Charles Dufour and Sofia C. Olhede. Inference for decorated graphs and application to multiplex networks, August 2024

2024
[20]

Olhede and Patrick J

Sofia C. Olhede and Patrick J. Wolfe. Network histograms and universality of blockmodel approximation.Proceedings of the National Academy of Sciences, 111(41):14722–14727, October 2014. doi: 10.1073/pnas.1400374111

work page doi:10.1073/pnas.1400374111 2014
[21]

Matrix estimation by Universal Singular Value Thresholding.The Annals of Statistics, 43(1):177–214, February 2015

Sourav Chatterjee. Matrix estimation by Universal Singular Value Thresholding.The Annals of Statistics, 43(1):177–214, February 2015. ISSN 0090-5364, 2168-8966. doi: 10.1214/ 14-AOS1272

2015
[22]

Amini and Elizaveta Levina

Arash A. Amini and Elizaveta Levina. On semidefinite relaxations for the block model. The Annals of Statistics, 46(1):149–179, February 2018. ISSN 0090-5364, 2168-8966. doi: 10.1214/17-AOS1545

work page doi:10.1214/17-aos1545 2018
[23]

A Consistent Histogram Estimator for Exchangeable Graph Models

Stanley Chan and Edoardo Airoldi. A Consistent Histogram Estimator for Exchangeable Graph Models. InProceedings of the 31st International Conference on Machine Learning, pages 208–216. PMLR, January 2014

2014
[24]

Consistency of maximum-likelihood and variational estimators in the stochastic block model.Electronic Journal of Statistics, 6: 1847–1899, 2012

Alain Celisse, Jean-Jacques Daudin, and Laurent Pierre. Consistency of maximum-likelihood and variational estimators in the stochastic block model.Electronic Journal of Statistics, 6: 1847–1899, 2012. doi: 10.1214/12-EJS729

work page doi:10.1214/12-ejs729 2012
[25]

Spectral clustering and the high-dimensional stochastic blockmodel.The Annals of Statistics, 39(4):1878–1915, 2011

Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel.The Annals of Statistics, 39(4):1878–1915, 2011. doi: 10.1214/ 11-AOS887

1915
[26]

Consistency of spectral clustering in stochastic block models

Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015. doi: 10.1214/14-AOS1274

work page doi:10.1214/14-aos1274 2015
[27]

Optimal Transport for structured data with application on graphs

Vayer Titouan, Nicolas Courty, Romain Tavenard, Chapel Laetitia, and R´emi Flamary. Optimal Transport for structured data with application on graphs. InProceedings of the 36th International Conference on Machine Learning, pages 6275–6284. PMLR, May 2019

2019
[28]

Semi- relaxed Gromov Wasserstein divergence with applications on graphs

C´edric Vincent-Cuaz, R´emi Flamary, Marco Corneli, Titouan Vayer, and Nicolas Courty. Semi- relaxed Gromov Wasserstein divergence with applications on graphs. InICLR 2022 - 10th International Conference on Learning Representations, pages 1–28, Virtual, France, April 2022

2022
[29]

Online graph dictionary learning

C´edric Vincent-Cuaz, Titouan Vayer, R´emi Flamary, Marco Corneli, and Nicolas Courty. Online graph dictionary learning. InInternational conference on machine learning, pages 10564–10574. PMLR, 2021

2021
[30]

Generalized dimension reduction using semi-relaxed Gromov-Wasserstein distance

Ranthony A Clark, Tom Needham, and Thomas Weighill. Generalized dimension reduction using semi-relaxed Gromov-Wasserstein distance. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16082–16090, 2025. ISBN 2374-3468

2025
[31]

Distributional reduction: Unifying dimensionality reduction and clustering with gromov-wasserstein.Transactions on Machine Learning Research, 2025

Hugues Van Assel, C´edric Vincent-Cuaz, Nicolas Courty, R´emi Flamary, Pascal Frossard, and Titouan Vayer. Distributional reduction: Unifying dimensionality reduction and clustering with gromov-wasserstein.Transactions on Machine Learning Research, 2025. ISSN 2835-8856

2025
[32]

On probabilistic embeddings in optimal dimension reduction

Ryan Murray and Adam Pickarski. On probabilistic embeddings in optimal dimension reduction. Journal of Machine Learning Research, 26(234):1–39, 2025

2025
[33]

A nonparametric view of network models and Newman– Girvan and other modularities.Proceedings of the National Academy of Sciences, 106(50): 21068–21073, 2009

Peter J Bickel and Aiyou Chen. A nonparametric view of network models and Newman– Girvan and other modularities.Proceedings of the National Academy of Sciences, 106(50): 21068–21073, 2009. doi: 10.1073/pnas.0907096106

work page doi:10.1073/pnas.0907096106 2009
[34]

Peter Orbanz and Daniel M. Roy. Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 437–461, February 2015. ISSN 1939-3539. doi: 10.1109/TPAMI.2014.2334607. 11

work page doi:10.1109/tpami.2014.2334607 2015
[35]

Bandeira, and Georgina Hall

Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model.IEEE Transactions on Information Theory, 62(1):471–487, 2016. doi: 10.1109/ TIT.2015.2490670

work page arXiv 2016
[36]

Convergence Rate of Frank-Wolfe for Non-Convex Objectives, July 2016

Simon Lacoste-Julien. Convergence Rate of Frank-Wolfe for Non-Convex Objectives, July 2016

2016
[37]

Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L ´eo Gautheron, Nathalie T

R´emi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L ´eo Gautheron, Nathalie T. H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander ...

2021
[38]

Dalalyan, Francis Kramarz, Philippe Chon ´e, and Xavier D’Haultfoeuille

Etienne Donier-Meroz, Arnak S. Dalalyan, Francis Kramarz, Philippe Chon ´e, and Xavier D’Haultfoeuille. Graphon Estimation in bipartite graphs with observable edge labels and unobservable node labels, September 2023

2023
[39]

Estimating network edge probabilities by neigh- bourhood smoothing.Biometrika, 104(4):771–783, December 2017

Yuan Zhang, Elizaveta Levina, and Ji Zhu. Estimating network edge probabilities by neigh- bourhood smoothing.Biometrika, 104(4):771–783, December 2017. ISSN 0006-3444. doi: 10.1093/biomet/asx042

work page doi:10.1093/biomet/asx042 2017
[40]

Classification and estimation in the Stochastic Blockmodel based on the empirical degrees.Electronic Journal of Statistics, 6 (none):2574–2601, January 2012

Antoine Channarond, Jean-Jacques Daudin, and St´ephane Robin. Classification and estimation in the Stochastic Blockmodel based on the empirical degrees.Electronic Journal of Statistics, 6 (none):2574–2601, January 2012. ISSN 1935-7524, 1935-7524. doi: 10.1214/12-EJS753

work page doi:10.1214/12-ejs753 2012
[41]

SparseBM: A Python Module for Handling Sparse Graphs with Block Models

Gabriel Frisch, Jean-Benoist Leger, and Yves Grandvalet. SparseBM: A Python Module for Handling Sparse Graphs with Block Models. 2022

2022
[42]

EEG Database, 1995

Henri Begleiter. EEG Database, 1995

1995
[43]

Similarity of Neural Network Representations Revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of Neural Network Representations Revisited. InProceedings of the 36th International Conference on Machine Learning, pages 3519–3529. PMLR, May 2019

2019
[44]

Remapping the brain to compensate for impairment in recovering alcoholics.Cerebral cortex, 23(1):97–104, 2013

Sandra Chanraud, Anne-Lise Pitel, Eva M M¨uller-Oehring, Adolf Pfefferbaum, and Edith V Sullivan. Remapping the brain to compensate for impairment in recovering alcoholics.Cerebral cortex, 23(1):97–104, 2013

2013
[45]

Disturbed connectivity of eeg functional networks in alcoholism: A graph-theoretic analysis.Bio-Medical Materials and Engineering, 24(6):2927–2936, 2014

Rui Cao, Zheng Wu, Haifang Li, Jie Xiang, and Junjie Chen. Disturbed connectivity of eeg functional networks in alcoholism: A graph-theoretic analysis.Bio-Medical Materials and Engineering, 24(6):2927–2936, 2014. doi: 10.3233/BME-141112

work page doi:10.3233/bme-141112 2014
[46]

Openflights airports database

OpenFlights. Openflights airports database. https://openflights.org/data.html, 2017. Accessed: 2026-01-14

2017
[47]

Consistency of community detection in networks under degree-corrected stochastic block models.The Annals of Statistics, 40(4):2266–2292, August 2012

Yunpeng Zhao, Elizaveta Levina, and Ji Zhu. Consistency of community detection in networks under degree-corrected stochastic block models.The Annals of Statistics, 40(4):2266–2292, August 2012. ISSN 0090-5364, 2168-8966. doi: 10.1214/12-AOS1036

work page doi:10.1214/12-aos1036 2012
[48]

Graphons, cut norm and distance, couplings and rearrangements, June 2011

Svante Janson. Graphons, cut norm and distance, couplings and rearrangements, June 2011

2011
[49]

Limits of compact decorated graphs, October 2010

L ´aszl´o Lov´asz and Bal´azs Szegedy. Limits of compact decorated graphs, October 2010

2010
[50]

Probability-graphons: Limits of large dense weighted graphs, December 2023

Romain Abraham, Jean-Franc ¸ois Delmas, and Julien Weibel. Probability-graphons: Limits of large dense weighted graphs, December 2023

2023
[51]

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

Martin Jaggi. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning, pages 427–435. PMLR, February 2013. 12 A Background A.1 Generative models for networks Generative models for networks provide a probabilistic framework for describing graph-valued data. In this section ...

2013
[52]

For (j, d) = (i, a), π′ i,a =π i,a +Pki b=2 πi,b ≥0

(Nonnegativity):For (j, d)/∈ {(i, a),(i, b)} we have π′ j,d =π j,d ≥0 . For (j, d) = (i, a), π′ i,a =π i,a +Pki b=2 πi,b ≥0 . For (j, d) = (i, b), π′ i,b =π i,b −π i,b = 0. Hence, π′ j,d ≥0 for allj, d
[53]

Pk d=1 πj,d = 1/n for every j

(Left marginal)By assumption π has left marginal uniform, i.e. Pk d=1 πj,d = 1/n for every j. For any fixed j̸=i we have ψj,d = 0 for all d, henceP d π′ j,d =P d πj,d = 1/n. For j=i ,Pk d=1 π′ i,d =P d πi,d +P d ψi,d = 1 n + 0 = 1 n , since ψ has zero total mass. Therefore, the left marginal ofπ ′ is still uniform on[n]
[54]

15 Since LE, ˜B is quadratic in the coupling, we can split it into what is essentially a Taylor decomposition in the first and second-order variations ofL E, ˜B

(Total mass)Since the left marginal is preserved for every j, the total mass P j,d π′ j,d remainsP j(1/n) = 1, soπ ′ is normalized as a coupling. 15 Since LE, ˜B is quadratic in the coupling, we can split it into what is essentially a Taylor decomposition in the first and second-order variations ofL E, ˜B. Explicitly, one can see that: LE, ˜B(π′) =L E, ˜B...
[55]

0.2 0.8 0.8 0.2 # 2. h 0.5 0.5 i

also showed more results for constrained least square estimators. An interesting question is whether we could get such an estimator using a penalized version of the srGWB, where we add a penalty term to the objective that penalizes small partitions. We leave this question for future work. C Algorithmic details C.1 On implicit regularization toward determi...

[1] [1]

Social structure of facebook networks

Amanda L Traud, Peter J Mucha, and Mason A Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165–4180, 2012

2012

[2] [2]

Modeling polypharmacy side effects with graph convolutional networks.Bioinformatics, 34(13):i457–i466, 2018

Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks.Bioinformatics, 34(13):i457–i466, 2018

2018

[3] [3]

Large-scale analysis of disease pathways in the human interactome

Monica Agrawal, Marinka Zitnik, and Jure Leskovec. Large-scale analysis of disease pathways in the human interactome. InPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, volume 23, page 111, 2018

2018

[4] [4]

Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs.Advances in neural information processing systems, 30, 2017

2017

[5] [5]

Hierarchical graph learning for protein–protein interaction

Ziqi Gao, Chenran Jiang, Jiawen Zhang, Xiaosen Jiang, Lanqing Li, Peilin Zhao, Huanming Yang, Yong Huang, and Jia Li. Hierarchical graph learning for protein–protein interaction. Nature Communications, 14(1):1093, 2023

2023

[6] [6]

Graph convolutional neural networks for web-scale recommender systems

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 974–983, 2018

2018

[7] [7]

Diffusion convolutional recurrent neural network: Data-driven traffic forecasting

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. InProceedings of the International Conference on Learning Representations (ICLR), 2018

2018

[8] [8]

Chao Gao, Yu Lu, and Harrison H. Zhou. Rate-Optimal Graphon Estimation.The Annals of Statistics, 43(6):2624–2652, 2015. ISSN 0090-5364. doi: 10.1214/15-AOS1354

work page doi:10.1214/15-aos1354 2015

[9] [9]

doi: https://doi.org/10.1016/0378-8733(83)90021-7

Paul W Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109–137, 1983. doi: 10.1016/0378-8733(83)90021-7

work page doi:10.1016/0378-8733(83)90021-7 1983

[10] [10]

American Mathematical Soc.,

L´aszl´o Lov´asz.Large networks and graph limits, volume 60. American Mathematical Soc.,

[11] [11]

Gromov–Wasserstein Distances and the Metric Approach to Object Matching

Facundo M´emoli. Gromov–Wasserstein Distances and the Metric Approach to Object Matching. Foundations of Computational Mathematics, 11(4):417–487, August 2011. ISSN 1615-3383. doi: 10.1007/s10208-011-9093-5

work page doi:10.1007/s10208-011-9093-5 2011

[12] [12]

Gromov-Wasserstein Averaging of Kernel and Distance Matrices

Gabriel Peyr´e, Marco Cuturi, and Justin Solomon. Gromov-Wasserstein Averaging of Kernel and Distance Matrices. InProceedings of The 33rd International Conference on Machine Learning, pages 2664–2672. PMLR, June 2016

2016

[13] [13]

Gromov-Wasserstein Learning for Graph Matching and Node Embedding

Hongteng Xu, Dixin Luo, Hongyuan Zha, and Lawrence Carin Duke. Gromov-Wasserstein Learning for Graph Matching and Node Embedding. InProceedings of the 36th International Conference on Machine Learning, pages 6932–6941. PMLR, May 2019

2019

[14] [14]

Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 13(9):212, August 2020

Titouan Vayer, Laetitia Chapel, Remi Flamary, Romain Tavenard, and Nicolas Courty. Fused Gromov-Wasserstein Distance for Structured Objects.Algorithms, 13(9):212, August 2020. ISSN 1999-4893. doi: 10.3390/a13090212

work page doi:10.3390/a13090212 2020

[15] [15]

The Z-Gromov-Wasserstein Distance, January 2025

Martin Bauer, Facundo M´emoli, Tom Needham, and Mao Nishino. The Z-Gromov-Wasserstein Distance, January 2025

2025

[16] [16]

Learning Graphons via Structured Gromov-Wasserstein Barycenters.Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10505–10513, May 2021

Hongteng Xu, Dixin Luo, Lawrence Carin, and Hongyuan Zha. Learning Graphons via Structured Gromov-Wasserstein Barycenters.Proceedings of the AAAI Conference on Artificial Intelligence, 35(12):10505–10513, May 2021. ISSN 2374-3468. doi: 10.1609/aaai.v35i12. 17257

work page doi:10.1609/aaai.v35i12 2021

[17] [17]

G-Mixup: Graph Data Augmentation for Graph Classification

Xiaotian Han, Zhimeng Jiang, Ninghao Liu, and Xia Hu. G-Mixup: Graph Data Augmentation for Graph Classification. InProceedings of the 39th International Conference on Machine Learning, pages 8230–8248. PMLR, June 2022. 10

2022

[18] [18]

Tsybakov, and Nicolas Verzelen

Olga Klopp, Alexandre B. Tsybakov, and Nicolas Verzelen. Oracle inequalities for network models and sparse graphon estimation.The Annals of Statistics, 45(1):316–354, February 2017. ISSN 0090-5364, 2168-8966. doi: 10.1214/16-AOS1454

work page doi:10.1214/16-aos1454 2017

[19] [19]

Charles Dufour and Sofia C. Olhede. Inference for decorated graphs and application to multiplex networks, August 2024

2024

[20] [20]

Olhede and Patrick J

Sofia C. Olhede and Patrick J. Wolfe. Network histograms and universality of blockmodel approximation.Proceedings of the National Academy of Sciences, 111(41):14722–14727, October 2014. doi: 10.1073/pnas.1400374111

work page doi:10.1073/pnas.1400374111 2014

[21] [21]

Matrix estimation by Universal Singular Value Thresholding.The Annals of Statistics, 43(1):177–214, February 2015

Sourav Chatterjee. Matrix estimation by Universal Singular Value Thresholding.The Annals of Statistics, 43(1):177–214, February 2015. ISSN 0090-5364, 2168-8966. doi: 10.1214/ 14-AOS1272

2015

[22] [22]

Amini and Elizaveta Levina

Arash A. Amini and Elizaveta Levina. On semidefinite relaxations for the block model. The Annals of Statistics, 46(1):149–179, February 2018. ISSN 0090-5364, 2168-8966. doi: 10.1214/17-AOS1545

work page doi:10.1214/17-aos1545 2018

[23] [23]

A Consistent Histogram Estimator for Exchangeable Graph Models

Stanley Chan and Edoardo Airoldi. A Consistent Histogram Estimator for Exchangeable Graph Models. InProceedings of the 31st International Conference on Machine Learning, pages 208–216. PMLR, January 2014

2014

[24] [24]

Consistency of maximum-likelihood and variational estimators in the stochastic block model.Electronic Journal of Statistics, 6: 1847–1899, 2012

Alain Celisse, Jean-Jacques Daudin, and Laurent Pierre. Consistency of maximum-likelihood and variational estimators in the stochastic block model.Electronic Journal of Statistics, 6: 1847–1899, 2012. doi: 10.1214/12-EJS729

work page doi:10.1214/12-ejs729 2012

[25] [25]

Spectral clustering and the high-dimensional stochastic blockmodel.The Annals of Statistics, 39(4):1878–1915, 2011

Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral clustering and the high-dimensional stochastic blockmodel.The Annals of Statistics, 39(4):1878–1915, 2011. doi: 10.1214/ 11-AOS887

1915

[26] [26]

Consistency of spectral clustering in stochastic block models

Jing Lei and Alessandro Rinaldo. Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237, 2015. doi: 10.1214/14-AOS1274

work page doi:10.1214/14-aos1274 2015

[27] [27]

Optimal Transport for structured data with application on graphs

Vayer Titouan, Nicolas Courty, Romain Tavenard, Chapel Laetitia, and R´emi Flamary. Optimal Transport for structured data with application on graphs. InProceedings of the 36th International Conference on Machine Learning, pages 6275–6284. PMLR, May 2019

2019

[28] [28]

Semi- relaxed Gromov Wasserstein divergence with applications on graphs

C´edric Vincent-Cuaz, R´emi Flamary, Marco Corneli, Titouan Vayer, and Nicolas Courty. Semi- relaxed Gromov Wasserstein divergence with applications on graphs. InICLR 2022 - 10th International Conference on Learning Representations, pages 1–28, Virtual, France, April 2022

2022

[29] [29]

Online graph dictionary learning

C´edric Vincent-Cuaz, Titouan Vayer, R´emi Flamary, Marco Corneli, and Nicolas Courty. Online graph dictionary learning. InInternational conference on machine learning, pages 10564–10574. PMLR, 2021

2021

[30] [30]

Generalized dimension reduction using semi-relaxed Gromov-Wasserstein distance

Ranthony A Clark, Tom Needham, and Thomas Weighill. Generalized dimension reduction using semi-relaxed Gromov-Wasserstein distance. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 16082–16090, 2025. ISBN 2374-3468

2025

[31] [31]

Distributional reduction: Unifying dimensionality reduction and clustering with gromov-wasserstein.Transactions on Machine Learning Research, 2025

Hugues Van Assel, C´edric Vincent-Cuaz, Nicolas Courty, R´emi Flamary, Pascal Frossard, and Titouan Vayer. Distributional reduction: Unifying dimensionality reduction and clustering with gromov-wasserstein.Transactions on Machine Learning Research, 2025. ISSN 2835-8856

2025

[32] [32]

On probabilistic embeddings in optimal dimension reduction

Ryan Murray and Adam Pickarski. On probabilistic embeddings in optimal dimension reduction. Journal of Machine Learning Research, 26(234):1–39, 2025

2025

[33] [33]

A nonparametric view of network models and Newman– Girvan and other modularities.Proceedings of the National Academy of Sciences, 106(50): 21068–21073, 2009

Peter J Bickel and Aiyou Chen. A nonparametric view of network models and Newman– Girvan and other modularities.Proceedings of the National Academy of Sciences, 106(50): 21068–21073, 2009. doi: 10.1073/pnas.0907096106

work page doi:10.1073/pnas.0907096106 2009

[34] [34]

Peter Orbanz and Daniel M. Roy. Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 437–461, February 2015. ISSN 1939-3539. doi: 10.1109/TPAMI.2014.2334607. 11

work page doi:10.1109/tpami.2014.2334607 2015

[35] [35]

Bandeira, and Georgina Hall

Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model.IEEE Transactions on Information Theory, 62(1):471–487, 2016. doi: 10.1109/ TIT.2015.2490670

work page arXiv 2016

[36] [36]

Convergence Rate of Frank-Wolfe for Non-Convex Objectives, July 2016

Simon Lacoste-Julien. Convergence Rate of Frank-Wolfe for Non-Convex Objectives, July 2016

2016

[37] [37]

Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L ´eo Gautheron, Nathalie T

R´emi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aur´elie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, L ´eo Gautheron, Nathalie T. H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander ...

2021

[38] [38]

Dalalyan, Francis Kramarz, Philippe Chon ´e, and Xavier D’Haultfoeuille

Etienne Donier-Meroz, Arnak S. Dalalyan, Francis Kramarz, Philippe Chon ´e, and Xavier D’Haultfoeuille. Graphon Estimation in bipartite graphs with observable edge labels and unobservable node labels, September 2023

2023

[39] [39]

Estimating network edge probabilities by neigh- bourhood smoothing.Biometrika, 104(4):771–783, December 2017

Yuan Zhang, Elizaveta Levina, and Ji Zhu. Estimating network edge probabilities by neigh- bourhood smoothing.Biometrika, 104(4):771–783, December 2017. ISSN 0006-3444. doi: 10.1093/biomet/asx042

work page doi:10.1093/biomet/asx042 2017

[40] [40]

Classification and estimation in the Stochastic Blockmodel based on the empirical degrees.Electronic Journal of Statistics, 6 (none):2574–2601, January 2012

Antoine Channarond, Jean-Jacques Daudin, and St´ephane Robin. Classification and estimation in the Stochastic Blockmodel based on the empirical degrees.Electronic Journal of Statistics, 6 (none):2574–2601, January 2012. ISSN 1935-7524, 1935-7524. doi: 10.1214/12-EJS753

work page doi:10.1214/12-ejs753 2012

[41] [41]

SparseBM: A Python Module for Handling Sparse Graphs with Block Models

Gabriel Frisch, Jean-Benoist Leger, and Yves Grandvalet. SparseBM: A Python Module for Handling Sparse Graphs with Block Models. 2022

2022

[42] [42]

EEG Database, 1995

Henri Begleiter. EEG Database, 1995

1995

[43] [43]

Similarity of Neural Network Representations Revisited

Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. Similarity of Neural Network Representations Revisited. InProceedings of the 36th International Conference on Machine Learning, pages 3519–3529. PMLR, May 2019

2019

[44] [44]

Remapping the brain to compensate for impairment in recovering alcoholics.Cerebral cortex, 23(1):97–104, 2013

Sandra Chanraud, Anne-Lise Pitel, Eva M M¨uller-Oehring, Adolf Pfefferbaum, and Edith V Sullivan. Remapping the brain to compensate for impairment in recovering alcoholics.Cerebral cortex, 23(1):97–104, 2013

2013

[45] [45]

Disturbed connectivity of eeg functional networks in alcoholism: A graph-theoretic analysis.Bio-Medical Materials and Engineering, 24(6):2927–2936, 2014

Rui Cao, Zheng Wu, Haifang Li, Jie Xiang, and Junjie Chen. Disturbed connectivity of eeg functional networks in alcoholism: A graph-theoretic analysis.Bio-Medical Materials and Engineering, 24(6):2927–2936, 2014. doi: 10.3233/BME-141112

work page doi:10.3233/bme-141112 2014

[46] [46]

Openflights airports database

OpenFlights. Openflights airports database. https://openflights.org/data.html, 2017. Accessed: 2026-01-14

2017

[47] [47]

Consistency of community detection in networks under degree-corrected stochastic block models.The Annals of Statistics, 40(4):2266–2292, August 2012

Yunpeng Zhao, Elizaveta Levina, and Ji Zhu. Consistency of community detection in networks under degree-corrected stochastic block models.The Annals of Statistics, 40(4):2266–2292, August 2012. ISSN 0090-5364, 2168-8966. doi: 10.1214/12-AOS1036

work page doi:10.1214/12-aos1036 2012

[48] [48]

Graphons, cut norm and distance, couplings and rearrangements, June 2011

Svante Janson. Graphons, cut norm and distance, couplings and rearrangements, June 2011

2011

[49] [49]

Limits of compact decorated graphs, October 2010

L ´aszl´o Lov´asz and Bal´azs Szegedy. Limits of compact decorated graphs, October 2010

2010

[50] [50]

Probability-graphons: Limits of large dense weighted graphs, December 2023

Romain Abraham, Jean-Franc ¸ois Delmas, and Julien Weibel. Probability-graphons: Limits of large dense weighted graphs, December 2023

2023

[51] [51]

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

Martin Jaggi. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In Proceedings of the 30th International Conference on Machine Learning, pages 427–435. PMLR, February 2013. 12 A Background A.1 Generative models for networks Generative models for networks provide a probabilistic framework for describing graph-valued data. In this section ...

2013

[52] [52]

For (j, d) = (i, a), π′ i,a =π i,a +Pki b=2 πi,b ≥0

(Nonnegativity):For (j, d)/∈ {(i, a),(i, b)} we have π′ j,d =π j,d ≥0 . For (j, d) = (i, a), π′ i,a =π i,a +Pki b=2 πi,b ≥0 . For (j, d) = (i, b), π′ i,b =π i,b −π i,b = 0. Hence, π′ j,d ≥0 for allj, d

[53] [53]

Pk d=1 πj,d = 1/n for every j

(Left marginal)By assumption π has left marginal uniform, i.e. Pk d=1 πj,d = 1/n for every j. For any fixed j̸=i we have ψj,d = 0 for all d, henceP d π′ j,d =P d πj,d = 1/n. For j=i ,Pk d=1 π′ i,d =P d πi,d +P d ψi,d = 1 n + 0 = 1 n , since ψ has zero total mass. Therefore, the left marginal ofπ ′ is still uniform on[n]

[54] [54]

15 Since LE, ˜B is quadratic in the coupling, we can split it into what is essentially a Taylor decomposition in the first and second-order variations ofL E, ˜B

(Total mass)Since the left marginal is preserved for every j, the total mass P j,d π′ j,d remainsP j(1/n) = 1, soπ ′ is normalized as a coupling. 15 Since LE, ˜B is quadratic in the coupling, we can split it into what is essentially a Taylor decomposition in the first and second-order variations ofL E, ˜B. Explicitly, one can see that: LE, ˜B(π′) =L E, ˜B...

[55] [55]

0.2 0.8 0.8 0.2 # 2. h 0.5 0.5 i

also showed more results for constrained least square estimators. An interesting question is whether we could get such an estimator using a penalized version of the srGWB, where we add a penalty term to the objective that penalizes small partitions. We leave this question for future work. C Algorithmic details C.1 On implicit regularization toward determi...