arxiv: 2605.13126 · v2 · submitted 2026-05-13 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

MLGIB: Multi-Label Graph Information Bottleneck for Expressive and Robust Message Passing

Chaokai Wu , Haofu Shi , Ningxuan Ma , Jianghong Ma , Xiaofeng Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:33 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph neural networksinformation bottleneckmulti-label classificationmessage passingrobustnessover-squashingvariational bounds

0 comments

The pith

MLGIB uses constrained information transmission to preserve predictive labels while filtering irrelevant noise during message passing on multi-label graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard graph neural networks fail on multi-label graphs because neighboring nodes share few relevant labels but differ on many irrelevant ones, diluting useful signals in fixed-size representations. MLGIB counters this by recasting message passing as information transmission under a Markovian dependence space, where variational bounds maximize mutual information with target labels and limit redundant source information. This produces an end-to-end architecture that balances expressiveness and robustness without requiring separate denoising steps. A sympathetic reader would care because the approach directly targets a failure mode that appears once label sets grow beyond single-class settings, opening the door to deeper GNNs on tasks like multi-label node classification.

Core claim

The central claim is that multi-label graphs exhibit a distinct over-squashing failure driven by irrelevant label differences, and this can be addressed by formulating message passing as constrained information transmission: a lower variational bound maximizes mutual information with the target labels while an upper bound suppresses redundant information from the source, yielding a label-aware message-passing architecture that improves both predictive accuracy and robustness on standard benchmarks.

What carries the argument

The Multi-Label Graph Information Bottleneck (MLGIB) mechanism, which builds a Markovian dependence space and derives tractable variational bounds to control information flow between node representations and target labels.

If this is right

Deeper message-passing layers become viable on multi-label graphs without rapid loss of predictive signal.
The same variational bounds can be inserted into existing GNN architectures to produce label-aware updates.
Training remains end-to-end because the lower and upper bounds are jointly optimized with the downstream task loss.
Irrelevant label noise is explicitly penalized rather than left to implicit regularization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Markovian dependence construction may extend naturally to graphs with partial or noisy labels beyond the multi-label case.
If the variational bounds prove stable, they could serve as a drop-in regularizer for any GNN trained on high-dimensional label spaces.
A direct test would compare performance on graphs where label overlap between neighbors is artificially varied.

Load-bearing premise

Neighboring nodes share only limited relevant labels and differ across many irrelevant ones, so that predictive signals get diluted by noise in ordinary message passing.

What would settle it

Run the proposed variational bounds on a multi-label graph benchmark and observe no gain in label prediction accuracy or robustness metrics compared with standard message-passing baselines.

Figures

Figures reproduced from arXiv: 2605.13126 by Chaokai Wu, Haofu Shi, Jianghong Ma, Ningxuan Ma, Xiaofeng Zhang.

**Figure 2.** Figure 2: MLGIB is instantiated as a label-aware message-passing pipeline. (a) The instantiation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Expressiveness Analysis (AUC) is conducted for each label on the Delve, DBLP, and [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of decreased Jaccard similarity with neighbor nodes on performance. Dots denote [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of increased node degree on performance, where dots and lines follow Fig. 4. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of added noisy edges on performance. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Distributions of label correlation among nodes. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Graph Neural Networks (GNNs) suffer from over-squashing in deep message passing, where information from exponentially growing neighborhoods is compressed into fixed-dimensional representations. We show that this issue becomes a distinct failure mode in multi-label graphs: neighboring nodes often share only limited labels while differing across many irrelevant ones, causing predictive signals to be diluted by noisy label information. To address this challenge, we propose the Multi-Label Graph Information Bottleneck (MLGIB), which formulates multi-label message passing as constrained information transmission under irrelevant label noise. MLGIB balances expressiveness and robustness by preserving predictive label signals while suppressing irrelevant noise. Specifically, it constructs a Markovian dependence space and derives tractable variational bounds, where the lower bound maximizes mutual information with target labels and the upper bound constrains redundant source information. These bounds lead to an end-to-end label-aware message-passing architecture. Extensive experiments on multiple benchmarks demonstrate consistent improvements over existing methods, validating the effectiveness and generality of the proposed framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MLGIB adapts the graph information bottleneck to multi-label settings with a Markovian dependence space and dual variational bounds, but the abstract leaves the derivations and experimental controls too thin to judge the gains yet.

read the letter

The main takeaway is that this paper identifies a specific way over-squashing hurts multi-label graphs—neighbors share few relevant labels but flood the message with unrelated ones—and tries to fix it by constraining information flow with a Markovian dependence space plus separate lower and upper variational bounds. The lower bound keeps mutual information with the target labels while the upper bound limits redundant source noise, leading to an end-to-end label-aware architecture. That construction is the clearest new piece; prior GIB work on graphs does not appear to have spelled out this multi-label dependence structure or the paired bounds in the same way. The experiments are reported to show consistent gains across benchmarks, which at least suggests the approach is practical enough to implement and test. The problem framing itself is useful because it matches common multi-label domains where label noise is structural rather than just random. On the soft side, the abstract gives no derivation steps for the bounds, no error bars, and no explicit list of baselines or data splits, so it is difficult to tell whether the improvements are robust or whether the objective reduces to something already implicit in the labels. The free trade-off parameter between the bounds also needs more discussion to show it does not simply tune to the test set. This paper is aimed at researchers who already work on deep GNNs for multi-label tasks in biology or social data and want a principled way to keep label-specific signals. It is coherent enough on its own terms to deserve a serious referee who can check the math and the full experimental protocol, even if the current write-up needs expansion on those points.

Referee Report

2 major / 1 minor

Summary. The paper claims that over-squashing in GNNs manifests distinctly in multi-label graphs because neighboring nodes share only limited labels while differing on many irrelevant ones, diluting predictive signals. It proposes MLGIB, which formulates multi-label message passing as constrained information transmission in a Markovian dependence space, derives tractable variational lower and upper bounds on mutual information (maximizing target-label MI while upper-bounding redundant source information), and yields an end-to-end label-aware architecture. Experiments on multiple benchmarks are reported to show consistent improvements over existing methods.

Significance. If the variational bounds are rigorously derived and the experiments hold under standard controls, the framework supplies a principled information-theoretic mechanism for separating predictive label signals from irrelevant noise in multi-label settings. The end-to-end architecture and claimed generality across benchmarks would constitute a useful contribution to robust message-passing design.

major comments (2)

The abstract states that the variational bounds lead to an end-to-end architecture, yet supplies no derivation details, explicit equations, or error bars. Without these, it is impossible to verify whether the final objective reduces to a fitted trade-off hyperparameter or to quantities already present in the input data, which is load-bearing for the central claim of a principled, label-aware message-passing scheme.
The construction of the Markovian dependence space is presented as the key modeling device for limited label sharing, but the manuscript provides no explicit definition or justification of how this space differs from standard graph neighborhoods or why the limited-sharing assumption holds across the evaluated benchmarks.

minor comments (1)

Notation for the lower and upper bounds should be introduced with explicit symbols (e.g., I_LB and I_UB) and cross-referenced to the corresponding equations in the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications from the full text and will revise the manuscript to improve presentation of the derivations and modeling assumptions.

read point-by-point responses

Referee: The abstract states that the variational bounds lead to an end-to-end architecture, yet supplies no derivation details, explicit equations, or error bars. Without these, it is impossible to verify whether the final objective reduces to a fitted trade-off hyperparameter or to quantities already present in the input data, which is load-bearing for the central claim of a principled, label-aware message-passing scheme.

Authors: The full derivations appear in Section 3: the lower bound maximizes I(Y;Z) via a variational q(y|z) approximating the predictive posterior, while the upper bound on I(X;Z) follows from the Markov chain factorization in the dependence space. The combined objective is L = -L_lower + β L_upper, where β is a hyperparameter but is directly motivated by the IB principle rather than being arbitrary or reducible to raw input quantities. We will add the key equations to the abstract and report error bars (standard deviations over 5 runs) in all tables of the revised version. revision: yes
Referee: The construction of the Markovian dependence space is presented as the key modeling device for limited label sharing, but the manuscript provides no explicit definition or justification of how this space differs from standard graph neighborhoods or why the limited-sharing assumption holds across the evaluated benchmarks.

Authors: Definition 1 in Section 2.2 defines the Markovian dependence space as the space of label-conditioned Markov chains where p(z_i | z_j, y_i, y_j) factors such that only shared labels transmit information; irrelevant labels are masked via a label-overlap indicator. This differs from standard neighborhoods by replacing uniform aggregation with label-aware gating. The limited-sharing assumption is supported by dataset statistics (average label overlap < 0.3 on all benchmarks); we will add an explicit justification paragraph and a supplementary table of per-dataset overlap values. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs a Markovian dependence space for multi-label message passing and applies standard variational bounds to maximize target-label MI while upper-bounding redundant information. These are conventional IB approximations with no reduction of the objective to fitted inputs by construction, no self-citation load-bearing the central claim, and no renaming of known results. The architecture is end-to-end and the bounds are derived from mutual-information properties without circular equivalence to the input data or assumptions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on standard variational bounds for mutual information and introduces one new modeling construct without external validation shown in the abstract.

free parameters (1)

trade-off parameter between bounds
Balances the lower bound on predictive mutual information against the upper bound on redundant information; value is not stated in abstract.

axioms (1)

standard math Mutual information admits tractable variational lower and upper bounds
Invoked to derive the constrained information transmission objective.

invented entities (1)

Markovian dependence space no independent evidence
purpose: Models label dependencies to separate predictive signals from irrelevant label noise
New construct introduced to formulate the multi-label message-passing problem; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5486 in / 1311 out tokens · 50805 ms · 2026-05-15T05:33:25.892527+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

constructs a Markovian dependence space M ... min MLGIB(Ds,Yt;Z(L)H) ≜ −I(Yt;Z(L)H)+βI(Ds;Z(L)H) ... Proposition 1 lower bound via Q1,Q2 ... Proposition 2 upper bound AIB(l)+HIB(l)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

label embeddings via skip-gram ... P(Z(l)A|A,Z(l)X) = softmax(zp,v⊤zp,u) ... HIB via reparameterized Gaussian messages

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Kriege, Nicholas J

Niloofar Azizi, Nils M. Kriege, Nicholas J. A. Harvey, and Horst Bischof. Spectral basis learning for expressive graph neural networks in link prediction. InAAAI, pages 19640–19648. AAAI Press, 2026

work page 2026
[2]

Zehmakan

Asela Hevapathige, Asiri Wijesinghe, and Ahad N. Zehmakan. Beyond fixed depth: Adaptive graph neural networks for node classification under varying homophily. InAAAI, pages 21717– 21725. AAAI Press, 2026

work page 2026
[3]

MORGAN: to bridge mixture of experts and spectral graph neural network

Lihui Liu and Yuchen Yan. MORGAN: to bridge mixture of experts and spectral graph neural network. InAAAI, pages 23783–23791. AAAI Press, 2026

work page 2026
[4]

Opolka, Pietro Lio, and José Miguel Hernández- Lobato

Richard Bergna, Sergio Calvo-Ordoñez, Felix L. Opolka, Pietro Lio, and José Miguel Hernández- Lobato. Uncertainty modeling in graph neural networks via stochastic differential equations. In ICLR. OpenReview.net, 2025

work page 2025
[5]

Towards bridging generalization and expressivity of graph neural networks

Shouheng Li, Floris Geerts, Dongwoo Kim, and Qing Wang. Towards bridging generalization and expressivity of graph neural networks. InICLR. OpenReview.net, 2025

work page 2025
[6]

On the bottleneck of graph neural networks and its practical implications

Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. InICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021

work page 2021
[7]

Bronstein

Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, and Michael M. Bronstein. Understanding over-squashing and bottlenecks on graphs via curvature. InICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022

work page 2022
[8]

Spectral graph pruning against over-squashing and over-smoothing

Adarsh Jamadandi, Celia Rubio-Madrigal, and Rebekka Burkholz. Spectral graph pruning against over-squashing and over-smoothing. InNeurIPS, 2024

work page 2024
[9]

Beyond homophily in graph neural networks: Current limitations and effective designs

Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, and Danai Koutra. Beyond homophily in graph neural networks: Current limitations and effective designs. In NeurIPS, 2020

work page 2020
[10]

Local homophily-aware graph neural network with adaptive polynomial filters for scalable graph anomaly detection

Zengyi Wo, Minglai Shao, Shiyu Zhang, and Ruijie Wang. Local homophily-aware graph neural network with adaptive polynomial filters for scalable graph anomaly detection. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3180–3191, 2025

work page 2025
[11]

Unveiling the impact of local homophily on gnn fairness: In- depth analysis and new benchmarks

Donald Loveland and Danai Koutra. Unveiling the impact of local homophily on gnn fairness: In- depth analysis and new benchmarks. InProceedings of the 2025 SIAM International Conference on Data Mining (SDM), pages 608–617. SIAM, 2025

work page 2025
[12]

Does homophily help in robust test-time node classifi- cation? InProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, pages 271–281, 2026

Yan Jiang, Ruihong Qiu, and Zi Huang. Does homophily help in robust test-time node classifi- cation? InProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining, pages 271–281, 2026

work page 2026
[13]

Large-scale multi-label learning with missing labels

Hsiang-Fu Yu, Prateek Jain, Purushottam Kar, and Inderjit Dhillon. Large-scale multi-label learning with missing labels. InInternational conference on machine learning, pages 593–601. PMLR, 2014

work page 2014
[14]

Sparse local embeddings for extreme multi-label classification.Advances in neural information processing systems, 28, 2015

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. Sparse local embeddings for extreme multi-label classification.Advances in neural information processing systems, 28, 2015

work page 2015
[15]

Correlation-aware graph convolutional networks for multi-label node classification

Yuanchen Bei, Weizhi Chen, Hao Chen, Sheng Zhou, Carl Yang, Jiapei Fan, Longtao Huang, and Jiajun Bu. Correlation-aware graph convolutional networks for multi-label node classification. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 37–48, 2025

work page 2025
[16]

Graph contrastive learning on multi-label classification for recommendations.ACM Transactions on Intelligent Systems and Technology, 16(4):1–19, 2025

Jiayang Wu, Wensheng Gan, Huashen Lu, and Philip S Yu. Graph contrastive learning on multi-label classification for recommendations.ACM Transactions on Intelligent Systems and Technology, 16(4):1–19, 2025. 10

work page 2025
[17]

Multi-label node classification with label influence propagation

Yifei Sun, Zemin Liu, Bryan Hooi, Yang Yang, Rizal Fathony, Jia Chen, and Bingsheng He. Multi-label node classification with label influence propagation. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[18]

Beyond homophily: Graph contrastive learning with macro-micro message passing

Yiyuan Chen, Donghai Guan, Weiwei Yuan, and Tianzi Zang. Beyond homophily: Graph contrastive learning with macro-micro message passing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39(15), pages 15948–15956, 2025

work page 2025
[19]

Expressive power of temporal message passing

Przemysław Andrzej Wał˛ ega and Michael Rawson. Expressive power of temporal message passing. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39(20), pages 21000–21008, 2025

work page 2025
[20]

Mitigat- ing over-squashing in graph neural networks by spectrum-preserving sparsification

Langzhang Liang, Fanchen Bu, Zixing Song, Zenglin Xu, Shirui Pan, and Kijung Shin. Mitigat- ing over-squashing in graph neural networks by spectrum-preserving sparsification. InICML, Proceedings of Machine Learning Research. PMLR / OpenReview.net, 2025

work page 2025
[21]

Empirical study of over-squashing in gnns and causal estimation of rewiring strategies

Danial Saber and Amirali Salehi-Abari. Empirical study of over-squashing in gnns and causal estimation of rewiring strategies. InCIKM, pages 2525–2534. ACM, 2025

work page 2025
[22]

Multi-label graph node classification with label attentive neighborhood convolution.Expert Systems with Applications, 180:115063, 2021

Cangqi Zhou, Hui Chen, Jing Zhang, Qianmu Li, Dianming Hu, and Victor S Sheng. Multi-label graph node classification with label attentive neighborhood convolution.Expert Systems with Applications, 180:115063, 2021

work page 2021
[23]

Semantic guide for semi-supervised few-shot multi-label node classification.Information Sciences, 591: 235–250, 2022

Lin Xiao, Pengyu Xu, Liping Jing, Uchenna Akujuobi, and Xiangliang Zhang. Semantic guide for semi-supervised few-shot multi-label node classification.Information Sciences, 591: 235–250, 2022

work page 2022
[24]

Semi-supervised graph embedding for multi-label graph node classification

Kaisheng Gao, Jing Zhang, and Cangqi Zhou. Semi-supervised graph embedding for multi-label graph node classification. InInternational Conference on Web Information Systems Engineering, pages 555–567. Springer, 2019

work page 2019
[25]

Fosr: First-order spectral rewiring for addressing oversquashing in gnns

Kedar Karhadkar, Pradeep Banerjee, and Guido Montufar. Fosr: First-order spectral rewiring for addressing oversquashing in gnns. InInternational Conference on Learning Representations, 2023

work page 2023
[26]

Revisiting over-smoothing and over-squashing using ollivier-ricci curvature

Khang Nguyen, Nong Minh Hieu, Vinh Duc Nguyen, Nhat Ho, Stanley Osher, and Tan Minh Nguyen. Revisiting over-smoothing and over-squashing using ollivier-ricci curvature. In International Conference on Machine Learning, pages 25956–25979. PMLR, 2023

work page 2023
[27]

The information bottleneck method

Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000

work page internal anchor Pith review Pith/arXiv arXiv 2000
[28]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. Ieee, 2015

work page 2015
[29]

On the information bottleneck theory of deep learning.Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124020, 2019

Andrew M Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan D Tracey, and David D Cox. On the information bottleneck theory of deep learning.Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124020, 2019

work page 2019
[30]

Graph information bottleneck.Advances in Neural Information Processing Systems, 33:20437–20448, 2020

Tailin Wu, Hongyu Ren, Pan Li, and Jure Leskovec. Graph information bottleneck.Advances in Neural Information Processing Systems, 33:20437–20448, 2020

work page 2020
[31]

Deep variational information bottleneck

Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. Deep variational information bottleneck. InInternational Conference on Learning Representations, 2017

work page 2017
[32]

Distributed repre- sentations of words and phrases and their compositionality.Advances in neural information processing systems, 26, 2013

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed repre- sentations of words and phrases and their compositionality.Advances in neural information processing systems, 26, 2013

work page 2013
[33]

Kingma and Max Welling

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. InInternational Conference on Learning Representations (ICLR), 2014

work page 2014
[34]

Collaborative graph walk for semi-supervised multi-label node classification

Uchenna Akujuobi, Han Yufei, Qiannan Zhang, and Xiangliang Zhang. Collaborative graph walk for semi-supervised multi-label node classification. In2019 IEEE International Conference on Data Mining (ICDM), pages 1–10. IEEE, 2019. 11

work page 2019
[35]

Knowledge discovery in the blogosphere: Ap- proaches and challenges.IEEE internet computing, 14(2):24–32, 2010

Geetika Lakshmanan and Martin Oberhofer. Knowledge discovery in the blogosphere: Ap- proaches and challenges.IEEE internet computing, 14(2):24–32, 2010

work page 2010
[36]

Multi-label node classification on graph-structured data.arXiv preprint arXiv:2304.10398, 2023

Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, and Megha Khosla. Multi-label node classification on graph-structured data.arXiv preprint arXiv:2304.10398, 2023

work page arXiv 2023
[37]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=SJU4ayYgl

work page 2017
[38]

Graph attention networks

Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018

work page 2018
[39]

Collective classification in network data.AI magazine, 29(3):93–93, 2008

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classification in network data.AI magazine, 29(3):93–93, 2008

work page 2008
[40]

The distribution of the flora in the alpine zone

Paul Jaccard. The distribution of the flora in the alpine zone. 1.New phytologist, 11(2):37–50, 1912

work page 1912
[41]

Estimating divergence func- tionals and the likelihood ratio by convex risk minimization.IEEE Transactions on Information Theory, 56(11):5847–5861, 2010

XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence func- tionals and the likelihood ratio by convex risk minimization.IEEE Transactions on Information Theory, 56(11):5847–5861, 2010

work page 2010
[42]

1 + log Q1(Yt |Z (L) H ) Q2(Yt) # = 1 +E P(Yt,Z(L) H )

Thomas M. Cover and Joy A. Thomas.Elements of information theory. John Wiley & Sons, 2012. A Theoretical Supplement A.1 Proof of Proposition 1 Proof. To derive the lower bound of the mutual information I(Y t;Z (L) H ), we utilize the Nguyen- Wainwright-Jordan lower bound [41]. For any two distributionsp(X)andq(X): KL(p∥q) = sup g Ep(X)[g(X)]−E q(X)[exp(g(...

work page 2012