Text-attributed Graph Condensation via Text Selection and Attribute Matching

Guojia Wan; Hao Huang; Hao Wang; Haowei Han; Jiawei Jiang; Shanshan Feng; Xiao Yan; Yuxiang Wang

arxiv: 2606.03839 · v1 · pith:GXX2D6W5new · submitted 2026-06-02 · 💻 cs.LG

Text-attributed Graph Condensation via Text Selection and Attribute Matching

Haowei Han , Yuxiang Wang , Guojia Wan , Hao Wang , Shanshan Feng , Hao Huang , Jiawei Jiang , Xiao Yan This is my paper

Pith reviewed 2026-06-28 10:39 UTC · model grok-4.3

classification 💻 cs.LG

keywords text-attributed graphsgraph condensationmutual informationattribute similarityGNNlanguage modelsdata compressiontraining efficiency

0 comments

The pith

TAGSAM condenses text-attributed graphs to 1% size while preserving competitive accuracy for joint GNN and language model training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to lower the space and time costs of jointly training graph neural networks with language models on text-attributed graphs, where each node carries a text description. It does so by proposing TAGSAM, which selects and merges representative text chunks from related nodes through mutual information maximization and aligns stable attribute similarity matrices to compress the graph topology without the variance seen in trajectory-matching approaches. A sympathetic reader would care because full-scale joint training becomes impractical on large datasets, and a reliable condensation method could make such models usable on ordinary hardware. The reported results show an average 4.9% accuracy gain over the strongest baseline at matched sizes and sustained performance down to 1% condensation.

Core claim

By selecting subgraph texts that maximize mutual information and matching attribute similarities to produce stable matrices, TAGSAM produces condensed text-attributed graphs that retain the information needed for accurate joint GNN-LM training, outperforming prior condensation baselines by 4.9% on average at the same size and remaining competitive at 1% size.

What carries the argument

Subgraph text selection by mutual information maximization together with attribute similarity matching that aligns stable similarity matrices, which compress text content and graph topology respectively.

If this is right

Training accuracy stays competitive even when the text-attributed graph is reduced to 1% of its original size.
Accuracy improves by an average of 4.9% over the strongest baseline at any given compressed size.
Space and time requirements for joint GNN-LM training drop substantially on large text-attributed graphs.
The variance problem in training-trajectory matching methods is reduced by using stable attribute similarity matrices instead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selection and matching steps could be tested on graphs whose node attributes are images or other modalities rather than text.
Whether the condensed graphs support downstream tasks such as link prediction or node classification beyond the training objective remains open.
Combining the condensation with existing model-compression techniques might allow further size reductions while tracking accuracy.

Load-bearing premise

Maximizing mutual information for text chunk selection and aligning stable attribute similarity matrices will preserve the information required for accurate joint GNN and language-model training across the evaluated datasets and compression ratios.

What would settle it

If accuracy on any of the evaluated datasets falls below the best baseline when the graph is condensed to 1% size, the claim that the two mechanisms preserve necessary training information would not hold.

Figures

Figures reproduced from arXiv: 2606.03839 by Guojia Wan, Hao Huang, Hao Wang, Haowei Han, Jiawei Jiang, Shanshan Feng, Xiao Yan, Yuxiang Wang.

**Figure 3.** Figure 3: The overview of TAGSAM framework. Subgraph text selection (left) selects the representative chunks and merge [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 3.** Figure 3: The weight for each node’s features is determined by the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of attribute similarity matrices for [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Running time (in log scale) for hyperparameter [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: The impact of subgraph size for subgraph text se [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: The ablation study for different variants. The [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TAGSAM pairs mutual-info text selection with attribute-similarity matching for TAG condensation, but the 4.9% gain claim has no variance numbers or stats despite targeting MTT variance issues.

read the letter

The main things to know are that TAGSAM introduces mutual-information maximization to pick and merge representative text chunks across related nodes, plus attribute-similarity matrix alignment to stabilize the topology side of condensation. It reports average accuracy lifts of 4.9% over the strongest baseline at matched sizes and holds up reasonably at 1% condensation.

What is actually new is the specific pairing of those two pieces for text-attributed graphs. The text-selection step is a direct response to having multiple node descriptions, and the similarity-matching step is positioned as a fix for the high run-to-run variance that MTT-style methods show. The paper releases code, which is useful.

The designs are straightforward and address a practical cost problem in joint GNN plus language-model training on TAGs. The 1% size result is aggressive enough to be worth noting for anyone who needs to shrink large graphs.

The soft spot is the evaluation. The central accuracy claims rest on point estimates with no standard deviations, no seed counts, and no direct variance comparison to the MTT baselines. Since the attribute-matching component is sold as the solution to variance, the absence of any variance data makes it impossible to tell whether the reported delta is reliable or just noise. Dataset characteristics and exact baseline setups are also thin in the available description.

This is for people who work on graph condensation or efficient TAG training and want concrete implementation ideas. A reader could extract the two design choices and test them, but anyone needing dependable numbers would have to wait for stronger evidence.

I would send it to peer review once the authors add variance reporting, ablations on each component, and full experimental details; the core approach is grounded enough to justify referee time.

Referee Report

2 major / 1 minor

Summary. The paper proposes TAGSAM, a condensation method for text-attributed graphs (TAGs) that jointly compresses text descriptions and graph topology to enable efficient training of GNNs with language models. Key components are subgraph text selection (via mutual information maximization on representative chunks) and attribute similarity matching (to align stable similarity matrices and reduce variance in MTT-based condensation). Empirical evaluation against six baselines reports an average 4.9% accuracy gain at matched compressed sizes and competitive performance even at 1% condensation ratio, with code released.

Significance. If the empirical claims hold under proper statistical validation, the work would be significant for scalable TAG modeling on large datasets, as it directly targets the high space/time costs of joint GNN+LM training. The release of code supports reproducibility, which strengthens the contribution if the variance-reduction claim can be substantiated.

major comments (2)

[Abstract and evaluation] Abstract and evaluation section: The central claim of a 4.9% average accuracy improvement over the best-performing baseline (and competitiveness at 1% size) is reported only as point estimates with no standard deviations, number of random seeds, variance comparisons to MTT baselines, or statistical tests. This directly undermines the motivation that attribute similarity matching mitigates high variance in MTT methods, as no evidence is provided that the observed delta exceeds run-to-run fluctuation.
[Evaluation] Evaluation: The selection of the 'best-performing baseline' for the 4.9% comparison is post-hoc without details on exact baseline implementations, hyperparameter tuning protocols, or dataset characteristics (e.g., node counts, text lengths, label distributions), making the cross-method claim difficult to verify or reproduce from the reported results alone.

minor comments (1)

[Abstract] Abstract: The six baselines are not named; listing them would improve clarity without lengthening the abstract substantially.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on statistical validation and experimental transparency. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract and evaluation] Abstract and evaluation section: The central claim of a 4.9% average accuracy improvement over the best-performing baseline (and competitiveness at 1% size) is reported only as point estimates with no standard deviations, number of random seeds, variance comparisons to MTT baselines, or statistical tests. This directly undermines the motivation that attribute similarity matching mitigates high variance in MTT methods, as no evidence is provided that the observed delta exceeds run-to-run fluctuation.

Authors: We agree this is a substantive limitation in the current manuscript. The reported 4.9% gain and variance-reduction motivation rest on point estimates alone, which does not allow readers to assess whether the improvement exceeds typical run-to-run variation. In the revised version we will rerun all experiments using at least five random seeds, report mean accuracy ± standard deviation for every method and condensation ratio, include a direct variance comparison between TAGSAM and the MTT baselines, and add paired statistical significance tests (e.g., t-tests with p-values) against the strongest baseline. These additions will be placed in the main evaluation section and an expanded appendix. revision: yes
Referee: [Evaluation] Evaluation: The selection of the 'best-performing baseline' for the 4.9% comparison is post-hoc without details on exact baseline implementations, hyperparameter tuning protocols, or dataset characteristics (e.g., node counts, text lengths, label distributions), making the cross-method claim difficult to verify or reproduce from the reported results alone.

Authors: We acknowledge that greater detail is required for reproducibility. The revised manuscript will include (1) an appendix table listing the exact re-implementation choices and hyperparameter grids used for each of the six baselines, (2) a new dataset-statistics table reporting node counts, average text length, label balance, and condensation ratios for every dataset, and (3) explicit language clarifying that the "best-performing baseline" is the method achieving the highest accuracy under the same evaluation protocol we applied to TAGSAM. These changes will allow independent verification of the 4.9% figure. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with independent components

full rationale

The paper introduces TAGSAM, a condensation technique using subgraph text selection (via mutual information maximization) and attribute similarity matching to address variance in MTT-based methods. All central claims rest on empirical accuracy comparisons against baselines at matched compression ratios, with no equations, predictions, or derivations presented that reduce to fitted inputs or self-citations by construction. The method components are defined independently of the evaluation outcomes, and no load-bearing self-citation chains or ansatzes are invoked in the provided text. This is a standard empirical contribution without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard information-theoretic and similarity-based assumptions common to condensation literature; no new entities are postulated and no free parameters are explicitly fitted in the abstract description.

axioms (2)

domain assumption Maximizing mutual information between selected text chunks and original descriptions preserves task-relevant semantics.
Invoked to justify the text-selection component.
domain assumption Stable attribute similarity matrices are sufficient proxies for training dynamics in MTT-style condensation.
Invoked to justify replacement of trajectory matching.

pith-pipeline@v0.9.1-grok · 5782 in / 1278 out tokens · 34911 ms · 2026-06-28T10:39:21.259808+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 7 canonical work pages · 5 internal anchors

[1]

William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. 2024. ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings. InProceedings of the Workshop on Graph-based Methods for Natural Language Processing (TextGraphs)

2024
[2]

Vladimir Braverman, Vincent Cohen-Addad, H-C Shaofeng Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. 2022. The power of uniform sampling for coresets. InProceedings of the Annual IEEE Symposium on Foundations of Computer Science (FOCS)

2022
[3]

Jonathan Chang and David Blei. 2009. Relational Topic Models for Document Networks. InProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)

2009
[4]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu- tional neural networks on graphs with fast localized spectral filtering. InProceed- ings of the Conference on Neural Information Processing Systems (NeurIPS)

2016
[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

2019
[6]

Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xi- ang Wang, and Xiangnan He. 2024. EXGC: Bridging Efficiency and Explainability in Graph Condensation. InProceedings of the Web Conference (WWW)

2024
[7]

2009.Facility location: concepts, models, algorithms and case studies

Reza Zanjirani Farahani and Masoud Hekmatfar. 2009.Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media

2009
[8]

Xinyi Gao, Guanhua Ye, Tong Chen, Wentao Zhang, Junliang Yu, and Hongzhi Yin. 2025. Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition. InProceedings of the Web Conference (WWW)

2025
[9]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

2017
[10]

Sariel Har-Peled, Dan Roth, and Dav A Zimak. 2006. Maximum margin coresets for active and noise tolerant learning. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

2006
[11]

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: datasets for machine learning on graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

2020
[12]

Qiang Huang, Xiao Yan, Xin Wang, Susie Xi Rao, Zhichao Han, Fangcheng Fu, Wentao Zhang, and Jiawei Jiang. 2024. Retrofitting temporal graph neural networks with transformer.arXiv preprint arXiv:2409.05477(2024)

work page arXiv 2024
[13]

Song Jin, Xiantao Cai, and Jiawei Jiang. 2025. EDGaE: Efficient Distributed Graph Neural Network Training System at the Edge. InProceedings of the International Conference on Intelligent Computing (ICIC)

2025
[14]

Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah
[15]

InProceedings of the International Conference on Learning Representations (ICLR)

Graph condensation for graph neural networks. InProceedings of the International Conference on Learning Representations (ICLR)
[16]

Siddharth Joshi, Jiayi Ni, and Baharan Mirzasoleiman. 2024. Dataset distilla- tion via knowledge distillation: towards efficient self-supervised pre-training of deep networks. InProceedings of the International Conference on Learning Representations (ICLR)

2024
[17]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. InProceedings of the International Conference on Learning Representations (ICLR)

2016
[18]

Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, and Yapeng Tian. 2024. Audio-visual dataset distillation.Transactions on Machine Learning Research (TMLR)(2024)

2024
[19]

Yichuan Li, Kaize Ding, and Kyumin Lee. 2023. GRENADE: graph-centric lan- guage model for self-supervised representation learning on text-attributed graphs. InFindings of the Association for Computational Linguistics: EMNLP

2023
[20]

Yuxuan Liang, Wentao Zhang, Zeang Sheng, Ling Yang, Quanqing Xu, Jiawei Jiang, Yunhai Tong, and Bin Cui. 2025. Towards Scalable and Deep Graph Neural Networks via Noise Masking. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI)

2025
[21]

Mengyang Liu, Shanchuan Li, Xinshi Chen, and Le Song. 2022. Graph conden- sation via receptive field distribution matching.arXiv preprint arXiv:2206.13697 (2022)

work page arXiv 2022
[22]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: a robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[23]

Yunhui Liu, Qizhuo Xie, Jinwei Shi, Jiaxu Shen, and Tieke He. 2025. Multi- Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains. In Proceedings of the Web Conference (WWW)

2025
[24]

Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Xiaonan Nie, Jiawei Jiang, Wentao Zhang, and Bin Cui. 2025. Acceleration Algorithms in GNNs: A Survey.Transactions on Knowledge and Data Engineering (TKDE) (2025)

2025
[25]

Bishwas Mandal, Sarthak Khanal, and Doina Caragea. 2024. Contrastive Learning for Multimodal Classification of Crisis related Tweets. InProceedings of the Web Conference (WWW)

2024
[26]

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approxima- tions for maximizing submodular set functions–I.Math. Program.(1978)

1978
[27]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners.OpenAI Blog (2019). https://api.semanticscholar.org/CorpusID:160025533

2019
[29]

Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[30]

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data.AI Magazine (2008)

2008
[31]

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

A Vaswani. 2017. Attention is all you need. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

2017
[33]

Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough.Quantitative Science Studies(2020)

2020
[34]

Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, and Qing Li. 2024. Fast Graph Conden- sation with Structure-based Neural Tangent Kernel. InProceedings of the Web Conference (WWW)

2024
[35]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation.arXiv preprint arXiv:1811.10959(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

Xin Wang, Jiawei Jiang, Xiao Yan, and Qiang Huang. 2025. TESA: A Trajectory and Semantic-aware Dynamic Heterogeneous Graph Neural Network. InProceedings of the Web Conference (WWW)

2025
[37]

Yuxiang Wang, Xiao Yan, Chuang Hu, Quanqing Xu, Chuanhui Yang, Fangcheng Fu, Wentao Zhang, Hao Wang, Bo Du, and Jiawei Jiang. 2024. Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning. InInternational Conference on Data Engineering (ICDE)

2024
[38]

Yuxiang Wang, Xiao Yan, Shiyu Jin, Hao Huang, Quanqing Xu, Qingchen Zhang, Bo Du, and Jiawei Jiang. 2024. Self-Supervised Learning for Graph Dataset Condensation. InProceedings of the Conference on Knowledge Discovery and Data Mining (KDD)

2024
[39]

Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, and Jiawei Jiang. 2025. Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

2025
[40]

Max Welling. 2009. Herding dynamical weights to learn. InProceedings of the International Conference on Machine Learning (ICML)

2009
[41]

Zhihao Wen and Yuan Fang. 2023. Augmenting low-resource text classifica- tion with graph-grounded pre-training and prompting. InProceedings of the International Conference on Research and Development in Information Retrieval (SIGIR)

2023
[42]

Xindi Wu, Zhiwei Deng, and Olga Russakovsky. 2023. Vision-Language Dataset Distillation.Transactions on Machine Learning Research (TMLR)(2023)

2023
[43]

Zhenbang Xiao, Yu Wang, Shunyu Liu, Bingde Hu, Huiqiong Wang, Mingli Song, and Tongya Zheng. 2025. Disentangled Condensation for Large-scale Graphs. In Proceedings of the Web Conference (WWW)

2025
[44]

Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, and Yong-Lu Li. 2024. Low-rank similar- ity mining for multimodal dataset distillation. InProceedings of the International Conference on Machine Learning (ICML)

2024
[45]

Hongzhi Yin, Xinyi Gao, Junliang Yu, Ruihong Qiu, Tong Chen, Quoc Viet Hung Nguyen, and Zi Huang. 2025. Graph Condensation: Foundations, Methods and Prospects. InProceedings of the Web Conference (WWW)

2025
[46]

Qi Zhang, Yifei Wang, and Yisen Wang. 2023. On the generalization of multi- modal contrastive learning. InProceedings of the International Conference on Machine Learning (ICML)

2023
[47]

Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution match- ing. InProceedings of the Winter Conference on Applications of Computer Vision (W ACV)

2023
[48]

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. Dataset condensation with gradient matching. InProceedings of the International Conference on Learning Representations (ICLR)

2020
[49]

Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. 2024. Graphclip: enhancing transferability in graph foundation models for text-attributed graphs. InProceedings of the Web Conference (WWW). WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates. Haowei Han et al. A Proof of Theorem 1 Proof. Firstly...

2024

[1] [1]

William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. 2024. ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings. InProceedings of the Workshop on Graph-based Methods for Natural Language Processing (TextGraphs)

2024

[2] [2]

Vladimir Braverman, Vincent Cohen-Addad, H-C Shaofeng Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. 2022. The power of uniform sampling for coresets. InProceedings of the Annual IEEE Symposium on Foundations of Computer Science (FOCS)

2022

[3] [3]

Jonathan Chang and David Blei. 2009. Relational Topic Models for Document Networks. InProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)

2009

[4] [4]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu- tional neural networks on graphs with fast localized spectral filtering. InProceed- ings of the Conference on Neural Information Processing Systems (NeurIPS)

2016

[5] [5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

2019

[6] [6]

Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xi- ang Wang, and Xiangnan He. 2024. EXGC: Bridging Efficiency and Explainability in Graph Condensation. InProceedings of the Web Conference (WWW)

2024

[7] [7]

2009.Facility location: concepts, models, algorithms and case studies

Reza Zanjirani Farahani and Masoud Hekmatfar. 2009.Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media

2009

[8] [8]

Xinyi Gao, Guanhua Ye, Tong Chen, Wentao Zhang, Junliang Yu, and Hongzhi Yin. 2025. Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition. InProceedings of the Web Conference (WWW)

2025

[9] [9]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

2017

[10] [10]

Sariel Har-Peled, Dan Roth, and Dav A Zimak. 2006. Maximum margin coresets for active and noise tolerant learning. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

2006

[11] [11]

Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: datasets for machine learning on graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

2020

[12] [12]

Qiang Huang, Xiao Yan, Xin Wang, Susie Xi Rao, Zhichao Han, Fangcheng Fu, Wentao Zhang, and Jiawei Jiang. 2024. Retrofitting temporal graph neural networks with transformer.arXiv preprint arXiv:2409.05477(2024)

work page arXiv 2024

[13] [13]

Song Jin, Xiantao Cai, and Jiawei Jiang. 2025. EDGaE: Efficient Distributed Graph Neural Network Training System at the Edge. InProceedings of the International Conference on Intelligent Computing (ICIC)

2025

[14] [14]

Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah

[15] [15]

InProceedings of the International Conference on Learning Representations (ICLR)

Graph condensation for graph neural networks. InProceedings of the International Conference on Learning Representations (ICLR)

[16] [16]

Siddharth Joshi, Jiayi Ni, and Baharan Mirzasoleiman. 2024. Dataset distilla- tion via knowledge distillation: towards efficient self-supervised pre-training of deep networks. InProceedings of the International Conference on Learning Representations (ICLR)

2024

[17] [17]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. InProceedings of the International Conference on Learning Representations (ICLR)

2016

[18] [18]

Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, and Yapeng Tian. 2024. Audio-visual dataset distillation.Transactions on Machine Learning Research (TMLR)(2024)

2024

[19] [19]

Yichuan Li, Kaize Ding, and Kyumin Lee. 2023. GRENADE: graph-centric lan- guage model for self-supervised representation learning on text-attributed graphs. InFindings of the Association for Computational Linguistics: EMNLP

2023

[20] [20]

Yuxuan Liang, Wentao Zhang, Zeang Sheng, Ling Yang, Quanqing Xu, Jiawei Jiang, Yunhai Tong, and Bin Cui. 2025. Towards Scalable and Deep Graph Neural Networks via Noise Masking. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI)

2025

[21] [21]

Mengyang Liu, Shanchuan Li, Xinshi Chen, and Le Song. 2022. Graph conden- sation via receptive field distribution matching.arXiv preprint arXiv:2206.13697 (2022)

work page arXiv 2022

[22] [22]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: a robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[23] [23]

Yunhui Liu, Qizhuo Xie, Jinwei Shi, Jiaxu Shen, and Tieke He. 2025. Multi- Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains. In Proceedings of the Web Conference (WWW)

2025

[24] [24]

Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Xiaonan Nie, Jiawei Jiang, Wentao Zhang, and Bin Cui. 2025. Acceleration Algorithms in GNNs: A Survey.Transactions on Knowledge and Data Engineering (TKDE) (2025)

2025

[25] [25]

Bishwas Mandal, Sarthak Khanal, and Doina Caragea. 2024. Contrastive Learning for Multimodal Classification of Crisis related Tweets. InProceedings of the Web Conference (WWW)

2024

[26] [26]

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approxima- tions for maximizing submodular set functions–I.Math. Program.(1978)

1978

[27] [27]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners.OpenAI Blog (2019). https://api.semanticscholar.org/CorpusID:160025533

2019

[29] [29]

Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[30] [30]

Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data.AI Magazine (2008)

2008

[31] [31]

Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

A Vaswani. 2017. Attention is all you need. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

2017

[33] [33]

Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough.Quantitative Science Studies(2020)

2020

[34] [34]

Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, and Qing Li. 2024. Fast Graph Conden- sation with Structure-based Neural Tangent Kernel. InProceedings of the Web Conference (WWW)

2024

[35] [35]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation.arXiv preprint arXiv:1811.10959(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

Xin Wang, Jiawei Jiang, Xiao Yan, and Qiang Huang. 2025. TESA: A Trajectory and Semantic-aware Dynamic Heterogeneous Graph Neural Network. InProceedings of the Web Conference (WWW)

2025

[37] [37]

Yuxiang Wang, Xiao Yan, Chuang Hu, Quanqing Xu, Chuanhui Yang, Fangcheng Fu, Wentao Zhang, Hao Wang, Bo Du, and Jiawei Jiang. 2024. Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning. InInternational Conference on Data Engineering (ICDE)

2024

[38] [38]

Yuxiang Wang, Xiao Yan, Shiyu Jin, Hao Huang, Quanqing Xu, Qingchen Zhang, Bo Du, and Jiawei Jiang. 2024. Self-Supervised Learning for Graph Dataset Condensation. InProceedings of the Conference on Knowledge Discovery and Data Mining (KDD)

2024

[39] [39]

Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, and Jiawei Jiang. 2025. Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

2025

[40] [40]

Max Welling. 2009. Herding dynamical weights to learn. InProceedings of the International Conference on Machine Learning (ICML)

2009

[41] [41]

Zhihao Wen and Yuan Fang. 2023. Augmenting low-resource text classifica- tion with graph-grounded pre-training and prompting. InProceedings of the International Conference on Research and Development in Information Retrieval (SIGIR)

2023

[42] [42]

Xindi Wu, Zhiwei Deng, and Olga Russakovsky. 2023. Vision-Language Dataset Distillation.Transactions on Machine Learning Research (TMLR)(2023)

2023

[43] [43]

Zhenbang Xiao, Yu Wang, Shunyu Liu, Bingde Hu, Huiqiong Wang, Mingli Song, and Tongya Zheng. 2025. Disentangled Condensation for Large-scale Graphs. In Proceedings of the Web Conference (WWW)

2025

[44] [44]

Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, and Yong-Lu Li. 2024. Low-rank similar- ity mining for multimodal dataset distillation. InProceedings of the International Conference on Machine Learning (ICML)

2024

[45] [45]

Hongzhi Yin, Xinyi Gao, Junliang Yu, Ruihong Qiu, Tong Chen, Quoc Viet Hung Nguyen, and Zi Huang. 2025. Graph Condensation: Foundations, Methods and Prospects. InProceedings of the Web Conference (WWW)

2025

[46] [46]

Qi Zhang, Yifei Wang, and Yisen Wang. 2023. On the generalization of multi- modal contrastive learning. InProceedings of the International Conference on Machine Learning (ICML)

2023

[47] [47]

Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution match- ing. InProceedings of the Winter Conference on Applications of Computer Vision (W ACV)

2023

[48] [48]

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. Dataset condensation with gradient matching. InProceedings of the International Conference on Learning Representations (ICLR)

2020

[49] [49]

Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. 2024. Graphclip: enhancing transferability in graph foundation models for text-attributed graphs. InProceedings of the Web Conference (WWW). WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates. Haowei Han et al. A Proof of Theorem 1 Proof. Firstly...

2024