Text-attributed Graph Condensation via Text Selection and Attribute Matching
Pith reviewed 2026-06-28 10:39 UTC · model grok-4.3
The pith
TAGSAM condenses text-attributed graphs to 1% size while preserving competitive accuracy for joint GNN and language model training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By selecting subgraph texts that maximize mutual information and matching attribute similarities to produce stable matrices, TAGSAM produces condensed text-attributed graphs that retain the information needed for accurate joint GNN-LM training, outperforming prior condensation baselines by 4.9% on average at the same size and remaining competitive at 1% size.
What carries the argument
Subgraph text selection by mutual information maximization together with attribute similarity matching that aligns stable similarity matrices, which compress text content and graph topology respectively.
If this is right
- Training accuracy stays competitive even when the text-attributed graph is reduced to 1% of its original size.
- Accuracy improves by an average of 4.9% over the strongest baseline at any given compressed size.
- Space and time requirements for joint GNN-LM training drop substantially on large text-attributed graphs.
- The variance problem in training-trajectory matching methods is reduced by using stable attribute similarity matrices instead.
Where Pith is reading between the lines
- The same selection and matching steps could be tested on graphs whose node attributes are images or other modalities rather than text.
- Whether the condensed graphs support downstream tasks such as link prediction or node classification beyond the training objective remains open.
- Combining the condensation with existing model-compression techniques might allow further size reductions while tracking accuracy.
Load-bearing premise
Maximizing mutual information for text chunk selection and aligning stable attribute similarity matrices will preserve the information required for accurate joint GNN and language-model training across the evaluated datasets and compression ratios.
What would settle it
If accuracy on any of the evaluated datasets falls below the best baseline when the graph is condensed to 1% size, the claim that the two mechanisms preserve necessary training information would not hold.
Figures
read the original abstract
Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TAGSAM, a condensation method for text-attributed graphs (TAGs) that jointly compresses text descriptions and graph topology to enable efficient training of GNNs with language models. Key components are subgraph text selection (via mutual information maximization on representative chunks) and attribute similarity matching (to align stable similarity matrices and reduce variance in MTT-based condensation). Empirical evaluation against six baselines reports an average 4.9% accuracy gain at matched compressed sizes and competitive performance even at 1% condensation ratio, with code released.
Significance. If the empirical claims hold under proper statistical validation, the work would be significant for scalable TAG modeling on large datasets, as it directly targets the high space/time costs of joint GNN+LM training. The release of code supports reproducibility, which strengthens the contribution if the variance-reduction claim can be substantiated.
major comments (2)
- [Abstract and evaluation] Abstract and evaluation section: The central claim of a 4.9% average accuracy improvement over the best-performing baseline (and competitiveness at 1% size) is reported only as point estimates with no standard deviations, number of random seeds, variance comparisons to MTT baselines, or statistical tests. This directly undermines the motivation that attribute similarity matching mitigates high variance in MTT methods, as no evidence is provided that the observed delta exceeds run-to-run fluctuation.
- [Evaluation] Evaluation: The selection of the 'best-performing baseline' for the 4.9% comparison is post-hoc without details on exact baseline implementations, hyperparameter tuning protocols, or dataset characteristics (e.g., node counts, text lengths, label distributions), making the cross-method claim difficult to verify or reproduce from the reported results alone.
minor comments (1)
- [Abstract] Abstract: The six baselines are not named; listing them would improve clarity without lengthening the abstract substantially.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on statistical validation and experimental transparency. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: [Abstract and evaluation] Abstract and evaluation section: The central claim of a 4.9% average accuracy improvement over the best-performing baseline (and competitiveness at 1% size) is reported only as point estimates with no standard deviations, number of random seeds, variance comparisons to MTT baselines, or statistical tests. This directly undermines the motivation that attribute similarity matching mitigates high variance in MTT methods, as no evidence is provided that the observed delta exceeds run-to-run fluctuation.
Authors: We agree this is a substantive limitation in the current manuscript. The reported 4.9% gain and variance-reduction motivation rest on point estimates alone, which does not allow readers to assess whether the improvement exceeds typical run-to-run variation. In the revised version we will rerun all experiments using at least five random seeds, report mean accuracy ± standard deviation for every method and condensation ratio, include a direct variance comparison between TAGSAM and the MTT baselines, and add paired statistical significance tests (e.g., t-tests with p-values) against the strongest baseline. These additions will be placed in the main evaluation section and an expanded appendix. revision: yes
-
Referee: [Evaluation] Evaluation: The selection of the 'best-performing baseline' for the 4.9% comparison is post-hoc without details on exact baseline implementations, hyperparameter tuning protocols, or dataset characteristics (e.g., node counts, text lengths, label distributions), making the cross-method claim difficult to verify or reproduce from the reported results alone.
Authors: We acknowledge that greater detail is required for reproducibility. The revised manuscript will include (1) an appendix table listing the exact re-implementation choices and hyperparameter grids used for each of the six baselines, (2) a new dataset-statistics table reporting node counts, average text length, label balance, and condensation ratios for every dataset, and (3) explicit language clarifying that the "best-performing baseline" is the method achieving the highest accuracy under the same evaluation protocol we applied to TAGSAM. These changes will allow independent verification of the 4.9% figure. revision: yes
Circularity Check
No circularity; empirical method with independent components
full rationale
The paper introduces TAGSAM, a condensation technique using subgraph text selection (via mutual information maximization) and attribute similarity matching to address variance in MTT-based methods. All central claims rest on empirical accuracy comparisons against baselines at matched compression ratios, with no equations, predictions, or derivations presented that reduce to fitted inputs or self-citations by construction. The method components are defined independently of the evaluation outcomes, and no load-bearing self-citation chains or ansatzes are invoked in the provided text. This is a standard empirical contribution without self-referential reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Maximizing mutual information between selected text chunks and original descriptions preserves task-relevant semantics.
- domain assumption Stable attribute similarity matrices are sufficient proxies for training dynamics in MTT-style condensation.
Reference graph
Works this paper leans on
-
[1]
William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. 2024. ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings. InProceedings of the Workshop on Graph-based Methods for Natural Language Processing (TextGraphs)
2024
-
[2]
Vladimir Braverman, Vincent Cohen-Addad, H-C Shaofeng Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. 2022. The power of uniform sampling for coresets. InProceedings of the Annual IEEE Symposium on Foundations of Computer Science (FOCS)
2022
-
[3]
Jonathan Chang and David Blei. 2009. Relational Topic Models for Document Networks. InProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)
2009
-
[4]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu- tional neural networks on graphs with fast localized spectral filtering. InProceed- ings of the Conference on Neural Information Processing Systems (NeurIPS)
2016
-
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
2019
-
[6]
Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xi- ang Wang, and Xiangnan He. 2024. EXGC: Bridging Efficiency and Explainability in Graph Condensation. InProceedings of the Web Conference (WWW)
2024
-
[7]
2009.Facility location: concepts, models, algorithms and case studies
Reza Zanjirani Farahani and Masoud Hekmatfar. 2009.Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media
2009
-
[8]
Xinyi Gao, Guanhua Ye, Tong Chen, Wentao Zhang, Junliang Yu, and Hongzhi Yin. 2025. Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition. InProceedings of the Web Conference (WWW)
2025
-
[9]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)
2017
-
[10]
Sariel Har-Peled, Dan Roth, and Dav A Zimak. 2006. Maximum margin coresets for active and noise tolerant learning. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
2006
-
[11]
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: datasets for machine learning on graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)
2020
- [12]
-
[13]
Song Jin, Xiantao Cai, and Jiawei Jiang. 2025. EDGaE: Efficient Distributed Graph Neural Network Training System at the Edge. InProceedings of the International Conference on Intelligent Computing (ICIC)
2025
-
[14]
Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah
-
[15]
InProceedings of the International Conference on Learning Representations (ICLR)
Graph condensation for graph neural networks. InProceedings of the International Conference on Learning Representations (ICLR)
-
[16]
Siddharth Joshi, Jiayi Ni, and Baharan Mirzasoleiman. 2024. Dataset distilla- tion via knowledge distillation: towards efficient self-supervised pre-training of deep networks. InProceedings of the International Conference on Learning Representations (ICLR)
2024
-
[17]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. InProceedings of the International Conference on Learning Representations (ICLR)
2016
-
[18]
Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, and Yapeng Tian. 2024. Audio-visual dataset distillation.Transactions on Machine Learning Research (TMLR)(2024)
2024
-
[19]
Yichuan Li, Kaize Ding, and Kyumin Lee. 2023. GRENADE: graph-centric lan- guage model for self-supervised representation learning on text-attributed graphs. InFindings of the Association for Computational Linguistics: EMNLP
2023
-
[20]
Yuxuan Liang, Wentao Zhang, Zeang Sheng, Ling Yang, Quanqing Xu, Jiawei Jiang, Yunhai Tong, and Bin Cui. 2025. Towards Scalable and Deep Graph Neural Networks via Noise Masking. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI)
2025
- [21]
-
[22]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: a robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[23]
Yunhui Liu, Qizhuo Xie, Jinwei Shi, Jiaxu Shen, and Tieke He. 2025. Multi- Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains. In Proceedings of the Web Conference (WWW)
2025
-
[24]
Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Xiaonan Nie, Jiawei Jiang, Wentao Zhang, and Bin Cui. 2025. Acceleration Algorithms in GNNs: A Survey.Transactions on Knowledge and Data Engineering (TKDE) (2025)
2025
-
[25]
Bishwas Mandal, Sarthak Khanal, and Doina Caragea. 2024. Contrastive Learning for Multimodal Classification of Crisis related Tweets. InProceedings of the Web Conference (WWW)
2024
-
[26]
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approxima- tions for maximizing submodular set functions–I.Math. Program.(1978)
1978
-
[27]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners.OpenAI Blog (2019). https://api.semanticscholar.org/CorpusID:160025533
2019
-
[29]
Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[30]
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data.AI Magazine (2008)
2008
-
[31]
Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
A Vaswani. 2017. Attention is all you need. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)
2017
-
[33]
Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough.Quantitative Science Studies(2020)
2020
-
[34]
Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, and Qing Li. 2024. Fast Graph Conden- sation with Structure-based Neural Tangent Kernel. InProceedings of the Web Conference (WWW)
2024
-
[35]
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation.arXiv preprint arXiv:1811.10959(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
Xin Wang, Jiawei Jiang, Xiao Yan, and Qiang Huang. 2025. TESA: A Trajectory and Semantic-aware Dynamic Heterogeneous Graph Neural Network. InProceedings of the Web Conference (WWW)
2025
-
[37]
Yuxiang Wang, Xiao Yan, Chuang Hu, Quanqing Xu, Chuanhui Yang, Fangcheng Fu, Wentao Zhang, Hao Wang, Bo Du, and Jiawei Jiang. 2024. Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning. InInternational Conference on Data Engineering (ICDE)
2024
-
[38]
Yuxiang Wang, Xiao Yan, Shiyu Jin, Hao Huang, Quanqing Xu, Qingchen Zhang, Bo Du, and Jiawei Jiang. 2024. Self-Supervised Learning for Graph Dataset Condensation. InProceedings of the Conference on Knowledge Discovery and Data Mining (KDD)
2024
-
[39]
Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, and Jiawei Jiang. 2025. Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
2025
-
[40]
Max Welling. 2009. Herding dynamical weights to learn. InProceedings of the International Conference on Machine Learning (ICML)
2009
-
[41]
Zhihao Wen and Yuan Fang. 2023. Augmenting low-resource text classifica- tion with graph-grounded pre-training and prompting. InProceedings of the International Conference on Research and Development in Information Retrieval (SIGIR)
2023
-
[42]
Xindi Wu, Zhiwei Deng, and Olga Russakovsky. 2023. Vision-Language Dataset Distillation.Transactions on Machine Learning Research (TMLR)(2023)
2023
-
[43]
Zhenbang Xiao, Yu Wang, Shunyu Liu, Bingde Hu, Huiqiong Wang, Mingli Song, and Tongya Zheng. 2025. Disentangled Condensation for Large-scale Graphs. In Proceedings of the Web Conference (WWW)
2025
-
[44]
Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, and Yong-Lu Li. 2024. Low-rank similar- ity mining for multimodal dataset distillation. InProceedings of the International Conference on Machine Learning (ICML)
2024
-
[45]
Hongzhi Yin, Xinyi Gao, Junliang Yu, Ruihong Qiu, Tong Chen, Quoc Viet Hung Nguyen, and Zi Huang. 2025. Graph Condensation: Foundations, Methods and Prospects. InProceedings of the Web Conference (WWW)
2025
-
[46]
Qi Zhang, Yifei Wang, and Yisen Wang. 2023. On the generalization of multi- modal contrastive learning. InProceedings of the International Conference on Machine Learning (ICML)
2023
-
[47]
Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution match- ing. InProceedings of the Winter Conference on Applications of Computer Vision (W ACV)
2023
-
[48]
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. Dataset condensation with gradient matching. InProceedings of the International Conference on Learning Representations (ICLR)
2020
-
[49]
Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. 2024. Graphclip: enhancing transferability in graph foundation models for text-attributed graphs. InProceedings of the Web Conference (WWW). WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates. Haowei Han et al. A Proof of Theorem 1 Proof. Firstly...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.