pith. sign in

arxiv: 2606.03839 · v1 · pith:GXX2D6W5new · submitted 2026-06-02 · 💻 cs.LG

Text-attributed Graph Condensation via Text Selection and Attribute Matching

Pith reviewed 2026-06-28 10:39 UTC · model grok-4.3

classification 💻 cs.LG
keywords text-attributed graphsgraph condensationmutual informationattribute similarityGNNlanguage modelsdata compressiontraining efficiency
0
0 comments X

The pith

TAGSAM condenses text-attributed graphs to 1% size while preserving competitive accuracy for joint GNN and language model training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to lower the space and time costs of jointly training graph neural networks with language models on text-attributed graphs, where each node carries a text description. It does so by proposing TAGSAM, which selects and merges representative text chunks from related nodes through mutual information maximization and aligns stable attribute similarity matrices to compress the graph topology without the variance seen in trajectory-matching approaches. A sympathetic reader would care because full-scale joint training becomes impractical on large datasets, and a reliable condensation method could make such models usable on ordinary hardware. The reported results show an average 4.9% accuracy gain over the strongest baseline at matched sizes and sustained performance down to 1% condensation.

Core claim

By selecting subgraph texts that maximize mutual information and matching attribute similarities to produce stable matrices, TAGSAM produces condensed text-attributed graphs that retain the information needed for accurate joint GNN-LM training, outperforming prior condensation baselines by 4.9% on average at the same size and remaining competitive at 1% size.

What carries the argument

Subgraph text selection by mutual information maximization together with attribute similarity matching that aligns stable similarity matrices, which compress text content and graph topology respectively.

If this is right

  • Training accuracy stays competitive even when the text-attributed graph is reduced to 1% of its original size.
  • Accuracy improves by an average of 4.9% over the strongest baseline at any given compressed size.
  • Space and time requirements for joint GNN-LM training drop substantially on large text-attributed graphs.
  • The variance problem in training-trajectory matching methods is reduced by using stable attribute similarity matrices instead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same selection and matching steps could be tested on graphs whose node attributes are images or other modalities rather than text.
  • Whether the condensed graphs support downstream tasks such as link prediction or node classification beyond the training objective remains open.
  • Combining the condensation with existing model-compression techniques might allow further size reductions while tracking accuracy.

Load-bearing premise

Maximizing mutual information for text chunk selection and aligning stable attribute similarity matrices will preserve the information required for accurate joint GNN and language-model training across the evaluated datasets and compression ratios.

What would settle it

If accuracy on any of the evaluated datasets falls below the best baseline when the graph is condensed to 1% size, the claim that the two mechanisms preserve necessary training information would not hold.

Figures

Figures reproduced from arXiv: 2606.03839 by Guojia Wan, Hao Huang, Hao Wang, Haowei Han, Jiawei Jiang, Shanshan Feng, Xiao Yan, Yuxiang Wang.

Figure 1
Figure 1. Figure 1: Comparison of different dataset condensation [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: The overview of TAGSAM framework. Subgraph text selection (left) selects the representative chunks and merge [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: The weight for each node’s features is determined by the [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of attribute similarity matrices for [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Running time (in log scale) for hyperparameter [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The impact of subgraph size for subgraph text se [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The ablation study for different variants. The [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes TAGSAM, a condensation method for text-attributed graphs (TAGs) that jointly compresses text descriptions and graph topology to enable efficient training of GNNs with language models. Key components are subgraph text selection (via mutual information maximization on representative chunks) and attribute similarity matching (to align stable similarity matrices and reduce variance in MTT-based condensation). Empirical evaluation against six baselines reports an average 4.9% accuracy gain at matched compressed sizes and competitive performance even at 1% condensation ratio, with code released.

Significance. If the empirical claims hold under proper statistical validation, the work would be significant for scalable TAG modeling on large datasets, as it directly targets the high space/time costs of joint GNN+LM training. The release of code supports reproducibility, which strengthens the contribution if the variance-reduction claim can be substantiated.

major comments (2)
  1. [Abstract and evaluation] Abstract and evaluation section: The central claim of a 4.9% average accuracy improvement over the best-performing baseline (and competitiveness at 1% size) is reported only as point estimates with no standard deviations, number of random seeds, variance comparisons to MTT baselines, or statistical tests. This directly undermines the motivation that attribute similarity matching mitigates high variance in MTT methods, as no evidence is provided that the observed delta exceeds run-to-run fluctuation.
  2. [Evaluation] Evaluation: The selection of the 'best-performing baseline' for the 4.9% comparison is post-hoc without details on exact baseline implementations, hyperparameter tuning protocols, or dataset characteristics (e.g., node counts, text lengths, label distributions), making the cross-method claim difficult to verify or reproduce from the reported results alone.
minor comments (1)
  1. [Abstract] Abstract: The six baselines are not named; listing them would improve clarity without lengthening the abstract substantially.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on statistical validation and experimental transparency. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses
  1. Referee: [Abstract and evaluation] Abstract and evaluation section: The central claim of a 4.9% average accuracy improvement over the best-performing baseline (and competitiveness at 1% size) is reported only as point estimates with no standard deviations, number of random seeds, variance comparisons to MTT baselines, or statistical tests. This directly undermines the motivation that attribute similarity matching mitigates high variance in MTT methods, as no evidence is provided that the observed delta exceeds run-to-run fluctuation.

    Authors: We agree this is a substantive limitation in the current manuscript. The reported 4.9% gain and variance-reduction motivation rest on point estimates alone, which does not allow readers to assess whether the improvement exceeds typical run-to-run variation. In the revised version we will rerun all experiments using at least five random seeds, report mean accuracy ± standard deviation for every method and condensation ratio, include a direct variance comparison between TAGSAM and the MTT baselines, and add paired statistical significance tests (e.g., t-tests with p-values) against the strongest baseline. These additions will be placed in the main evaluation section and an expanded appendix. revision: yes

  2. Referee: [Evaluation] Evaluation: The selection of the 'best-performing baseline' for the 4.9% comparison is post-hoc without details on exact baseline implementations, hyperparameter tuning protocols, or dataset characteristics (e.g., node counts, text lengths, label distributions), making the cross-method claim difficult to verify or reproduce from the reported results alone.

    Authors: We acknowledge that greater detail is required for reproducibility. The revised manuscript will include (1) an appendix table listing the exact re-implementation choices and hyperparameter grids used for each of the six baselines, (2) a new dataset-statistics table reporting node counts, average text length, label balance, and condensation ratios for every dataset, and (3) explicit language clarifying that the "best-performing baseline" is the method achieving the highest accuracy under the same evaluation protocol we applied to TAGSAM. These changes will allow independent verification of the 4.9% figure. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with independent components

full rationale

The paper introduces TAGSAM, a condensation technique using subgraph text selection (via mutual information maximization) and attribute similarity matching to address variance in MTT-based methods. All central claims rest on empirical accuracy comparisons against baselines at matched compression ratios, with no equations, predictions, or derivations presented that reduce to fitted inputs or self-citations by construction. The method components are defined independently of the evaluation outcomes, and no load-bearing self-citation chains or ansatzes are invoked in the provided text. This is a standard empirical contribution without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The method rests on standard information-theoretic and similarity-based assumptions common to condensation literature; no new entities are postulated and no free parameters are explicitly fitted in the abstract description.

axioms (2)
  • domain assumption Maximizing mutual information between selected text chunks and original descriptions preserves task-relevant semantics.
    Invoked to justify the text-selection component.
  • domain assumption Stable attribute similarity matrices are sufficient proxies for training dynamics in MTT-style condensation.
    Invoked to justify replacement of trajectory matching.

pith-pipeline@v0.9.1-grok · 5782 in / 1278 out tokens · 34911 ms · 2026-06-28T10:39:21.259808+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 7 canonical work pages · 5 internal anchors

  1. [1]

    William Brannon, Wonjune Kang, Suyash Fulay, Hang Jiang, Brandon Roy, Deb Roy, and Jad Kabbara. 2024. ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings. InProceedings of the Workshop on Graph-based Methods for Natural Language Processing (TextGraphs)

  2. [2]

    Vladimir Braverman, Vincent Cohen-Addad, H-C Shaofeng Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. 2022. The power of uniform sampling for coresets. InProceedings of the Annual IEEE Symposium on Foundations of Computer Science (FOCS)

  3. [3]

    Jonathan Chang and David Blei. 2009. Relational Topic Models for Document Networks. InProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)

  4. [4]

    Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolu- tional neural networks on graphs with fast localized spectral filtering. InProceed- ings of the Conference on Neural Information Processing Systems (NeurIPS)

  5. [5]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

  6. [6]

    Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xi- ang Wang, and Xiangnan He. 2024. EXGC: Bridging Efficiency and Explainability in Graph Condensation. InProceedings of the Web Conference (WWW)

  7. [7]

    2009.Facility location: concepts, models, algorithms and case studies

    Reza Zanjirani Farahani and Masoud Hekmatfar. 2009.Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media

  8. [8]

    Xinyi Gao, Guanhua Ye, Tong Chen, Wentao Zhang, Junliang Yu, and Hongzhi Yin. 2025. Rethinking and Accelerating Graph Condensation: A Training-Free Approach with Class Partition. InProceedings of the Web Conference (WWW)

  9. [9]

    Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

  10. [10]

    Sariel Har-Peled, Dan Roth, and Dav A Zimak. 2006. Maximum margin coresets for active and noise tolerant learning. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

  11. [11]

    Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: datasets for machine learning on graphs. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

  12. [12]

    Qiang Huang, Xiao Yan, Xin Wang, Susie Xi Rao, Zhichao Han, Fangcheng Fu, Wentao Zhang, and Jiawei Jiang. 2024. Retrofitting temporal graph neural networks with transformer.arXiv preprint arXiv:2409.05477(2024)

  13. [13]

    Song Jin, Xiantao Cai, and Jiawei Jiang. 2025. EDGaE: Efficient Distributed Graph Neural Network Training System at the Edge. InProceedings of the International Conference on Intelligent Computing (ICIC)

  14. [14]

    Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah

  15. [15]

    InProceedings of the International Conference on Learning Representations (ICLR)

    Graph condensation for graph neural networks. InProceedings of the International Conference on Learning Representations (ICLR)

  16. [16]

    Siddharth Joshi, Jiayi Ni, and Baharan Mirzasoleiman. 2024. Dataset distilla- tion via knowledge distillation: towards efficient self-supervised pre-training of deep networks. InProceedings of the International Conference on Learning Representations (ICLR)

  17. [17]

    Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. InProceedings of the International Conference on Learning Representations (ICLR)

  18. [18]

    Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, and Yapeng Tian. 2024. Audio-visual dataset distillation.Transactions on Machine Learning Research (TMLR)(2024)

  19. [19]

    Yichuan Li, Kaize Ding, and Kyumin Lee. 2023. GRENADE: graph-centric lan- guage model for self-supervised representation learning on text-attributed graphs. InFindings of the Association for Computational Linguistics: EMNLP

  20. [20]

    Yuxuan Liang, Wentao Zhang, Zeang Sheng, Ling Yang, Quanqing Xu, Jiawei Jiang, Yunhai Tong, and Bin Cui. 2025. Towards Scalable and Deep Graph Neural Networks via Noise Masking. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI)

  21. [21]

    Mengyang Liu, Shanchuan Li, Xinshi Chen, and Le Song. 2022. Graph conden- sation via receptive field distribution matching.arXiv preprint arXiv:2206.13697 (2022)

  22. [22]

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: a robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

  23. [23]

    Yunhui Liu, Qizhuo Xie, Jinwei Shi, Jiaxu Shen, and Tieke He. 2025. Multi- Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains. In Proceedings of the Web Conference (WWW)

  24. [24]

    Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Xiaonan Nie, Jiawei Jiang, Wentao Zhang, and Bin Cui. 2025. Acceleration Algorithms in GNNs: A Survey.Transactions on Knowledge and Data Engineering (TKDE) (2025)

  25. [25]

    Bishwas Mandal, Sarthak Khanal, and Doina Caragea. 2024. Contrastive Learning for Multimodal Classification of Crisis related Tweets. InProceedings of the Web Conference (WWW)

  26. [26]

    G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approxima- tions for maximizing submodular set functions–I.Math. Program.(1978)

  27. [27]

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

  28. [28]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners.OpenAI Blog (2019). https://api.semanticscholar.org/CorpusID:160025533

  29. [29]

    Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747(2016)

  30. [30]

    Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data.AI Magazine (2008)

  31. [31]

    Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868(2018)

  32. [32]

    A Vaswani. 2017. Attention is all you need. InProceedings of the Conference on Neural Information Processing Systems (NeurIPS)

  33. [33]

    Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough.Quantitative Science Studies(2020)

  34. [34]

    Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, and Qing Li. 2024. Fast Graph Conden- sation with Structure-based Neural Tangent Kernel. InProceedings of the Web Conference (WWW)

  35. [35]

    Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation.arXiv preprint arXiv:1811.10959(2018)

  36. [36]

    Xin Wang, Jiawei Jiang, Xiao Yan, and Qiang Huang. 2025. TESA: A Trajectory and Semantic-aware Dynamic Heterogeneous Graph Neural Network. InProceedings of the Web Conference (WWW)

  37. [37]

    Yuxiang Wang, Xiao Yan, Chuang Hu, Quanqing Xu, Chuanhui Yang, Fangcheng Fu, Wentao Zhang, Hao Wang, Bo Du, and Jiawei Jiang. 2024. Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning. InInternational Conference on Data Engineering (ICDE)

  38. [38]

    Yuxiang Wang, Xiao Yan, Shiyu Jin, Hao Huang, Quanqing Xu, Qingchen Zhang, Bo Du, and Jiawei Jiang. 2024. Self-Supervised Learning for Graph Dataset Condensation. InProceedings of the Conference on Knowledge Discovery and Data Mining (KDD)

  39. [39]

    Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, and Jiawei Jiang. 2025. Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)

  40. [40]

    Max Welling. 2009. Herding dynamical weights to learn. InProceedings of the International Conference on Machine Learning (ICML)

  41. [41]

    Zhihao Wen and Yuan Fang. 2023. Augmenting low-resource text classifica- tion with graph-grounded pre-training and prompting. InProceedings of the International Conference on Research and Development in Information Retrieval (SIGIR)

  42. [42]

    Xindi Wu, Zhiwei Deng, and Olga Russakovsky. 2023. Vision-Language Dataset Distillation.Transactions on Machine Learning Research (TMLR)(2023)

  43. [43]

    Zhenbang Xiao, Yu Wang, Shunyu Liu, Bingde Hu, Huiqiong Wang, Mingli Song, and Tongya Zheng. 2025. Disentangled Condensation for Large-scale Graphs. In Proceedings of the Web Conference (WWW)

  44. [44]

    Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, and Yong-Lu Li. 2024. Low-rank similar- ity mining for multimodal dataset distillation. InProceedings of the International Conference on Machine Learning (ICML)

  45. [45]

    Hongzhi Yin, Xinyi Gao, Junliang Yu, Ruihong Qiu, Tong Chen, Quoc Viet Hung Nguyen, and Zi Huang. 2025. Graph Condensation: Foundations, Methods and Prospects. InProceedings of the Web Conference (WWW)

  46. [46]

    Qi Zhang, Yifei Wang, and Yisen Wang. 2023. On the generalization of multi- modal contrastive learning. InProceedings of the International Conference on Machine Learning (ICML)

  47. [47]

    Bo Zhao and Hakan Bilen. 2023. Dataset condensation with distribution match- ing. InProceedings of the Winter Conference on Applications of Computer Vision (W ACV)

  48. [48]

    Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020. Dataset condensation with gradient matching. InProceedings of the International Conference on Learning Representations (ICLR)

  49. [49]

    Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, and Siliang Tang. 2024. Graphclip: enhancing transferability in graph foundation models for text-attributed graphs. InProceedings of the Web Conference (WWW). WWW ’26, April 13–17, 2026, Dubai, United Arab Emirates. Haowei Han et al. A Proof of Theorem 1 Proof. Firstly...