When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected

Haotian Xu; Tengfei Ma; Yuning You

arxiv: 2511.16767 · v2 · submitted 2025-11-20 · 💻 cs.LG

When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected

Haotian Xu , Yuning You , Tengfei Ma This is my paper

Pith reviewed 2026-05-17 20:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords text-attributed graphsLLM graph reasoningstructural encodingnode descriptionsgraph neural networkssemantics vs structure

0 comments

The pith

LLMs achieve strong graph task performance from node text alone, with explicit structure encodings adding little or hurting results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to test whether feeding graph structure into large language models improves their reasoning on text-attributed graphs. It finds that simply giving the model the textual descriptions of nodes already yields competitive accuracy across the tasks examined. Adding common structural encodings—either through templates or separate graph neural networks—produces only small gains in some cases and clear drops in others. This result directly challenges the long-standing assumption in graph learning that relational structure must be modeled explicitly. If the finding holds, it suggests that future systems can rely more on semantic content and less on hand-crafted structural priors when powerful language models are available.

Core claim

Systematic experiments show that LLMs using only node textual descriptions already achieve strong performance across graph tasks, while most structural encoding strategies deliver marginal or negative gains, indicating that explicit structural priors are often unnecessary and sometimes counterproductive for powerful language models on text-attributed graphs.

What carries the argument

Direct comparison of LLM prompting strategies that include node text only versus node text plus explicit structure encodings (template-based or GNN-derived) on the same text-attributed graph benchmarks.

If this is right

LLM-based graph reasoning can proceed effectively from semantics without separate structural modules.
Traditional graph learning pipelines that prioritize structure may require redesign when paired with large language models.
Semantics-driven approaches become a viable alternative to structure-first methods in the LLM era.
Some structural encodings can actively interfere with an LLM's ability to use textual cues.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Model builders might safely drop graph-specific encoders in favor of richer text prompting for many real-world networks.
Evaluation benchmarks that assume structure is always additive may need re-examination when LLMs are the reasoner.
Hybrid systems could allocate compute away from structural pre-processing and toward better text alignment instead.

Load-bearing premise

The tested structural encoding strategies and chosen tasks and datasets are representative enough to support broad claims about the limited value of structure for LLM-based graph reasoning.

What would settle it

A follow-up study that applies the same suite of structural encodings to a new collection of text-attributed graphs and reports consistent accuracy lifts larger than those seen in the original experiments would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.16767 by Haotian Xu, Tengfei Ma, Yuning You.

**Figure 1.** Figure 1: We present a common paradigm for aligning graph type data into LLMs. On the left, one needs to define the graph (citation network, molecule, protein, etc) and parameterize it with proper structures. In the middle, we briefly delineate the strategies encoding graphs into a LLM-favored representations: Template-based encoding will arrange each node inside graph according to a predefined sequence, while GNN-… view at source ↗

**Figure 2.** Figure 2: Increasing the number of adapter layers leads to notable performance degradation for GNN-based adapters, particularly GIN, which loses much of its generalizability in deeper configurations. In contrast, MLP adapters, without relying on structural information, maintain stable performance and exhibit greater robustness across varying depths. Furthermore, we observe that increasing the adapter depth in Graph… view at source ↗

**Figure 3.** Figure 3: How Pretrained Encoders Impact BACE BBBP HIV 50 60 70 80 90 100 Accuracy (%) 49.76±3.64 51.63±1.66 96.74±1.50 58.99±1.58 54.57±0.10 96.85±0.01 Pretrained Graph Encoder v.s. Pretrained Language Encoder Model GraphMVP TinyBERT This experiment further reinforces our central finding: LLMs tend to prioritize semantic content over structural information when processing graph-related inputs. Even when structura… view at source ↗

read the original abstract

Graphs provide a unified representation of semantic content and relational structure, making them a natural fit for domains such as molecular modeling, citation networks, and social graphs. Meanwhile, large language models (LLMs) have excelled at understanding natural language and integrating cross-modal signals, sparking interest in their potential for graph reasoning. Recent work has explored this by either designing template-based graph templates or using graph neural networks (GNNs) to encode structural information. In this study, we investigate how different strategies for encoding graph structure affect LLM performance on text-attributed graphs. Surprisingly, our systematic experiments reveal that: (i) LLMs leveraging only node textual descriptions already achieve strong performance across tasks; and (ii) most structural encoding strategies offer marginal or even negative gains. We show that explicit structural priors are often unnecessary and, in some cases, counterproductive when powerful language models are involved. This represents a significant departure from traditional graph learning paradigms and highlights the need to rethink how structure should be represented and utilized in the LLM era. Our study is to systematically challenge the foundational assumption that structure is inherently beneficial for LLM-based graph reasoning, opening the door to new, semantics-driven approaches for graph learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs on text-attributed graphs often match or beat structure-injected versions using text alone, but the experiments may not cover all cases where structure matters most.

read the letter

The main thing your colleague should know is that this paper reports LLMs using only the textual descriptions of nodes in text-attributed graphs achieve competitive or superior performance compared to approaches that add graph structure via templates or GNNs. The central claim is that explicit structural priors are often unnecessary and can be counterproductive. What the paper does is run a series of experiments ablating different structure encoding strategies and measuring their impact relative to the text-only case. This provides new empirical evidence on the topic. It does well in setting up clear baselines and showing consistent patterns where structure adds little. Credit is due for tackling the assumption head-on with multiple methods rather than a single proposal. The soft spots are around the scope of the tests. The encodings used may not be optimal for LLMs, and the tasks and datasets might lean toward cases where text is rich enough to stand alone. In domains like molecular graphs, where relational topology can provide information not present in node text, different results might appear with better integration techniques. If the full paper includes statistical tests and controls for factors like prompt size, that would help. Otherwise the broad conclusion rests on how representative these setups are. This paper is aimed at researchers in LLM-graph integration who are questioning the need for complex structural modeling. A reader interested in simplifying graph reasoning pipelines or exploring semantics-first methods will get practical insights from the results. It deserves a serious referee because the question it raises is timely and the empirical approach, if executed well, can inform future work. I would recommend putting it through peer review rather than a desk reject, as the findings could influence how the field approaches these hybrids.

Referee Report

2 major / 2 minor

Summary. The paper claims that LLMs using only node textual descriptions on text-attributed graphs achieve strong performance across tasks in domains such as molecular modeling, citation networks, and social graphs. Systematic experiments show that most structural encoding strategies (template-based or GNN-based) yield only marginal or negative gains, leading to the conclusion that explicit structural priors are often unnecessary and sometimes counterproductive for powerful LLMs, challenging traditional graph learning assumptions.

Significance. If the central empirical findings hold after addressing representativeness concerns, this work could meaningfully shift focus toward semantics-driven LLM approaches for graph reasoning and reduce reliance on complex structural integrations. The systematic comparison to explicit baselines and the falsifiable nature of the 'structure often unhelpful' claim are strengths that support its potential impact.

major comments (2)

[Experiments] Experiments section: The broad claim that explicit structural priors are 'often unnecessary' or counterproductive depends on the tested encodings and datasets being representative. The paper should include controls showing that the GNN and template encodings capture relational topology effectively (e.g., via GNN-only baselines on the same datasets) and explicitly test or discuss molecular graphs where global topology is known to dominate node text; otherwise the negative results may reflect encoding inefficiency rather than inherent dispensability of structure.
[§4 (Results)] §4 (Results) and abstract: The reported 'marginal or even negative gains' from structural encodings require clearer statistical controls, variance reporting, and significance testing across tasks. Without these, it is hard to determine whether observed decrements are robust or artifacts of prompt length, attention dilution, or specific implementation choices.

minor comments (2)

[Methods] Clarify in the methods how node text is exactly tokenized and inserted into LLM prompts for the 'text-only' baseline to allow reproducibility.
[Figures and Tables] Add error bars and dataset statistics (size, text length distribution) to performance tables and figures for better interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below regarding experimental representativeness and statistical rigor. Where revisions are needed to strengthen the claims, we indicate our plans to update the manuscript accordingly.

read point-by-point responses

Referee: [Experiments] Experiments section: The broad claim that explicit structural priors are 'often unnecessary' or counterproductive depends on the tested encodings and datasets being representative. The paper should include controls showing that the GNN and template encodings capture relational topology effectively (e.g., via GNN-only baselines on the same datasets) and explicitly test or discuss molecular graphs where global topology is known to dominate node text; otherwise the negative results may reflect encoding inefficiency rather than inherent dispensability of structure.

Authors: We agree that validating the effectiveness of the structural encodings is necessary to rule out implementation artifacts. In the revised manuscript we will add GNN-only baselines on the identical datasets and tasks; these will demonstrate that the GNN encoders recover useful topology when applied in a conventional graph-learning setting. Our current suite already contains molecular datasets, but we will expand the discussion to explicitly consider regimes in which global topology is known to dominate (e.g., certain 3-D molecular property tasks) and will report additional results on such datasets where available. These additions will clarify that the observed marginal or negative gains are not solely attributable to weak encodings. revision: yes
Referee: [§4 (Results)] §4 (Results) and abstract: The reported 'marginal or even negative gains' from structural encodings require clearer statistical controls, variance reporting, and significance testing across tasks. Without these, it is hard to determine whether observed decrements are robust or artifacts of prompt length, attention dilution, or specific implementation choices.

Authors: We accept that the current presentation would benefit from stronger statistical grounding. The revised version will include per-task standard deviations across multiple random seeds, error bars on all bar plots, and paired statistical significance tests (e.g., t-tests with Bonferroni correction) comparing text-only versus structure-augmented runs. We will also add a short analysis of prompt-length effects and attention dilution as potential confounds. These changes will allow readers to assess the robustness of the reported decrements. revision: yes

Circularity Check

0 steps flagged

Empirical study with no derivation chain or self-referential reductions

full rationale

The paper is an empirical investigation that reports experimental results comparing LLM performance using only node text versus various structural encoding strategies on text-attributed graphs. No mathematical derivations, first-principles predictions, or fitted parameters are presented that could reduce to the authors' own inputs or prior results by construction. Claims rest on direct comparisons against baselines across tasks and datasets rather than any self-definitional, fitted-input, or self-citation load-bearing steps. The work is therefore self-contained with no circularity in its reasoning chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical outcomes of the described experiments. No new mathematical axioms or invented entities are introduced. The key domain assumption is that the selected tasks, datasets, and encoding strategies adequately represent the space of text-attributed graph problems.

axioms (1)

domain assumption The chosen tasks and datasets are representative of typical text-attributed graph reasoning problems.
Generalization of the finding that structure is often unnecessary depends on this assumption.

pith-pipeline@v0.9.0 · 5513 in / 1181 out tokens · 45000 ms · 2026-05-17T20:14:27.567508+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LLMs leveraging only node textual descriptions already achieve strong performance across tasks; most structural encoding strategies offer marginal or even negative gains.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

explicit structural priors are often unnecessary and, in some cases, counterproductive

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages · 2 internal anchors

[1]

S., Bronstein, M., Veliˇckovi´c, P., and Li`o, P

1 Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets.Advances in neural information processing systems, 30, 2017. 2 Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca, Luis Müller, Jan Tönshoff, Antoine Siraudin, Viktor Zaverkin, Michael M Bronstein, Mathias Niepert, Bryan Perozz...

work page arXiv 2017
[2]

Spectral Networks and Locally Connected Networks on Graphs

2, 7, 8 Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs.arXiv preprint arXiv:1312.6203, 2013. 2 Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H...

work page internal anchor Pith review Pith/arXiv arXiv 2013
[3]

URL https://openreview.net/forum?id=L2jRavXRxs

ISSN 2835-8856. URL https://openreview.net/forum?id=L2jRavXRxs. 3 Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, and Hong Cheng. When do llms help with node classification? a comprehensive analysis. InInternational Conference on Machine Learning. PMLR, 2025. URL https://arxiv.org/abs/2502.00829. 3 Chuang Zhou, Zhu Wang, Shengyuan...

work page arXiv 2025
[4]

Llama 2: Open Foundation and Fine-Tuned Chat Models

7 Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining. InInternational Conference on Learning Representations, 2023. 8 Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemica...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s42256-022-00580-7 2023

[1] [1]

S., Bronstein, M., Veliˇckovi´c, P., and Li`o, P

1 Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets.Advances in neural information processing systems, 30, 2017. 2 Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca, Luis Müller, Jan Tönshoff, Antoine Siraudin, Viktor Zaverkin, Michael M Bronstein, Mathias Niepert, Bryan Perozz...

work page arXiv 2017

[2] [2]

Spectral Networks and Locally Connected Networks on Graphs

2, 7, 8 Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs.arXiv preprint arXiv:1312.6203, 2013. 2 Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, and Yang Shen. Graph contrastive learning with augmentations. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H...

work page internal anchor Pith review Pith/arXiv arXiv 2013

[3] [3]

URL https://openreview.net/forum?id=L2jRavXRxs

ISSN 2835-8856. URL https://openreview.net/forum?id=L2jRavXRxs. 3 Xixi Wu, Yifei Shen, Fangzhou Ge, Caihua Shan, Yizhu Jiao, Xiangguo Sun, and Hong Cheng. When do llms help with node classification? a comprehensive analysis. InInternational Conference on Machine Learning. PMLR, 2025. URL https://arxiv.org/abs/2502.00829. 3 Chuang Zhou, Zhu Wang, Shengyuan...

work page arXiv 2025

[4] [4]

Llama 2: Open Foundation and Fine-Tuned Chat Models

7 Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining. InInternational Conference on Learning Representations, 2023. 8 Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. Large-scale chemica...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/s42256-022-00580-7 2023