arxiv: 2604.17271 · v1 · submitted 2026-04-19 · 💻 cs.CL

Recognition: unknown

HopRank: Self-Supervised LLM Preference-Tuning on Graphs for Few-Shot Node Classification

Ziqing Wang , Kaize Ding

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:59 UTC · model grok-4.3

classification 💻 cs.CL

keywords self-supervised learningnode classificationtext-attributed graphslarge language modelspreference tuninglink predictiongraph homophily

0 comments

The pith

HopRank lets LLMs classify nodes on text graphs without labels by turning the task into link prediction on homophilic structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that node classification on text-attributed graphs can be solved by large language models in a fully self-supervised way. It starts from the observation that real graphs connect similar nodes, so topology alone supplies the class signal normally provided by labels. HopRank builds training preferences by sampling nodes at successive hop distances and tunes the model to rank likely connections. At test time it classifies each node by its predicted preference for a set of labeled anchors, using early exit to limit computation. Experiments indicate this reaches the accuracy of fully supervised graph neural networks on standard benchmarks.

Core claim

HopRank reformulates node classification as a link prediction task and presents a fully self-supervised LLM-tuning framework for text-attributed graphs. It constructs preference data via hierarchical hop-based sampling and employs adaptive preference learning to prioritize informative signals without any class labels. At inference, nodes are classified by predicting their connection preferences to labeled anchors with an adaptive early-exit voting scheme.

What carries the argument

Hierarchical hop-based sampling that generates preference pairs from multi-hop neighborhoods to encode class structure via homophily without labels.

If this is right

The method achieves accuracy comparable to fully supervised GNNs on three TAG benchmarks.
It substantially outperforms prior graph-LLM approaches that require labels.
Classification runs with zero labeled training examples.
Adaptive early-exit voting reduces the number of model calls during inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same preference construction could be applied to other structure-rich tasks such as community detection or link prediction itself.
Performance on graphs with weak or negative homophily would reveal the boundary of the approach.
Combining the hop-sampling preference signal with other unsupervised LLM objectives might further reduce any remaining supervision needs.

Load-bearing premise

Edges in text-attributed graphs predominantly connect nodes of the same class according to the homophily principle.

What would settle it

A demonstration that HopRank falls short of supervised GNN accuracy on a text-attributed graph benchmark where connected nodes tend to have different classes.

Figures

Figures reproduced from arXiv: 2604.17271 by Kaize Ding, Ziqing Wang.

**Figure 1.1.** Figure 1.1: Graph Topology Encodes Class Structure. Class homophily and textual similarity are highest for directly connected nodes and decay predictably with hop distance. standing and strong generalization from minimal supervision, capable of few-shot or even zero-shot inference via in-context learning [36, 21]. These capabilities have motivated a growing line of work that adapts LLMs for graph tasks [22, 1, 34,… view at source ↗

**Figure 1.** Figure 1: illustrates, both class homophily and textual [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 1.2.** Figure 1.2: Overview of the HopRank framework. Left (Self-Supervised Training): For each edge in the graph, we perform hierarchical hop-based sampling to construct a preference learning instance: the 1-hop neighbor is the chosen candidate, while 2-hop and 3-hop nodes serve as rejected candidates of decreasing difficulty. The LLM is prompted to identify the most likely connection and trained via the HopRank loss to r… view at source ↗

**Figure 1.** Figure 1: illustrates the overall [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 4.1.** Figure 4.1: Analysis of Design Choices on Citeseer-20. (a) Accuracy improves rapidly with voting rounds R and plateaus beyond R=100; the adaptive early-exit (⋆) achieves comparable accuracy at R¯=27.4 rounds. (b) 2-hop + 3-hop is the optimal hop configuration; adding 4-hop dilutes training with trivially-distinct pairs. (c) HopRank is robust to DPO temperature β. (d) SFT weight γ is more sensitive: small values caus… view at source ↗

**Figure 4.** Figure 4: (a), accuracy rises rapidly from 69.31% at [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Node classification on text-attributed graphs (TAGs) is a fundamental task with broad applications in citation analysis, social networks, and recommendation systems. Current GNN-based approaches suffer from shallow text encoding and heavy dependence on labeled data, limiting their effectiveness in label-scarce settings. While large language models (LLMs) naturally address the text understanding gap with deep semantic reasoning, existing LLM-for-graph methods either still require abundant labels during training or fail to exploit the rich structural signals freely available in graph topology. Our key observation is that, in many real-world TAGs, edges predominantly connect similar nodes under the homophily principle, meaning graph topology inherently encodes class structure without any labels. Building on this insight, we reformulate node classification as a link prediction task and present HopRank, a fully self-supervised LLM-tuning framework for TAGs. HopRank constructs preference data via hierarchical hop-based sampling and employs adaptive preference learning to prioritize informative training signals without any class labels. At inference, nodes are classified by predicting their connection preferences to labeled anchors, with an adaptive early-exit voting scheme to improve efficiency. Experiments on three TAG benchmarks show that HopRank matches fully-supervised GNNs and substantially outperforms prior graph-LLM methods, despite using zero labeled training data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HopRank turns homophily into self-supervised preference data for LLM tuning on graphs and reports matching supervised GNNs with zero labels, but the results hinge on an untested assumption.

read the letter

The main thing to know is that this paper gives a label-free way to classify nodes on text-attributed graphs by reformulating the task as link prediction and tuning an LLM on preference pairs built from hop distances. It claims competitive results against fully supervised GNNs on three benchmarks while using no training labels at all. What is new is the hierarchical hop sampling that treats closer nodes as positives and farther ones as negatives, plus the adaptive preference learning step that prioritizes useful signals during tuning. The inference stage then uses connection predictions to labeled anchors with early-exit voting. This combination does a solid job of pulling structural signals out of the graph without labels, which is a clearer step than earlier graph-LLM work that still leaned on supervision. The soft spots sit mainly with the homophily assumption. The method needs edges to mostly connect same-class nodes so that proximity encodes class information, yet the paper provides no homophily ratios for the benchmarks and no tests on heterophilic graphs. If that assumption is weak, the sampled preferences turn noisy and the zero-label performance would not hold up. The abstract also leaves out details on baseline implementations, statistical tests, and ablations for the sampling and voting pieces, so the experimental support is harder to judge from the given information. This work is aimed at researchers who want practical label reduction in graph settings that mix text and structure. Readers focused on few-shot or self-supervised graph learning would find the framework worth trying even with the gaps. The paper shows clear thinking about how to generate training signals from topology, so it deserves a serious referee. I would send it to peer review rather than desk reject, with the expectation that reviewers will ask for homophily measurements and more controls.

Referee Report

3 major / 2 minor

Summary. The manuscript presents HopRank, a self-supervised LLM preference-tuning framework for few-shot node classification on text-attributed graphs (TAGs). It reformulates node classification as link prediction by constructing preference pairs via hierarchical hop-based sampling from graph topology (treating closer hops as positive under the homophily assumption), performs adaptive preference learning on an LLM without any class labels, and classifies nodes at inference by predicting preferences to a set of labeled anchors combined with early-exit voting. Experiments on three TAG benchmarks claim that HopRank matches the performance of fully-supervised GNNs while substantially outperforming prior graph-LLM methods despite using zero labeled training data.

Significance. If the results hold under scrutiny, this work is significant for demonstrating that graph topology alone can supply self-supervised signals sufficient for LLM preference tuning to achieve competitive node classification in label-scarce TAG settings. The approach bridges GNN structural reasoning with LLM semantic depth without requiring labeled training data, and the adaptive preference and early-exit mechanisms are practical contributions that could reduce annotation costs in applications such as citation networks and social graphs.

major comments (3)

[§4] §4 (Experiments) and Table 2: The central claim that HopRank matches fully-supervised GNNs with zero training labels rests on the unquantified homophily assumption, yet no homophily ratio, edge-class correlation, or class-homophily metric is reported for the three benchmarks; without these, it is impossible to determine whether the sampled preferences are aligned with true classes or whether the results are benchmark-specific.
[§3.2] §3.2 (Hop sampling and preference construction): The hierarchical hop-based sampling procedure for generating positive/negative pairs is load-bearing for the self-supervised signal, but the manuscript provides insufficient detail on the exact sampling probabilities, maximum hop distance, and how ties or low-homophily edges are handled; this directly affects whether the preference data reliably encodes class structure.
[§4.1] §4.1 (Inference procedure): The method uses a set of labeled anchors at inference, making it few-shot rather than zero-label overall, but the number of anchors, their selection strategy, and sensitivity analysis to anchor count are not reported; this undermines direct comparison to standard few-shot GNN baselines that also use limited labels.

minor comments (2)

[Abstract] The abstract and §1 should explicitly name the three TAG benchmarks (presumably Cora, CiteSeer, PubMed) rather than referring to them generically.
[Figure 2] Figure 2 (method overview) would benefit from clearer labeling of the adaptive preference loss components and the early-exit condition.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify key areas where additional quantification and clarification will improve the manuscript's rigor and reproducibility. We address each major comment point by point below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [§4] §4 (Experiments) and Table 2: The central claim that HopRank matches fully-supervised GNNs with zero training labels rests on the unquantified homophily assumption, yet no homophily ratio, edge-class correlation, or class-homophily metric is reported for the three benchmarks; without these, it is impossible to determine whether the sampled preferences are aligned with true classes or whether the results are benchmark-specific.

Authors: We agree that reporting homophily metrics will strengthen the interpretation of our results. In the revised manuscript we will add a dedicated paragraph and supplementary table that computes and reports, for each of the three TAG benchmarks: (1) the overall homophily ratio (fraction of intra-class edges), (2) edge-class correlation, and (3) per-class homophily scores. These quantities will be calculated directly from the ground-truth labels that exist only for evaluation, thereby quantifying how well the hierarchical hop sampling aligns with class structure without altering the zero-label training regime. revision: yes
Referee: [§3.2] §3.2 (Hop sampling and preference construction): The hierarchical hop-based sampling procedure for generating positive/negative pairs is load-bearing for the self-supervised signal, but the manuscript provides insufficient detail on the exact sampling probabilities, maximum hop distance, and how ties or low-homophily edges are handled; this directly affects whether the preference data reliably encodes class structure.

Authors: We accept that the current description of the sampling procedure is insufficiently precise. In the revision we will expand §3.2 with an explicit algorithmic description that states: sampling probabilities decay exponentially with hop distance (p(h) ∝ 2^{-h}), the maximum hop distance is capped at 3, and edges connecting nodes whose hop distance exceeds this cap or that fall below a minimum homophily threshold are discarded rather than used as negative samples. We will also include pseudocode and a small illustrative example to make the construction of preference pairs fully reproducible. revision: yes
Referee: [§4.1] §4.1 (Inference procedure): The method uses a set of labeled anchors at inference, making it few-shot rather than zero-label overall, but the number of anchors, their selection strategy, and sensitivity analysis to anchor count are not reported; this undermines direct comparison to standard few-shot GNN baselines that also use limited labels.

Authors: The title and abstract already characterize the setting as few-shot node classification; the zero-label claim applies strictly to the training phase. At inference we indeed rely on a small set of labeled anchors. In the revised experiments section we will report the exact anchor count used (10 per class), the selection procedure (stratified random sampling from the labeled pool), and a sensitivity plot showing accuracy as a function of anchor count (1–20 per class). These additions will enable direct, apples-to-apples comparison with few-shot GNN baselines while preserving the central contribution that no labels are required for the preference-tuning stage. revision: yes

Circularity Check

0 steps flagged

No significant circularity; self-supervised signals generated independently from topology

full rationale

The paper's core derivation reformulates node classification as link prediction by sampling preference pairs from hop distances on the graph. This construction uses only the adjacency structure and the homophily premise as an external modeling choice; the resulting preference data and tuning objective are not defined in terms of the target class labels or any fitted quantity that is later re-used as a 'prediction.' No equations, self-citations, or uniqueness theorems are shown to collapse the claimed performance back onto the inputs by construction. The reported benchmark results therefore constitute an independent empirical test rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption of homophily in text-attributed graphs and the premise that LLM preference learning from structural signals can substitute for labeled supervision.

axioms (1)

domain assumption Homophily principle: edges predominantly connect similar nodes, allowing graph topology to encode class structure without labels
Invoked explicitly as the key observation that enables reformulating node classification as link prediction in a self-supervised manner.

pith-pipeline@v0.9.0 · 5523 in / 1374 out tokens · 48882 ms · 2026-05-10T05:59:33.110188+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 11 canonical work pages · 4 internal anchors

[1]

R. Chen, T. Zhao, A. Jaisw al, N. Shah, and Z. W ang,Llaga: Large language and graph assistant, in ICML, 2024

2024
[2]

Z. Chen, H. Mao, H. Li, W. Jin, H. Wen, X. Wei, S. W ang, D. Yin, W. F an, H. Liu, et al.,Exploring the potential of large language models (llms) in learning on graphs, in SIGKDD, 2024

2024
[3]

K. Ding, J. W ang, J. Li, D. Li, and H. Liu,Be more with less: Hypergraph attention networks for inductive text classification, in EMNLP, 2020

2020
[4]

Fast Graph Representation Learning with PyTorch Geometric

M. Fey and J. E. Lenssen,Fast graph representa- tion learning with pytorch geometric, arXiv preprint arXiv:1903.02428, (2019)

work page internal anchor Pith review arXiv 1903
[5]

C. L. Giles, K. D. Bollacker, and S. La wrence,Citeseer: An automatic citation indexing system, in Proceedings of the third ACM conference on Digital libraries, 1998, pp. 89–98

1998
[6]

Hamilton, Z

W. Hamilton, Z. Ying, and J. Leskovec, Inductive representation learning on large graphs, Advances in neural information processing systems, 30 (2017)

2017
[7]

X. He, X. Bresson, T. Laurent, A. Perold, Y. LeCun, and B. Hooi,Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning, in ICLR, 2023

2023
[8]

J. Hong, N. Lee, and J. Thorne,ORPO: Monolithic preference optimization without reference model, arXiv preprint arXiv:2403.07691, (2024)

work page arXiv 2024
[9]

Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. W ang, and J. Tang,Graphmae: Self- supervised masked graph autoencoders, in Proceed- ings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594–604

2022
[10]

E. J. Hu, Y. Shen, P. W allis, Z. Allen- Zhu, Y. Li, S. W ang, L. W ang, W. Chen, et al.,Lora: Low-rank adaptation of large language models., ICLR, 1 (2022), p. 3

2022
[11]

Huang and M

K. Huang and M. Zitnik,Graph meta learning via local subgraphs, Advances in neural information processing systems, 33 (2020), pp. 5862–5874

2020
[12]

B. Jin, G. Liu, C. Han, M. Jiang, H. Ji, and J. Han,Large language models on graphs: A comprehensive survey, IEEE Transactions on Knowledge and Data Engineering, (2024)

2024
[13]

D. P. Kingma,Adam: A method for stochastic op- timization, arXiv preprint arXiv:1412.6980, (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Semi-Supervised Classification with Graph Convolutional Networks

T. Kipf,Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907, (2016)

work page internal anchor Pith review arXiv 2016
[15]

L. Kong, J. Feng, H. Liu, C. Huang, J. Huang, Y. Chen, and M. Zhang,GOFA: A generative one-for-all model for joint graph language modeling, arXiv preprint arXiv:2407.09709, (2024)

work page arXiv 2024
[16]

McPherson, L

M. McPherson, L. Smith-Lovin, and J. M. Cook,Birds of a feather: Homophily in social networks, Annual review of sociology, 27 (2001), pp. 415–444

2001
[17]

Y. Meng, M. Xia, and D. Chen,SimPO: Simple preference optimization with a reference-free reward, Advances in Neural Information Processing Systems, (2024)

2024
[18]

Ouyang, J

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. W ainwright, P. Mishkin, C. Zhang, S. Agar w al, K. Slama, A. Ray, et al.,Training language models to follow instructions with human feedback, Advances in neural information processing systems, 35 (2022), pp. 27730–27744

2022
[19]

Raf ailov, A

R. Raf ailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn,Direct prefer- ence optimization: Your language model is secretly a reward model, Advances in neural information processing systems, 36 (2023), pp. 53728–53741

2023
[20]

P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad,Collective classification in network data, AI magazine, 29 (2008), pp. 93–93

2008
[21]

J. Sun, C. Xu, L. Tang, S. W ang, C. Lin, Y. Gong, L. M. Ni, H.-Y. Shum, and J. Guo, Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph, arXiv preprint arXiv:2307.07697, (2023)

work page arXiv 2023
[22]

J. Tang, Y. Yang, W. Wei, L. Shi, L. Su, S. Cheng, D. Yin, and C. Huang,Graphgpt: Graph instruction tuning for large language models, in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 491–500

2024
[23]

Thakoor, C

S. Thakoor, C. Tallec, M. G. Azar, M. Az- abou, E. L. Dyer, R. Munos, P. Veličković, and M. V alko,Large-scale representation learn- ing on graphs via bootstrapping, in International Conference on Learning Representations, 2022

2022
[24]

Velickovic, G

P. Velickovic, G. Cucurull, A. Casanov a, A. Romero, P. Lio, Y. Bengio, et al.,Graph attention networks, stat, 1050 (2017), pp. 10–48550

2017
[25]

Veličković, W

P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm,Deep graph infomax, in International Conference on Learning Representations, 2019

2019
[26]

W ang, J

J. W ang, J. Wu, Y. Hou, Y. Liu, M. Gao, and J. McAuley,Instructgraph: Boosting large language models via graph-centric instruction tuning and preference alignment, in ACL Finding, 2024

2024
[27]

W ang and K

Z. W ang and K. Ding,Remol: Llm-guided molecular optimization with reinforcement learning, (2018)

2018
[28]

W ang, C

Z. W ang, C. Mao, X. Wen, Y. Luo, and K. Ding,Amanda: Agentic medical knowledge aug- mentation for data-efficient medical visual question answering, arXiv preprint arXiv:2510.02328, (2025)

work page arXiv 2025
[29]

MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization

Z. W ang, Y. Wen, A. Pandy, H. Liu, and K. Ding,Molmem: Memory-augmented agentic reinforcement learning for sample-efficient molecu- lar optimization, arXiv preprint arXiv:2604.12237, (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

W ang, Y

Z. W ang, Y. Wen, W. Pattie, X. Luo, W. Wu, J. Y.-C. Hu, A. Pandey, H. Liu, and K. Ding,Polo: Preference-guided multi-turn re- inforcement learning for lead optimization, arXiv preprint arXiv:2509.21737, (2025)

work page arXiv 2025
[31]

W ang, K

Z. W ang, K. Zhang, Z. Zhao, Y. Wen, A. Pandey, H. Liu, and K. Ding,A survey of large language models for text-guided molecular discovery: from molecule generation to optimization, arXiv preprint arXiv:2505.16094, (2025)

work page arXiv 2025
[32]

F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger,Simplifying graph convo- lutional networks, in International conference on machine learning, Pmlr, 2019, pp. 6861–6871

2019
[33]

X. Wu, Y. Shen, F. Ge, C. Shan, Y. Jiao, X. Sun, and H. Cheng,When do llms help with node classification? a comprehensive analysis, arXiv preprint arXiv:2502.00829, (2025)

work page arXiv 2025
[34]

Xu and K

R. Xu and K. Ding,Gnn-as-judge: Unleashing the power of llms for graph semi-supervised learning with gnn feedback, in Machine Learning on Graphs in the Era of Generative Artificial Intelligence, 2025

2025
[35]

Z. Yang, W. Cohen, and R. Salakhudinov, Revisiting semi-supervised learning with graph em- beddings, in International conference on machine learning, PMLR, 2016, pp. 40–48

2016
[36]

R. Ye, C. Zhang, R. W ang, S. Xu, and Y. Zhang,Language is all a graph needs, in Find- ings of the association for computational linguistics: EACL 2024, 2024, pp. 1955–1973

2024
[37]

J. Zhao, M. Qu, C. Li, H. Yan, Q. Liu, R. Li, X. Xie, and J. Tang,Learning on large-scale text-attributed graphs via variational inference, in International Conference on Learning Representations, 2023

2023
[38]

F. Zhou, C. Cao, K. Zhang, G. Trajcevski, T. Zhong, and J. Geng,Meta-gnn: On few-shot node classification in graph meta-learning, in Pro- ceedings of the 28th ACM international conference on information and knowledge management, 2019, pp. 2357–2360

2019
[39]

Example: [text]→Answer: [label]

Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. W ang,Deep graph contrastive representation learning, in ICML Workshop on Graph Representa- tion Learning and Beyond, 2020. A Scope Discussion. HopRank introduces a new paradigm for graph-LLM research: using graph topol- ogy as a self-supervised signal to replace labeled data entirely. As a proof of concept for ...

2020