From Node2Vec to GPT-based GraphRAG: scientific impact prediction across graph and language models

Adilson Vital Jr.; Diego R. Amancio; Filipi N. Silva

arxiv: 2605.18410 · v1 · pith:LPRDPQ2Tnew · submitted 2026-05-18 · 💻 cs.DL

From Node2Vec to GPT-based GraphRAG: scientific impact prediction across graph and language models

Adilson Vital Jr. , Filipi N. Silva , Diego R. Amancio This is my paper

Pith reviewed 2026-05-19 23:29 UTC · model grok-4.3

classification 💻 cs.DL

keywords scientific impact predictioncitation networksNode2VecGraphRAGlarge language modelstext embeddingssupervised classificationcold start

0 comments

The pith

Directed citation graphs combined with textual embeddings predict scientific impact with 0.84-0.85 AUC, while GPT prompts without retrieval often match GraphRAG performance at 0.87.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests ways to forecast which newly published scientific papers will gain many citations later on, using only data available at the time of publication. It builds citation networks and text-similarity graphs, then applies Node2Vec embeddings either by themselves or together with OpenAI text vectors in supervised classifiers. The best results come from mixing directed citation structure with language model embeddings. When using large language models like GPT, adding retrieved graph context does not reliably beat just prompting with the target paper's text alone. Accurate early impact prediction matters because it can help editors, funders, and researchers focus effort on work that is likely to matter most.

Core claim

The authors formulate impact prediction as classifying papers into cohort-normalized top-P% citation ranks and show that supervised models using Node2Vec on directed citation graphs plus textual embeddings reach about 0.84-0.85 AUC. GPT-based GraphRAG using graph neighborhoods as context achieves up to 0.87 but target-only prompts perform as well or better, indicating that structural and textual signals complement each other in supervised settings while retrieval augmentation needs careful comparison to simple baselines.

What carries the argument

Temporally constrained citation and textual-similarity graphs processed with Node2Vec embeddings, fused with OpenAI text embeddings for supervised classification, alongside GPT models prompted with or without graph neighborhood context.

Load-bearing premise

Cohort-normalized top-P% citation rank years after publication acts as a stable, unbiased proxy for scientific impact that can be predicted from publication-time data without future leakage.

What would settle it

Model AUC dropping to near 0.5 or below when applied to predict impact in a completely new scientific field or later publication year cohort using the same training setup.

Figures

Figures reproduced from arXiv: 2605.18410 by Adilson Vital Jr., Diego R. Amancio, Filipi N. Silva.

**Figure 2.** Figure 2: FIG. 2. Number of papers published annually in ACS Applied Materials & Interfaces (American [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Overview of the LLM-based GraphRAG methodology for top-paper prediction. The pro [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Comparison of graph construction strategies and input representations for top-20% pa [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Performance comparison of textual-similarity graphs built with Top- [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Effect of quantile-based thresholds on top-paper prediction performance. The figure reports [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: FIG. 7. GraphRAG-based prediction performance under different neighbor retrieval strategies using [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: FIG. 8. Effect of graph-retrieved context on LLM-based top-paper prediction. The figure compares [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 9.** Figure 9: FIG. 9. Cross-journal evaluation of LLM-based top-paper prediction with and without graph [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗

read the original abstract

Identifying which newly published scientific papers are likely to become highly cited is important for prioritizing research attention, supporting editorial decisions, and guiding the allocation of scientific resources, particularly under cold-start conditions where little direct evidence is available at publication time. In this work, we formulate impact prediction as a cohort-normalized top-P% classification task and compare graph-based and LLM-based approaches under a unified framework. We construct citation and textual-similarity graphs under temporal constraints and generate Node2Vec representations, either alone or combined with OpenAI text embeddings. The best supervised configuration combines directed citation graphs with textual embeddings, reaching approximately 0.84-0.85 AUC. We also evaluate a GPT-based GraphRAG setup, using GPT 5.5 and 5.4 Nano, in which graph neighborhoods are used as contextual evidence for prediction. Although the LLM-based approach achieves high performance, retrieved context does not consistently improve results; target-only prompts often perform as well as or better than GraphRAG prompts achieving the 0.87 mark. These findings indicate that structural and textual signals are complementary for supervised prediction, while retrieval augmentation must be carefully evaluated against simpler LLM baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows that directed citation graphs plus text embeddings reach 0.84-0.85 AUC for cohort-normalized top-P% impact prediction under temporal constraints, while GPT GraphRAG hits 0.87 but rarely improves over target-only prompts.

read the letter

This paper lines up Node2Vec on citation graphs against a GPT-based GraphRAG setup for predicting which new papers will end up highly cited. The best supervised run hits around 0.84-0.85 AUC when it uses directed citation graphs together with text embeddings. The LLM version gets a bit higher at 0.87, but pulling in graph neighborhoods often doesn't beat just prompting with the target paper alone.

Referee Report

3 major / 1 minor

Summary. The manuscript formulates scientific impact prediction as a cohort-normalized top-P% citation classification task and compares Node2Vec embeddings derived from temporally constrained citation and textual-similarity graphs (alone or combined with OpenAI text embeddings) against GPT-based GraphRAG prompting using GPT-3.5/4 variants. It reports that the strongest supervised configuration reaches approximately 0.84-0.85 AUC while LLM prompting achieves up to 0.87 AUC, with the observation that target-only prompts frequently match or exceed GraphRAG performance.

Significance. If the reported performance differences hold under rigorous validation, the work usefully demonstrates complementarity between graph structure and textual features for cold-start prediction and supplies a practical reminder that retrieval augmentation must be benchmarked against simpler LLM baselines. The emphasis on temporal graph construction to respect publication-time information is a positive methodological choice.

major comments (3)

[Abstract] Abstract: The central AUC figures (0.84-0.85 supervised, 0.87 LLM) are presented without any description of dataset size, the concrete value chosen for P, the exact temporal train/test split dates, cross-validation procedure, or statistical significance testing of differences across configurations. These omissions are load-bearing for assessing whether the headline performance claims are reliable.
[Abstract] Abstract: The cohort-normalized top-P% future citation rank is adopted as the prediction target with no explicit verification that post-publication information (e.g., journal effects, topic popularity shifts, or early visibility signals) does not leak into the label or the temporally constrained node features/neighborhoods. A concrete test—such as reporting AUC when using only pre-publication metadata or stratifying results by field—would strengthen the interpretation that the models are predicting intrinsic impact rather than recovering early cues.
[Abstract] Abstract: The statement that 'retrieved context does not consistently improve results' requires supporting quantitative evidence; a table or figure comparing AUC (with confidence intervals) for target-only versus GraphRAG prompts across all model variants would make this claim verifiable rather than qualitative.

minor comments (1)

[Abstract] Abstract: The phrasing 'GPT 5.5 and 5.4 Nano' is unclear and likely contains a typographical error; the exact model identifiers (e.g., GPT-3.5-turbo and GPT-4o) should be stated explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important areas for improving transparency and rigor. We address each major comment point by point below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central AUC figures (0.84-0.85 supervised, 0.87 LLM) are presented without any description of dataset size, the concrete value chosen for P, the exact temporal train/test split dates, cross-validation procedure, or statistical significance testing of differences across configurations. These omissions are load-bearing for assessing whether the headline performance claims are reliable.

Authors: We agree that the abstract would benefit from these key details to allow readers to properly evaluate the claims. The full manuscript provides this information in the Methods and Experimental Setup sections, including the scale of the paper collection, the specific P threshold for the top-P% task, the exact temporal cutoffs for train/test splits, the cross-validation procedure used, and statistical tests comparing AUC differences. We will revise the abstract to concisely include summaries of these elements. revision: yes
Referee: [Abstract] Abstract: The cohort-normalized top-P% future citation rank is adopted as the prediction target with no explicit verification that post-publication information (e.g., journal effects, topic popularity shifts, or early visibility signals) does not leak into the label or the temporally constrained node features/neighborhoods. A concrete test—such as reporting AUC when using only pre-publication metadata or stratifying results by field—would strengthen the interpretation that the models are predicting intrinsic impact rather than recovering early cues.

Authors: This concern is well-taken. The temporal constraints on graph construction and feature extraction are explicitly designed to use only information available at publication time, thereby avoiding post-publication leakage. To further bolster the interpretation, we will add new analyses that report AUC using only pre-publication metadata and that stratify performance by scientific field. These results will be presented in a dedicated subsection of the revised manuscript. revision: yes
Referee: [Abstract] Abstract: The statement that 'retrieved context does not consistently improve results' requires supporting quantitative evidence; a table or figure comparing AUC (with confidence intervals) for target-only versus GraphRAG prompts across all model variants would make this claim verifiable rather than qualitative.

Authors: We agree that the claim requires explicit quantitative backing to be fully verifiable. While detailed per-variant comparisons appear in the Results section, we will add a consolidated table (or expand an existing results table) that directly reports AUC values together with confidence intervals for target-only versus GraphRAG prompts across all GPT variants. This will make the observation that retrieved neighborhoods do not consistently outperform target-only baselines immediately evident and quantifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on temporal held-out data

full rationale

The paper trains Node2Vec embeddings and GPT-based classifiers on temporally constrained citation and similarity graphs to predict cohort-normalized future citation rank. AUC values are measured on held-out test sets after temporal splits, not obtained by fitting a parameter that directly encodes the target label or by renaming an input. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the derivation. The approach is a standard supervised comparison whose outputs are independent of the inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions from graph representation learning and LLM prompting literature plus the domain-specific choice of citation rank as impact proxy; no new entities are postulated.

free parameters (1)

top-P% threshold
The percentile cutoff used to define positive class is a modeling choice whose exact value is not stated in the abstract.

axioms (1)

domain assumption Citation and textual-similarity graphs can be constructed under strict temporal constraints that simulate cold-start conditions at publication time.
Invoked when building the input graphs for both Node2Vec and GraphRAG experiments.

pith-pipeline@v0.9.0 · 5746 in / 1264 out tokens · 34026 ms · 2026-05-19T23:29:39.693665+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 3 internal anchors

[1]

The number of papers published yearly is shown in Figure 2

Data processing The first part of this work started by selecting35,354academic papers from the journal ACS Applied Materials & Interfaces(American Chemical Society, ACS), spanning 11 pub- lication years from 2009 to 2020 and exhibiting a natural year-over-year growth in volume. The number of papers published yearly is shown in Figure 2. The only informati...

work page 2009
[2]

For both graph types we created four variations based on (i) edge direction (directed vs

Graph Construction Once the database was organized intoY-year post-publication windows and the target metric labels were defined, we used the data available up to that point to construct two graphs representing complementary relational views: citations and semantic similarity. For both graph types we created four variations based on (i) edge direction (di...

work page 2009
[3]

Each graph family is further expanded by two edge direction types (directed vs

Embedding construction After the graph construction phase, we have the following graph families: (i) the citation graph built from citation relations, and (ii) the similarity graph built by connecting top-K 9 most similar papers, producing four similarity-graph variants according toK∈ {3,5,7,9}. Each graph family is further expanded by two edge direction ...

work page
[4]

top paper

Impact classification In the final phase, we use the embeddings from the previous phase as inputs to a su- pervised classification model that predicts whether each paper will be a “top paper” un- der a given definition, percentile thresholdP∈ {10,20,30,40,50}, and prediction horizon Y∈ {0, . . . ,10}(when observable). Concretely, each training instance co...

work page 2048
[5]

We retain the 13 FIG

Graph construction The LLM-based experiments use the graphs constructed in the previous stage; conse- quently, no additional graph-construction procedures are introduced here. We retain the 13 FIG. 3. Overview of the LLM-based GraphRAG methodology for top-paper prediction. The pro- cess reuses the citation and textual-similarity graphs constructed in the ...

work page 2050
[6]

Context retrieval For each sampled target paper, we extracted a local neighborhood from the graph to serve as contextual information within the prompt. We evaluated two distinct retrieval strategies: (i) random sampling from the target node’s immediate graph neighbors, and (ii) similarity- based selection, where we identified the top five most similar pap...

work page
[7]

To achieve this, each request was structured into three distinct layers: a system prompt, a developer prompt, and a programmatically generated user prompt

Prompt construction We employed GPT-5.5 and GPT 5.4 Nano as the underlying models, configuring the prompting protocol to function as a specialized scientific impact prediction engine rather than a general-purpose assistant. To achieve this, each request was structured into three distinct layers: a system prompt, a developer prompt, and a programmatically ...

work page
[8]

Prediction and evaluation For each target paper, the LLM produces a single structured response centered on a probability vector for top-paper prediction across all requested horizon years, together with 16 additional auxiliary outputs included for completeness. Thus, unlike the graph-based classi- fier, which solves separate binary classification problems...

work page
[9]

Comparison between citation and textual-similarity graphs Figure 4 presents the AUC scores for the classification of papers belonging to the top 20% of their respective cohorts for each year following publication. We evaluate two distinct graph construction strategies –Paper CitationandTextual Similarity– against two input representations for the neural c...

work page
[10]

The most prominent finding is that concatenating textual embeddings with Node2Vec consistently outperforms the standalone Node2Vec model across all values ofK

Sensitivity analysis of Top-K textual similarity graphs In Figure 5, we examine the textual-similarity graphs by varying the number of neighbors (K) from 3 to 9. The most prominent finding is that concatenating textual embeddings with Node2Vec consistently outperforms the standalone Node2Vec model across all values ofK. This confirms that explicitly prese...

work page
[11]

exceptional

Effect of quantile-based thresholds on top-paper prediction performance In Figure 6, we extend the experiment across various quantile thresholds (50th to90 th percentiles in increments of 10) used to define the positive class. While the figure displays results exclusively for directed and weighted graphs, the observed trends were consistent across other g...

work page
[12]

Effect of neighbor retrieval strategy in GraphRAG-based prediction We employed the graph structure as a retrieval mechanism, selecting neighbors of the target paper and injecting them into the LLM prompt as contextual evidence. We evaluated multiple graph configurations by varying edge direction and edge weighting, and compared two neighborhood-selection ...

work page
[13]

Effect of graph-retrieved context on LLM prediction performance In Figure 8, we replicate the GraphRAG setup under a simplified and controlled configu- ration, using directed and unweighted graphs with random retrieval, and compare prompts withandwithoutgraph-retrieved neighbors. The goal of this experiment is to isolate the contribution of retrieved grap...

work page
[14]

We kept the same controlled configuration used in the previous comparison: directed and unweighted graphs, with five randomly selected neighbors in the context-augmented condition

Cross-journal evaluation with and without graph-retrieved context To assess whether the behavior observed in the main corpus was specific to a single journal or reflected a more general property of the LLM-based prediction setup, we repeated the context-ablation experiment on three additional journals:Informetrics,PNAS, and PRL. We kept the same controlle...

work page 2025
[15]

Abramo, C

G. Abramo, C. A. D’Angelo, and G. Felici. Predicting publication long-term impact through a combination of early citations and journal impact factor.Journal of Informetrics, 13(1):32–49, 2019. 36

work page 2019
[16]

D. W. Aksnes, L. Langfeldt, and P. Wouters. Citations, citation indicators, and research quality: An overview of basic concepts and theories.Sage Open, 9(1):2158244019829575, 2019

work page 2019
[17]

D. R. Amancio, M. d. G. V. Nunes, O. N. Oliveira Jr, and L. da F. Costa. Using complex networks concepts to assess approaches for citations in scientific papers.Scientometrics, 91 (3):827–842, 2012

work page 2012
[18]

T. Azad, I. Al Azher, S. R. Choudhury, and H. Alhoori. Predicting the scholarly impact of research papers using retrieval-augmented llms. InProceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 124–131, 2025

work page 2025
[19]

Beltagy, K

I. Beltagy, K. Lo, and A. Cohan. SciBERT: A pretrained language model for scientific text. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, Nov. ...

work page 2019
[20]

A. C. M. Brito, F. N. Silva, and D. R. Amancio. A complex network approach to political analysis: Application to the brazilian chamber of deputies.Plos one, 15(3):e0229928, 2020

work page 2020
[21]

Cohan, S

A. Cohan, S. Feldman, I. Beltagy, D. Downey, and D. Weld. SPECTER: Document-level rep- resentation learning using citation-informed transformers. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Com- putational Linguistics, pages 2270–2282, Online, July 2020. Association for Comp...

work page 2020
[22]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding.CoRR, abs/1810.04805, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

node2vec: Scalable Feature Learning for Networks

A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks.CoRR, abs/1607.00653, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

G. He, Z. Xue, Z. Jiang, Y. Kang, S. Zhao, and W. Lu. H2cgl: Modeling dynamics of citation network for impact prediction.Information Processing & Management, 60(6):103512, 2023

work page 2023
[25]

L. He, L. Bai, X. Yang, H. Du, and J. Liang. High-order graph attention network.Information Sciences, 630:222–234, 2023. ISSN 0020-0255

work page 2023
[26]

J. H. Kim, J. Son, H. Kim, and E. Lee. Node embedding for homophilous graphs with argew: Augmentation of random walks by graph edge weights.arXiv preprint arXiv:2308.05957, 2023. 37

work page arXiv 2023
[27]

Kousha and M

K. Kousha and M. Thelwall. Factors associating with or predicting more cited or higher quality journal articles: An annual review of information science and technology (arist) paper.Journal of the Association for Information Science and Technology, 75(3):215–244, 2024

work page 2024
[28]

Mikolov, K

T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space, 2013

work page 2013
[29]

Distributed Representations of Words and Phrases and their Compositionality

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality.CoRR, abs/1310.4546, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[30]

A. P. Millán, H. Sun, L. Giambagli, R. Muolo, T. Carletti, J. J. Torres, F. Radicchi, J. Kurths, and G. Bianconi. Topology shapes dynamics of higher-order networks.Nature Physics, 21(3): 353–361, 2025

work page 2025
[31]

New embedding models and api updates.https://openai.com/index/ new-embedding-models-and-api-updates/, Jan

OpenAI. New embedding models and api updates.https://openai.com/index/ new-embedding-models-and-api-updates/, Jan. 2024. Accessed: 2026-01-16

work page 2024
[32]

Penner, R

O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, and S. Fortunato. On the predictability of future impact in science.Scientific reports, 3(1):3052, 2013

work page 2013
[33]

A. M. Petersen, R. K. Pan, F. Pammolli, and S. Fortunato. Methods to account for citation inflation in research evaluation.Research Policy, 48(7):1855–1865, 2019

work page 2019
[34]

Stegehuis, N

C. Stegehuis, N. Litvak, and L. Waltman. Predicting the long-term citation impact of recent publications.Journal of Informetrics, 9(3):642–657, 2015. ISSN 1751-1577

work page 2015
[35]

Stella, T

M. Stella, T. J. Swanson, A. S. Teixeira, B. N. Richson, Y. Li, T. T. Hills, K. T. Forbush, and D. Watson. Cognitive networks and text analysis identify anxiety as a key dimension of distress in genuine suicide notes.Big Data and Cognitive Computing, 9(7):171, 2025

work page 2025
[36]

Vital and D

A. Vital and D. R. Amancio. A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks.Scientometrics, 127(10):6011–6028, 2022

work page 2022
[37]

Vital, Jr., F

A. Vital, Jr., F. N. Silva, and D. R. Amancio. Comparing random walks in graph embedding and link prediction.PLOS ONE, 19(11):1–22, 11 2024

work page 2024
[38]

Vital Jr, F

A. Vital Jr, F. N. Silva, and D. R. Amancio. Recovering link-weight structure in complex networks with weight-aware random walks.arXiv preprint arXiv:2508.07489, 2025

work page arXiv 2025
[39]

Vital Jr, F

A. Vital Jr, F. N. Silva, O. N. Oliveira Jr, and D. R. Amancio. Predicting citation impact of research papers using gpt and other text embeddings.Physica A: Statistical Mechanics and its Applications, 674:130789, 2025. 38

work page 2025
[40]

L. Waltman. A review of the literature on citation impact indicators.Journal of informetrics, 10(2):365–391, 2016

work page 2016
[41]

Waltman and M

L. Waltman and M. Schreiber. On the calculation of percentile-based bibliometric indicators. Journal of the American Society for information Science and Technology, 64(2):372–379, 2013

work page 2013
[42]

D. Wang, C. Song, and A.-L. Barabási. Quantifying long-term scientific impact.Science, 342 (6154):127–132, 2013

work page 2013
[43]

X. Wu, H. Pang, Y. Fan, Y. Linghu, and Y. Luo. Probwalk: A random walk approach in weighted graph embedding.Procedia Computer Science, 183:683–689, 2021

work page 2021
[44]

Z. Ye, Y. Hou, R. Pan, T. Gao, and H. Wang. Are large language models able to predict highly cited papers? evidence from statistical publications.arXiv preprint arXiv:2601.13627, 2026

work page arXiv 2026
[45]

Zhao and X

Q. Zhao and X. Feng. Utilizing citation network structure to predict paper citation counts: A deep learning approach.Journal of Informetrics, 16(1):101235, 2022

work page 2022
[46]

X. Zhou, J. Wang, J. Wang, and Q. Guan. Predicting air quality using a multi-scale spatiotem- poral graph attention network.Information Sciences, 680:121072, 2024. ISSN 0020-0255. 39 Appendix A: Prompt Templates Used in the LLM-Based Experiments This appendix presents the prompt templates used in the LLM-based prediction exper- iments. The prompting proto...

work page 2024
[47]

Your job is to estimate calibrated probabilities for whether a target paper will become atop paperwithin its journal at each requested horizon year

System Prompt System Prompt You are a scientific impact prediction engine for journal articles. Your job is to estimate calibrated probabilities for whether a target paper will become atop paperwithin its journal at each requested horizon year. Output rules

work page
[48]

No Markdown, no explanation, and no extra text

Output valid JSON only. No Markdown, no explanation, and no extra text

work page
[49]

response

The JSON must have exactly one top-level key:"response". 3."response"must have exactly one key:"y_acc_vector"

work page
[50]

response

The required structure is: {"response":{"y_acc_vector":[...]}} 5."y_acc_vector"must contain numeric probabilities in[0,1]

work page
[51]

Probabilities must be numbers, not strings

work page
[52]

y_acc_vector

The length of"y_acc_vector"must be exactly equal to<OUTPUT_SPEC><n_years>

work page
[53]

Return only the final JSON

Do not reveal reasoning or chain-of-thought. Return only the final JSON

work page
[54]

Use only the information explicitly present in the XML input. 40

work page
[55]

Do not use external facts or hidden assumptions about papers, authors, journals, venues, identifiers, files, or datasets

work page
[56]

response

Developer Prompt Developer Prompt Task Predict, for the target journal article, the probability that it will be atop paperby accumulated citations at each requested horizon year. Positive event The positive event is defined by<CONFIG><q_value>: •q_valueis a quantile threshold. •A paper is consideredtopif it belongs to the top(1−q_value)fraction within its...

work page
[57]

response

User Prompt Template The user prompt was generated dynamically from the experiment payload and serialized in XML format. It always contained a<REQUEST>root block. The<OUTPUT_SPEC>section specified the required JSON schema and the exact output vector length. The<CONFIG> section described the graph and retrieval settings. The<TARGET>section contained the me...

work page

[1] [1]

The number of papers published yearly is shown in Figure 2

Data processing The first part of this work started by selecting35,354academic papers from the journal ACS Applied Materials & Interfaces(American Chemical Society, ACS), spanning 11 pub- lication years from 2009 to 2020 and exhibiting a natural year-over-year growth in volume. The number of papers published yearly is shown in Figure 2. The only informati...

work page 2009

[2] [2]

For both graph types we created four variations based on (i) edge direction (directed vs

Graph Construction Once the database was organized intoY-year post-publication windows and the target metric labels were defined, we used the data available up to that point to construct two graphs representing complementary relational views: citations and semantic similarity. For both graph types we created four variations based on (i) edge direction (di...

work page 2009

[3] [3]

Each graph family is further expanded by two edge direction types (directed vs

Embedding construction After the graph construction phase, we have the following graph families: (i) the citation graph built from citation relations, and (ii) the similarity graph built by connecting top-K 9 most similar papers, producing four similarity-graph variants according toK∈ {3,5,7,9}. Each graph family is further expanded by two edge direction ...

work page

[4] [4]

top paper

Impact classification In the final phase, we use the embeddings from the previous phase as inputs to a su- pervised classification model that predicts whether each paper will be a “top paper” un- der a given definition, percentile thresholdP∈ {10,20,30,40,50}, and prediction horizon Y∈ {0, . . . ,10}(when observable). Concretely, each training instance co...

work page 2048

[5] [5]

We retain the 13 FIG

Graph construction The LLM-based experiments use the graphs constructed in the previous stage; conse- quently, no additional graph-construction procedures are introduced here. We retain the 13 FIG. 3. Overview of the LLM-based GraphRAG methodology for top-paper prediction. The pro- cess reuses the citation and textual-similarity graphs constructed in the ...

work page 2050

[6] [6]

Context retrieval For each sampled target paper, we extracted a local neighborhood from the graph to serve as contextual information within the prompt. We evaluated two distinct retrieval strategies: (i) random sampling from the target node’s immediate graph neighbors, and (ii) similarity- based selection, where we identified the top five most similar pap...

work page

[7] [7]

To achieve this, each request was structured into three distinct layers: a system prompt, a developer prompt, and a programmatically generated user prompt

Prompt construction We employed GPT-5.5 and GPT 5.4 Nano as the underlying models, configuring the prompting protocol to function as a specialized scientific impact prediction engine rather than a general-purpose assistant. To achieve this, each request was structured into three distinct layers: a system prompt, a developer prompt, and a programmatically ...

work page

[8] [8]

Prediction and evaluation For each target paper, the LLM produces a single structured response centered on a probability vector for top-paper prediction across all requested horizon years, together with 16 additional auxiliary outputs included for completeness. Thus, unlike the graph-based classi- fier, which solves separate binary classification problems...

work page

[9] [9]

Comparison between citation and textual-similarity graphs Figure 4 presents the AUC scores for the classification of papers belonging to the top 20% of their respective cohorts for each year following publication. We evaluate two distinct graph construction strategies –Paper CitationandTextual Similarity– against two input representations for the neural c...

work page

[10] [10]

The most prominent finding is that concatenating textual embeddings with Node2Vec consistently outperforms the standalone Node2Vec model across all values ofK

Sensitivity analysis of Top-K textual similarity graphs In Figure 5, we examine the textual-similarity graphs by varying the number of neighbors (K) from 3 to 9. The most prominent finding is that concatenating textual embeddings with Node2Vec consistently outperforms the standalone Node2Vec model across all values ofK. This confirms that explicitly prese...

work page

[11] [11]

exceptional

Effect of quantile-based thresholds on top-paper prediction performance In Figure 6, we extend the experiment across various quantile thresholds (50th to90 th percentiles in increments of 10) used to define the positive class. While the figure displays results exclusively for directed and weighted graphs, the observed trends were consistent across other g...

work page

[12] [12]

Effect of neighbor retrieval strategy in GraphRAG-based prediction We employed the graph structure as a retrieval mechanism, selecting neighbors of the target paper and injecting them into the LLM prompt as contextual evidence. We evaluated multiple graph configurations by varying edge direction and edge weighting, and compared two neighborhood-selection ...

work page

[13] [13]

Effect of graph-retrieved context on LLM prediction performance In Figure 8, we replicate the GraphRAG setup under a simplified and controlled configu- ration, using directed and unweighted graphs with random retrieval, and compare prompts withandwithoutgraph-retrieved neighbors. The goal of this experiment is to isolate the contribution of retrieved grap...

work page

[14] [14]

We kept the same controlled configuration used in the previous comparison: directed and unweighted graphs, with five randomly selected neighbors in the context-augmented condition

Cross-journal evaluation with and without graph-retrieved context To assess whether the behavior observed in the main corpus was specific to a single journal or reflected a more general property of the LLM-based prediction setup, we repeated the context-ablation experiment on three additional journals:Informetrics,PNAS, and PRL. We kept the same controlle...

work page 2025

[15] [15]

Abramo, C

G. Abramo, C. A. D’Angelo, and G. Felici. Predicting publication long-term impact through a combination of early citations and journal impact factor.Journal of Informetrics, 13(1):32–49, 2019. 36

work page 2019

[16] [16]

D. W. Aksnes, L. Langfeldt, and P. Wouters. Citations, citation indicators, and research quality: An overview of basic concepts and theories.Sage Open, 9(1):2158244019829575, 2019

work page 2019

[17] [17]

D. R. Amancio, M. d. G. V. Nunes, O. N. Oliveira Jr, and L. da F. Costa. Using complex networks concepts to assess approaches for citations in scientific papers.Scientometrics, 91 (3):827–842, 2012

work page 2012

[18] [18]

T. Azad, I. Al Azher, S. R. Choudhury, and H. Alhoori. Predicting the scholarly impact of research papers using retrieval-augmented llms. InProceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 124–131, 2025

work page 2025

[19] [19]

Beltagy, K

I. Beltagy, K. Lo, and A. Cohan. SciBERT: A pretrained language model for scientific text. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, Nov. ...

work page 2019

[20] [20]

A. C. M. Brito, F. N. Silva, and D. R. Amancio. A complex network approach to political analysis: Application to the brazilian chamber of deputies.Plos one, 15(3):e0229928, 2020

work page 2020

[21] [21]

Cohan, S

A. Cohan, S. Feldman, I. Beltagy, D. Downey, and D. Weld. SPECTER: Document-level rep- resentation learning using citation-informed transformers. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Com- putational Linguistics, pages 2270–2282, Online, July 2020. Association for Comp...

work page 2020

[22] [22]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding.CoRR, abs/1810.04805, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

node2vec: Scalable Feature Learning for Networks

A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks.CoRR, abs/1607.00653, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

G. He, Z. Xue, Z. Jiang, Y. Kang, S. Zhao, and W. Lu. H2cgl: Modeling dynamics of citation network for impact prediction.Information Processing & Management, 60(6):103512, 2023

work page 2023

[25] [25]

L. He, L. Bai, X. Yang, H. Du, and J. Liang. High-order graph attention network.Information Sciences, 630:222–234, 2023. ISSN 0020-0255

work page 2023

[26] [26]

J. H. Kim, J. Son, H. Kim, and E. Lee. Node embedding for homophilous graphs with argew: Augmentation of random walks by graph edge weights.arXiv preprint arXiv:2308.05957, 2023. 37

work page arXiv 2023

[27] [27]

Kousha and M

K. Kousha and M. Thelwall. Factors associating with or predicting more cited or higher quality journal articles: An annual review of information science and technology (arist) paper.Journal of the Association for Information Science and Technology, 75(3):215–244, 2024

work page 2024

[28] [28]

Mikolov, K

T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space, 2013

work page 2013

[29] [29]

Distributed Representations of Words and Phrases and their Compositionality

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality.CoRR, abs/1310.4546, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[30] [30]

A. P. Millán, H. Sun, L. Giambagli, R. Muolo, T. Carletti, J. J. Torres, F. Radicchi, J. Kurths, and G. Bianconi. Topology shapes dynamics of higher-order networks.Nature Physics, 21(3): 353–361, 2025

work page 2025

[31] [31]

New embedding models and api updates.https://openai.com/index/ new-embedding-models-and-api-updates/, Jan

OpenAI. New embedding models and api updates.https://openai.com/index/ new-embedding-models-and-api-updates/, Jan. 2024. Accessed: 2026-01-16

work page 2024

[32] [32]

Penner, R

O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, and S. Fortunato. On the predictability of future impact in science.Scientific reports, 3(1):3052, 2013

work page 2013

[33] [33]

A. M. Petersen, R. K. Pan, F. Pammolli, and S. Fortunato. Methods to account for citation inflation in research evaluation.Research Policy, 48(7):1855–1865, 2019

work page 2019

[34] [34]

Stegehuis, N

C. Stegehuis, N. Litvak, and L. Waltman. Predicting the long-term citation impact of recent publications.Journal of Informetrics, 9(3):642–657, 2015. ISSN 1751-1577

work page 2015

[35] [35]

Stella, T

M. Stella, T. J. Swanson, A. S. Teixeira, B. N. Richson, Y. Li, T. T. Hills, K. T. Forbush, and D. Watson. Cognitive networks and text analysis identify anxiety as a key dimension of distress in genuine suicide notes.Big Data and Cognitive Computing, 9(7):171, 2025

work page 2025

[36] [36]

Vital and D

A. Vital and D. R. Amancio. A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks.Scientometrics, 127(10):6011–6028, 2022

work page 2022

[37] [37]

Vital, Jr., F

A. Vital, Jr., F. N. Silva, and D. R. Amancio. Comparing random walks in graph embedding and link prediction.PLOS ONE, 19(11):1–22, 11 2024

work page 2024

[38] [38]

Vital Jr, F

A. Vital Jr, F. N. Silva, and D. R. Amancio. Recovering link-weight structure in complex networks with weight-aware random walks.arXiv preprint arXiv:2508.07489, 2025

work page arXiv 2025

[39] [39]

Vital Jr, F

A. Vital Jr, F. N. Silva, O. N. Oliveira Jr, and D. R. Amancio. Predicting citation impact of research papers using gpt and other text embeddings.Physica A: Statistical Mechanics and its Applications, 674:130789, 2025. 38

work page 2025

[40] [40]

L. Waltman. A review of the literature on citation impact indicators.Journal of informetrics, 10(2):365–391, 2016

work page 2016

[41] [41]

Waltman and M

L. Waltman and M. Schreiber. On the calculation of percentile-based bibliometric indicators. Journal of the American Society for information Science and Technology, 64(2):372–379, 2013

work page 2013

[42] [42]

D. Wang, C. Song, and A.-L. Barabási. Quantifying long-term scientific impact.Science, 342 (6154):127–132, 2013

work page 2013

[43] [43]

X. Wu, H. Pang, Y. Fan, Y. Linghu, and Y. Luo. Probwalk: A random walk approach in weighted graph embedding.Procedia Computer Science, 183:683–689, 2021

work page 2021

[44] [44]

Z. Ye, Y. Hou, R. Pan, T. Gao, and H. Wang. Are large language models able to predict highly cited papers? evidence from statistical publications.arXiv preprint arXiv:2601.13627, 2026

work page arXiv 2026

[45] [45]

Zhao and X

Q. Zhao and X. Feng. Utilizing citation network structure to predict paper citation counts: A deep learning approach.Journal of Informetrics, 16(1):101235, 2022

work page 2022

[46] [46]

X. Zhou, J. Wang, J. Wang, and Q. Guan. Predicting air quality using a multi-scale spatiotem- poral graph attention network.Information Sciences, 680:121072, 2024. ISSN 0020-0255. 39 Appendix A: Prompt Templates Used in the LLM-Based Experiments This appendix presents the prompt templates used in the LLM-based prediction exper- iments. The prompting proto...

work page 2024

[47] [47]

Your job is to estimate calibrated probabilities for whether a target paper will become atop paperwithin its journal at each requested horizon year

System Prompt System Prompt You are a scientific impact prediction engine for journal articles. Your job is to estimate calibrated probabilities for whether a target paper will become atop paperwithin its journal at each requested horizon year. Output rules

work page

[48] [48]

No Markdown, no explanation, and no extra text

Output valid JSON only. No Markdown, no explanation, and no extra text

work page

[49] [49]

response

The JSON must have exactly one top-level key:"response". 3."response"must have exactly one key:"y_acc_vector"

work page

[50] [50]

response

The required structure is: {"response":{"y_acc_vector":[...]}} 5."y_acc_vector"must contain numeric probabilities in[0,1]

work page

[51] [51]

Probabilities must be numbers, not strings

work page

[52] [52]

y_acc_vector

The length of"y_acc_vector"must be exactly equal to<OUTPUT_SPEC><n_years>

work page

[53] [53]

Return only the final JSON

Do not reveal reasoning or chain-of-thought. Return only the final JSON

work page

[54] [54]

Use only the information explicitly present in the XML input. 40

work page

[55] [55]

Do not use external facts or hidden assumptions about papers, authors, journals, venues, identifiers, files, or datasets

work page

[56] [56]

response

Developer Prompt Developer Prompt Task Predict, for the target journal article, the probability that it will be atop paperby accumulated citations at each requested horizon year. Positive event The positive event is defined by<CONFIG><q_value>: •q_valueis a quantile threshold. •A paper is consideredtopif it belongs to the top(1−q_value)fraction within its...

work page

[57] [57]

response

User Prompt Template The user prompt was generated dynamically from the experiment payload and serialized in XML format. It always contained a<REQUEST>root block. The<OUTPUT_SPEC>section specified the required JSON schema and the exact output vector length. The<CONFIG> section described the graph and retrieval settings. The<TARGET>section contained the me...

work page