pith. sign in

arxiv: 2605.18410 · v1 · pith:LPRDPQ2Tnew · submitted 2026-05-18 · 💻 cs.DL

From Node2Vec to GPT-based GraphRAG: scientific impact prediction across graph and language models

Pith reviewed 2026-05-19 23:29 UTC · model grok-4.3

classification 💻 cs.DL
keywords scientific impact predictioncitation networksNode2VecGraphRAGlarge language modelstext embeddingssupervised classificationcold start
0
0 comments X

The pith

Directed citation graphs combined with textual embeddings predict scientific impact with 0.84-0.85 AUC, while GPT prompts without retrieval often match GraphRAG performance at 0.87.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests ways to forecast which newly published scientific papers will gain many citations later on, using only data available at the time of publication. It builds citation networks and text-similarity graphs, then applies Node2Vec embeddings either by themselves or together with OpenAI text vectors in supervised classifiers. The best results come from mixing directed citation structure with language model embeddings. When using large language models like GPT, adding retrieved graph context does not reliably beat just prompting with the target paper's text alone. Accurate early impact prediction matters because it can help editors, funders, and researchers focus effort on work that is likely to matter most.

Core claim

The authors formulate impact prediction as classifying papers into cohort-normalized top-P% citation ranks and show that supervised models using Node2Vec on directed citation graphs plus textual embeddings reach about 0.84-0.85 AUC. GPT-based GraphRAG using graph neighborhoods as context achieves up to 0.87 but target-only prompts perform as well or better, indicating that structural and textual signals complement each other in supervised settings while retrieval augmentation needs careful comparison to simple baselines.

What carries the argument

Temporally constrained citation and textual-similarity graphs processed with Node2Vec embeddings, fused with OpenAI text embeddings for supervised classification, alongside GPT models prompted with or without graph neighborhood context.

Load-bearing premise

Cohort-normalized top-P% citation rank years after publication acts as a stable, unbiased proxy for scientific impact that can be predicted from publication-time data without future leakage.

What would settle it

Model AUC dropping to near 0.5 or below when applied to predict impact in a completely new scientific field or later publication year cohort using the same training setup.

Figures

Figures reproduced from arXiv: 2605.18410 by Adilson Vital Jr., Diego R. Amancio, Filipi N. Silva.

Figure 1
Figure 1. Figure 1: FIG. 1. Overview of the methodology employed to predict scientific impact. The process begins [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Number of papers published annually in ACS Applied Materials & Interfaces (American [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Overview of the LLM-based GraphRAG methodology for top-paper prediction. The pro [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Comparison of graph construction strategies and input representations for top-20% pa [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Performance comparison of textual-similarity graphs built with Top- [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Effect of quantile-based thresholds on top-paper prediction performance. The figure reports [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. GraphRAG-based prediction performance under different neighbor retrieval strategies using [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Effect of graph-retrieved context on LLM-based top-paper prediction. The figure compares [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Cross-journal evaluation of LLM-based top-paper prediction with and without graph [PITH_FULL_IMAGE:figures/full_fig_p033_9.png] view at source ↗
read the original abstract

Identifying which newly published scientific papers are likely to become highly cited is important for prioritizing research attention, supporting editorial decisions, and guiding the allocation of scientific resources, particularly under cold-start conditions where little direct evidence is available at publication time. In this work, we formulate impact prediction as a cohort-normalized top-P% classification task and compare graph-based and LLM-based approaches under a unified framework. We construct citation and textual-similarity graphs under temporal constraints and generate Node2Vec representations, either alone or combined with OpenAI text embeddings. The best supervised configuration combines directed citation graphs with textual embeddings, reaching approximately 0.84-0.85 AUC. We also evaluate a GPT-based GraphRAG setup, using GPT 5.5 and 5.4 Nano, in which graph neighborhoods are used as contextual evidence for prediction. Although the LLM-based approach achieves high performance, retrieved context does not consistently improve results; target-only prompts often perform as well as or better than GraphRAG prompts achieving the 0.87 mark. These findings indicate that structural and textual signals are complementary for supervised prediction, while retrieval augmentation must be carefully evaluated against simpler LLM baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript formulates scientific impact prediction as a cohort-normalized top-P% citation classification task and compares Node2Vec embeddings derived from temporally constrained citation and textual-similarity graphs (alone or combined with OpenAI text embeddings) against GPT-based GraphRAG prompting using GPT-3.5/4 variants. It reports that the strongest supervised configuration reaches approximately 0.84-0.85 AUC while LLM prompting achieves up to 0.87 AUC, with the observation that target-only prompts frequently match or exceed GraphRAG performance.

Significance. If the reported performance differences hold under rigorous validation, the work usefully demonstrates complementarity between graph structure and textual features for cold-start prediction and supplies a practical reminder that retrieval augmentation must be benchmarked against simpler LLM baselines. The emphasis on temporal graph construction to respect publication-time information is a positive methodological choice.

major comments (3)
  1. [Abstract] Abstract: The central AUC figures (0.84-0.85 supervised, 0.87 LLM) are presented without any description of dataset size, the concrete value chosen for P, the exact temporal train/test split dates, cross-validation procedure, or statistical significance testing of differences across configurations. These omissions are load-bearing for assessing whether the headline performance claims are reliable.
  2. [Abstract] Abstract: The cohort-normalized top-P% future citation rank is adopted as the prediction target with no explicit verification that post-publication information (e.g., journal effects, topic popularity shifts, or early visibility signals) does not leak into the label or the temporally constrained node features/neighborhoods. A concrete test—such as reporting AUC when using only pre-publication metadata or stratifying results by field—would strengthen the interpretation that the models are predicting intrinsic impact rather than recovering early cues.
  3. [Abstract] Abstract: The statement that 'retrieved context does not consistently improve results' requires supporting quantitative evidence; a table or figure comparing AUC (with confidence intervals) for target-only versus GraphRAG prompts across all model variants would make this claim verifiable rather than qualitative.
minor comments (1)
  1. [Abstract] Abstract: The phrasing 'GPT 5.5 and 5.4 Nano' is unclear and likely contains a typographical error; the exact model identifiers (e.g., GPT-3.5-turbo and GPT-4o) should be stated explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which highlights important areas for improving transparency and rigor. We address each major comment point by point below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central AUC figures (0.84-0.85 supervised, 0.87 LLM) are presented without any description of dataset size, the concrete value chosen for P, the exact temporal train/test split dates, cross-validation procedure, or statistical significance testing of differences across configurations. These omissions are load-bearing for assessing whether the headline performance claims are reliable.

    Authors: We agree that the abstract would benefit from these key details to allow readers to properly evaluate the claims. The full manuscript provides this information in the Methods and Experimental Setup sections, including the scale of the paper collection, the specific P threshold for the top-P% task, the exact temporal cutoffs for train/test splits, the cross-validation procedure used, and statistical tests comparing AUC differences. We will revise the abstract to concisely include summaries of these elements. revision: yes

  2. Referee: [Abstract] Abstract: The cohort-normalized top-P% future citation rank is adopted as the prediction target with no explicit verification that post-publication information (e.g., journal effects, topic popularity shifts, or early visibility signals) does not leak into the label or the temporally constrained node features/neighborhoods. A concrete test—such as reporting AUC when using only pre-publication metadata or stratifying results by field—would strengthen the interpretation that the models are predicting intrinsic impact rather than recovering early cues.

    Authors: This concern is well-taken. The temporal constraints on graph construction and feature extraction are explicitly designed to use only information available at publication time, thereby avoiding post-publication leakage. To further bolster the interpretation, we will add new analyses that report AUC using only pre-publication metadata and that stratify performance by scientific field. These results will be presented in a dedicated subsection of the revised manuscript. revision: yes

  3. Referee: [Abstract] Abstract: The statement that 'retrieved context does not consistently improve results' requires supporting quantitative evidence; a table or figure comparing AUC (with confidence intervals) for target-only versus GraphRAG prompts across all model variants would make this claim verifiable rather than qualitative.

    Authors: We agree that the claim requires explicit quantitative backing to be fully verifiable. While detailed per-variant comparisons appear in the Results section, we will add a consolidated table (or expand an existing results table) that directly reports AUC values together with confidence intervals for target-only versus GraphRAG prompts across all GPT variants. This will make the observation that retrieved neighborhoods do not consistently outperform target-only baselines immediately evident and quantifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on temporal held-out data

full rationale

The paper trains Node2Vec embeddings and GPT-based classifiers on temporally constrained citation and similarity graphs to predict cohort-normalized future citation rank. AUC values are measured on held-out test sets after temporal splits, not obtained by fitting a parameter that directly encodes the target label or by renaming an input. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the derivation. The approach is a standard supervised comparison whose outputs are independent of the inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions from graph representation learning and LLM prompting literature plus the domain-specific choice of citation rank as impact proxy; no new entities are postulated.

free parameters (1)
  • top-P% threshold
    The percentile cutoff used to define positive class is a modeling choice whose exact value is not stated in the abstract.
axioms (1)
  • domain assumption Citation and textual-similarity graphs can be constructed under strict temporal constraints that simulate cold-start conditions at publication time.
    Invoked when building the input graphs for both Node2Vec and GraphRAG experiments.

pith-pipeline@v0.9.0 · 5746 in / 1264 out tokens · 34026 ms · 2026-05-19T23:29:39.693665+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 3 internal anchors

  1. [1]

    The number of papers published yearly is shown in Figure 2

    Data processing The first part of this work started by selecting35,354academic papers from the journal ACS Applied Materials & Interfaces(American Chemical Society, ACS), spanning 11 pub- lication years from 2009 to 2020 and exhibiting a natural year-over-year growth in volume. The number of papers published yearly is shown in Figure 2. The only informati...

  2. [2]

    For both graph types we created four variations based on (i) edge direction (directed vs

    Graph Construction Once the database was organized intoY-year post-publication windows and the target metric labels were defined, we used the data available up to that point to construct two graphs representing complementary relational views: citations and semantic similarity. For both graph types we created four variations based on (i) edge direction (di...

  3. [3]

    Each graph family is further expanded by two edge direction types (directed vs

    Embedding construction After the graph construction phase, we have the following graph families: (i) the citation graph built from citation relations, and (ii) the similarity graph built by connecting top-K 9 most similar papers, producing four similarity-graph variants according toK∈ {3,5,7,9}. Each graph family is further expanded by two edge direction ...

  4. [4]

    top paper

    Impact classification In the final phase, we use the embeddings from the previous phase as inputs to a su- pervised classification model that predicts whether each paper will be a “top paper” un- der a given definition, percentile thresholdP∈ {10,20,30,40,50}, and prediction horizon Y∈ {0, . . . ,10}(when observable). Concretely, each training instance co...

  5. [5]

    We retain the 13 FIG

    Graph construction The LLM-based experiments use the graphs constructed in the previous stage; conse- quently, no additional graph-construction procedures are introduced here. We retain the 13 FIG. 3. Overview of the LLM-based GraphRAG methodology for top-paper prediction. The pro- cess reuses the citation and textual-similarity graphs constructed in the ...

  6. [6]

    Context retrieval For each sampled target paper, we extracted a local neighborhood from the graph to serve as contextual information within the prompt. We evaluated two distinct retrieval strategies: (i) random sampling from the target node’s immediate graph neighbors, and (ii) similarity- based selection, where we identified the top five most similar pap...

  7. [7]

    To achieve this, each request was structured into three distinct layers: a system prompt, a developer prompt, and a programmatically generated user prompt

    Prompt construction We employed GPT-5.5 and GPT 5.4 Nano as the underlying models, configuring the prompting protocol to function as a specialized scientific impact prediction engine rather than a general-purpose assistant. To achieve this, each request was structured into three distinct layers: a system prompt, a developer prompt, and a programmatically ...

  8. [8]

    Prediction and evaluation For each target paper, the LLM produces a single structured response centered on a probability vector for top-paper prediction across all requested horizon years, together with 16 additional auxiliary outputs included for completeness. Thus, unlike the graph-based classi- fier, which solves separate binary classification problems...

  9. [9]

    Comparison between citation and textual-similarity graphs Figure 4 presents the AUC scores for the classification of papers belonging to the top 20% of their respective cohorts for each year following publication. We evaluate two distinct graph construction strategies –Paper CitationandTextual Similarity– against two input representations for the neural c...

  10. [10]

    The most prominent finding is that concatenating textual embeddings with Node2Vec consistently outperforms the standalone Node2Vec model across all values ofK

    Sensitivity analysis of Top-K textual similarity graphs In Figure 5, we examine the textual-similarity graphs by varying the number of neighbors (K) from 3 to 9. The most prominent finding is that concatenating textual embeddings with Node2Vec consistently outperforms the standalone Node2Vec model across all values ofK. This confirms that explicitly prese...

  11. [11]

    exceptional

    Effect of quantile-based thresholds on top-paper prediction performance In Figure 6, we extend the experiment across various quantile thresholds (50th to90 th percentiles in increments of 10) used to define the positive class. While the figure displays results exclusively for directed and weighted graphs, the observed trends were consistent across other g...

  12. [12]

    Effect of neighbor retrieval strategy in GraphRAG-based prediction We employed the graph structure as a retrieval mechanism, selecting neighbors of the target paper and injecting them into the LLM prompt as contextual evidence. We evaluated multiple graph configurations by varying edge direction and edge weighting, and compared two neighborhood-selection ...

  13. [13]

    Effect of graph-retrieved context on LLM prediction performance In Figure 8, we replicate the GraphRAG setup under a simplified and controlled configu- ration, using directed and unweighted graphs with random retrieval, and compare prompts withandwithoutgraph-retrieved neighbors. The goal of this experiment is to isolate the contribution of retrieved grap...

  14. [14]

    We kept the same controlled configuration used in the previous comparison: directed and unweighted graphs, with five randomly selected neighbors in the context-augmented condition

    Cross-journal evaluation with and without graph-retrieved context To assess whether the behavior observed in the main corpus was specific to a single journal or reflected a more general property of the LLM-based prediction setup, we repeated the context-ablation experiment on three additional journals:Informetrics,PNAS, and PRL. We kept the same controlle...

  15. [15]

    Abramo, C

    G. Abramo, C. A. D’Angelo, and G. Felici. Predicting publication long-term impact through a combination of early citations and journal impact factor.Journal of Informetrics, 13(1):32–49, 2019. 36

  16. [16]

    D. W. Aksnes, L. Langfeldt, and P. Wouters. Citations, citation indicators, and research quality: An overview of basic concepts and theories.Sage Open, 9(1):2158244019829575, 2019

  17. [17]

    D. R. Amancio, M. d. G. V. Nunes, O. N. Oliveira Jr, and L. da F. Costa. Using complex networks concepts to assess approaches for citations in scientific papers.Scientometrics, 91 (3):827–842, 2012

  18. [18]

    T. Azad, I. Al Azher, S. R. Choudhury, and H. Alhoori. Predicting the scholarly impact of research papers using retrieval-augmented llms. InProceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025), pages 124–131, 2025

  19. [19]

    Beltagy, K

    I. Beltagy, K. Lo, and A. Cohan. SciBERT: A pretrained language model for scientific text. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, Nov. ...

  20. [20]

    A. C. M. Brito, F. N. Silva, and D. R. Amancio. A complex network approach to political analysis: Application to the brazilian chamber of deputies.Plos one, 15(3):e0229928, 2020

  21. [21]

    Cohan, S

    A. Cohan, S. Feldman, I. Beltagy, D. Downey, and D. Weld. SPECTER: Document-level rep- resentation learning using citation-informed transformers. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Com- putational Linguistics, pages 2270–2282, Online, July 2020. Association for Comp...

  22. [22]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding.CoRR, abs/1810.04805, 2018

  23. [23]

    node2vec: Scalable Feature Learning for Networks

    A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks.CoRR, abs/1607.00653, 2016

  24. [24]

    G. He, Z. Xue, Z. Jiang, Y. Kang, S. Zhao, and W. Lu. H2cgl: Modeling dynamics of citation network for impact prediction.Information Processing & Management, 60(6):103512, 2023

  25. [25]

    L. He, L. Bai, X. Yang, H. Du, and J. Liang. High-order graph attention network.Information Sciences, 630:222–234, 2023. ISSN 0020-0255

  26. [26]

    J. H. Kim, J. Son, H. Kim, and E. Lee. Node embedding for homophilous graphs with argew: Augmentation of random walks by graph edge weights.arXiv preprint arXiv:2308.05957, 2023. 37

  27. [27]

    Kousha and M

    K. Kousha and M. Thelwall. Factors associating with or predicting more cited or higher quality journal articles: An annual review of information science and technology (arist) paper.Journal of the Association for Information Science and Technology, 75(3):215–244, 2024

  28. [28]

    Mikolov, K

    T. Mikolov, K. Chen, G. S. Corrado, and J. Dean. Efficient estimation of word representations in vector space, 2013

  29. [29]

    Distributed Representations of Words and Phrases and their Compositionality

    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality.CoRR, abs/1310.4546, 2013

  30. [30]

    A. P. Millán, H. Sun, L. Giambagli, R. Muolo, T. Carletti, J. J. Torres, F. Radicchi, J. Kurths, and G. Bianconi. Topology shapes dynamics of higher-order networks.Nature Physics, 21(3): 353–361, 2025

  31. [31]

    New embedding models and api updates.https://openai.com/index/ new-embedding-models-and-api-updates/, Jan

    OpenAI. New embedding models and api updates.https://openai.com/index/ new-embedding-models-and-api-updates/, Jan. 2024. Accessed: 2026-01-16

  32. [32]

    Penner, R

    O. Penner, R. K. Pan, A. M. Petersen, K. Kaski, and S. Fortunato. On the predictability of future impact in science.Scientific reports, 3(1):3052, 2013

  33. [33]

    A. M. Petersen, R. K. Pan, F. Pammolli, and S. Fortunato. Methods to account for citation inflation in research evaluation.Research Policy, 48(7):1855–1865, 2019

  34. [34]

    Stegehuis, N

    C. Stegehuis, N. Litvak, and L. Waltman. Predicting the long-term citation impact of recent publications.Journal of Informetrics, 9(3):642–657, 2015. ISSN 1751-1577

  35. [35]

    Stella, T

    M. Stella, T. J. Swanson, A. S. Teixeira, B. N. Richson, Y. Li, T. T. Hills, K. T. Forbush, and D. Watson. Cognitive networks and text analysis identify anxiety as a key dimension of distress in genuine suicide notes.Big Data and Cognitive Computing, 9(7):171, 2025

  36. [36]

    Vital and D

    A. Vital and D. R. Amancio. A comparative analysis of local similarity metrics and machine learning approaches: application to link prediction in author citation networks.Scientometrics, 127(10):6011–6028, 2022

  37. [37]

    Vital, Jr., F

    A. Vital, Jr., F. N. Silva, and D. R. Amancio. Comparing random walks in graph embedding and link prediction.PLOS ONE, 19(11):1–22, 11 2024

  38. [38]

    Vital Jr, F

    A. Vital Jr, F. N. Silva, and D. R. Amancio. Recovering link-weight structure in complex networks with weight-aware random walks.arXiv preprint arXiv:2508.07489, 2025

  39. [39]

    Vital Jr, F

    A. Vital Jr, F. N. Silva, O. N. Oliveira Jr, and D. R. Amancio. Predicting citation impact of research papers using gpt and other text embeddings.Physica A: Statistical Mechanics and its Applications, 674:130789, 2025. 38

  40. [40]

    L. Waltman. A review of the literature on citation impact indicators.Journal of informetrics, 10(2):365–391, 2016

  41. [41]

    Waltman and M

    L. Waltman and M. Schreiber. On the calculation of percentile-based bibliometric indicators. Journal of the American Society for information Science and Technology, 64(2):372–379, 2013

  42. [42]

    D. Wang, C. Song, and A.-L. Barabási. Quantifying long-term scientific impact.Science, 342 (6154):127–132, 2013

  43. [43]

    X. Wu, H. Pang, Y. Fan, Y. Linghu, and Y. Luo. Probwalk: A random walk approach in weighted graph embedding.Procedia Computer Science, 183:683–689, 2021

  44. [44]

    Z. Ye, Y. Hou, R. Pan, T. Gao, and H. Wang. Are large language models able to predict highly cited papers? evidence from statistical publications.arXiv preprint arXiv:2601.13627, 2026

  45. [45]

    Zhao and X

    Q. Zhao and X. Feng. Utilizing citation network structure to predict paper citation counts: A deep learning approach.Journal of Informetrics, 16(1):101235, 2022

  46. [46]

    X. Zhou, J. Wang, J. Wang, and Q. Guan. Predicting air quality using a multi-scale spatiotem- poral graph attention network.Information Sciences, 680:121072, 2024. ISSN 0020-0255. 39 Appendix A: Prompt Templates Used in the LLM-Based Experiments This appendix presents the prompt templates used in the LLM-based prediction exper- iments. The prompting proto...

  47. [47]

    Your job is to estimate calibrated probabilities for whether a target paper will become atop paperwithin its journal at each requested horizon year

    System Prompt System Prompt You are a scientific impact prediction engine for journal articles. Your job is to estimate calibrated probabilities for whether a target paper will become atop paperwithin its journal at each requested horizon year. Output rules

  48. [48]

    No Markdown, no explanation, and no extra text

    Output valid JSON only. No Markdown, no explanation, and no extra text

  49. [49]

    response

    The JSON must have exactly one top-level key:"response". 3."response"must have exactly one key:"y_acc_vector"

  50. [50]

    response

    The required structure is: {"response":{"y_acc_vector":[...]}} 5."y_acc_vector"must contain numeric probabilities in[0,1]

  51. [51]

    Probabilities must be numbers, not strings

  52. [52]

    y_acc_vector

    The length of"y_acc_vector"must be exactly equal to<OUTPUT_SPEC><n_years>

  53. [53]

    Return only the final JSON

    Do not reveal reasoning or chain-of-thought. Return only the final JSON

  54. [54]

    Use only the information explicitly present in the XML input. 40

  55. [55]

    Do not use external facts or hidden assumptions about papers, authors, journals, venues, identifiers, files, or datasets

  56. [56]

    response

    Developer Prompt Developer Prompt Task Predict, for the target journal article, the probability that it will be atop paperby accumulated citations at each requested horizon year. Positive event The positive event is defined by<CONFIG><q_value>: •q_valueis a quantile threshold. •A paper is consideredtopif it belongs to the top(1−q_value)fraction within its...

  57. [57]

    response

    User Prompt Template The user prompt was generated dynamically from the experiment payload and serialized in XML format. It always contained a<REQUEST>root block. The<OUTPUT_SPEC>section specified the required JSON schema and the exact output vector length. The<CONFIG> section described the graph and retrieval settings. The<TARGET>section contained the me...