LinkNav: Surfacing Interconnected Information in Scientific Articles

Ani Nenkova; Jennifer Healey; Junyi Jessy Li; Sebastian Joseph

arxiv: 2606.06650 · v1 · pith:TYP2GZYKnew · submitted 2026-06-04 · 💻 cs.HC

LinkNav: Surfacing Interconnected Information in Scientific Articles

Sebastian Joseph , Jennifer Healey , Junyi Jessy Li , Ani Nenkova This is my paper

Pith reviewed 2026-06-27 23:19 UTC · model grok-4.3

classification 💻 cs.HC

keywords academic paper navigationintra-document linkslanguage model questionspassage connectionsscientific readinganswer detectionnon-adjacent segments

0 comments

The pith

LinkNav uses a language model to generate questions from passages in academic papers and locate their answers in distant sections, surfacing connections that average ten segments apart.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a reading tool that turns implicit relationships inside a single scientific article into explicit links. It prompts a language model to create questions that might occur to a reader at one location and then searches the rest of the document for passages that answer those questions. When matches are found, the system marks the pair as connected. Across the papers tested, these linked passages sit on average ten segments away, a distance at which readers would not naturally notice the relationship. The result is a navigation aid that makes the document's internal structure more visible without requiring the reader to hunt manually.

Core claim

LinkNav instructs a language model to generate questions that may arise while reading a passage and then searches for answer passages elsewhere in the document, forming intra-document connections when answers are found. An answer detection pipeline operates with high precision and yields a reasonable number of connections per document. On a dataset of academic papers, connected passages are on average ten segments away from each other.

What carries the argument

Question-generation step that produces potential reader questions from a passage, followed by an answer-detection step that identifies matching passages elsewhere in the same document.

If this is right

Readers gain access to connections between sections that are on average ten segments apart and would otherwise remain hidden.
The question-and-answer pipeline produces enough valid links to create a usable navigation layer without overwhelming the document.
Explicit links turn a linear text into a set of traversable relationships inside one paper.
The same building blocks can be applied to any document where non-adjacent passages contain related content.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support new interfaces that let readers jump between related ideas rather than scrolling sequentially.
Similar question-generation techniques might help surface connections across multiple papers or across different document types.
If the links prove reliable, standard PDF and e-reader tools might eventually embed them by default.

Load-bearing premise

The questions created by the language model and the passages flagged as answers are both accurate and useful to actual human readers.

What would settle it

A study in which readers examine the generated links and judge most of them to be either factually wrong or already obvious from the surrounding text.

Figures

Figures reproduced from arXiv: 2606.06650 by Ani Nenkova, Jennifer Healey, Junyi Jessy Li, Sebastian Joseph.

**Figure 1.** Figure 1: LinkNav pipeline and experience. (a) A paper is broken down into segments and a language model generates questions for each segment i. (b) Similarity is computed between the questions and all other segments. An LLM decides if one of the top five segments, j is the answer. (c) If so, i and j are considered connected. (d) Connections are surfaced via links and a side panel. reviewers frequently ask questions… view at source ↗

**Figure 2.** Figure 2: LinkNav surfaces links between different segments in papers. Text portions that likely triggered the [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative percent of PeerQA-Gold answers [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

We present LinkNav, an enhanced experience for reading academic papers which makes explicit connections between related but non-adjacent passages. To create the experience, we instruct a language model to generate questions that may arise while reading a passage and then search for answer passages elsewhere in the document, forming intra-document connections when answers are found. We confirm that these building blocks work well to power the experience, with an answer detection pipeline that works with high precision, resulting in a reasonable number of connections being made for a document. On a dataset of academic papers, we find that connected passages are on average ten segments away from each other, making explicit connections that a reader may have otherwise missed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LinkNav applies LLM question generation plus retrieval to link non-adjacent passages in papers and reports an average distance of ten segments, but supplies no human data on whether those links are accurate or useful.

read the letter

The paper's main move is to take a passage, prompt an LLM for questions a reader might ask, then retrieve answer passages elsewhere in the same document and surface the resulting links. On their set of academic papers the linked segments sit about ten segments apart on average. That distance number is the most concrete result.

The pipeline itself is straightforward and the authors show the answer-detection step can run at high precision with a usable number of connections per paper. That part is new enough as a specific combination for this use case, and the distance observation gives a sense of how non-local the connections tend to be.

The soft spot is the complete absence of human evaluation. The abstract claims the links make explicit connections a reader may have missed, yet there are no ratings of link quality, no inter-annotator checks, and no user study measuring whether readers find the links helpful or even notice them. The precision claim is also stated without numbers, dataset size, or baselines, so it cannot be checked from the given text.

This is aimed at HCI people who build reading interfaces or experiment with LLMs for scholarly tools. A reader looking for a working prototype idea could pull the pipeline and the distance statistic. Anyone expecting evidence that the system actually improves reading will not find it.

I would bring it to a reading group only if the group is specifically covering LLM-supported document navigation. I would not cite it in the next year. It could go to peer review as a short system paper if the full version adds implementation details and at least preliminary human feedback; on the abstract alone the central usefulness claim is unanchored.

Referee Report

2 major / 2 minor

Summary. The paper presents LinkNav, a reading interface for academic papers that uses an LLM to generate questions from a passage and then detects answer passages elsewhere in the same document to create explicit intra-document links. The central claims are that the answer-detection pipeline achieves high precision, that a reasonable number of connections are surfaced per document, and that on a dataset of papers the connected passages are on average ten segments apart, thereby surfacing links a reader might otherwise miss.

Significance. If the generated connections prove both accurate and useful to readers, the work could contribute a practical HCI technique for improving navigation and comprehension in long scientific documents. The approach of question-generation plus answer retrieval is straightforward and leverages existing LLM capabilities, but the absence of human validation leaves the claimed reader benefit unanchored.

major comments (2)

[Abstract and §4] Abstract and §4 (Evaluation): the claim of 'high precision' for the answer detection pipeline is stated without any quantitative results, dataset size, error bars, baseline comparisons, or inter-annotator agreement; this directly undermines the assertion that the pipeline 'works well to power the experience.'
[Abstract and §5] Abstract and §5 (Results/Discussion): the claim that connected passages are 'on average ten segments away' and 'making explicit connections that a reader may have otherwise missed' rests on automated metrics alone; no human evaluation, usefulness ratings, or user study is reported to support the HCI benefit or the 'may have otherwise missed' assertion.

minor comments (2)

[§4] The manuscript should clarify the exact segmentation method used to compute 'segments' and report the total number of papers and segments in the evaluation dataset.
[Figure 1] Figure captions and the system diagram would benefit from explicit labels indicating which components are LLM-driven versus rule-based.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, acknowledging where the manuscript requires strengthening and outlining specific revisions.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation): the claim of 'high precision' for the answer detection pipeline is stated without any quantitative results, dataset size, error bars, baseline comparisons, or inter-annotator agreement; this directly undermines the assertion that the pipeline 'works well to power the experience.'

Authors: We agree the current text states 'high precision' without supporting numbers, dataset details, or comparisons. We will revise §4 to report the exact precision of the answer-detection pipeline, the number of papers and segments evaluated, any variance or error analysis performed, and relevant baselines. The abstract will be updated to reference these results. revision: yes
Referee: [Abstract and §5] Abstract and §5 (Results/Discussion): the claim that connected passages are 'on average ten segments away' and 'making explicit connections that a reader may have otherwise missed' rests on automated metrics alone; no human evaluation, usefulness ratings, or user study is reported to support the HCI benefit or the 'may have otherwise missed' assertion.

Authors: The reported average distance and connection counts are derived solely from automated segment analysis. We acknowledge that this does not directly demonstrate reader benefit or that connections would otherwise be missed. We will revise the abstract and §5 to qualify these claims as automated observations, add an explicit limitations paragraph, and include future-work discussion of planned human validation studies. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical system description with no derivations or fitted predictions

full rationale

The paper presents LinkNav as a system that uses an LLM to generate questions from passages and detect answers elsewhere in the document to form intra-document links. The central empirical claim (connected passages average ten segments apart) is a direct measurement on a dataset of papers, not a model prediction derived from fitted parameters or prior self-citations. No equations, ansatzes, uniqueness theorems, or self-referential definitions appear. The pipeline is described as a sequence of LLM calls and detection steps whose outputs are evaluated for precision on held-out data; these steps do not reduce to tautologies or rename their own inputs. The absence of human evaluation noted by the skeptic is a question of external validity, not circularity in the reported chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a system description with no mathematical model, so the ledger contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5643 in / 1025 out tokens · 14843 ms · 2026-06-27T23:19:30.484598+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references

[1]

Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, and Ngai Wong

Peerqa: A scientific question an- swering dataset from peer reviews.Preprint, arXiv:2502.13668. Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, and Ngai Wong

arXiv
[2]

Preprint, arXiv:2506.07642

Treereview: A dynamic tree of questions framework for deep and efficient llm-based scientific peer review. Preprint, arXiv:2506.07642. Jeremy R. Cole, Palak Jain, Julian Martin Eisensch- los, Michael J.Q. Zhang, Eunsol Choi, and Bhuwan Dhingra

arXiv
[3]

Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S

Qlarify: Re- cursively expandable abstracts for directed infor- mation retrieval over scientific papers.Preprint, arXiv:2310.07581. Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, and Marti A. Hearst

arXiv
[4]

InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY , USA

Augmenting scientific papers with just- in-time, position-sensitive definitions of terms and symbols. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY , USA. Association for Computing Machinery. Baorong Huang, Juhua Dou, and Hai Zhao

2021
[5]

InProceed- ings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555, Online

Inquisitive question gener- ation for high level text comprehension. InProceed- ings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555, Online. Association for Computational Linguistics. Wei-Jen Ko, Cutter Dalton, Mark Simmons, Eliza Fisher, Greg Durrett, and Junyi Jessy Li

2020
[6]

InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11752–11764, Abu Dhabi, United Arab Emirates

Dis- course comprehension: A question answering frame- work to represent sentence connections. InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11752–11764, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. Yoonjoo Lee, Nedim Lipka, Zichao Wang, Ryan Rossi, Puneet Mathur, Tong Sun,...

2022
[7]

InProceedings of the Tenth In- ternational Conference on Language Resources and Evaluation (LREC 2016), Paris, France

Improving the annotation of sentence specificity. InProceedings of the Tenth In- ternational Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA). Zihao Lin, Zichao Wang, Yuanting Pan, Varun Man- junatha, Ryan Rossi, Angela Lau, Lifu Huang, and Tong Sun

2016
[8]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X

Persona-sq: A personalized sug- gested question generation framework for real-world documents.Preprint, arXiv:2412.12445. Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anas- tasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yo- ganand Chandrasekhar, Yen-Sung Che...

arXiv
[9]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X

The semantic reader project: Augmenting scholarly documents through ai-powered interactive reading interfaces.Preprint, arXiv:2303.14334. Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anas- tasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yo- ganand Chandrasekh...

arXiv
[10]

InProceedings of the 2004 Con- ference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain

TextRank: Bring- ing order into text. InProceedings of the 2004 Con- ference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain. Asso- ciation for Computational Linguistics. Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, and Amir Globerson

2004
[11]

Preprint, arXiv:2401.16475

Infolossqa: Characterizing and recovering information loss in text simplification. Preprint, arXiv:2401.16475. Siyuan Wang, Zhongyu Wei, Zhihao Fan, Yang Liu, and Xuanjing Huang

arXiv
[12]

InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 19969–19987, Miami, Florida, USA

Which ques- tions should I answer? salience prediction of inquis- itive questions. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 19969–19987, Miami, Florida, USA. Association for Computational Linguistics. Yating Wu, Ritika Mangla, Greg Durrett, and Junyi Jessy Li

2024
[13]

follow_up_questions

QUDeval: The evaluation of questions under discussion discourse parsing. InPro- ceedings of the 2023 Conference on Empirical Meth- ods in Natural Language Processing, pages 5344– 5363, Singapore. Association for Computational Lin- guistics. 8 A Question Generation Prompt Prompt A System: You are logical, intelligent, insightful, precise, and can understan...

2023

[1] [1]

Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, and Ngai Wong

Peerqa: A scientific question an- swering dataset from peer reviews.Preprint, arXiv:2502.13668. Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, and Ngai Wong

arXiv

[2] [2]

Preprint, arXiv:2506.07642

Treereview: A dynamic tree of questions framework for deep and efficient llm-based scientific peer review. Preprint, arXiv:2506.07642. Jeremy R. Cole, Palak Jain, Julian Martin Eisensch- los, Michael J.Q. Zhang, Eunsol Choi, and Bhuwan Dhingra

arXiv

[3] [3]

Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S

Qlarify: Re- cursively expandable abstracts for directed infor- mation retrieval over scientific papers.Preprint, arXiv:2310.07581. Andrew Head, Kyle Lo, Dongyeop Kang, Raymond Fok, Sam Skjonsberg, Daniel S. Weld, and Marti A. Hearst

arXiv

[4] [4]

InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY , USA

Augmenting scientific papers with just- in-time, position-sensitive definitions of terms and symbols. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY , USA. Association for Computing Machinery. Baorong Huang, Juhua Dou, and Hai Zhao

2021

[5] [5]

InProceed- ings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555, Online

Inquisitive question gener- ation for high level text comprehension. InProceed- ings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555, Online. Association for Computational Linguistics. Wei-Jen Ko, Cutter Dalton, Mark Simmons, Eliza Fisher, Greg Durrett, and Junyi Jessy Li

2020

[6] [6]

InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11752–11764, Abu Dhabi, United Arab Emirates

Dis- course comprehension: A question answering frame- work to represent sentence connections. InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11752–11764, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. Yoonjoo Lee, Nedim Lipka, Zichao Wang, Ryan Rossi, Puneet Mathur, Tong Sun,...

2022

[7] [7]

InProceedings of the Tenth In- ternational Conference on Language Resources and Evaluation (LREC 2016), Paris, France

Improving the annotation of sentence specificity. InProceedings of the Tenth In- ternational Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA). Zihao Lin, Zichao Wang, Yuanting Pan, Varun Man- junatha, Ryan Rossi, Angela Lau, Lifu Huang, and Tong Sun

2016

[8] [8]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X

Persona-sq: A personalized sug- gested question generation framework for real-world documents.Preprint, arXiv:2412.12445. Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anas- tasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yo- ganand Chandrasekhar, Yen-Sung Che...

arXiv

[9] [9]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X

The semantic reader project: Augmenting scholarly documents through ai-powered interactive reading interfaces.Preprint, arXiv:2303.14334. Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anas- tasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yo- ganand Chandrasekh...

arXiv

[10] [10]

InProceedings of the 2004 Con- ference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain

TextRank: Bring- ing order into text. InProceedings of the 2004 Con- ference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain. Asso- ciation for Computational Linguistics. Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, and Amir Globerson

2004

[11] [11]

Preprint, arXiv:2401.16475

Infolossqa: Characterizing and recovering information loss in text simplification. Preprint, arXiv:2401.16475. Siyuan Wang, Zhongyu Wei, Zhihao Fan, Yang Liu, and Xuanjing Huang

arXiv

[12] [12]

InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 19969–19987, Miami, Florida, USA

Which ques- tions should I answer? salience prediction of inquis- itive questions. InProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Pro- cessing, pages 19969–19987, Miami, Florida, USA. Association for Computational Linguistics. Yating Wu, Ritika Mangla, Greg Durrett, and Junyi Jessy Li

2024

[13] [13]

follow_up_questions

QUDeval: The evaluation of questions under discussion discourse parsing. InPro- ceedings of the 2023 Conference on Empirical Meth- ods in Natural Language Processing, pages 5344– 5363, Singapore. Association for Computational Lin- guistics. 8 A Question Generation Prompt Prompt A System: You are logical, intelligent, insightful, precise, and can understan...

2023