Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Baotian Hu; Huiyao Chen; Meishan Zhang; Min Zhang; Yinghui Li; Yi Yang

arxiv: 2506.06313 · v5 · submitted 2025-05-26 · 💻 cs.IR · cs.AI· cs.CL

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Huiyao Chen , Yi Yang , Yinghui Li , Meishan Zhang , Baotian Hu , Min Zhang This is my paper

Pith reviewed 2026-05-19 14:06 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords long document question answeringdiscourse structurerhetorical structure theoryhierarchical retrievaldiscourse parsingLLM node enhancement

0 comments

The pith

A discourse-aware framework using rhetorical structure trees improves long document question answering over flat chunking methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that converting discourse trees from rhetorical structure theory into hierarchical representations, enhanced by language models, allows retrieval systems to follow natural text organization for better answers on long documents. Existing systems rely on flat sequences or heuristics that ignore how humans use discourse cues to comprehend extended texts. The authors test this on four datasets spanning genres and languages and report consistent gains from adding the structure layer. A reader would care because accurate retrieval on lengthy sources like reports or books directly affects the reliability of question answering tools.

Core claim

The paper claims that a discourse-aware hierarchical framework for long document question answering, built on language-universal discourse parsing, LLM-enhanced discourse relation nodes, and structure-guided hierarchical retrieval, delivers consistent improvements over prior approaches across four datasets, multiple genres, and languages while showing robustness to varied document types.

What carries the argument

Rhetorical structure theory discourse trees turned into sentence-level representations with LLM-enhanced nodes, which supply the structural scaffold for hierarchical retrieval that combines discourse relations with semantic similarity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same discourse hierarchy could be reused for related tasks like long-document summarization without retraining the parser.
Accuracy of the language-universal parser sets an upper bound on how much the retrieval gains can grow if parsing quality improves.

Load-bearing premise

Reliable rhetorical structure trees can be produced for long documents and these trees add retrieval value beyond what semantic similarity alone provides.

What would settle it

An ablation study on the same four datasets that removes the discourse tree component and shows no remaining performance gain over baseline chunking methods would disprove the central claim.

Figures

Figures reproduced from arXiv: 2506.06313 by Baotian Hu, Huiyao Chen, Meishan Zhang, Min Zhang, Yinghui Li, Yi Yang.

**Figure 1.** Figure 1: Comparison of document modeling approaches for long-text retrieval. Numbers (1-6) show sentence order in original document, with similar colors indicating semantic relationships. Four approaches are compared: (a) Flat sequential modeling, (b) Bottom-up semantic clustering of RAPTOR, (c) Bisection-based adjacent grouping, and (d) Our discourse-aware DISRetrieval that preserves both semantic and discourse … view at source ↗

**Figure 2.** Figure 2: Overview of the DISRetrieval framework. The framework consists of three main steps: (1) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of bottom-up LLM enhancement in Phase 2 of discourse tree construction. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: , we compare five variants: leaf-only baseline, summary-based retrieval, all filtered-leaves, Top-K with ranking order, and our final Top-K with original order. The results reveal three key insights: (1) First, using summaries of intermediate nodes performs worse than the leaf baseline, indicating that preserving original text details is crucial. (2) Second, while using all filtered leaves shows slight imp… view at source ↗

**Figure 5.** Figure 5: Impact of discourse parser capability on [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparative analysis of distribution difference of two datasets. Figure (a) shows the [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Ablation results different values of K. The horizontal axis represents different choices of K, and the vertical axis indicates generation performance (F1-match for QASPER and accuracy for QuALITY). All question answering tasks are conducted on the UnifiedQA-3B model. gradually declines, or exhibits minor fluctuations. This consistent pattern suggests an optimal balance point where sufficient context is pro… view at source ↗

read the original abstract

Existing long-document question answering systems typically process texts as flat sequences or use heuristic chunking, which overlook the discourse structures that naturally guide human comprehension. We present a discourse-aware hierarchical framework that leverages rhetorical structure theory (RST) for long document question answering. Our approach converts discourse trees into sentence-level representations and employs LLM-enhanced node representations to bridge structural and semantic information. The framework involves three key innovations: language-universal discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval. Extensive experiments on four datasets demonstrate consistent improvements over existing approaches through the incorporation of discourse structure, across multiple genres and languages. Moreover, the proposed framework exhibits strong robustness across diverse document types and linguistic settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a hierarchical retrieval system from RST discourse trees with LLM node boosts, but the abstract gives no numbers or ablations to show the structure actually adds value over semantic methods.

read the letter

The main point is that they replace flat chunking with a pipeline that turns rhetorical structure trees into sentence representations, enhances the nodes with an LLM, and then retrieves hierarchically along the tree for long-document QA. This targets the real issue that simple chunks can break logical connections in reports, books, or legal texts. The three claimed pieces—language-universal parsing for long documents, LLM-enhanced discourse nodes, and structure-guided retrieval—are presented as the framework's contributions, and they do extend standard discourse parsing into the retrieval setting in a direct way. The framing of the problem is clear and the pipeline looks like a sensible attempt to keep discourse relations intact during retrieval. The soft spot is the missing evidence. The abstract states consistent gains across four datasets and multiple languages, yet supplies no quantitative results, baseline details, error bars, or parser accuracy numbers on long texts. Without those, it is hard to tell whether the discourse hierarchy drives the improvement or whether the LLM enhancement is doing most of the work. The worry that RST parsers degrade on documents beyond a few thousand tokens is worth checking; if node attachments are noisy, the method risks becoming an expensive semantic chunker with little extra signal. This paper is aimed at retrieval researchers who already work with long documents and want to test linguistic structure. A reader focused on practical QA systems could extract useful implementation ideas once the experiments are visible. I would send it for peer review so the full results, scaling details, and any ablations can be examined.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a discourse-aware hierarchical retrieval framework for long-document question answering. It converts documents into rhetorical structure theory (RST) trees via language-universal discourse parsing, augments node representations with LLMs to combine structural and semantic signals, and performs structure-guided hierarchical retrieval. The central claim is that this yields consistent improvements over prior chunking-based and flat-retrieval baselines across four datasets spanning multiple genres and languages.

Significance. If the empirical claims hold after proper validation, the work would offer a concrete step beyond heuristic chunking by showing that discourse hierarchy can supply retrieval signals orthogonal to pure semantic similarity. The combination of RST parsing with LLM node enhancement is a plausible direction for improving interpretability and accuracy in long-document QA.

major comments (2)

[Abstract] Abstract and §4 (Experiments): the abstract states that 'extensive experiments on four datasets demonstrate consistent improvements' yet supplies no quantitative results, baseline descriptions, error bars, or statistical significance tests. Without these data it is impossible to assess whether the reported gains are attributable to discourse structure rather than the LLM enhancements or other implementation choices.
[§3.1] §3.1 (Discourse Parsing): the framework relies on language-universal RST parsing of lengthy documents as a load-bearing component, but no parser accuracy metrics (e.g., F1 on attachment or relation labeling) or scaling behavior for documents beyond a few thousand tokens are reported. Existing RST parsers are known to suffer error propagation on long inputs; without an ablation isolating the structural signal from semantic similarity alone, the claimed orthogonality remains unverified.

minor comments (2)

[§3.2] The description of how discourse trees are converted into sentence-level representations would benefit from an explicit algorithm or pseudocode block.
[§4] Figure captions should explicitly state the number of documents and average length per dataset to allow readers to judge the long-document regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to improve clarity and completeness where the points are valid.

read point-by-point responses

Referee: [Abstract] Abstract and §4 (Experiments): the abstract states that 'extensive experiments on four datasets demonstrate consistent improvements' yet supplies no quantitative results, baseline descriptions, error bars, or statistical significance tests. Without these data it is impossible to assess whether the reported gains are attributable to discourse structure rather than the LLM enhancements or other implementation choices.

Authors: We agree that the abstract would benefit from including key quantitative results to allow immediate assessment of the claims. In the revised manuscript we will update the abstract to report the main average improvements over the strongest baselines across the four datasets, along with a brief note on the statistical significance tests already detailed in Section 4. The experiments section already contains baseline descriptions, error bars, and significance results; the abstract revision will simply surface the most salient numbers. revision: yes
Referee: [§3.1] §3.1 (Discourse Parsing): the framework relies on language-universal RST parsing of lengthy documents as a load-bearing component, but no parser accuracy metrics (e.g., F1 on attachment or relation labeling) or scaling behavior for documents beyond a few thousand tokens are reported. Existing RST parsers are known to suffer error propagation on long inputs; without an ablation isolating the structural signal from semantic similarity alone, the claimed orthogonality remains unverified.

Authors: We acknowledge the value of reporting parser-level metrics. We will add a short discussion of the language-universal parser's published attachment and relation F1 scores on standard benchmarks and note its documented behavior on documents up to several thousand tokens. On the orthogonality question, the experiments already include a direct comparison between the full discourse-guided model and a flat semantic-retrieval baseline that removes the hierarchical structure; the persistent gains in this controlled setting support that the discourse signal is additive to pure semantic similarity. We will make this ablation more prominent in the revised text. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline with external benchmarks

full rationale

The paper describes an empirical framework that converts discourse trees into representations, applies LLM enhancements, and performs structure-guided retrieval, then evaluates performance on four external datasets across genres and languages. No equations, fitted parameters, or self-citations are presented as reducing the central claims (consistent improvements via discourse structure) to inputs by construction. The derivation relies on independent experimental results rather than self-referential definitions or renamings, satisfying the criteria for a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The framework rests on the utility of RST discourse trees for long documents and the ability of LLMs to meaningfully enhance structural nodes; no explicit free parameters, new axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5661 in / 1064 out tokens · 72883 ms · 2026-05-19T14:06:03.604747+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 4 internal anchors

[1]

GQA: Training generalized multi-query transformer models from multi-head checkpoints

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, and Sumit Sanghai. GQA: Training generalized multi-query transformer models from multi-head checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 4895–4901, 2023

work page 2023
[2]

Hybrid hierarchical retrieval for open-domain question answering

Manoj Ghuhan Arivazhagan, Lan Liu, Peng Qi, Xinchi Chen, William Yang Wang, and Zhiheng Huang. Hybrid hierarchical retrieval for open-domain question answering. In Findings of the Association for Computational Linguistics: ACL 2023 , pages 10680–10689, 2023

work page 2023
[3]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004
[4]

Understanding and overcoming the chal- lenges of efficient transformer quantization

Yelysei Bondarenko, Markus Nagel, and Tijmen Blankevoort. Understanding and overcoming the chal- lenges of efficient transformer quantization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7947–7969, 2021

work page 2021
[5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems , 33:1877–1901, 2020

work page 1901
[6]

Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

Lynn Carlson, Daniel Marcu, and Mary Ellen Okurovsky. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, 2001

work page 2001
[7]

R^3: Reverse, retrieve, and rank for sarcasm generation with commonsense knowledge

Tuhin Chakrabarty, Debanjan Ghosh, Smaranda Muresan, and Nanyun Peng. R^3: Reverse, retrieve, and rank for sarcasm generation with commonsense knowledge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 7976–7986, 2020

work page 2020
[8]

Retrieval-style in-context learning for few-shot hierarchical text classification

Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, and Min Zhang. Retrieval-style in-context learning for few-shot hierarchical text classification. Transactions of the Associa- tion for Computational Linguistics , 12:1214–1231, 2024

work page 2024
[9]

A Systematic Survey of Semantic Role Labeling in the Era of Pretrained Language Models

Huiyao Chen, Meishan Zhang, Jing Li, Min Zhang, Lilja Øvrelid, Jan Hajiˇc, and Hao Fei. Semantic role labeling: A systematical survey. arXiv preprint arXiv:2502.08660, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

Palm: Scaling language modeling with pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023

work page 2023
[11]

A discourse-aware attention model for abstractive summarization of long documents

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, V olume 2 (Short Pape...

work page 2018
[12]

Smith, and Matt Gardner

Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4599–4610, 2021

work page 2021
[13]

Longnet: Scaling transformers to 1,000,000,000 tokens

Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, and Furu Wei. Longnet: Scaling transformers to 1,000,000,000 tokens. arXiv preprint arXiv:2307.02486, 2023. 10

work page arXiv 2023
[14]

Hierarchical text segmentation from multi-scale lexical cohesion

Jacob Eisenstein. Hierarchical text segmentation from multi-scale lexical cohesion. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , pages 353–361, 2009

work page 2009
[15]

A linear-time bottom-up discourse parser with constraints and post-editing

Vanessa Wei Feng and Graeme Hirst. A linear-time bottom-up discourse parser with constraints and post-editing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 511–521, 2014

work page 2014
[16]

LongT5: Efficient text-to-text transformer for long sequences

Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, and Yinfei Yang. LongT5: Efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022 , pages 724–736, 2022

work page 2022
[17]

REALM: retrieval- augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. REALM: retrieval- augmented language model pre-training. CoRR, 2020

work page 2020
[18]

Retrieval augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre-training. In International conference on machine learning , pages 3929–3938, 2020

work page 2020
[19]

Marti A. Hearst. Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64, 1997

work page 1997
[20]

Efficient long-text understanding with short-text models

Maor Ivgi, Uri Shaham, and Jonathan Berant. Efficient long-text understanding with short-text models. Transactions of the Association for Computational Linguistics , 11:284–299, 2023

work page 2023
[21]

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2007
[22]

Atlas: Few-shot learning with retrieval augmented language models

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi- Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. Atlas: Few-shot learning with retrieval augmented language models. J. Mach. Learn. Res., 24:251:1–251:43, 2023

work page 2023
[23]

Hierarchical document refinement for long-context retrieval-augmented generation

Jiajie Jin, Xiaoxi Li, Guanting Dong, Yuyao Zhang, Yutao Zhu, Yongkang Wu, Zhonghua Li, Qi Ye, and Zhicheng Dou. Hierarchical document refinement for long-context retrieval-augmented generation. arXiv preprint arXiv:2505.10413, 2025

work page arXiv 2025
[24]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 6769–6781, 2020

work page 2020
[25]

Retrieval-augmented generation for knowledge- intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks. Advances in neural information processing systems , 33:9459–9474, 2020

work page 2020
[26]

Hierarchical transformers for multi-document summarization

Yang Liu and Mirella Lapata. Hierarchical transformers for multi-document summarization. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 5070–5081, 2019

work page 2019
[27]

Dense hierarchical retrieval for open-domain question answering

Ye Liu, Kazuma Hashimoto, Yingbo Zhou, Semih Yavuz, Caiming Xiong, and Philip Yu. Dense hierarchical retrieval for open-domain question answering. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 188–200, 2021

work page 2021
[28]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[29]

Text segmentation by cross segment attention

Michal Lukasik, Boris Dadachev, Kishore Papineni, and Gonçalo Simões. Text segmentation by cross segment attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4707–4716, 2020

work page 2020
[30]

Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel Bowman. QuALITY: Question answering with long input texts, yes! In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, editors, Proceedings of the 2022 Conference of the North ...

work page 2022
[31]

Smith, and Mike Lewis

Ofir Press, Noah A. Smith, and Mike Lewis. Train short, test long: Attention with linear biases enables input length extrapolation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net, 2022. 11

work page 2022
[32]

Grounding language model with chunking-free in-context retrieval

Hongjin Qian, Zheng Liu, Kelong Mao, Yujia Zhou, and Zhicheng Dou. Grounding language model with chunking-free in-context retrieval. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 1298–1311, 2024

work page 2024
[33]

Rae, Anna Potapenko, Siddhant M

Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, and Timothy P. Lillicrap. Com- pressive transformers for long-range sequence modelling. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020

work page 2020
[34]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. RAPTOR: recursive abstractive processing for tree-organized retrieval. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 , 2024

work page 2024
[35]

Introduction to information retrieval, volume 39

Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. Introduction to information retrieval, volume 39. Cambridge University Press Cambridge, 2008

work page 2008
[36]

We need to talk about random splits

Anders Søgaard, Sebastian Ebert, Jasmijn Bastings, and Katja Filippova. We need to talk about random splits. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main V olume, pages 1823–1832, 2021

work page 2021
[37]

Capturing longer context for document-level neural machine translation: A multi-resolutional approach

Zewei Sun, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Shujian Huang, Jiajun Chen, and Lei Li. Capturing longer context for document-level neural machine translation: A multi-resolutional approach. arXiv preprint arXiv:2010.08961, 2020

work page arXiv 2010
[38]

Long range arena : A benchmark for efficient transformers

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. Long range arena : A benchmark for efficient transformers. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 , 2021

work page 2021
[39]

SimLM: Pre-training with representation bottleneck for dense passage retrieval

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers) , pages 2244–2258, 2023

work page 2023
[40]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art na...

work page 2020
[41]

RST discourse parsing with second-stage EDU- level pre-training

Nan Yu, Meishan Zhang, Guohong Fu, and Min Zhang. RST discourse parsing with second-stage EDU- level pre-training. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 4269–4280, 2022

work page 2022
[42]

Generate rather than retrieve: Large language models are strong context generators

Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023

work page 2023
[43]

Big bird: Transformers for longer sequences

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. Big bird: Transformers for longer sequences. Advances in neural information processing systems , 33:17283–17297, 2020

work page 2020
[44]

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event , volume 119 of Proceedings of Machine Learning Research, pages 11328–11339. PMLR, 2020

work page 2020
[45]

A survey of graph retrieval-augmented generation for customized large language models

Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. CoRR, 2025

work page 2025
[46]

SEER: Self-aligned evidence extraction for retrieval-augmented generation

Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, and Min Zhang. SEER: Self-aligned evidence extraction for retrieval-augmented generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages 3027–3041, 2024. 12 A Limitation and Future Work We here discuss the limitations and future work o...

work page 2024
[47]

A “shift” action moves a sentence from the queue to the stack when we need new content to process

work page
[48]

A “reduce” action combines two adjacent subtrees on top of the stack into a new subtree by identifying their discourse relationship

work page
[49]

pop root

A “pop root” action concludes the process when we have successfully built a complete tree. Each state of the system is represented as c = (σ, β), starting from c0 = ([ ] , Si) with all sentences in the queue, and ending at cf = ([Ti], [ ]) with a complete discourse tree Ti. The transition system follows a deterministic process guided by the neural scoring model:

work page
[50]

Initialize σ = [ ] and β = Si

work page
[51]

shift” action to move the next sentence from β to σ; (b) Else if β is empty, perform a “reduce

While β is not empty or |σ| > 1: (a) If |σ| < 2 and β is not empty, perform a “shift” action to move the next sentence from β to σ; (b) Else if β is empty, perform a “reduce” action to combine the top two subtrees in σ; (c) Else, use the neural scoring model to decide between a “shift” or “reduce” action based on the current state ofσ and β

work page
[52]

The scoring model considers the three topmost subtrees on the stack (s1, s2, s3) and the next sentence in the queue q1

Return the single tree Ti remaining on the stack σ. The scoring model considers the three topmost subtrees on the stack (s1, s2, s3) and the next sentence in the queue q1. This design is motivated by several factors:

work page
[53]

s1 and s2 are the immediate candidates for the next potential "reduce" action

work page
[54]

s3 provides crucial context about the recently built structure

work page
[55]

q1 helps determine if we should introduce new content via a "shift" action. For each tree node v, we compute its representation hv recursively: hv = ( PLM(si), if v is a sentence 1 |C(v)| P u∈C(v) hu, if v is a relationship node (4) where C(v) denotes the set of child nodes of v, and PLM(·) is a pre-trained language model that encodes the semantic meaning...

work page
[56]

Initializes each paragraph’s parsing state with an empty stack and sentence queue: c0 = ([ ], Si)

work page
[57]

Processes paragraphs independently, enabling parallel computation

work page
[58]

Applies transition actions iteratively until a complete tree is formed

work page
[59]

Phase 2: Document-level Tree Construction

Stores both the resulting paragraph-level tree Ti and its root representation hTi. Phase 2: Document-level Tree Construction. The second phase focuses on capturing document- level discourse structure. After obtaining all paragraph-level trees T1, T2, ..., Tn, we:

work page
[60]

For each paragraph-level tree Ti, apply bottom-up LLM-enhanced summarization: • For each non-leaf node v with children cl and cr: tv = fLLM(tl, tr), if |tl| + |tr| ⩾ τ tl ⊕ tr, otherwise (8) where tl and tr are the textual content of child nodes • Continue until reaching root node to obtain semantic unit ui

work page
[61]

Form the semantic units set U = {u1, u2, ..., un} from root representations

work page
[62]

Apply the discourse parser to these units to construct a document-level tree Tdoc using the same transition-based parsing system: Tdoc = fdiscourse(U ) (9)

work page
[63]

unanswerable

Apply bottom-up LLM-enhanced summarization to Tdoc: • For each non-leaf node v ∈ Tdoc with children cl and cr: tv = fLLM(tl ⊕ tr), if |tl ⊕ tr| ⩾ τ tl ⊕ tr, otherwise (10) • Process nodes level by level from bottom to top until reaching the root of Tdoc This step effectively captures the high-level discourse relationships between paragraphs while main- ta...

work page arXiv 2086

[1] [1]

GQA: Training generalized multi-query transformer models from multi-head checkpoints

Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, and Sumit Sanghai. GQA: Training generalized multi-query transformer models from multi-head checkpoints. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 4895–4901, 2023

work page 2023

[2] [2]

Hybrid hierarchical retrieval for open-domain question answering

Manoj Ghuhan Arivazhagan, Lan Liu, Peng Qi, Xinchi Chen, William Yang Wang, and Zhiheng Huang. Hybrid hierarchical retrieval for open-domain question answering. In Findings of the Association for Computational Linguistics: ACL 2023 , pages 10680–10689, 2023

work page 2023

[3] [3]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004

[4] [4]

Understanding and overcoming the chal- lenges of efficient transformer quantization

Yelysei Bondarenko, Markus Nagel, and Tijmen Blankevoort. Understanding and overcoming the chal- lenges of efficient transformer quantization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7947–7969, 2021

work page 2021

[5] [5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems , 33:1877–1901, 2020

work page 1901

[6] [6]

Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

Lynn Carlson, Daniel Marcu, and Mary Ellen Okurovsky. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, 2001

work page 2001

[7] [7]

R^3: Reverse, retrieve, and rank for sarcasm generation with commonsense knowledge

Tuhin Chakrabarty, Debanjan Ghosh, Smaranda Muresan, and Nanyun Peng. R^3: Reverse, retrieve, and rank for sarcasm generation with commonsense knowledge. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 7976–7986, 2020

work page 2020

[8] [8]

Retrieval-style in-context learning for few-shot hierarchical text classification

Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, and Min Zhang. Retrieval-style in-context learning for few-shot hierarchical text classification. Transactions of the Associa- tion for Computational Linguistics , 12:1214–1231, 2024

work page 2024

[9] [9]

A Systematic Survey of Semantic Role Labeling in the Era of Pretrained Language Models

Huiyao Chen, Meishan Zhang, Jing Li, Min Zhang, Lilja Øvrelid, Jan Hajiˇc, and Hao Fei. Semantic role labeling: A systematical survey. arXiv preprint arXiv:2502.08660, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

Palm: Scaling language modeling with pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023

work page 2023

[11] [11]

A discourse-aware attention model for abstractive summarization of long documents

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, V olume 2 (Short Pape...

work page 2018

[12] [12]

Smith, and Matt Gardner

Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4599–4610, 2021

work page 2021

[13] [13]

Longnet: Scaling transformers to 1,000,000,000 tokens

Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, and Furu Wei. Longnet: Scaling transformers to 1,000,000,000 tokens. arXiv preprint arXiv:2307.02486, 2023. 10

work page arXiv 2023

[14] [14]

Hierarchical text segmentation from multi-scale lexical cohesion

Jacob Eisenstein. Hierarchical text segmentation from multi-scale lexical cohesion. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , pages 353–361, 2009

work page 2009

[15] [15]

A linear-time bottom-up discourse parser with constraints and post-editing

Vanessa Wei Feng and Graeme Hirst. A linear-time bottom-up discourse parser with constraints and post-editing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 511–521, 2014

work page 2014

[16] [16]

LongT5: Efficient text-to-text transformer for long sequences

Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, and Yinfei Yang. LongT5: Efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022 , pages 724–736, 2022

work page 2022

[17] [17]

REALM: retrieval- augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. REALM: retrieval- augmented language model pre-training. CoRR, 2020

work page 2020

[18] [18]

Retrieval augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre-training. In International conference on machine learning , pages 3929–3938, 2020

work page 2020

[19] [19]

Marti A. Hearst. Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33–64, 1997

work page 1997

[20] [20]

Efficient long-text understanding with short-text models

Maor Ivgi, Uri Shaham, and Jonathan Berant. Efficient long-text understanding with short-text models. Transactions of the Association for Computational Linguistics , 11:284–299, 2023

work page 2023

[21] [21]

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2007

[22] [22]

Atlas: Few-shot learning with retrieval augmented language models

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi- Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave. Atlas: Few-shot learning with retrieval augmented language models. J. Mach. Learn. Res., 24:251:1–251:43, 2023

work page 2023

[23] [23]

Hierarchical document refinement for long-context retrieval-augmented generation

Jiajie Jin, Xiaoxi Li, Guanting Dong, Yuyao Zhang, Yutao Zhu, Yongkang Wu, Zhonghua Li, Qi Ye, and Zhicheng Dou. Hierarchical document refinement for long-context retrieval-augmented generation. arXiv preprint arXiv:2505.10413, 2025

work page arXiv 2025

[24] [24]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages 6769–6781, 2020

work page 2020

[25] [25]

Retrieval-augmented generation for knowledge- intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks. Advances in neural information processing systems , 33:9459–9474, 2020

work page 2020

[26] [26]

Hierarchical transformers for multi-document summarization

Yang Liu and Mirella Lapata. Hierarchical transformers for multi-document summarization. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages 5070–5081, 2019

work page 2019

[27] [27]

Dense hierarchical retrieval for open-domain question answering

Ye Liu, Kazuma Hashimoto, Yingbo Zhou, Semih Yavuz, Caiming Xiong, and Philip Yu. Dense hierarchical retrieval for open-domain question answering. InFindings of the Association for Computational Linguistics: EMNLP 2021, pages 188–200, 2021

work page 2021

[28] [28]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[29] [29]

Text segmentation by cross segment attention

Michal Lukasik, Boris Dadachev, Kishore Papineni, and Gonçalo Simões. Text segmentation by cross segment attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4707–4716, 2020

work page 2020

[30] [30]

Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel Bowman. QuALITY: Question answering with long input texts, yes! In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, editors, Proceedings of the 2022 Conference of the North ...

work page 2022

[31] [31]

Smith, and Mike Lewis

Ofir Press, Noah A. Smith, and Mike Lewis. Train short, test long: Attention with linear biases enables input length extrapolation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net, 2022. 11

work page 2022

[32] [32]

Grounding language model with chunking-free in-context retrieval

Hongjin Qian, Zheng Liu, Kelong Mao, Yujia Zhou, and Zhicheng Dou. Grounding language model with chunking-free in-context retrieval. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 1298–1311, 2024

work page 2024

[33] [33]

Rae, Anna Potapenko, Siddhant M

Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Chloe Hillier, and Timothy P. Lillicrap. Com- pressive transformers for long-range sequence modelling. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net, 2020

work page 2020

[34] [34]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. RAPTOR: recursive abstractive processing for tree-organized retrieval. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 , 2024

work page 2024

[35] [35]

Introduction to information retrieval, volume 39

Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. Introduction to information retrieval, volume 39. Cambridge University Press Cambridge, 2008

work page 2008

[36] [36]

We need to talk about random splits

Anders Søgaard, Sebastian Ebert, Jasmijn Bastings, and Katja Filippova. We need to talk about random splits. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main V olume, pages 1823–1832, 2021

work page 2021

[37] [37]

Capturing longer context for document-level neural machine translation: A multi-resolutional approach

Zewei Sun, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Shujian Huang, Jiajun Chen, and Lei Li. Capturing longer context for document-level neural machine translation: A multi-resolutional approach. arXiv preprint arXiv:2010.08961, 2020

work page arXiv 2010

[38] [38]

Long range arena : A benchmark for efficient transformers

Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. Long range arena : A benchmark for efficient transformers. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 , 2021

work page 2021

[39] [39]

SimLM: Pre-training with representation bottleneck for dense passage retrieval

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers) , pages 2244–2258, 2023

work page 2023

[40] [40]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. Transformers: State-of-the-art na...

work page 2020

[41] [41]

RST discourse parsing with second-stage EDU- level pre-training

Nan Yu, Meishan Zhang, Guohong Fu, and Min Zhang. RST discourse parsing with second-stage EDU- level pre-training. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 4269–4280, 2022

work page 2022

[42] [42]

Generate rather than retrieve: Large language models are strong context generators

Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023

work page 2023

[43] [43]

Big bird: Transformers for longer sequences

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, et al. Big bird: Transformers for longer sequences. Advances in neural information processing systems , 33:17283–17297, 2020

work page 2020

[44] [44]

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. PEGASUS: pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event , volume 119 of Proceedings of Machine Learning Research, pages 11328–11339. PMLR, 2020

work page 2020

[45] [45]

A survey of graph retrieval-augmented generation for customized large language models

Qinggang Zhang, Shengyuan Chen, Yuanchen Bei, Zheng Yuan, Huachi Zhou, Zijin Hong, Junnan Dong, Hao Chen, Yi Chang, and Xiao Huang. A survey of graph retrieval-augmented generation for customized large language models. CoRR, 2025

work page 2025

[46] [46]

SEER: Self-aligned evidence extraction for retrieval-augmented generation

Xinping Zhao, Dongfang Li, Yan Zhong, Boren Hu, Yibin Chen, Baotian Hu, and Min Zhang. SEER: Self-aligned evidence extraction for retrieval-augmented generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages 3027–3041, 2024. 12 A Limitation and Future Work We here discuss the limitations and future work o...

work page 2024

[47] [47]

A “shift” action moves a sentence from the queue to the stack when we need new content to process

work page

[48] [48]

A “reduce” action combines two adjacent subtrees on top of the stack into a new subtree by identifying their discourse relationship

work page

[49] [49]

pop root

A “pop root” action concludes the process when we have successfully built a complete tree. Each state of the system is represented as c = (σ, β), starting from c0 = ([ ] , Si) with all sentences in the queue, and ending at cf = ([Ti], [ ]) with a complete discourse tree Ti. The transition system follows a deterministic process guided by the neural scoring model:

work page

[50] [50]

Initialize σ = [ ] and β = Si

work page

[51] [51]

shift” action to move the next sentence from β to σ; (b) Else if β is empty, perform a “reduce

While β is not empty or |σ| > 1: (a) If |σ| < 2 and β is not empty, perform a “shift” action to move the next sentence from β to σ; (b) Else if β is empty, perform a “reduce” action to combine the top two subtrees in σ; (c) Else, use the neural scoring model to decide between a “shift” or “reduce” action based on the current state ofσ and β

work page

[52] [52]

The scoring model considers the three topmost subtrees on the stack (s1, s2, s3) and the next sentence in the queue q1

Return the single tree Ti remaining on the stack σ. The scoring model considers the three topmost subtrees on the stack (s1, s2, s3) and the next sentence in the queue q1. This design is motivated by several factors:

work page

[53] [53]

s1 and s2 are the immediate candidates for the next potential "reduce" action

work page

[54] [54]

s3 provides crucial context about the recently built structure

work page

[55] [55]

q1 helps determine if we should introduce new content via a "shift" action. For each tree node v, we compute its representation hv recursively: hv = ( PLM(si), if v is a sentence 1 |C(v)| P u∈C(v) hu, if v is a relationship node (4) where C(v) denotes the set of child nodes of v, and PLM(·) is a pre-trained language model that encodes the semantic meaning...

work page

[56] [56]

Initializes each paragraph’s parsing state with an empty stack and sentence queue: c0 = ([ ], Si)

work page

[57] [57]

Processes paragraphs independently, enabling parallel computation

work page

[58] [58]

Applies transition actions iteratively until a complete tree is formed

work page

[59] [59]

Phase 2: Document-level Tree Construction

Stores both the resulting paragraph-level tree Ti and its root representation hTi. Phase 2: Document-level Tree Construction. The second phase focuses on capturing document- level discourse structure. After obtaining all paragraph-level trees T1, T2, ..., Tn, we:

work page

[60] [60]

For each paragraph-level tree Ti, apply bottom-up LLM-enhanced summarization: • For each non-leaf node v with children cl and cr: tv = fLLM(tl, tr), if |tl| + |tr| ⩾ τ tl ⊕ tr, otherwise (8) where tl and tr are the textual content of child nodes • Continue until reaching root node to obtain semantic unit ui

work page

[61] [61]

Form the semantic units set U = {u1, u2, ..., un} from root representations

work page

[62] [62]

Apply the discourse parser to these units to construct a document-level tree Tdoc using the same transition-based parsing system: Tdoc = fdiscourse(U ) (9)

work page

[63] [63]

unanswerable

Apply bottom-up LLM-enhanced summarization to Tdoc: • For each non-leaf node v ∈ Tdoc with children cl and cr: tv = fLLM(tl ⊕ tr), if |tl ⊕ tr| ⩾ τ tl ⊕ tr, otherwise (10) • Process nodes level by level from bottom to top until reaching the root of Tdoc This step effectively captures the high-level discourse relationships between paragraphs while main- ta...

work page arXiv 2086