arxiv: 2605.12518 · v1 · submitted 2026-04-03 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models

Liancheng Zhang , Xiaoxi Li , Zhicheng Dou

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:29 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords timeline summarizationlarge reasoning modelsevent extractiontemporal reasoningnews processinginformation retrievaltwo-stage frameworkopen-domain TLS

0 comments

The pith

TimelineReasoner uses large reasoning models to actively track events globally and fill gaps through targeted retrieval, producing more accurate and coherent timelines than passive LLM approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TimelineReasoner as a framework that treats timeline summarization as an active reasoning task rather than static text generation. It divides the work into a Global Cognition stage that maintains and updates a broad event memory and a Detail Exploration stage that spots missing information and pulls in specific documents to complete the timeline. Three supporting mechanisms—an Event Scraper, a Timeline Updater, and a Supervisor—handle retrieval, refinement, and consistency checks. Experiments on open-domain datasets show clear gains in accuracy, coverage, and coherence compared with earlier LLM-based systems, while closed-domain results match or exceed prior state-of-the-art methods.

Core claim

TimelineReasoner shifts timeline summarization from static generation to an active, reasoning-driven process using large reasoning models. The framework consists of a Global Cognition stage that tracks events at a macroscopic level and continuously updates a global event memory, and a Detail Exploration stage that identifies informational gaps and refines the timeline via targeted document retrieval. It incorporates an Event Scraper for retrieving temporal event descriptions, a Timeline Updater for refining the timeline, and a Supervisor for detecting gaps and guiding retrieval. Experimental results demonstrate that this approach significantly outperforms existing LLM-based TLS methods on开放域

What carries the argument

The two-stage reasoning process of Global Cognition for macroscopic event tracking with continuous memory updates and Detail Exploration for gap identification plus targeted retrieval, supported by the Event Scraper, Timeline Updater, and Supervisor mechanisms.

If this is right

Timelines produced from news will contain fewer missing events and fewer ordering inconsistencies.
The method enables iterative acquisition of evidence and explicit validation of temporal consistency during construction.
Performance gains hold on open-domain settings and remain competitive on closed-domain settings.
The framework demonstrates that large reasoning models can move beyond passive generation to structured, memory-augmented information extraction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-stage structure could be adapted to build timelines from legal case files or historical archives where events are scattered across documents.
Real-time news streams could feed directly into the Global Cognition stage for continuously updated public timelines.
If the supervisor reliably detects gaps, the approach may reduce the need for exhaustive initial retrieval and lower overall token cost.

Load-bearing premise

The specialized mechanisms of Event Scraper, Timeline Updater, and Supervisor can be implemented reliably on top of existing large reasoning models without introducing new hallucinations or retrieval errors that cancel out the reported gains.

What would settle it

A controlled experiment on the same open-domain TLS datasets in which TimelineReasoner produces timelines with measurably lower accuracy, coverage, or coherence than a strong baseline LLM prompt would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.12518 by Liancheng Zhang, Xiaoxi Li, Zhicheng Dou.

**Figure 1.** Figure 1: Comparison of methods using LLMs for TLS. (a) A method that uses LLMs to generate event nodes and constructs an [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Framework of TimelineReasoner. It contains two stages: Global Cognition searches for the news query, uses an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

The proliferation of online news poses a challenge to extracting structured timelines from unstructured content. While recent studies have shown that Large Language Models (LLMs) can assist Timeline Summarization (TLS), these approaches primarily treat models as passive generators. The emergence of Large Reasoning Models (LRMs) presents an opportunity to reason over events actively, enabling iterative evidence acquisition, the detection of missing events, and the validation of temporal consistency. To systematically leverage the reasoning capabilities of LRMs, we propose TimelineReasoner, a novel framework that shifts TLS from static generation to an active, reasoning-driven process. Unlike prior work, TimelineReasoner adopts a two-stage framework: Global Cognition, which tracks events at a macroscopic level and continuously updates a global event memory, and Detail Exploration, which identifies informational gaps and refines the timeline via targeted document retrieval. To support this, TimelineReasoner incorporates several specialized mechanisms, including an Event Scraper for retrieving temporal event descriptions, a Timeline Updater for refining the timeline, and a Supervisor for detecting gaps in the timeline and guiding retrieval. Experimental results on open-domain TLS datasets demonstrate that TimelineReasoner significantly outperforms existing LLM-based TLS methods in terms of timeline accuracy, coverage, and coherence. On closed-domain TLS datasets, our method performs on par with or exceeds state-of-the-art approaches. This work not only pushes the boundaries of TLS but also highlights the broader potential of LRM-based reasoning frameworks for timeline summarization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TimelineReasoner gives a concrete two-stage LRM framework for active timeline construction, but the outperformance claims need ablations and controls to hold up.

read the letter

The main takeaway is that this paper lays out TimelineReasoner as a shift from static LLM generation to an iterative reasoning process for timeline summarization. It uses a Global Cognition stage to track events at a high level and update a shared memory, then a Detail Exploration stage to find gaps and pull targeted documents. The three modules—Event Scraper for temporal descriptions, Timeline Updater for refinements, and Supervisor for gap detection—give the approach a distinct structure that earlier LLM-based TLS work does not spell out in the same way. That engineering framing is the clearest addition and it makes the active-reasoning idea more operational than prior passive setups. The abstract also notes solid results on closed-domain data and stronger gains on open-domain sets for accuracy, coverage, and coherence, which suggests the loop can be practical for news-style tasks. The soft spot is the missing experimental detail. The central claim of significant outperformance rests on baselines, metrics, and statistical checks that the abstract does not show, and there is no breakdown separating the reasoning stages from the retrieval components. If the gains come mostly from better document fetching rather than the LRM-specific iteration, the headline move to active reasoning loses force. The stress-test concern about retrieval driving results rather than the two-stage loop looks worth checking against the full runs. This is aimed at NLP researchers who work on temporal extraction or summarization from streaming news. Someone building systems for information monitoring could borrow the module design even if they adapt the evaluation. I would send it to peer review because the framework is specific enough to test and the active-reasoning direction is worth referee scrutiny, though the authors will need to add ablations and full result tables to make the claims stick.

Referee Report

2 major / 1 minor

Summary. The paper proposes TimelineReasoner, a framework that leverages Large Reasoning Models (LRMs) to advance timeline summarization (TLS) by shifting from passive LLM generation to an active, iterative reasoning process. It introduces a two-stage architecture consisting of Global Cognition (macroscopic event tracking and global memory updates) and Detail Exploration (gap detection and targeted retrieval), supported by specialized modules including an Event Scraper, Timeline Updater, and Supervisor. Experiments on open-domain TLS datasets are reported to show significant gains in timeline accuracy, coverage, and coherence over prior LLM-based methods, with competitive or superior performance on closed-domain datasets.

Significance. If the empirical claims hold under rigorous controls, the work could meaningfully advance TLS by demonstrating how LRM reasoning enables dynamic evidence acquisition and consistency validation, moving beyond static prompting. This may have broader implications for applying active reasoning loops to other structured extraction tasks in news and knowledge summarization.

major comments (2)

[Abstract / Experimental Results] Abstract and Experimental Results section: the central claim of significant outperformance on open-domain TLS datasets is presented without any description of the baselines employed, the precise definitions or implementations of the accuracy/coverage/coherence metrics, statistical significance testing, or experimental controls (e.g., retrieval-only ablations). This absence leaves the headline result unsupported by visible evidence and prevents assessment of whether gains derive from the LRM-specific two-stage loop or from ancillary retrieval components.
[Framework Description] Framework Description (Global Cognition + Detail Exploration): no quantitative breakdown is supplied for the reliability of the added mechanisms, such as precision/recall of the Event Scraper on temporal event extraction, gap-detection accuracy of the Supervisor, or error propagation rates through the Timeline Updater. Without such diagnostics, it remains unclear whether the iterative reasoning reduces hallucinations and retrieval noise or amplifies them relative to simpler LLM baselines.

minor comments (1)

[Abstract] The abstract would be strengthened by naming the specific open-domain and closed-domain datasets used and by reporting at least one key quantitative delta (e.g., absolute improvement in a primary metric).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important areas for strengthening the presentation of our experimental results and framework analysis. We agree that greater transparency on baselines, metrics, controls, and component reliability is needed to fully support the claims. We will revise the manuscript to incorporate these elements.

read point-by-point responses

Referee: [Abstract / Experimental Results] Abstract and Experimental Results section: the central claim of significant outperformance on open-domain TLS datasets is presented without any description of the baselines employed, the precise definitions or implementations of the accuracy/coverage/coherence metrics, statistical significance testing, or experimental controls (e.g., retrieval-only ablations). This absence leaves the headline result unsupported by visible evidence and prevents assessment of whether gains derive from the LRM-specific two-stage loop or from ancillary retrieval components.

Authors: We acknowledge that the current version does not sufficiently detail these elements in the abstract and results summary. The full experimental setup section describes the baselines as prior LLM-based TLS approaches (e.g., static prompting methods from recent works on news timeline extraction). Metrics follow standard TLS evaluation protocols: accuracy assesses factual correctness and temporal ordering of events, coverage measures the proportion of key events captured, and coherence evaluates logical consistency and readability. In the revision, we will explicitly define and implement these metrics, report statistical significance via paired t-tests or bootstrap resampling, and add ablation studies including retrieval-only variants to isolate the contribution of the Global Cognition + Detail Exploration reasoning loop. This will clarify that performance gains stem from the LRM-driven iterative process rather than retrieval alone. revision: yes
Referee: [Framework Description] Framework Description (Global Cognition + Detail Exploration): no quantitative breakdown is supplied for the reliability of the added mechanisms, such as precision/recall of the Event Scraper on temporal event extraction, gap-detection accuracy of the Supervisor, or error propagation rates through the Timeline Updater. Without such diagnostics, it remains unclear whether the iterative reasoning reduces hallucinations and retrieval noise or amplifies them relative to simpler LLM baselines.

Authors: We agree that quantitative diagnostics on component reliability would strengthen the framework analysis and help demonstrate the value of the iterative reasoning. In the revised manuscript, we will add targeted evaluations: precision/recall for the Event Scraper on temporal event extraction using a held-out annotated sample; accuracy metrics for the Supervisor's gap detection; and an error propagation study tracing how mistakes in one module affect downstream timeline quality. These will be compared against simpler LLM baselines to show that the two-stage loop reduces hallucinations and noise rather than amplifying them. revision: yes

Circularity Check

0 steps flagged

No circularity: independent engineering framework on pre-trained LRMs

full rationale

The paper proposes TimelineReasoner as a two-stage engineering framework (Global Cognition for macroscopic event tracking and Detail Exploration for gap-filling via retrieval) built directly on existing Large Reasoning Models, with auxiliary modules (Event Scraper, Timeline Updater, Supervisor) described as implementation components rather than derived quantities. No equations, fitted parameters, or mathematical derivations appear in the provided text; claims rest on experimental comparisons to prior LLM-based TLS methods. No self-definitional reductions, fitted-input predictions, load-bearing self-citations, uniqueness theorems, or ansatz smuggling are present. The contribution is self-contained as an applied system design whose performance is evaluated externally on open- and closed-domain datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper rests on the unstated assumption that current large reasoning models possess sufficient chain-of-thought and retrieval-augmented capabilities to support the proposed active reasoning loop; no new mathematical axioms or physical entities are introduced.

pith-pipeline@v0.9.0 · 5560 in / 1105 out tokens · 32159 ms · 2026-05-14T21:29:55.639080+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

two-stage framework: Global Cognition... Detail Exploration... Event Scraper... Timeline Updater... Supervisor
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

iterative reasoning and targeted information retrieval... dynamic timeline memory

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 12 internal anchors

[1]

James Allan, Rahul Gupta, and Vikas Khandelwal. 2001. Temporal Summaries of News Topics. InSIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September SIGIR ’26, July 20–24, 2026, Naarm, Australia Zhang et al. 9-13, 2001, New Orleans, Louisiana, USA, W. Bruce Croft, David J....

work page doi:10.1145/383952.383954 2001
[2]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309 2023
[3]

Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. 2025. Towards Reason- ing Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Mod- els.CoRRabs/2503.09567 (2025). arXiv:2503.09567 doi:10.48550/ARXIV.2503.09567

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.09567 2025
[4]

Xiuying Chen, Zhangming Chan, Shen Gao, Meng-Hsuan Yu, Dongyan Zhao, and Rui Yan. 2019. Learning towards Abstractive Timeline Summarization. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intel- ligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 4939–4945. doi:10.24963/IJCAI.2019/686

work page doi:10.24963/ijcai.2019/686 2019
[5]

Xiuying Chen, Mingzhe Li, Shen Gao, Zhangming Chan, Dongyan Zhao, Xin Gao, Xiangliang Zhang, and Rui Yan. 2023. Follow the Timeline! Generating an Abstractive and Extractive Timeline Summary in Chronological Order.ACM Trans. Inf. Syst.41, 1 (2023), 9:1–9:30. doi:10.1145/3517221

work page doi:10.1145/3517221 2023
[6]

Zhipeng Chen, Yingqian Min, Beichen Zhang, Jie Chen, Jinhao Jiang, Daixuan Cheng, Wayne Xin Zhao, Zheng Liu, Xu Miao, Yang Lu, Lei Fang, Zhongyuan Wang, and Ji-Rong Wen. 2025. An Empirical Study on Eliciting and Improving R1-like Reasoning Models.CoRRabs/2503.04548 (2025). arXiv:2503.04548 doi:10. 48550/ARXIV.2503.04548

work page arXiv 2025
[7]

Hai Leong Chieu and Yoong Keok Lee. 2004. Query based event extraction along a timeline. InSIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, July 25-29, 2004, Mark Sanderson, Kalervo Järvelin, James Allan, and Peter Bruza (Eds.). ACM, 425–432. doi:10.1145/1008...

work page doi:10.1145/1008992.1009065 2004
[8]

DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.CoRRabs/2501.12948 (2025). arXiv:2501.12948 doi:10.48550/ARXIV.2501.12948

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[9]

Yijun Duan, Adam Jatowt, and Masatoshi Yoshikawa. 2020. Comparative Time- line Summarization via Dynamic Affinity-Preserving Random Walk. InECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Arti...

work page doi:10.3233/faia200292 2020
[10]

Mohamed Amine Ferrag, Norbert Tihanyi, and Mérouane Debbah. 2025. Reason- ing Beyond Limits: Advances and Open Problems for LLMs.CoRRabs/2503.22732 (2025). arXiv:2503.22732 doi:10.48550/ARXIV.2503.22732

work page doi:10.48550/arxiv.2503.22732 2025
[11]

Demian Gholipour Ghalandari and Georgiana Ifrim. 2020. Examining the State- of-the-Art in News Timeline Summarization. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguist...

work page doi:10.18653/v1/2020 2020
[12]

Clinton Gormley and Zachary J. Tong. 2015. Elasticsearch: The Definitive Guide. https://api.semanticscholar.org/CorpusID:62964734

work page 2015
[13]

Scaling Laws for Autoregressive Generative Modeling

Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, and Sam McCandlish. 2020. Scaling Laws for Au- toregressive Generative Modeling.CoRRabs/2010.14701...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[14]

Jingcheng Hu, Yinmin Zhang, Qi Han, Daxin Jiang, Xiangyu Zhang, and Heung- Yeung Shum. 2025. Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model.CoRRabs/2503.24290 (2025). arXiv:2503.24290 doi:10.48550/ARXIV.2503.24290

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.24290 2025
[15]

Qisheng Hu, Geonsik Moon, and Hwee Tou Ng. 2024. From Moments to Mile- stones: Incremental Timeline Summarization Leveraging Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar...

work page doi:10.18653/v1/2024.acl-long.390 2024
[16]

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex If- timie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich, Andre...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.16720 2024
[17]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Informa- tion Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. A...

work page 2022
[18]

Mojtaba Komeili, Kurt Shuster, and Jason Weston. 2022. Internet-Augmented Dialogue Generation. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguist...

work page doi:10.18653/v1/ 2022
[19]

Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Gang Fu, Haiyang Shen, Jiayi...

work page internal anchor Pith review doi:10.48550/arxiv.2510.24701 2025
[20]

Jiwei Li and Sujian Li. 2013. Evolutionary Hierarchical Dirichlet Process for Timeline Summarization. InProceedings of the 51st Annual Meeting of the As- sociation for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bul- garia, Volume 2: Short Papers. The Association for Computer Linguistics, 556–560. https://aclanthology.org/P13-2099/

work page 2013
[21]

Manling Li, Tengfei Ma, Mo Yu, Lingfei Wu, Tian Gao, Heng Ji, and Kathleen R. McKeown. 2021. Timeline Summarization based on Event Graph Compression via Time-Aware Optimal Transport. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Mar...

work page doi:10.18653/v1/2021.emnlp-main.519 2021
[22]

Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-o1: Agentic Search-Enhanced Large Reasoning Models.CoRRabs/2501.05366 (2025). arXiv:2501.05366 doi:10.48550/ ARXIV.2501.05366

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, and Zhicheng Dou. 2025. WebThinker: Empowering Large Rea- soning Models with Deep Research Capability.CoRRabs/2504.21776 (2025). arXiv:2504.21776 doi:10.48550/ARXIV.2504.21776

work page doi:10.48550/arxiv.2504.21776 2025
[24]

Zijian Li, Xin Guan, Bo Zhang, Shen Huang, Houquan Zhou, Shaopeng Lai, Ming Yan, Yong Jiang, Pengjun Xie, Fei Huang, Jun Zhang, and Jingren Zhou

work page
[25]

arXiv:2509.13312 doi:10.48550/ARXIV.2509.13312

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research.CoRRabs/2509.13312 (2025). arXiv:2509.13312 doi:10.48550/ARXIV.2509.13312

work page doi:10.48550/arxiv.2509.13312 2025
[26]

Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, and Cheng-Lin Liu. 2025. From System 1 to System 2: A Survey of Reasoning Large Language Models. CoRRabs/2502.17419 (2025). arXiv:2502.17419 doi:10.48550/ARXIV...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.17419 2025
[27]

Sebastian Martschat and Katja Markert. 2017. Improving ROUGE for Timeline Summarization. InProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 2: Short Papers, Mirella Lapata, Phil Blunsom, and Alexander Koller (Eds.). Association for Computational...

work page doi:10.18653/v1/e17- 2017
[28]

Sebastian Martschat and Katja Markert. 2018. A Temporally Sensitive Sub- modularity Framework for Timeline Summarization. InProceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium, October 31 - November 1, 2018, Anna Korhonen and Ivan Titov (Eds.). Association for Computational Linguistics, 230–240. doi...

work page doi:10.18653/v1/k18-1023 2018
[29]

Yingqian Min, Zhipeng Chen, Jinhao Jiang, Jie Chen, Jia Deng, Yiwen Hu, Yiru Tang, Jiapeng Wang, Xiaoxue Cheng, Huatong Song, Wayne Xin Zhao, Zheng Liu, Zhongyuan Wang, and Ji-Rong Wen. 2024. Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems.CoRRabs/2412.09413 (2024). arXiv:2412.09413 doi:10.48550/ARXIV.2412.09413

work page doi:10.48550/arxiv.2412.09413 2024
[30]

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2021. WebGPT: Browser- assisted question-answering with human feedback.CoRRabs/2112....

work page internal anchor Pith review Pith/arXiv arXiv 2021
[31]

Kiem-Hieu Nguyen, Xavier Tannier, and Véronique Moriceau. 2014. Ranking Multidocument Event Descriptions for Building Thematic Timelines. InCOLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23-29, 2014, Dublin, Ireland, Jan Hajic and Junichi Tsujii (Eds.). ACL, 1208–1217. https...

work page 2014
[32]

OpenAI. 2023. GPT-4 Technical Report.CoRRabs/2303.08774 (2023). arXiv:2303.08774 doi:10.48550/ARXIV.2303.08774

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
[33]

Julius Steen and Katja Markert. 2019. Abstractive Timeline Summarization. In Proceedings of the 2nd Workshop on New Frontiers in Summarization, Lu Wang, Jackie Chi Kit Cheung, Giuseppe Carenini, and Fei Liu (Eds.). Association for Computational Linguistics, Hong Kong, China, 21–31. doi:10.18653/v1/D19-5403

work page doi:10.18653/v1/d19-5403 2019
[34]

Xinyu Tang, Xiaolei Wang, Zhihao Lv, Yingqian Min, Xin Zhao, Binbin Hu, Ziqi Liu, and Zhiqiang Zhang. 2025. Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vien...

work page 2025
[35]

Qwen Team. 2025. QwQ-32B: Embracing the Power of Reinforcement Learning. https://qwenlm.github.io/blog/qwq-32b/

work page 2025
[36]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. InAdvances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA,...

work page 2022
[37]

Weiqi Wu, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, and Hai Zhao

work page
[38]

InFindings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025, Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.)

Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization. InFindings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025, Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.). Association for Computational Linguistics, 4385–4398. doi:10.18653/V1/2025.FINDINGS-...

work page doi:10.18653/v1/2025.findings-naacl.248 2025
[39]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2024
[40]

Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. 2025. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. InThe Thirteenth International Conference on Learning Rep- resentations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https: //openreview.net/forum?id=zpDGwcmMV4

work page 2025
[41]

Jingyi You, Dongyuan Li, Hidetaka Kamigaito, Kotaro Funakoshi, and Man- abu Okumura. 2022. Joint Learning-based Heterogeneous Graph Attention Network for Timeline Summarization. InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, NAACL 2022, Seattle, W A, Unite...

work page 2022
[42]

doi:10.18653/V1/2022.NAACL-MAIN.301

work page doi:10.18653/v1/2022.naacl-main.301 2022
[43]

Yi Yu, Adam Jatowt, Antoine Doucet, Kazunari Sugiyama, and Masatoshi Yoshikawa. 2021. Multi-TimeLine Summarization (MTLS): Improving Time- line Summarization by Generating Multiple Summaries. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,...

work page doi:10.18653/v1/2021.acl-long.32 2021
[44]

Wayne Xin Zhao, Yanwei Guo, Rui Yan, Yulan He, and Xiaoming Li. 2013. Timeline generation with social attention. InThe 36th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’13, Dublin, Ireland - July 28 - August 01, 2013, Gareth J. F. Jones, Paraic Sheridan, Diane Kelly, Maarten de Rijke, and Tetsuya Sakai (E...

work page doi:10.1145/2484028.2484103 2013