Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

Alvaro J. L\'opez-L\'opez; Jos\'e Portela; Roberto Mart\'inez-Cruz

arxiv: 2606.10716 · v2 · pith:GJTTDEFDnew · submitted 2026-06-09 · 💻 cs.CL · cs.AI

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

Roberto Mart\'inez-Cruz , Alvaro J. L\'opez-L\'opez , Jos\'e Portela This is my paper

Pith reviewed 2026-06-27 13:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords keyphrase extractionlong documentsattention expansionpre-trained language modelscontextualized embeddingsinformation fusionnatural language processing

0 comments

The pith

An attention expansion mechanism augments PLM token representations with out-of-context embeddings to improve keyphrase extraction from long documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that its attention expansion mechanism improves keyphrase extraction from long documents by fusing semantic information from out-of-context chunks into pre-trained language model representations. It achieves this fusion using pre-trained word embeddings and attention, without needing full-document processing or costly long-context models. A sympathetic reader would care because salient keyphrase evidence is often scattered across distant sections that standard models cannot see together. If correct, the mechanism supplies complementary signals that raise performance even on models already built for longer contexts or specific domains.

Core claim

The attention expansion mechanism augments PLM token representations with information from surrounding out-of-context chunks using pre-trained word embeddings. This expands the effective contextual scope of PLM-based KPE models without requiring full-document attention or expensive LLM-based inference. Experimental results demonstrate that attention expansion consistently enhances KPE performance across all evaluation settings, outperforming state-of-the-art models and yielding notable improvements in F1 score. The improvements extend to domain-specific, task-specialized, and native long-context models, showing that the proposed mechanism provides complementary information rather than merely

What carries the argument

The attention expansion mechanism that fuses pre-trained word embeddings from out-of-context chunks into PLM token representations via attention.

If this is right

It raises F1 scores on scientific and news domain corpora using five different PLM backbones.
It delivers gains under two training regimes and works with general-purpose, scientific, task-specific, and long-context encoders.
It supplies complementary information rather than only fixing limited context length.
It outperforms prior state-of-the-art KPE models while remaining computationally lighter than full long-context LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fusion idea could be tested on other long-text tasks such as summarization or relation extraction.
It points to a general pattern where embeddings from shorter, cheaper models can be combined with stronger contextual ones without retraining either.
Scaling the number of out-of-context chunks or choosing them by relevance might further increase the observed gains.

Load-bearing premise

Pre-trained word embeddings from out-of-context chunks supply complementary semantic signals that can be fused into PLM representations without introducing noise or requiring task-specific retraining.

What would settle it

An experiment in which adding the attention expansion mechanism produces no F1 improvement or a decrease on the five benchmark corpora across the tested PLM backbones and training regimes.

Figures

Figures reproduced from arXiv: 2606.10716 by Alvaro J. L\'opez-L\'opez, Jos\'e Portela, Roberto Mart\'inez-Cruz.

**Figure 2.** Figure 2: Schematic depiction of the attention expansion mechanism. The PLM contextualises a single chunk, while [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Pre-trained language models (PLMs) have achieved strong performance in keyphrase extraction (KPE), largely due to their ability to generate rich contextualized representations. However, long-document KPE remains challenging because salient keyphrase evidence may be scattered across distant document sections that cannot be jointly captured within the limited context window of most PLMs. Although long-context large language models (LLMs) can process broader textual contexts, their computational cost limits their practicality for efficient and high-throughput KPE. To overcome this limitation, we propose an attention expansion mechanism that augments PLM token representations with information from surrounding out-of-context chunks using pre-trained word embeddings. The proposed mechanism expands the effective contextual scope of PLM-based KPE models without requiring full-document attention or expensive LLM-based inference. We evaluate our approach across five PLM backbones, including general-purpose, scientific, task-specific, and long-context encoders, using two training regimes and five benchmark corpora from scientific and news domains. Experimental results demonstrate that attention expansion consistently enhances KPE performance across all evaluation settings, outperforming state-of-the-art models and yielding notable improvements in F1 score. The improvements extend to domain-specific, task-specialized, and native long-context models, showing that the proposed mechanism provides complementary information rather than merely compensating for limited input length. These results establish attention expansion as an efficient and effective strategy for long-document KPE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Attention expansion is a practical augmentation for long-document KPE that reports gains across backbones, but the evidence it adds value beyond length compensation remains unconvincing without clearer baseline details.

read the letter

The paper's core idea is to fuse static word embeddings from out-of-context document chunks into PLM token representations via an attention expansion step. This lets standard encoders pull in distant signals for keyphrase extraction without running full long-context attention or expensive LLMs.

It does a few things right. The evaluation covers five backbones (general, scientific, task-specific, and long-context models), two training regimes, and five corpora from scientific and news domains. Running the same augmentation on native long-context encoders like Longformer is a reasonable check, and the abstract states consistent F1 lifts in all cases.

The main soft spot is the stress-test concern. The claim that the method supplies complementary information rather than just fixing truncation only holds if the long-context baselines actually processed full documents. If those models were also fed truncated inputs, the reported gains on them are consistent with simple length extension and do not demonstrate the stronger complementarity argument. The abstract gives no numbers on input lengths or truncation points, so this cannot be checked from the summary alone. Minor additional issues include the lack of visible ablation on the fusion step itself and no mention of statistical significance or variance across runs.

This work is aimed at practitioners who need efficient KPE on long documents and are willing to add a lightweight embedding layer rather than retrain or switch to heavier models. A reader looking for incremental engineering improvements on an established task will find the multi-backbone results useful.

The paper is coherent on its own terms and shows honest engagement with the practical constraints of PLMs, so it deserves a serious referee even if the complementarity claim needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes an attention expansion mechanism that augments token representations from pre-trained language models (PLMs) with semantic signals from out-of-context document chunks via static word embeddings. This is intended to improve keyphrase extraction (KPE) on long documents without incurring the cost of full long-context attention. The central empirical claim is that the mechanism yields consistent F1 gains across five PLM backbones (general, scientific, task-specific, and long-context) and five corpora, outperforming prior SOTA and, crucially, supplying complementary information even to native long-context models rather than merely mitigating input truncation.

Significance. If the experimental claims hold after clarification of the long-context baseline protocol, the work would offer a lightweight, plug-in augmentation for existing PLM-based KPE pipelines that avoids both truncation artifacts and the inference cost of full-document LLMs. The breadth of backbones and domains tested would strengthen the case for practical utility in scientific and news KPE.

major comments (2)

[Experimental Results] Abstract and Experimental Results: The claim that attention expansion supplies complementary signals 'rather than merely compensating for limited input length' on native long-context models (Longformer, BigBird, etc.) is load-bearing. The manuscript does not state whether those long-context baselines were run on full documents or on the same truncated inputs used for standard PLMs; without this protocol detail, the complementarity interpretation cannot be distinguished from a length-compensation effect.
[Experimental Results] Experimental Results: No ablation isolating the contribution of the static embedding fusion versus simple length extension is reported, nor are statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) provided for the reported F1 gains across the five backbones and two training regimes. These omissions leave the consistency claim difficult to evaluate.

minor comments (2)

[Method] The description of how out-of-context chunks are selected and aligned to PLM tokens is underspecified; a concrete example or pseudocode would clarify the fusion step.
[Tables] Table captions and axis labels should explicitly indicate whether reported F1 scores are macro- or micro-averaged and whether they reflect exact-match or partial-match evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to improve experimental clarity.

read point-by-point responses

Referee: [Experimental Results] Abstract and Experimental Results: The claim that attention expansion supplies complementary signals 'rather than merely compensating for limited input length' on native long-context models (Longformer, BigBird, etc.) is load-bearing. The manuscript does not state whether those long-context baselines were run on full documents or on the same truncated inputs used for standard PLMs; without this protocol detail, the complementarity interpretation cannot be distinguished from a length-compensation effect.

Authors: We agree that the protocol detail is necessary to support the complementarity claim. The long-context models were evaluated on full documents using their native extended context windows, while standard PLMs used 512-token truncations. We will explicitly state this in the Experimental Setup, Results, and abstract sections of the revised manuscript. revision: yes
Referee: [Experimental Results] Experimental Results: No ablation isolating the contribution of the static embedding fusion versus simple length extension is reported, nor are statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) provided for the reported F1 gains across the five backbones and two training regimes. These omissions leave the consistency claim difficult to evaluate.

Authors: The existing comparisons against native long-context models already control for length by using models designed for extended contexts, thereby isolating the benefit of static-embedding attention expansion. We will nevertheless add paired t-tests (and bootstrap intervals where appropriate) for the F1 gains in the revised Experimental Results section. An explicit ablation against naive length extension can be included if space allows. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical augmentation with no derivations or self-referential reductions

full rationale

The paper proposes an attention expansion mechanism that fuses pre-trained word embeddings from out-of-context chunks into PLM representations, then reports empirical F1 gains across five backbones and five corpora. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim rests on experimental comparisons rather than any derivation that reduces to its own inputs by construction. The method is framed as a practical augmentation relying on external embeddings, with no load-bearing self-citations or ansatzes. This is the common case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the utility of fusing static word embeddings with PLM representations and on the experimental outcomes; both rest on domain assumptions about embedding complementarity that are not independently verified in the abstract.

axioms (1)

domain assumption Pre-trained word embeddings capture useful semantic information from out-of-context chunks that can be fused into PLM representations
The mechanism explicitly relies on this to expand effective context.

invented entities (1)

attention expansion mechanism no independent evidence
purpose: Augment PLM token representations with distant context information
New technique introduced by the paper to address long-document limitations.

pith-pipeline@v0.9.1-grok · 5797 in / 1335 out tokens · 23609 ms · 2026-06-27T13:04:42.298630+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 6 canonical work pages · 2 internal anchors

[1]

A study on automatically extracted keywords in text categorization

Anette Hulth and Beáta Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 537–544, 2006

2006
[2]

Corephrase: Keyphrase extraction for document clustering

Khaled M Hammouda, Diego N Matute, and Mohamed S Kamel. Corephrase: Keyphrase extraction for document clustering. InInternational workshop on machine learning and data mining in pattern recognition, pages 265–274. Springer, 2005

2005
[3]

Citation summarization through keyphrase extraction

Vahed Qazvinian, Dragomir Radev, and Arzucan Özgür. Citation summarization through keyphrase extraction. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010), pages 895–903, 2010

2010
[4]

World wide web site summarization.Web Intelligence and Agent Systems: An International Journal, 2(1):39–53, 2004

Yongzheng Zhang, Nur Zincir-Heywood, and Evangelos Milios. World wide web site summarization.Web Intelligence and Agent Systems: An International Journal, 2(1):39–53, 2004

2004
[5]

Improving browsing in digital libraries with keyphrase indexes.Decision Support Systems, 27(1-2):81–104, 1999

Carl Gutwin, Gordon Paynter, Ian Witten, Craig Nevill-Manning, and Eibe Frank. Improving browsing in digital libraries with keyphrase indexes.Decision Support Systems, 27(1-2):81–104, 1999

1999
[6]

Keyphrase extraction-based query expansion in digital libraries

Il Yeol Song, Robert B Allen, Zoran Obradovic, and Min Song. Keyphrase extraction-based query expansion in digital libraries. InProceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’06), pages 202–209. IEEE, 2006

2006
[7]

Phrasier: a system for interactive document retrieval using keyphrases

Steve Jones and Mark S Staveley. Phrasier: a system for interactive document retrieval using keyphrases. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 160–167, 1999

1999
[8]

From fundamentals to recent advances: A tutorial on keyphrasi- fication

Rui Meng, Debanjan Mahata, and Florian Boudin. From fundamentals to recent advances: A tutorial on keyphrasi- fication. InAdvances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, pages 582–588. Springer, 2022

2022
[9]

Automatic keyphrase extraction using graph- based methods

Josiane Mothe, Faneva Ramiandrisoa, and Michael Rasolomanana. Automatic keyphrase extraction using graph- based methods. InProceedings of the 33rd Annual ACM Symposium on Applied Computing, pages 728–730, 2018

2018
[10]

A comparison of centrality measures for graph-based keyphrase extraction

Florian Boudin. A comparison of centrality measures for graph-based keyphrase extraction. InProceedings of the sixth international joint conference on natural language processing, pages 834–838, 2013

2013
[11]

Automatic keyphrase extraction: A survey of the state of the art

Kazi Saidul Hasan and Vincent Ng. Automatic keyphrase extraction: A survey of the state of the art. InProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1262–1273, 2014

2014
[12]

Efficient estimation of word representations in vector space, 2013

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013

2013
[13]

GloVe: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics

2014
[14]

Bert: Pre-training of deep bidirectional transformers for language understanding, 2018

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018

2018
[15]

Keyphrase extraction as sequence labeling using contextualized embeddings

Dhruva Sahrawat, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. Keyphrase extraction as sequence labeling using contextualized embeddings. InEuropean Conference on Information Retrieval, pages 328–335. Springer, 2020

2020
[16]

Learning rich representation of keyphrases from text

Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. Learning rich representation of keyphrases from text. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 891–906, Seattle, United States, July 2022. Association for Computational Linguistics

2022
[17]

Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning

Seoyeon Park and Cornelia Caragea. Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. InProceedings of the 28th International Conference on Computational Linguistics, pages 5409–5419, 2020

2020
[18]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer.arXiv:2004.05150, 2020. 15 Attention Expansion for Long-Document KPE

work page internal anchor Pith review Pith/arXiv arXiv 2004
[19]

Big bird: Transformers for longer sequences

Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big bird: Transformers for longer sequences. 2020

2020
[20]

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, and Iacopo Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024

2024
[21]

Farrar, Straus and Giroux, New York, 2011

Daniel Kahneman.Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, 2011

2011
[22]

Do artificial intelligence systems understand?Claridades

Carlos Blanco Pérez and Eduardo Garrido-Merchán. Do artificial intelligence systems understand?Claridades. Revista de Filosofía, 16(1):171–205, 2024

2024
[23]

López-López, and José Portela

Roberto Martínez-Cruz, Debanjan Mahata, Alvaro J. López-López, and José Portela. Enhancing keyphrase extraction from long scientific documents using graph embeddings, 2023

2023
[24]

LongKey: Keyphrase extraction for long documents

Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra Freitas, and Jean Paul Barddal. LongKey: Keyphrase extraction for long documents. InProceedings of the 2024 IEEE International Conference on Big Data, 2024. arXiv:2411.17863

work page arXiv 2024
[25]

MAPEX: A multi-agent pipeline for keyphrase extraction, 2025

Liting Zhang, Shiwan Zhao, Aobo Kong, and Qicheng Li. MAPEX: A multi-agent pipeline for keyphrase extraction, 2025

2025
[26]

TextRank: Bringing order into text

Rada Mihalcea and Paul Tarau. TextRank: Bringing order into text. InProceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain, July 2004. Association for Computational Linguistics

2004
[27]

TopicRank: Graph-based topic ranking for keyphrase extraction

Adrien Bougouin, Florian Boudin, and Béatrice Daille. TopicRank: Graph-based topic ranking for keyphrase extraction. InProceedings of the Sixth International Joint Conference on Natural Language Processing, pages 543–551, Nagoya, Japan, October 2013. Asian Federation of Natural Language Processing

2013
[28]

Corpus-independent generic keyphrase extraction using word embedding vectors

Rui Wang, Wei Liu, and Chris McDonald. Corpus-independent generic keyphrase extraction using word embedding vectors. InSoftware engineering research conference, volume 39, pages 1–8, 2014

2014
[29]

Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings

Debanjan Mahata, John Kuriakose, Rajiv Shah, and Roger Zimmermann. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 634–639, 2018

2018
[30]

Theme-weighted ranking of keywords from text documents using phrase embeddings

Debanjan Mahata, Rajiv Ratn Shah, John Kuriakose, Roger Zimmermann, and John R Talburt. Theme-weighted ranking of keywords from text documents using phrase embeddings. In2018 IEEE conference on multimedia information processing and retrieval (MIPR), pages 184–189. IEEE, 2018

2018
[31]

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. Simple unsupervised keyphrase extraction using sentence embeddings.arXiv preprint arXiv:1801.04470, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

PatternRank: Leveraging pretrained language models and part of speech for unsupervised keyphrase extraction

Tim Schopf, Simon Klimek, and Florian Matthes. PatternRank: Leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. InProceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. SCITEPRESS - Science and Technology Publications, 2022

2022
[33]

Promptrank: Unsupervised keyphrase extraction using prompt.arXiv preprint arXiv:2305.04490, 2023

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xiaoyan Bai. Promptrank: Unsupervised keyphrase extraction using prompt.arXiv preprint arXiv:2305.04490, 2023

work page arXiv 2023
[34]

Incorporating expert knowledge into keyphrase extraction

Sujatha Das Gollapalli, Xiao-li Li, and Peng Yang. Incorporating expert knowledge into keyphrase extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), Feb. 2017

2017
[35]

Lee Giles

Rabah Alzaidy, Cornelia Caragea, and C. Lee Giles. Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. InThe World Wide Web Conference, WWW ’19, page 2551–2557, New York, NY , USA,
[36]

Association for Computing Machinery
[37]

Exploring word embeddings in crf-based keyphrase extraction from research papers

Krutarth Patel and Cornelia Caragea. Exploring word embeddings in crf-based keyphrase extraction from research papers. InProceedings of the 10th International Conference on Knowledge Capture, pages 37–44, 2019

2019
[38]

Transkp: Transformer based key-phrase extraction.2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020

Mukund Rungta, Rishabh Kumar, Mehak Preet Dhaliwal, Hemant Tiwari, and Vanraj Vala. Transkp: Transformer based key-phrase extraction.2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020

2020
[39]

TNT-KID: Transformer-based neural tagger for keyword identifica- tion.Natural Language Engineering, 28(4):409–448, jun 2021

Matej Martinc, Blaž Škrlj, and Senja Pollak. TNT-KID: Transformer-based neural tagger for keyword identifica- tion.Natural Language Engineering, 28(4):409–448, jun 2021. 16 Attention Expansion for Long-Document KPE

2021
[40]

SciBERT: A pretrained language model for scientific text

Iz Beltagy, Kyle Lo, and Arman Cohan. SciBERT: A pretrained language model for scientific text. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November

2019
[42]

Ldkp: A dataset for identifying keyphrases from long scientific documents

Debanjan Mahata, Naveen Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, and Rajiv Ratn Shah. Ldkp: A dataset for identifying keyphrases from long scientific documents. arXiv preprint arXiv:2203.15349, 2022

work page arXiv 2022
[43]

Keyphrase generation beyond the boundaries of title and abstract, 2022

Krishna Garg, Jishnu Ray Chowdhury, and Cornelia Caragea. Keyphrase generation beyond the boundaries of title and abstract, 2022

2022
[44]

Query-based keyphrase extraction from long documents.The International FLAIRS Conference Proceedings, 35, may 2022

Martin Doˇcekal and Pavel Smrž. Query-based keyphrase extraction from long documents.The International FLAIRS Conference Proceedings, 35, may 2022

2022
[45]

UFORank: Unified framework of unsupervised keyphrase extraction for long documents.IEEE Access, 14:9986–10001, 2026

Doyoon Kim and Pilsung Kang. UFORank: Unified framework of unsupervised keyphrase extraction for long documents.IEEE Access, 14:9986–10001, 2026

2026
[46]

López-López, and José Portela

Roberto Martínez-Cruz, Alvaro J. López-López, and José Portela. Chatgpt vs state-of-the-art models: A bench- marking study in keyphrase generation task, 2023

2023
[47]

Empirical study of zero-shot keyphrase extraction with large language models

Byungha Kang and Youhyun Shin. Empirical study of zero-shot keyphrase extraction with large language models. InProceedings of the 31st International Conference on Computational Linguistics, pages 3670–3686, Abu Dhabi, UAE, 2025. Association for Computational Linguistics

2025
[48]

LongDocRank: Graph-augmented large language models for unsupervised keyphrase extraction from long documents.Journal of Big Data, 13(14), 2026

Haoran Ding and Xiao Luo. LongDocRank: Graph-augmented large language models for unsupervised keyphrase extraction from long documents.Journal of Big Data, 13(14), 2026

2026
[49]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017

2017
[50]

Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles

Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. InProceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, page 21–26, USA, 2010. Association for Computational Linguistics

2010
[51]

Keyphrase extraction in scientific publications

Thuy Dung Nguyen and Min-Yen Kan. Keyphrase extraction in scientific publications. In Dion Hoe-Lian Goh, Tru Hoang Cao, Ingeborg Torvik Sølvberg, and Edie Rasmussen, editors,Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317–326, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg

2007
[52]

Single document keyphrase extraction using neighborhood knowledge

Xiaojun Wan and Jianguo Xiao. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, AAAI’08, page 855–860. AAAI Press, 2008

2008
[53]

Improved automatic keyword extraction given more linguistic knowledge

Anette Hulth. Improved automatic keyword extraction given more linguistic knowledge. InProceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, page 216–223, USA,

2003
[54]

Association for Computational Linguistics
[55]

Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld. GORC: A large contextual citation graph of academic papers.CoRR, abs/1911.02782, 2019

work page arXiv 1911
[56]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

2019
[57]

DeBERTaV3: Improving DeBERTa using ELECTRA-style pre- training with gradient-disentangled embedding sharing

Pengcheng He, Jianfeng Gao, and Weizhu Chen. DeBERTaV3: Improving DeBERTa using ELECTRA-style pre- training with gradient-disentangled embedding sharing. InInternational Conference on Learning Representations (ICLR), 2023

2023
[58]

Model2Vec: Fast state-of-the-art static embeddings

Stéphan Tulkens and Thomas van Dongen. Model2Vec: Fast state-of-the-art static embeddings. https:// github.com/MinishLab/model2vec, 2024. MinishLab, GitHub repository. 17

2024

[1] [1]

A study on automatically extracted keywords in text categorization

Anette Hulth and Beáta Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 537–544, 2006

2006

[2] [2]

Corephrase: Keyphrase extraction for document clustering

Khaled M Hammouda, Diego N Matute, and Mohamed S Kamel. Corephrase: Keyphrase extraction for document clustering. InInternational workshop on machine learning and data mining in pattern recognition, pages 265–274. Springer, 2005

2005

[3] [3]

Citation summarization through keyphrase extraction

Vahed Qazvinian, Dragomir Radev, and Arzucan Özgür. Citation summarization through keyphrase extraction. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010), pages 895–903, 2010

2010

[4] [4]

World wide web site summarization.Web Intelligence and Agent Systems: An International Journal, 2(1):39–53, 2004

Yongzheng Zhang, Nur Zincir-Heywood, and Evangelos Milios. World wide web site summarization.Web Intelligence and Agent Systems: An International Journal, 2(1):39–53, 2004

2004

[5] [5]

Improving browsing in digital libraries with keyphrase indexes.Decision Support Systems, 27(1-2):81–104, 1999

Carl Gutwin, Gordon Paynter, Ian Witten, Craig Nevill-Manning, and Eibe Frank. Improving browsing in digital libraries with keyphrase indexes.Decision Support Systems, 27(1-2):81–104, 1999

1999

[6] [6]

Keyphrase extraction-based query expansion in digital libraries

Il Yeol Song, Robert B Allen, Zoran Obradovic, and Min Song. Keyphrase extraction-based query expansion in digital libraries. InProceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’06), pages 202–209. IEEE, 2006

2006

[7] [7]

Phrasier: a system for interactive document retrieval using keyphrases

Steve Jones and Mark S Staveley. Phrasier: a system for interactive document retrieval using keyphrases. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 160–167, 1999

1999

[8] [8]

From fundamentals to recent advances: A tutorial on keyphrasi- fication

Rui Meng, Debanjan Mahata, and Florian Boudin. From fundamentals to recent advances: A tutorial on keyphrasi- fication. InAdvances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, pages 582–588. Springer, 2022

2022

[9] [9]

Automatic keyphrase extraction using graph- based methods

Josiane Mothe, Faneva Ramiandrisoa, and Michael Rasolomanana. Automatic keyphrase extraction using graph- based methods. InProceedings of the 33rd Annual ACM Symposium on Applied Computing, pages 728–730, 2018

2018

[10] [10]

A comparison of centrality measures for graph-based keyphrase extraction

Florian Boudin. A comparison of centrality measures for graph-based keyphrase extraction. InProceedings of the sixth international joint conference on natural language processing, pages 834–838, 2013

2013

[11] [11]

Automatic keyphrase extraction: A survey of the state of the art

Kazi Saidul Hasan and Vincent Ng. Automatic keyphrase extraction: A survey of the state of the art. InProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1262–1273, 2014

2014

[12] [12]

Efficient estimation of word representations in vector space, 2013

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013

2013

[13] [13]

GloVe: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics

2014

[14] [14]

Bert: Pre-training of deep bidirectional transformers for language understanding, 2018

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018

2018

[15] [15]

Keyphrase extraction as sequence labeling using contextualized embeddings

Dhruva Sahrawat, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. Keyphrase extraction as sequence labeling using contextualized embeddings. InEuropean Conference on Information Retrieval, pages 328–335. Springer, 2020

2020

[16] [16]

Learning rich representation of keyphrases from text

Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. Learning rich representation of keyphrases from text. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 891–906, Seattle, United States, July 2022. Association for Computational Linguistics

2022

[17] [17]

Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning

Seoyeon Park and Cornelia Caragea. Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. InProceedings of the 28th International Conference on Computational Linguistics, pages 5409–5419, 2020

2020

[18] [18]

Longformer: The Long-Document Transformer

Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer.arXiv:2004.05150, 2020. 15 Attention Expansion for Long-Document KPE

work page internal anchor Pith review Pith/arXiv arXiv 2004

[19] [19]

Big bird: Transformers for longer sequences

Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big bird: Transformers for longer sequences. 2020

2020

[20] [20]

Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, and Iacopo Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024

2024

[21] [21]

Farrar, Straus and Giroux, New York, 2011

Daniel Kahneman.Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, 2011

2011

[22] [22]

Do artificial intelligence systems understand?Claridades

Carlos Blanco Pérez and Eduardo Garrido-Merchán. Do artificial intelligence systems understand?Claridades. Revista de Filosofía, 16(1):171–205, 2024

2024

[23] [23]

López-López, and José Portela

Roberto Martínez-Cruz, Debanjan Mahata, Alvaro J. López-López, and José Portela. Enhancing keyphrase extraction from long scientific documents using graph embeddings, 2023

2023

[24] [24]

LongKey: Keyphrase extraction for long documents

Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra Freitas, and Jean Paul Barddal. LongKey: Keyphrase extraction for long documents. InProceedings of the 2024 IEEE International Conference on Big Data, 2024. arXiv:2411.17863

work page arXiv 2024

[25] [25]

MAPEX: A multi-agent pipeline for keyphrase extraction, 2025

Liting Zhang, Shiwan Zhao, Aobo Kong, and Qicheng Li. MAPEX: A multi-agent pipeline for keyphrase extraction, 2025

2025

[26] [26]

TextRank: Bringing order into text

Rada Mihalcea and Paul Tarau. TextRank: Bringing order into text. InProceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain, July 2004. Association for Computational Linguistics

2004

[27] [27]

TopicRank: Graph-based topic ranking for keyphrase extraction

Adrien Bougouin, Florian Boudin, and Béatrice Daille. TopicRank: Graph-based topic ranking for keyphrase extraction. InProceedings of the Sixth International Joint Conference on Natural Language Processing, pages 543–551, Nagoya, Japan, October 2013. Asian Federation of Natural Language Processing

2013

[28] [28]

Corpus-independent generic keyphrase extraction using word embedding vectors

Rui Wang, Wei Liu, and Chris McDonald. Corpus-independent generic keyphrase extraction using word embedding vectors. InSoftware engineering research conference, volume 39, pages 1–8, 2014

2014

[29] [29]

Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings

Debanjan Mahata, John Kuriakose, Rajiv Shah, and Roger Zimmermann. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 634–639, 2018

2018

[30] [30]

Theme-weighted ranking of keywords from text documents using phrase embeddings

Debanjan Mahata, Rajiv Ratn Shah, John Kuriakose, Roger Zimmermann, and John R Talburt. Theme-weighted ranking of keywords from text documents using phrase embeddings. In2018 IEEE conference on multimedia information processing and retrieval (MIPR), pages 184–189. IEEE, 2018

2018

[31] [31]

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. Simple unsupervised keyphrase extraction using sentence embeddings.arXiv preprint arXiv:1801.04470, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

PatternRank: Leveraging pretrained language models and part of speech for unsupervised keyphrase extraction

Tim Schopf, Simon Klimek, and Florian Matthes. PatternRank: Leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. InProceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. SCITEPRESS - Science and Technology Publications, 2022

2022

[33] [33]

Promptrank: Unsupervised keyphrase extraction using prompt.arXiv preprint arXiv:2305.04490, 2023

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xiaoyan Bai. Promptrank: Unsupervised keyphrase extraction using prompt.arXiv preprint arXiv:2305.04490, 2023

work page arXiv 2023

[34] [34]

Incorporating expert knowledge into keyphrase extraction

Sujatha Das Gollapalli, Xiao-li Li, and Peng Yang. Incorporating expert knowledge into keyphrase extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), Feb. 2017

2017

[35] [35]

Lee Giles

Rabah Alzaidy, Cornelia Caragea, and C. Lee Giles. Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. InThe World Wide Web Conference, WWW ’19, page 2551–2557, New York, NY , USA,

[36] [36]

Association for Computing Machinery

[37] [37]

Exploring word embeddings in crf-based keyphrase extraction from research papers

Krutarth Patel and Cornelia Caragea. Exploring word embeddings in crf-based keyphrase extraction from research papers. InProceedings of the 10th International Conference on Knowledge Capture, pages 37–44, 2019

2019

[38] [38]

Transkp: Transformer based key-phrase extraction.2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020

Mukund Rungta, Rishabh Kumar, Mehak Preet Dhaliwal, Hemant Tiwari, and Vanraj Vala. Transkp: Transformer based key-phrase extraction.2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020

2020

[39] [39]

TNT-KID: Transformer-based neural tagger for keyword identifica- tion.Natural Language Engineering, 28(4):409–448, jun 2021

Matej Martinc, Blaž Škrlj, and Senja Pollak. TNT-KID: Transformer-based neural tagger for keyword identifica- tion.Natural Language Engineering, 28(4):409–448, jun 2021. 16 Attention Expansion for Long-Document KPE

2021

[40] [40]

SciBERT: A pretrained language model for scientific text

Iz Beltagy, Kyle Lo, and Arman Cohan. SciBERT: A pretrained language model for scientific text. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November

2019

[41] [42]

Ldkp: A dataset for identifying keyphrases from long scientific documents

Debanjan Mahata, Naveen Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, and Rajiv Ratn Shah. Ldkp: A dataset for identifying keyphrases from long scientific documents. arXiv preprint arXiv:2203.15349, 2022

work page arXiv 2022

[42] [43]

Keyphrase generation beyond the boundaries of title and abstract, 2022

Krishna Garg, Jishnu Ray Chowdhury, and Cornelia Caragea. Keyphrase generation beyond the boundaries of title and abstract, 2022

2022

[43] [44]

Query-based keyphrase extraction from long documents.The International FLAIRS Conference Proceedings, 35, may 2022

Martin Doˇcekal and Pavel Smrž. Query-based keyphrase extraction from long documents.The International FLAIRS Conference Proceedings, 35, may 2022

2022

[44] [45]

UFORank: Unified framework of unsupervised keyphrase extraction for long documents.IEEE Access, 14:9986–10001, 2026

Doyoon Kim and Pilsung Kang. UFORank: Unified framework of unsupervised keyphrase extraction for long documents.IEEE Access, 14:9986–10001, 2026

2026

[45] [46]

López-López, and José Portela

Roberto Martínez-Cruz, Alvaro J. López-López, and José Portela. Chatgpt vs state-of-the-art models: A bench- marking study in keyphrase generation task, 2023

2023

[46] [47]

Empirical study of zero-shot keyphrase extraction with large language models

Byungha Kang and Youhyun Shin. Empirical study of zero-shot keyphrase extraction with large language models. InProceedings of the 31st International Conference on Computational Linguistics, pages 3670–3686, Abu Dhabi, UAE, 2025. Association for Computational Linguistics

2025

[47] [48]

LongDocRank: Graph-augmented large language models for unsupervised keyphrase extraction from long documents.Journal of Big Data, 13(14), 2026

Haoran Ding and Xiao Luo. LongDocRank: Graph-augmented large language models for unsupervised keyphrase extraction from long documents.Journal of Big Data, 13(14), 2026

2026

[48] [49]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017

2017

[49] [50]

Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles

Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. InProceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, page 21–26, USA, 2010. Association for Computational Linguistics

2010

[50] [51]

Keyphrase extraction in scientific publications

Thuy Dung Nguyen and Min-Yen Kan. Keyphrase extraction in scientific publications. In Dion Hoe-Lian Goh, Tru Hoang Cao, Ingeborg Torvik Sølvberg, and Edie Rasmussen, editors,Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317–326, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg

2007

[51] [52]

Single document keyphrase extraction using neighborhood knowledge

Xiaojun Wan and Jianguo Xiao. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, AAAI’08, page 855–860. AAAI Press, 2008

2008

[52] [53]

Improved automatic keyword extraction given more linguistic knowledge

Anette Hulth. Improved automatic keyword extraction given more linguistic knowledge. InProceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, page 216–223, USA,

2003

[53] [54]

Association for Computational Linguistics

[54] [55]

Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld. GORC: A large contextual citation graph of academic papers.CoRR, abs/1911.02782, 2019

work page arXiv 1911

[55] [56]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

2019

[56] [57]

DeBERTaV3: Improving DeBERTa using ELECTRA-style pre- training with gradient-disentangled embedding sharing

Pengcheng He, Jianfeng Gao, and Weizhu Chen. DeBERTaV3: Improving DeBERTa using ELECTRA-style pre- training with gradient-disentangled embedding sharing. InInternational Conference on Learning Representations (ICLR), 2023

2023

[57] [58]

Model2Vec: Fast state-of-the-art static embeddings

Stéphan Tulkens and Thomas van Dongen. Model2Vec: Fast state-of-the-art static embeddings. https:// github.com/MinishLab/model2vec, 2024. MinishLab, GitHub repository. 17

2024