pith. sign in

arxiv: 2606.10716 · v2 · pith:GJTTDEFDnew · submitted 2026-06-09 · 💻 cs.CL · cs.AI

Attention Expansion: Enhancing Keyphrase Extraction from Long Documents with Attention-Augmented Contextualized Embeddings

Pith reviewed 2026-06-27 13:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords keyphrase extractionlong documentsattention expansionpre-trained language modelscontextualized embeddingsinformation fusionnatural language processing
0
0 comments X

The pith

An attention expansion mechanism augments PLM token representations with out-of-context embeddings to improve keyphrase extraction from long documents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that its attention expansion mechanism improves keyphrase extraction from long documents by fusing semantic information from out-of-context chunks into pre-trained language model representations. It achieves this fusion using pre-trained word embeddings and attention, without needing full-document processing or costly long-context models. A sympathetic reader would care because salient keyphrase evidence is often scattered across distant sections that standard models cannot see together. If correct, the mechanism supplies complementary signals that raise performance even on models already built for longer contexts or specific domains.

Core claim

The attention expansion mechanism augments PLM token representations with information from surrounding out-of-context chunks using pre-trained word embeddings. This expands the effective contextual scope of PLM-based KPE models without requiring full-document attention or expensive LLM-based inference. Experimental results demonstrate that attention expansion consistently enhances KPE performance across all evaluation settings, outperforming state-of-the-art models and yielding notable improvements in F1 score. The improvements extend to domain-specific, task-specialized, and native long-context models, showing that the proposed mechanism provides complementary information rather than merely

What carries the argument

The attention expansion mechanism that fuses pre-trained word embeddings from out-of-context chunks into PLM token representations via attention.

If this is right

  • It raises F1 scores on scientific and news domain corpora using five different PLM backbones.
  • It delivers gains under two training regimes and works with general-purpose, scientific, task-specific, and long-context encoders.
  • It supplies complementary information rather than only fixing limited context length.
  • It outperforms prior state-of-the-art KPE models while remaining computationally lighter than full long-context LLMs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion idea could be tested on other long-text tasks such as summarization or relation extraction.
  • It points to a general pattern where embeddings from shorter, cheaper models can be combined with stronger contextual ones without retraining either.
  • Scaling the number of out-of-context chunks or choosing them by relevance might further increase the observed gains.

Load-bearing premise

Pre-trained word embeddings from out-of-context chunks supply complementary semantic signals that can be fused into PLM representations without introducing noise or requiring task-specific retraining.

What would settle it

An experiment in which adding the attention expansion mechanism produces no F1 improvement or a decrease on the five benchmark corpora across the tested PLM backbones and training regimes.

Figures

Figures reproduced from arXiv: 2606.10716 by Alvaro J. L\'opez-L\'opez, Jos\'e Portela, Roberto Mart\'inez-Cruz.

Figure 1
Figure 1. Figure 1: Schematic example of sliding-window embedding computation for long documents. The PLM sees only one [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic depiction of the attention expansion mechanism. The PLM contextualises a single chunk, while [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

Pre-trained language models (PLMs) have achieved strong performance in keyphrase extraction (KPE), largely due to their ability to generate rich contextualized representations. However, long-document KPE remains challenging because salient keyphrase evidence may be scattered across distant document sections that cannot be jointly captured within the limited context window of most PLMs. Although long-context large language models (LLMs) can process broader textual contexts, their computational cost limits their practicality for efficient and high-throughput KPE. To overcome this limitation, we propose an attention expansion mechanism that augments PLM token representations with information from surrounding out-of-context chunks using pre-trained word embeddings. The proposed mechanism expands the effective contextual scope of PLM-based KPE models without requiring full-document attention or expensive LLM-based inference. We evaluate our approach across five PLM backbones, including general-purpose, scientific, task-specific, and long-context encoders, using two training regimes and five benchmark corpora from scientific and news domains. Experimental results demonstrate that attention expansion consistently enhances KPE performance across all evaluation settings, outperforming state-of-the-art models and yielding notable improvements in F1 score. The improvements extend to domain-specific, task-specialized, and native long-context models, showing that the proposed mechanism provides complementary information rather than merely compensating for limited input length. These results establish attention expansion as an efficient and effective strategy for long-document KPE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an attention expansion mechanism that augments token representations from pre-trained language models (PLMs) with semantic signals from out-of-context document chunks via static word embeddings. This is intended to improve keyphrase extraction (KPE) on long documents without incurring the cost of full long-context attention. The central empirical claim is that the mechanism yields consistent F1 gains across five PLM backbones (general, scientific, task-specific, and long-context) and five corpora, outperforming prior SOTA and, crucially, supplying complementary information even to native long-context models rather than merely mitigating input truncation.

Significance. If the experimental claims hold after clarification of the long-context baseline protocol, the work would offer a lightweight, plug-in augmentation for existing PLM-based KPE pipelines that avoids both truncation artifacts and the inference cost of full-document LLMs. The breadth of backbones and domains tested would strengthen the case for practical utility in scientific and news KPE.

major comments (2)
  1. [Experimental Results] Abstract and Experimental Results: The claim that attention expansion supplies complementary signals 'rather than merely compensating for limited input length' on native long-context models (Longformer, BigBird, etc.) is load-bearing. The manuscript does not state whether those long-context baselines were run on full documents or on the same truncated inputs used for standard PLMs; without this protocol detail, the complementarity interpretation cannot be distinguished from a length-compensation effect.
  2. [Experimental Results] Experimental Results: No ablation isolating the contribution of the static embedding fusion versus simple length extension is reported, nor are statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) provided for the reported F1 gains across the five backbones and two training regimes. These omissions leave the consistency claim difficult to evaluate.
minor comments (2)
  1. [Method] The description of how out-of-context chunks are selected and aligned to PLM tokens is underspecified; a concrete example or pseudocode would clarify the fusion step.
  2. [Tables] Table captions and axis labels should explicitly indicate whether reported F1 scores are macro- or micro-averaged and whether they reflect exact-match or partial-match evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to improve experimental clarity.

read point-by-point responses
  1. Referee: [Experimental Results] Abstract and Experimental Results: The claim that attention expansion supplies complementary signals 'rather than merely compensating for limited input length' on native long-context models (Longformer, BigBird, etc.) is load-bearing. The manuscript does not state whether those long-context baselines were run on full documents or on the same truncated inputs used for standard PLMs; without this protocol detail, the complementarity interpretation cannot be distinguished from a length-compensation effect.

    Authors: We agree that the protocol detail is necessary to support the complementarity claim. The long-context models were evaluated on full documents using their native extended context windows, while standard PLMs used 512-token truncations. We will explicitly state this in the Experimental Setup, Results, and abstract sections of the revised manuscript. revision: yes

  2. Referee: [Experimental Results] Experimental Results: No ablation isolating the contribution of the static embedding fusion versus simple length extension is reported, nor are statistical significance tests (e.g., paired t-tests or bootstrap confidence intervals) provided for the reported F1 gains across the five backbones and two training regimes. These omissions leave the consistency claim difficult to evaluate.

    Authors: The existing comparisons against native long-context models already control for length by using models designed for extended contexts, thereby isolating the benefit of static-embedding attention expansion. We will nevertheless add paired t-tests (and bootstrap intervals where appropriate) for the F1 gains in the revised Experimental Results section. An explicit ablation against naive length extension can be included if space allows. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical augmentation with no derivations or self-referential reductions

full rationale

The paper proposes an attention expansion mechanism that fuses pre-trained word embeddings from out-of-context chunks into PLM representations, then reports empirical F1 gains across five backbones and five corpora. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim rests on experimental comparisons rather than any derivation that reduces to its own inputs by construction. The method is framed as a practical augmentation relying on external embeddings, with no load-bearing self-citations or ansatzes. This is the common case of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the utility of fusing static word embeddings with PLM representations and on the experimental outcomes; both rest on domain assumptions about embedding complementarity that are not independently verified in the abstract.

axioms (1)
  • domain assumption Pre-trained word embeddings capture useful semantic information from out-of-context chunks that can be fused into PLM representations
    The mechanism explicitly relies on this to expand effective context.
invented entities (1)
  • attention expansion mechanism no independent evidence
    purpose: Augment PLM token representations with distant context information
    New technique introduced by the paper to address long-document limitations.

pith-pipeline@v0.9.1-grok · 5797 in / 1335 out tokens · 23609 ms · 2026-06-27T13:04:42.298630+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 6 canonical work pages · 2 internal anchors

  1. [1]

    A study on automatically extracted keywords in text categorization

    Anette Hulth and Beáta Megyesi. A study on automatically extracted keywords in text categorization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 537–544, 2006

  2. [2]

    Corephrase: Keyphrase extraction for document clustering

    Khaled M Hammouda, Diego N Matute, and Mohamed S Kamel. Corephrase: Keyphrase extraction for document clustering. InInternational workshop on machine learning and data mining in pattern recognition, pages 265–274. Springer, 2005

  3. [3]

    Citation summarization through keyphrase extraction

    Vahed Qazvinian, Dragomir Radev, and Arzucan Özgür. Citation summarization through keyphrase extraction. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010), pages 895–903, 2010

  4. [4]

    World wide web site summarization.Web Intelligence and Agent Systems: An International Journal, 2(1):39–53, 2004

    Yongzheng Zhang, Nur Zincir-Heywood, and Evangelos Milios. World wide web site summarization.Web Intelligence and Agent Systems: An International Journal, 2(1):39–53, 2004

  5. [5]

    Improving browsing in digital libraries with keyphrase indexes.Decision Support Systems, 27(1-2):81–104, 1999

    Carl Gutwin, Gordon Paynter, Ian Witten, Craig Nevill-Manning, and Eibe Frank. Improving browsing in digital libraries with keyphrase indexes.Decision Support Systems, 27(1-2):81–104, 1999

  6. [6]

    Keyphrase extraction-based query expansion in digital libraries

    Il Yeol Song, Robert B Allen, Zoran Obradovic, and Min Song. Keyphrase extraction-based query expansion in digital libraries. InProceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’06), pages 202–209. IEEE, 2006

  7. [7]

    Phrasier: a system for interactive document retrieval using keyphrases

    Steve Jones and Mark S Staveley. Phrasier: a system for interactive document retrieval using keyphrases. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 160–167, 1999

  8. [8]

    From fundamentals to recent advances: A tutorial on keyphrasi- fication

    Rui Meng, Debanjan Mahata, and Florian Boudin. From fundamentals to recent advances: A tutorial on keyphrasi- fication. InAdvances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, pages 582–588. Springer, 2022

  9. [9]

    Automatic keyphrase extraction using graph- based methods

    Josiane Mothe, Faneva Ramiandrisoa, and Michael Rasolomanana. Automatic keyphrase extraction using graph- based methods. InProceedings of the 33rd Annual ACM Symposium on Applied Computing, pages 728–730, 2018

  10. [10]

    A comparison of centrality measures for graph-based keyphrase extraction

    Florian Boudin. A comparison of centrality measures for graph-based keyphrase extraction. InProceedings of the sixth international joint conference on natural language processing, pages 834–838, 2013

  11. [11]

    Automatic keyphrase extraction: A survey of the state of the art

    Kazi Saidul Hasan and Vincent Ng. Automatic keyphrase extraction: A survey of the state of the art. InProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1262–1273, 2014

  12. [12]

    Efficient estimation of word representations in vector space, 2013

    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013

  13. [13]

    GloVe: Global vectors for word representation

    Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics

  14. [14]

    Bert: Pre-training of deep bidirectional transformers for language understanding, 2018

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018

  15. [15]

    Keyphrase extraction as sequence labeling using contextualized embeddings

    Dhruva Sahrawat, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. Keyphrase extraction as sequence labeling using contextualized embeddings. InEuropean Conference on Information Retrieval, pages 328–335. Springer, 2020

  16. [16]

    Learning rich representation of keyphrases from text

    Mayank Kulkarni, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. Learning rich representation of keyphrases from text. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 891–906, Seattle, United States, July 2022. Association for Computational Linguistics

  17. [17]

    Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning

    Seoyeon Park and Cornelia Caragea. Scientific keyphrase identification and classification by pre-trained language models intermediate task transfer learning. InProceedings of the 28th International Conference on Computational Linguistics, pages 5409–5419, 2020

  18. [18]

    Longformer: The Long-Document Transformer

    Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer.arXiv:2004.05150, 2020. 15 Attention Expansion for Long-Document KPE

  19. [19]

    Big bird: Transformers for longer sequences

    Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big bird: Transformers for longer sequences. 2020

  20. [20]

    Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024

    Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, and Iacopo Poli. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference, 2024

  21. [21]

    Farrar, Straus and Giroux, New York, 2011

    Daniel Kahneman.Thinking, Fast and Slow. Farrar, Straus and Giroux, New York, 2011

  22. [22]

    Do artificial intelligence systems understand?Claridades

    Carlos Blanco Pérez and Eduardo Garrido-Merchán. Do artificial intelligence systems understand?Claridades. Revista de Filosofía, 16(1):171–205, 2024

  23. [23]

    López-López, and José Portela

    Roberto Martínez-Cruz, Debanjan Mahata, Alvaro J. López-López, and José Portela. Enhancing keyphrase extraction from long scientific documents using graph embeddings, 2023

  24. [24]

    LongKey: Keyphrase extraction for long documents

    Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra Freitas, and Jean Paul Barddal. LongKey: Keyphrase extraction for long documents. InProceedings of the 2024 IEEE International Conference on Big Data, 2024. arXiv:2411.17863

  25. [25]

    MAPEX: A multi-agent pipeline for keyphrase extraction, 2025

    Liting Zhang, Shiwan Zhao, Aobo Kong, and Qicheng Li. MAPEX: A multi-agent pipeline for keyphrase extraction, 2025

  26. [26]

    TextRank: Bringing order into text

    Rada Mihalcea and Paul Tarau. TextRank: Bringing order into text. InProceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411, Barcelona, Spain, July 2004. Association for Computational Linguistics

  27. [27]

    TopicRank: Graph-based topic ranking for keyphrase extraction

    Adrien Bougouin, Florian Boudin, and Béatrice Daille. TopicRank: Graph-based topic ranking for keyphrase extraction. InProceedings of the Sixth International Joint Conference on Natural Language Processing, pages 543–551, Nagoya, Japan, October 2013. Asian Federation of Natural Language Processing

  28. [28]

    Corpus-independent generic keyphrase extraction using word embedding vectors

    Rui Wang, Wei Liu, and Chris McDonald. Corpus-independent generic keyphrase extraction using word embedding vectors. InSoftware engineering research conference, volume 39, pages 1–8, 2014

  29. [29]

    Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings

    Debanjan Mahata, John Kuriakose, Rajiv Shah, and Roger Zimmermann. Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 634–639, 2018

  30. [30]

    Theme-weighted ranking of keywords from text documents using phrase embeddings

    Debanjan Mahata, Rajiv Ratn Shah, John Kuriakose, Roger Zimmermann, and John R Talburt. Theme-weighted ranking of keywords from text documents using phrase embeddings. In2018 IEEE conference on multimedia information processing and retrieval (MIPR), pages 184–189. IEEE, 2018

  31. [31]

    Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

    Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, and Martin Jaggi. Simple unsupervised keyphrase extraction using sentence embeddings.arXiv preprint arXiv:1801.04470, 2018

  32. [32]

    PatternRank: Leveraging pretrained language models and part of speech for unsupervised keyphrase extraction

    Tim Schopf, Simon Klimek, and Florian Matthes. PatternRank: Leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. InProceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. SCITEPRESS - Science and Technology Publications, 2022

  33. [33]

    Promptrank: Unsupervised keyphrase extraction using prompt.arXiv preprint arXiv:2305.04490, 2023

    Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xiaoyan Bai. Promptrank: Unsupervised keyphrase extraction using prompt.arXiv preprint arXiv:2305.04490, 2023

  34. [34]

    Incorporating expert knowledge into keyphrase extraction

    Sujatha Das Gollapalli, Xiao-li Li, and Peng Yang. Incorporating expert knowledge into keyphrase extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), Feb. 2017

  35. [35]

    Lee Giles

    Rabah Alzaidy, Cornelia Caragea, and C. Lee Giles. Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. InThe World Wide Web Conference, WWW ’19, page 2551–2557, New York, NY , USA,

  36. [36]

    Association for Computing Machinery

  37. [37]

    Exploring word embeddings in crf-based keyphrase extraction from research papers

    Krutarth Patel and Cornelia Caragea. Exploring word embeddings in crf-based keyphrase extraction from research papers. InProceedings of the 10th International Conference on Knowledge Capture, pages 37–44, 2019

  38. [38]

    Transkp: Transformer based key-phrase extraction.2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020

    Mukund Rungta, Rishabh Kumar, Mehak Preet Dhaliwal, Hemant Tiwari, and Vanraj Vala. Transkp: Transformer based key-phrase extraction.2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020

  39. [39]

    TNT-KID: Transformer-based neural tagger for keyword identifica- tion.Natural Language Engineering, 28(4):409–448, jun 2021

    Matej Martinc, Blaž Škrlj, and Senja Pollak. TNT-KID: Transformer-based neural tagger for keyword identifica- tion.Natural Language Engineering, 28(4):409–448, jun 2021. 16 Attention Expansion for Long-Document KPE

  40. [40]

    SciBERT: A pretrained language model for scientific text

    Iz Beltagy, Kyle Lo, and Arman Cohan. SciBERT: A pretrained language model for scientific text. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November

  41. [42]

    Ldkp: A dataset for identifying keyphrases from long scientific documents

    Debanjan Mahata, Naveen Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, and Rajiv Ratn Shah. Ldkp: A dataset for identifying keyphrases from long scientific documents. arXiv preprint arXiv:2203.15349, 2022

  42. [43]

    Keyphrase generation beyond the boundaries of title and abstract, 2022

    Krishna Garg, Jishnu Ray Chowdhury, and Cornelia Caragea. Keyphrase generation beyond the boundaries of title and abstract, 2022

  43. [44]

    Query-based keyphrase extraction from long documents.The International FLAIRS Conference Proceedings, 35, may 2022

    Martin Doˇcekal and Pavel Smrž. Query-based keyphrase extraction from long documents.The International FLAIRS Conference Proceedings, 35, may 2022

  44. [45]

    UFORank: Unified framework of unsupervised keyphrase extraction for long documents.IEEE Access, 14:9986–10001, 2026

    Doyoon Kim and Pilsung Kang. UFORank: Unified framework of unsupervised keyphrase extraction for long documents.IEEE Access, 14:9986–10001, 2026

  45. [46]

    López-López, and José Portela

    Roberto Martínez-Cruz, Alvaro J. López-López, and José Portela. Chatgpt vs state-of-the-art models: A bench- marking study in keyphrase generation task, 2023

  46. [47]

    Empirical study of zero-shot keyphrase extraction with large language models

    Byungha Kang and Youhyun Shin. Empirical study of zero-shot keyphrase extraction with large language models. InProceedings of the 31st International Conference on Computational Linguistics, pages 3670–3686, Abu Dhabi, UAE, 2025. Association for Computational Linguistics

  47. [48]

    LongDocRank: Graph-augmented large language models for unsupervised keyphrase extraction from long documents.Journal of Big Data, 13(14), 2026

    Haoran Ding and Xiao Luo. LongDocRank: Graph-augmented large language models for unsupervised keyphrase extraction from long documents.Journal of Big Data, 13(14), 2026

  48. [49]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017

  49. [50]

    Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles

    Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. InProceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, page 21–26, USA, 2010. Association for Computational Linguistics

  50. [51]

    Keyphrase extraction in scientific publications

    Thuy Dung Nguyen and Min-Yen Kan. Keyphrase extraction in scientific publications. In Dion Hoe-Lian Goh, Tru Hoang Cao, Ingeborg Torvik Sølvberg, and Edie Rasmussen, editors,Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pages 317–326, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg

  51. [52]

    Single document keyphrase extraction using neighborhood knowledge

    Xiaojun Wan and Jianguo Xiao. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, AAAI’08, page 855–860. AAAI Press, 2008

  52. [53]

    Improved automatic keyword extraction given more linguistic knowledge

    Anette Hulth. Improved automatic keyword extraction given more linguistic knowledge. InProceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, page 216–223, USA,

  53. [54]

    Association for Computational Linguistics

  54. [55]

    Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld. GORC: A large contextual citation graph of academic papers.CoRR, abs/1911.02782, 2019

  55. [56]

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

    Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter, 2019

  56. [57]

    DeBERTaV3: Improving DeBERTa using ELECTRA-style pre- training with gradient-disentangled embedding sharing

    Pengcheng He, Jianfeng Gao, and Weizhu Chen. DeBERTaV3: Improving DeBERTa using ELECTRA-style pre- training with gradient-disentangled embedding sharing. InInternational Conference on Learning Representations (ICLR), 2023

  57. [58]

    Model2Vec: Fast state-of-the-art static embeddings

    Stéphan Tulkens and Thomas van Dongen. Model2Vec: Fast state-of-the-art static embeddings. https:// github.com/MinishLab/model2vec, 2024. MinishLab, GitHub repository. 17