Less is More: Quality-Aware Training Data Selection for Scientific Summarization

Grigorios Tsoumakas; Maria Nefeli Paraskevopoulou; Tatiana Passali

arxiv: 2606.24828 · v1 · pith:LFGGTTFUnew · submitted 2026-06-23 · 💻 cs.CL

Less is More: Quality-Aware Training Data Selection for Scientific Summarization

Maria Nefeli Paraskevopoulou , Tatiana Passali , Grigorios Tsoumakas This is my paper

Pith reviewed 2026-06-25 23:40 UTC · model grok-4.3

classification 💻 cs.CL

keywords scientific summarizationtraining data selectionquality metricsbiomedical summarizationfactuality evaluationreference qualitydata efficiency

0 comments

The pith

Quality-aware selection of training abstracts outperforms random sampling at matched sizes and can match larger random sets on factuality metrics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles a dataset of 1.88 million biomedical articles and measures how closely author-written abstracts align with their source documents. It applies source-grounded and model-based metrics to score this alignment and uses the scores to pick high-quality subsets for training. Models trained on these selected subsets produce more factual summaries than models trained on random subsets of the same size. In several cases the smaller selected sets also match or exceed the factuality of models trained on much larger random collections. The work treats reference quality as a controllable variable that directly affects how efficiently a summarization model learns from scientific text.

Core claim

Author-written abstracts vary substantially in alignment with their full articles. Source-grounded and model-based quality metrics identify higher-quality subsets. Training on these subsets yields better factuality-oriented performance than random sampling at equal training size and can reach or surpass larger random subsets.

What carries the argument

Quality scoring of reference abstracts with source-grounded and model-based metrics to select training-data subsets for summarization models.

If this is right

Fewer but higher-quality examples can replace larger volumes of lower-quality examples without loss of factuality performance.
Filtering low-alignment abstracts before training raises the efficiency of data use in scientific summarization.
Reference quality acts as a limiting factor on what models can learn from author abstracts.
Quality-aware selection offers a direct way to improve training when high-quality labeled data remain scarce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selection logic could be tested on other long-document summarization domains where reference quality also varies.
If the metrics generalize, they could reduce the total compute needed to reach a target factuality level.
The approach invites direct comparison against other data-filtering strategies such as perplexity-based or diversity-based selection.

Load-bearing premise

The metrics correctly identify which abstracts will produce models with higher factuality when used as training targets.

What would settle it

Train summarization models on the metric-selected high-quality subsets and observe no gain or a loss in factuality metrics relative to random subsets of identical size.

Figures

Figures reproduced from arXiv: 2606.24828 by Grigorios Tsoumakas, Maria Nefeli Paraskevopoulou, Tatiana Passali.

**Figure 2.** Figure 2: Token-count distributions for article bodies [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

read the original abstract

Scientific long-document summarization datasets commonly treat author-written abstracts as gold reference summaries, although their quality and alignment with the source article vary. At the same time, publicly available scientific summarization datasets remain limited in scale and structure for modern long-context models. In this work, we address both challenges by a) constructing and releasing one of the largest biomedical and life science datasets for long-document summarization, containing 1.88 million PMC articles, and b) analyzing the reference quality of author-written abstracts with source-grounded and model-based metrics. We show that author-written abstracts vary in their alignment with the full article and that these quality signals can guide training-data selection. Training on selected high-quality subsets outperforms random sampling at matched training sizes and can match or exceed larger random subsets on factuality-oriented metrics. Our findings suggest that reference quality is an important factor in scientific summarization and that quality-aware data selection can improve training efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a 1.88M-article PMC dataset and shows quality-filtered subsets beat random sampling on factuality at matched sizes.

read the letter

The main things to know are the release of one of the larger biomedical summarization datasets and the empirical result that training on high-quality abstract subsets improves factuality metrics over random selection at the same training size.

They pull 1.88 million PMC articles, score the author abstracts with source-grounded and model-based alignment metrics, then train summarization models on the top-scoring subsets versus random draws. The selected subsets match or exceed larger random sets on factuality-oriented measures. Releasing the data is the clearest contribution, and the efficiency angle is straightforward for anyone constrained by compute.

The experiments are direct and the gains appear in their reported comparisons. That part holds up as a practical demonstration.

The soft spot is that the design does not rule out other differences between the subsets. High-scoring abstracts could simply be longer, more lexically varied, or cover different subtopics than the random ones, and those properties might explain the factuality lift rather than the quality scores themselves. No controls that match subsets on length or diversity are described, so the causal role of the specific metrics stays open.

This is for people working on scientific or biomedical summarization who need large training resources or are testing data-selection methods. A reading group focused on long-context models or curation would find the dataset and the size-matched comparisons useful.

It should go to peer review. The dataset is new and the result is testable even if the controls need tightening.

Referee Report

3 major / 2 minor

Summary. The paper constructs and releases a dataset of 1.88 million PMC articles for biomedical long-document summarization. It analyzes author-written abstracts using source-grounded and model-based quality metrics, then shows that training summarization models on high-quality subsets selected via these metrics outperforms random sampling at matched training sizes and can match or exceed performance from larger random subsets on factuality-oriented metrics.

Significance. If the central empirical result holds after addressing confounds, the work would demonstrate that reference quality is a load-bearing factor in scientific summarization training and that quality-aware selection improves efficiency. The release of the large-scale dataset is a concrete contribution that could support further research on long-context models.

major comments (3)

[§5] §5 (Experiments): The comparison of quality-selected vs. random subsets at matched sizes does not report any controls or ablations for correlated subset properties (e.g., abstract length, lexical diversity, or domain coverage). Without these, it is impossible to isolate whether the reported factuality gains are caused by the quality metrics or by incidental differences between the subsets.
[§5.2, Table 3] §5.2 and Table 3: No statistical significance tests, confidence intervals, or effect sizes are provided for the factuality metric improvements. The abstract claims outperformance, but the lack of these details leaves the strength of the evidence unclear.
[§4] §4 (Quality Analysis): The source-grounded and model-based metrics are used to filter data, yet the paper does not test whether subsets selected by these metrics differ systematically from random subsets on non-quality dimensions that could affect downstream training (e.g., via correlation analysis or matched sampling on length).

minor comments (2)

[Abstract, §1] The abstract and §1 should explicitly name the factuality metrics (e.g., FactCC, SummaC) and the exact model architectures used in the downstream experiments.
[Figure 2] Figure 2 caption and axis labels use inconsistent terminology for 'quality score' vs. 'alignment score'; standardize notation across figures and text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on potential confounds and statistical reporting. We address each major comment below.

read point-by-point responses

Referee: [§5] §5 (Experiments): The comparison of quality-selected vs. random subsets at matched sizes does not report any controls or ablations for correlated subset properties (e.g., abstract length, lexical diversity, or domain coverage). Without these, it is impossible to isolate whether the reported factuality gains are caused by the quality metrics or by incidental differences between the subsets.

Authors: We agree that additional controls are needed to better isolate the contribution of the quality metrics. In the revised manuscript we will add correlation analyses and ablations that compare abstract length, lexical diversity, and domain coverage between the quality-selected subsets and the random subsets of matched size. revision: yes
Referee: [§5.2, Table 3] §5.2 and Table 3: No statistical significance tests, confidence intervals, or effect sizes are provided for the factuality metric improvements. The abstract claims outperformance, but the lack of these details leaves the strength of the evidence unclear.

Authors: We acknowledge that the current presentation would benefit from formal statistical reporting. We will add statistical significance tests, confidence intervals, and effect sizes for the factuality metrics in §5.2 and Table 3 of the revised version. revision: yes
Referee: [§4] §4 (Quality Analysis): The source-grounded and model-based metrics are used to filter data, yet the paper does not test whether subsets selected by these metrics differ systematically from random subsets on non-quality dimensions that could affect downstream training (e.g., via correlation analysis or matched sampling on length).

Authors: This concern is closely related to the first comment. We will extend the analysis in §4 to include explicit comparisons (via correlation and matched-sampling checks) of non-quality properties such as length between the quality-selected and random subsets. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical dataset construction and subset comparison

full rationale

The paper's core contribution is the release of a 1.88M-article dataset followed by empirical training experiments that compare quality-filtered subsets against random subsets of matched size. No equations, fitted parameters, or self-citations are invoked to derive the reported performance gains; the outperformance is measured directly on held-out test sets. The derivation chain is therefore self-contained against external benchmarks and contains no self-definitional, fitted-input, or self-citation reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is data-driven and relies on standard assumptions in machine learning about reference quality affecting model performance. No free parameters are introduced beyond typical training hyperparameters. No new entities are postulated.

axioms (1)

domain assumption Author-written abstracts serve as usable but variable-quality reference summaries for training summarization models.
Stated directly in the abstract as the starting point for quality analysis.

pith-pipeline@v0.9.1-grok · 5698 in / 1266 out tokens · 22666 ms · 2026-06-25T23:40:08.843827+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

89 extracted references · 20 canonical work pages

[1]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

2004
[2]

International Conference on Learning Representations , year=

BERTScore: Evaluating Text Generation with BERT , author=. International Conference on Learning Representations , year=
[7]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[8]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
[9]

Journal of machine learning research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of machine learning research , volume=
[10]

International conference on machine learning , pages=

Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International conference on machine learning , pages=. 2020 , organization=

2020
[11]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[12]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
[13]

International Journal of Data Science and Analytics , volume=

Biomedical text summarization with large language models: methodologies, challenges, and future directions , author=. International Journal of Data Science and Analytics , volume=. 2026 , publisher=

2026
[14]

Bioinformatics , volume=

BioBERT: a pre-trained biomedical language representation model for biomedical text mining , author=. Bioinformatics , volume=. 2020 , publisher=

2020
[15]

ACM Transactions on Computing for Healthcare (HEALTH) , volume=

Domain-specific language model pretraining for biomedical natural language processing , author=. ACM Transactions on Computing for Healthcare (HEALTH) , volume=. 2021 , publisher=

2021
[17]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Summn: A multi-stage summarization framework for long input dialogues and documents , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[18]

Advances in neural information processing systems , volume=

Teaching machines to read and comprehend , author=. Advances in neural information processing systems , volume=
[19]

and Lapata, Mirella

Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

work page doi:10.18653/v1/d18-1206 2018
[20]

N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

Grusky, Max and Naaman, Mor and Artzi, Yoav. N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1065

work page doi:10.18653/v1/n18-1065 2018
[21]

QMS um: A New Benchmark for Query-based Multi-domain Meeting Summarization

Zhong, Ming and Yin, Da and Yu, Tao and Zaidi, Ahmad and Mutuma, Mutethia and Jha, Rahul and Awadallah, Ahmed Hassan and Celikyilmaz, Asli and Liu, Yang and Qiu, Xipeng and Radev, Dragomir. QMS um: A New Benchmark for Query-based Multi-domain Meeting Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Com...

work page doi:10.18653/v1/2021.naacl-main.472 2021
[22]

SAMS um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander. SAMS um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019. doi:10.18653/v1/D19-5409

work page doi:10.18653/v1/d19-5409 2019
[23]

DialogSum:

Chen, Yulong and Liu, Yang and Chen, Liang and Zhang, Yue. D ialog S um: A Real-Life Scenario Dialogue Summarization Dataset. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.449

work page doi:10.18653/v1/2021.findings-acl.449 2021
[24]

B ill S um: A Corpus for Automatic Summarization of US Legislation

Kornilova, Anastassia and Eidelman, Vladimir. B ill S um: A Corpus for Automatic Summarization of US Legislation. Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019. doi:10.18653/v1/D19-5406

work page doi:10.18653/v1/d19-5406 2019
[27]

Proceedings of the ACM web conference 2023 , pages=

Citationsum: Citation-aware graph contrastive learning for scientific paper summarization , author=. Proceedings of the ACM web conference 2023 , pages=

2023
[28]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Structured summarization of academic publications , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2019 , organization=

2019
[29]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Making science simple: Corpora for the lay summarisation of scientific literature , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022
[31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Automated lay language summarization of biomedical scientific reviews , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[32]

Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

On the summarization of consumer health questions , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
[33]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

Ms2: Multi-document summarization of medical studies , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

2021
[34]

JAMA , year =

Accuracy of Data in Abstracts of Published Research Articles , author =. JAMA , year =
[35]

BMC Medical Research Methodology , volume=

Abstracts in high profile journals often fail to report harm , author=. BMC Medical Research Methodology , volume=. 2008 , publisher=

2008
[36]

Research Integrity and Peer Review , volume=

Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications , author=. Research Integrity and Peer Review , volume=. 2023 , publisher=

2023
[37]

BMC medical research methodology , volume=

A scoping review of comparisons between abstracts and full reports in primary biomedical research , author=. BMC medical research methodology , volume=. 2017 , publisher=

2017
[38]

Journal of clinical epidemiology , volume=

Do not make clinical decisions based on abstracts of healthcare research: A systematic review , author=. Journal of clinical epidemiology , volume=. 2021 , publisher=

2021
[39]

BMJ Evidence-Based Medicine , volume=

Comparing data accuracy between structured abstracts and full-text journal articles: implications in their use for informing clinical decisions , author=. BMJ Evidence-Based Medicine , volume=. 2013 , publisher=

2013
[40]

BMC medical research methodology , volume=

Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention , author=. BMC medical research methodology , volume=. 2015 , publisher=

2015
[41]

Proceedings of the National Academy of Sciences , volume=

Misrepresentation and distortion of research in biomedical literature , author=. Proceedings of the National Academy of Sciences , volume=. 2018 , publisher=

2018
[42]

PLoS medicine , volume=

CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration , author=. PLoS medicine , volume=. 2008 , publisher=

2008
[43]

PLoS medicine , volume=

PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts , author=. PLoS medicine , volume=. 2013 , publisher=

2013
[44]

Patterns , volume=

The landscape of biomedical research , author=. Patterns , volume=. 2024 , publisher=

2024
[46]

Big Bird: Transformers for Longer Sequences , url =

Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , booktitle =. Big Bird: Transformers for Longer Sequences , url =
[48]

Findings of the Association for Computational Linguistics: EACL 2023 , pages=

Long document summarization with top-down and bottom-up inference , author=. Findings of the Association for Computational Linguistics: EACL 2023 , pages=

2023
[50]

International Conference on Learning Representations , volume=

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , author=. International Conference on Learning Representations , volume=
[52]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

MarkupLM: Pre-training of text and markup language for visually rich document understanding , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[54]

Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

Evaluating the factual consistency of abstractive text summarization , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

2020
[56]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Understanding faithfulness and reasoning of large language models on plain biomedical summaries , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024
[58]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Data selection curriculum for abstractive text summarization , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023
[59]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

Improving truthfulness of headline generation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
[60]

Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume , pages=

Entity-level factual consistency of abstractive text summarization , author=. Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume , pages=
[61]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Learning to revise references for faithful summarization , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022
[62]

Asma Ben Abacha and Dina Demner-Fushman. 2019. On the summarization of consumer health questions. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 2228--2234

2019
[63]

Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, and No \'e mie Elhadad. 2022. Learning to revise references for faithful summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4009--4027

2022
[64]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150

Pith/arXiv arXiv 2020
[65]

Enrique Bernal-Delgado and Elliot S Fisher. 2008. Abstracts in high profile journals often fail to report harm. BMC Medical Research Methodology, 8(1):14

2008
[66]

Isabelle Boutron and Philippe Ravaud. 2018. Misrepresentation and distortion of research in biomedical literature. Proceedings of the National Academy of Sciences, 115(11):2613--2619

2018
[67]

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. https://doi.org/10.18653/v1/N18-2097 A discourse-aware attention model for abstractive summarization of long documents . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human...

work page doi:10.18653/v1/n18-2097 2018
[68]

Daniel Deutsch and Dan Roth. 2021. https://doi.org/10.18653/v1/2021.conll-1.24 Understanding the extent to which content quality metrics measure the information quality of summaries . In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 300--309, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.conll-1.24 2021
[69]

Jay DeYoung, Iz Beltagy, Madeleine van Zuylen, Bailey Kuehl, and Lucy Lu Wang. 2021. Ms2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7494--7513

2021
[70]

Biaoyan Fang, Xiang Dai, and Sarvnaz Karimi. 2024. Understanding faithfulness and reasoning of large language models on plain biomedical summaries. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9890--9911

2024
[71]

Alexios Gidiotis and Grigorios Tsoumakas. 2019. Structured summarization of academic publications. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 636--645. Springer

2019
[72]

Alexios Gidiotis and Grigorios Tsoumakas. 2020. https://doi.org/10.1109/TASLP.2020.3037401 A divide-and-conquer approach to the summarization of long documents . IEEE/ACM Trans. Audio, Speech and Lang. Proc., 28:3029–3040

work page doi:10.1109/taslp.2020.3037401 2020
[73]

Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. 2023. https://doi.org/10.18653/v1/2023.bionlp-1.44 Overview of the biolaysumm 2023 shared task on lay summarization of biomedical research articles . In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Sha...

work page doi:10.18653/v1/2023.bionlp-1.44 2023
[74]

Tomas Goldsack, Zhihao Zhang, Chenghua Lin, and Carolina Scarton. 2022. Making science simple: Corpora for the lay summarisation of scientific literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10589--10604

2022
[75]

Rita Gonz \'a lez-M \'a rquez, Luca Schmidt, Benjamin M Schmidt, Philipp Berens, and Dmitry Kobak. 2024. The landscape of biomedical research. Patterns, 5(6)

2024
[76]

Yue Guo, Wei Qiu, Yizhong Wang, and Trevor Cohen. 2021. Automated lay language summarization of biomedical scientific reviews. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 160--168

2021
[77]

Vivek Gupta, Prerna Bharti, Pegah Nokhiz, and Harish Karnick. 2021. https://doi.org/10.18653/v1/2021.acl-srw.30 SumPubMed : Summarization dataset of P ub M ed scientific articles . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student R...

work page doi:10.18653/v1/2021.acl-srw.30 2021
[78]

Jia He, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X Wang, and Sadid Hasan. 2024. Does prompt formatting have any impact on llm performance? arXiv preprint arXiv:2411.10541

arXiv 2024
[79]

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang. 2021. https://doi.org/10.18653/v1/2021.naacl-main.112 Efficient attentions for long document summarization . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419--1436, Online. Associa...

work page doi:10.18653/v1/2021.naacl-main.112 2021
[80]

Sherif Ahmed Kamel and Tamer A El-Sobky. 2023. Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications. Research Integrity and Peer Review, 8(1):11

2023
[81]

Wojciech Kry \'s ci \'n ski, Bryan McCann, Caiming Xiong, and Richard Socher. 2020. Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 9332--9346

2020
[82]

and Hearst, Marti A

Philippe Laban, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. 2022. https://doi.org/10.1162/tacl_a_00453 S umma C : Re-visiting NLI -based models for inconsistency detection in summarization . Transactions of the Association for Computational Linguistics, 10:163--177

work page doi:10.1162/tacl_a_00453 2022
[83]

Cl \'e ment Lazarus, Romana Haneef, Philippe Ravaud, and Isabelle Boutron. 2015. Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention. BMC medical research methodology, 15(1):85

2015
[84]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 7871--7880

2020
[85]

Guowei Li, Luciana PF Abbade, Ikunna Nwosu, Yanling Jin, Alvin Leenus, Muhammad Maaz, Mei Wang, Meha Bhatt, Laura Zielinski, Nitika Sanger, and 1 others. 2017. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC medical research methodology, 17(1):181

2017
[86]

Junlong Li, Yiheng Xu, Lei Cui, and Furu Wei. 2022. Markuplm: Pre-training of text and markup language for visually rich document understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6078--6087

2022
[87]

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.153 G -eval: NLG evaluation using gpt-4 with better human alignment . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511--2522, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.153 2023
[88]

Zheheng Luo, Qianqian Xie, and Sophia Ananiadou. 2023. Citationsum: Citation-aware graph contrastive learning for scientific paper summarization. In Proceedings of the ACM web conference 2023, pages 1843--1852

2023
[89]

Kazuki Matsumaru, Sho Takase, and Naoaki Okazaki. 2020. Improving truthfulness of headline generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1335--1346

2020
[90]

Shafiya Mushtaq and K Veningston. 2026. Biomedical text summarization with large language models: methodologies, challenges, and future directions. International Journal of Data Science and Analytics, 22(1):29

2026
[91]

Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, and Bing Xiang. 2021. Entity-level factual consistency of abstractive text summarization. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pages 2727--2733

2021
[92]

Dafne P Nascimento, Raymond WJG Ostelo, Maurits W van Tulder, Gabrielle Z Gonzalez, Amanda C Araujo, Adriane A Vanin, and Leonardo OP Costa. 2021. Do not make clinical decisions based on abstracts of healthcare research: A systematic review. Journal of clinical epidemiology, 135:136--157

2021
[93]

Bo Pang, Erik Nijkamp, Wojciech Kry \'s ci \'n ski, Silvio Savarese, Yingbo Zhou, and Caiming Xiong. 2023. Long document summarization with top-down and bottom-up inference. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1267--1284

2023
[94]

RM Pitkin, MA Branagan, and LF Burmeister. 1999. https://doi.org/10.1001/jama.281.12.1110 Accuracy of data in abstracts of published research articles . JAMA, 281(12):1110--1111

work page doi:10.1001/jama.281.12.1110 1999
[95]

Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. 2024. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In International Conference on Learning Representations, volume 2024, pages 25055--25083

2024

Showing first 80 references.

[1] [1]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

2004

[2] [2]

International Conference on Learning Representations , year=

BERTScore: Evaluating Text Generation with BERT , author=. International Conference on Learning Representations , year=

[3] [7]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

[4] [8]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[5] [9]

Journal of machine learning research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of machine learning research , volume=

[6] [10]

International conference on machine learning , pages=

Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International conference on machine learning , pages=. 2020 , organization=

2020

[7] [11]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

[8] [12]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

[9] [13]

International Journal of Data Science and Analytics , volume=

Biomedical text summarization with large language models: methodologies, challenges, and future directions , author=. International Journal of Data Science and Analytics , volume=. 2026 , publisher=

2026

[10] [14]

Bioinformatics , volume=

BioBERT: a pre-trained biomedical language representation model for biomedical text mining , author=. Bioinformatics , volume=. 2020 , publisher=

2020

[11] [15]

ACM Transactions on Computing for Healthcare (HEALTH) , volume=

Domain-specific language model pretraining for biomedical natural language processing , author=. ACM Transactions on Computing for Healthcare (HEALTH) , volume=. 2021 , publisher=

2021

[12] [17]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Summn: A multi-stage summarization framework for long input dialogues and documents , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[13] [18]

Advances in neural information processing systems , volume=

Teaching machines to read and comprehend , author=. Advances in neural information processing systems , volume=

[14] [19]

and Lapata, Mirella

Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

work page doi:10.18653/v1/d18-1206 2018

[15] [20]

N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

Grusky, Max and Naaman, Mor and Artzi, Yoav. N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1065

work page doi:10.18653/v1/n18-1065 2018

[16] [21]

QMS um: A New Benchmark for Query-based Multi-domain Meeting Summarization

Zhong, Ming and Yin, Da and Yu, Tao and Zaidi, Ahmad and Mutuma, Mutethia and Jha, Rahul and Awadallah, Ahmed Hassan and Celikyilmaz, Asli and Liu, Yang and Qiu, Xipeng and Radev, Dragomir. QMS um: A New Benchmark for Query-based Multi-domain Meeting Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Com...

work page doi:10.18653/v1/2021.naacl-main.472 2021

[17] [22]

SAMS um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander. SAMS um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019. doi:10.18653/v1/D19-5409

work page doi:10.18653/v1/d19-5409 2019

[18] [23]

DialogSum:

Chen, Yulong and Liu, Yang and Chen, Liang and Zhang, Yue. D ialog S um: A Real-Life Scenario Dialogue Summarization Dataset. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.449

work page doi:10.18653/v1/2021.findings-acl.449 2021

[19] [24]

B ill S um: A Corpus for Automatic Summarization of US Legislation

Kornilova, Anastassia and Eidelman, Vladimir. B ill S um: A Corpus for Automatic Summarization of US Legislation. Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019. doi:10.18653/v1/D19-5406

work page doi:10.18653/v1/d19-5406 2019

[20] [27]

Proceedings of the ACM web conference 2023 , pages=

Citationsum: Citation-aware graph contrastive learning for scientific paper summarization , author=. Proceedings of the ACM web conference 2023 , pages=

2023

[21] [28]

Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=

Structured summarization of academic publications , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2019 , organization=

2019

[22] [29]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Making science simple: Corpora for the lay summarisation of scientific literature , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022

[23] [31]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Automated lay language summarization of biomedical scientific reviews , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[24] [32]

Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

On the summarization of consumer health questions , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=

[25] [33]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

Ms2: Multi-document summarization of medical studies , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

2021

[26] [34]

JAMA , year =

Accuracy of Data in Abstracts of Published Research Articles , author =. JAMA , year =

[27] [35]

BMC Medical Research Methodology , volume=

Abstracts in high profile journals often fail to report harm , author=. BMC Medical Research Methodology , volume=. 2008 , publisher=

2008

[28] [36]

Research Integrity and Peer Review , volume=

Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications , author=. Research Integrity and Peer Review , volume=. 2023 , publisher=

2023

[29] [37]

BMC medical research methodology , volume=

A scoping review of comparisons between abstracts and full reports in primary biomedical research , author=. BMC medical research methodology , volume=. 2017 , publisher=

2017

[30] [38]

Journal of clinical epidemiology , volume=

Do not make clinical decisions based on abstracts of healthcare research: A systematic review , author=. Journal of clinical epidemiology , volume=. 2021 , publisher=

2021

[31] [39]

BMJ Evidence-Based Medicine , volume=

Comparing data accuracy between structured abstracts and full-text journal articles: implications in their use for informing clinical decisions , author=. BMJ Evidence-Based Medicine , volume=. 2013 , publisher=

2013

[32] [40]

BMC medical research methodology , volume=

Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention , author=. BMC medical research methodology , volume=. 2015 , publisher=

2015

[33] [41]

Proceedings of the National Academy of Sciences , volume=

Misrepresentation and distortion of research in biomedical literature , author=. Proceedings of the National Academy of Sciences , volume=. 2018 , publisher=

2018

[34] [42]

PLoS medicine , volume=

CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration , author=. PLoS medicine , volume=. 2008 , publisher=

2008

[35] [43]

PLoS medicine , volume=

PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts , author=. PLoS medicine , volume=. 2013 , publisher=

2013

[36] [44]

Patterns , volume=

The landscape of biomedical research , author=. Patterns , volume=. 2024 , publisher=

2024

[37] [46]

Big Bird: Transformers for Longer Sequences , url =

Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , booktitle =. Big Bird: Transformers for Longer Sequences , url =

[38] [48]

Findings of the Association for Computational Linguistics: EACL 2023 , pages=

Long document summarization with top-down and bottom-up inference , author=. Findings of the Association for Computational Linguistics: EACL 2023 , pages=

2023

[39] [50]

International Conference on Learning Representations , volume=

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , author=. International Conference on Learning Representations , volume=

[40] [52]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

MarkupLM: Pre-training of text and markup language for visually rich document understanding , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[41] [54]

Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

Evaluating the factual consistency of abstractive text summarization , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

2020

[42] [56]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

Understanding faithfulness and reasoning of large language models on plain biomedical summaries , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024

[43] [58]

Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

Data selection curriculum for abstractive text summarization , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=

2023

[44] [59]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

Improving truthfulness of headline generation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

[45] [60]

Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume , pages=

Entity-level factual consistency of abstractive text summarization , author=. Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume , pages=

[46] [61]

Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

Learning to revise references for faithful summarization , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=

2022

[47] [62]

Asma Ben Abacha and Dina Demner-Fushman. 2019. On the summarization of consumer health questions. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 2228--2234

2019

[48] [63]

Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, and No \'e mie Elhadad. 2022. Learning to revise references for faithful summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4009--4027

2022

[49] [64]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150

Pith/arXiv arXiv 2020

[50] [65]

Enrique Bernal-Delgado and Elliot S Fisher. 2008. Abstracts in high profile journals often fail to report harm. BMC Medical Research Methodology, 8(1):14

2008

[51] [66]

Isabelle Boutron and Philippe Ravaud. 2018. Misrepresentation and distortion of research in biomedical literature. Proceedings of the National Academy of Sciences, 115(11):2613--2619

2018

[52] [67]

Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. https://doi.org/10.18653/v1/N18-2097 A discourse-aware attention model for abstractive summarization of long documents . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human...

work page doi:10.18653/v1/n18-2097 2018

[53] [68]

Daniel Deutsch and Dan Roth. 2021. https://doi.org/10.18653/v1/2021.conll-1.24 Understanding the extent to which content quality metrics measure the information quality of summaries . In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 300--309, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.conll-1.24 2021

[54] [69]

Jay DeYoung, Iz Beltagy, Madeleine van Zuylen, Bailey Kuehl, and Lucy Lu Wang. 2021. Ms2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7494--7513

2021

[55] [70]

Biaoyan Fang, Xiang Dai, and Sarvnaz Karimi. 2024. Understanding faithfulness and reasoning of large language models on plain biomedical summaries. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9890--9911

2024

[56] [71]

Alexios Gidiotis and Grigorios Tsoumakas. 2019. Structured summarization of academic publications. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 636--645. Springer

2019

[57] [72]

Alexios Gidiotis and Grigorios Tsoumakas. 2020. https://doi.org/10.1109/TASLP.2020.3037401 A divide-and-conquer approach to the summarization of long documents . IEEE/ACM Trans. Audio, Speech and Lang. Proc., 28:3029–3040

work page doi:10.1109/taslp.2020.3037401 2020

[58] [73]

Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. 2023. https://doi.org/10.18653/v1/2023.bionlp-1.44 Overview of the biolaysumm 2023 shared task on lay summarization of biomedical research articles . In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Sha...

work page doi:10.18653/v1/2023.bionlp-1.44 2023

[59] [74]

Tomas Goldsack, Zhihao Zhang, Chenghua Lin, and Carolina Scarton. 2022. Making science simple: Corpora for the lay summarisation of scientific literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10589--10604

2022

[60] [75]

Rita Gonz \'a lez-M \'a rquez, Luca Schmidt, Benjamin M Schmidt, Philipp Berens, and Dmitry Kobak. 2024. The landscape of biomedical research. Patterns, 5(6)

2024

[61] [76]

Yue Guo, Wei Qiu, Yizhong Wang, and Trevor Cohen. 2021. Automated lay language summarization of biomedical scientific reviews. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 160--168

2021

[62] [77]

Vivek Gupta, Prerna Bharti, Pegah Nokhiz, and Harish Karnick. 2021. https://doi.org/10.18653/v1/2021.acl-srw.30 SumPubMed : Summarization dataset of P ub M ed scientific articles . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student R...

work page doi:10.18653/v1/2021.acl-srw.30 2021

[63] [78]

Jia He, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X Wang, and Sadid Hasan. 2024. Does prompt formatting have any impact on llm performance? arXiv preprint arXiv:2411.10541

arXiv 2024

[64] [79]

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang. 2021. https://doi.org/10.18653/v1/2021.naacl-main.112 Efficient attentions for long document summarization . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419--1436, Online. Associa...

work page doi:10.18653/v1/2021.naacl-main.112 2021

[65] [80]

Sherif Ahmed Kamel and Tamer A El-Sobky. 2023. Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications. Research Integrity and Peer Review, 8(1):11

2023

[66] [81]

Wojciech Kry \'s ci \'n ski, Bryan McCann, Caiming Xiong, and Richard Socher. 2020. Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 9332--9346

2020

[67] [82]

and Hearst, Marti A

Philippe Laban, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. 2022. https://doi.org/10.1162/tacl_a_00453 S umma C : Re-visiting NLI -based models for inconsistency detection in summarization . Transactions of the Association for Computational Linguistics, 10:163--177

work page doi:10.1162/tacl_a_00453 2022

[68] [83]

Cl \'e ment Lazarus, Romana Haneef, Philippe Ravaud, and Isabelle Boutron. 2015. Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention. BMC medical research methodology, 15(1):85

2015

[69] [84]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 7871--7880

2020

[70] [85]

Guowei Li, Luciana PF Abbade, Ikunna Nwosu, Yanling Jin, Alvin Leenus, Muhammad Maaz, Mei Wang, Meha Bhatt, Laura Zielinski, Nitika Sanger, and 1 others. 2017. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC medical research methodology, 17(1):181

2017

[71] [86]

Junlong Li, Yiheng Xu, Lei Cui, and Furu Wei. 2022. Markuplm: Pre-training of text and markup language for visually rich document understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6078--6087

2022

[72] [87]

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.153 G -eval: NLG evaluation using gpt-4 with better human alignment . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511--2522, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.153 2023

[73] [88]

Zheheng Luo, Qianqian Xie, and Sophia Ananiadou. 2023. Citationsum: Citation-aware graph contrastive learning for scientific paper summarization. In Proceedings of the ACM web conference 2023, pages 1843--1852

2023

[74] [89]

Kazuki Matsumaru, Sho Takase, and Naoaki Okazaki. 2020. Improving truthfulness of headline generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1335--1346

2020

[75] [90]

Shafiya Mushtaq and K Veningston. 2026. Biomedical text summarization with large language models: methodologies, challenges, and future directions. International Journal of Data Science and Analytics, 22(1):29

2026

[76] [91]

Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, and Bing Xiang. 2021. Entity-level factual consistency of abstractive text summarization. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pages 2727--2733

2021

[77] [92]

Dafne P Nascimento, Raymond WJG Ostelo, Maurits W van Tulder, Gabrielle Z Gonzalez, Amanda C Araujo, Adriane A Vanin, and Leonardo OP Costa. 2021. Do not make clinical decisions based on abstracts of healthcare research: A systematic review. Journal of clinical epidemiology, 135:136--157

2021

[78] [93]

Bo Pang, Erik Nijkamp, Wojciech Kry \'s ci \'n ski, Silvio Savarese, Yingbo Zhou, and Caiming Xiong. 2023. Long document summarization with top-down and bottom-up inference. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1267--1284

2023

[79] [94]

RM Pitkin, MA Branagan, and LF Burmeister. 1999. https://doi.org/10.1001/jama.281.12.1110 Accuracy of data in abstracts of published research articles . JAMA, 281(12):1110--1111

work page doi:10.1001/jama.281.12.1110 1999

[80] [95]

Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. 2024. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In International Conference on Learning Representations, volume 2024, pages 25055--25083

2024