Less is More: Quality-Aware Training Data Selection for Scientific Summarization
Pith reviewed 2026-06-25 23:40 UTC · model grok-4.3
The pith
Quality-aware selection of training abstracts outperforms random sampling at matched sizes and can match larger random sets on factuality metrics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Author-written abstracts vary substantially in alignment with their full articles. Source-grounded and model-based quality metrics identify higher-quality subsets. Training on these subsets yields better factuality-oriented performance than random sampling at equal training size and can reach or surpass larger random subsets.
What carries the argument
Quality scoring of reference abstracts with source-grounded and model-based metrics to select training-data subsets for summarization models.
If this is right
- Fewer but higher-quality examples can replace larger volumes of lower-quality examples without loss of factuality performance.
- Filtering low-alignment abstracts before training raises the efficiency of data use in scientific summarization.
- Reference quality acts as a limiting factor on what models can learn from author abstracts.
- Quality-aware selection offers a direct way to improve training when high-quality labeled data remain scarce.
Where Pith is reading between the lines
- The same selection logic could be tested on other long-document summarization domains where reference quality also varies.
- If the metrics generalize, they could reduce the total compute needed to reach a target factuality level.
- The approach invites direct comparison against other data-filtering strategies such as perplexity-based or diversity-based selection.
Load-bearing premise
The metrics correctly identify which abstracts will produce models with higher factuality when used as training targets.
What would settle it
Train summarization models on the metric-selected high-quality subsets and observe no gain or a loss in factuality metrics relative to random subsets of identical size.
Figures
read the original abstract
Scientific long-document summarization datasets commonly treat author-written abstracts as gold reference summaries, although their quality and alignment with the source article vary. At the same time, publicly available scientific summarization datasets remain limited in scale and structure for modern long-context models. In this work, we address both challenges by a) constructing and releasing one of the largest biomedical and life science datasets for long-document summarization, containing 1.88 million PMC articles, and b) analyzing the reference quality of author-written abstracts with source-grounded and model-based metrics. We show that author-written abstracts vary in their alignment with the full article and that these quality signals can guide training-data selection. Training on selected high-quality subsets outperforms random sampling at matched training sizes and can match or exceed larger random subsets on factuality-oriented metrics. Our findings suggest that reference quality is an important factor in scientific summarization and that quality-aware data selection can improve training efficiency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs and releases a dataset of 1.88 million PMC articles for biomedical long-document summarization. It analyzes author-written abstracts using source-grounded and model-based quality metrics, then shows that training summarization models on high-quality subsets selected via these metrics outperforms random sampling at matched training sizes and can match or exceed performance from larger random subsets on factuality-oriented metrics.
Significance. If the central empirical result holds after addressing confounds, the work would demonstrate that reference quality is a load-bearing factor in scientific summarization training and that quality-aware selection improves efficiency. The release of the large-scale dataset is a concrete contribution that could support further research on long-context models.
major comments (3)
- [§5] §5 (Experiments): The comparison of quality-selected vs. random subsets at matched sizes does not report any controls or ablations for correlated subset properties (e.g., abstract length, lexical diversity, or domain coverage). Without these, it is impossible to isolate whether the reported factuality gains are caused by the quality metrics or by incidental differences between the subsets.
- [§5.2, Table 3] §5.2 and Table 3: No statistical significance tests, confidence intervals, or effect sizes are provided for the factuality metric improvements. The abstract claims outperformance, but the lack of these details leaves the strength of the evidence unclear.
- [§4] §4 (Quality Analysis): The source-grounded and model-based metrics are used to filter data, yet the paper does not test whether subsets selected by these metrics differ systematically from random subsets on non-quality dimensions that could affect downstream training (e.g., via correlation analysis or matched sampling on length).
minor comments (2)
- [Abstract, §1] The abstract and §1 should explicitly name the factuality metrics (e.g., FactCC, SummaC) and the exact model architectures used in the downstream experiments.
- [Figure 2] Figure 2 caption and axis labels use inconsistent terminology for 'quality score' vs. 'alignment score'; standardize notation across figures and text.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on potential confounds and statistical reporting. We address each major comment below.
read point-by-point responses
-
Referee: [§5] §5 (Experiments): The comparison of quality-selected vs. random subsets at matched sizes does not report any controls or ablations for correlated subset properties (e.g., abstract length, lexical diversity, or domain coverage). Without these, it is impossible to isolate whether the reported factuality gains are caused by the quality metrics or by incidental differences between the subsets.
Authors: We agree that additional controls are needed to better isolate the contribution of the quality metrics. In the revised manuscript we will add correlation analyses and ablations that compare abstract length, lexical diversity, and domain coverage between the quality-selected subsets and the random subsets of matched size. revision: yes
-
Referee: [§5.2, Table 3] §5.2 and Table 3: No statistical significance tests, confidence intervals, or effect sizes are provided for the factuality metric improvements. The abstract claims outperformance, but the lack of these details leaves the strength of the evidence unclear.
Authors: We acknowledge that the current presentation would benefit from formal statistical reporting. We will add statistical significance tests, confidence intervals, and effect sizes for the factuality metrics in §5.2 and Table 3 of the revised version. revision: yes
-
Referee: [§4] §4 (Quality Analysis): The source-grounded and model-based metrics are used to filter data, yet the paper does not test whether subsets selected by these metrics differ systematically from random subsets on non-quality dimensions that could affect downstream training (e.g., via correlation analysis or matched sampling on length).
Authors: This concern is closely related to the first comment. We will extend the analysis in §4 to include explicit comparisons (via correlation and matched-sampling checks) of non-quality properties such as length between the quality-selected and random subsets. revision: yes
Circularity Check
No circularity: empirical dataset construction and subset comparison
full rationale
The paper's core contribution is the release of a 1.88M-article dataset followed by empirical training experiments that compare quality-filtered subsets against random subsets of matched size. No equations, fitted parameters, or self-citations are invoked to derive the reported performance gains; the outperformance is measured directly on held-out test sets. The derivation chain is therefore self-contained against external benchmarks and contains no self-definitional, fitted-input, or self-citation reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Author-written abstracts serve as usable but variable-quality reference summaries for training summarization models.
Reference graph
Works this paper leans on
-
[1]
ROUGE : A Package for Automatic Evaluation of Summaries
Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004
2004
-
[2]
International Conference on Learning Representations , year=
BERTScore: Evaluating Text Generation with BERT , author=. International Conference on Learning Representations , year=
-
[7]
Advances in neural information processing systems , volume=
Attention is all you need , author=. Advances in neural information processing systems , volume=
-
[8]
Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
-
[9]
Journal of machine learning research , volume=
Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of machine learning research , volume=
-
[10]
International conference on machine learning , pages=
Pegasus: Pre-training with extracted gap-sentences for abstractive summarization , author=. International conference on machine learning , pages=. 2020 , organization=
2020
-
[11]
Advances in neural information processing systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
-
[12]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[13]
International Journal of Data Science and Analytics , volume=
Biomedical text summarization with large language models: methodologies, challenges, and future directions , author=. International Journal of Data Science and Analytics , volume=. 2026 , publisher=
2026
-
[14]
Bioinformatics , volume=
BioBERT: a pre-trained biomedical language representation model for biomedical text mining , author=. Bioinformatics , volume=. 2020 , publisher=
2020
-
[15]
ACM Transactions on Computing for Healthcare (HEALTH) , volume=
Domain-specific language model pretraining for biomedical natural language processing , author=. ACM Transactions on Computing for Healthcare (HEALTH) , volume=. 2021 , publisher=
2021
-
[17]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Summn: A multi-stage summarization framework for long input dialogues and documents , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[18]
Advances in neural information processing systems , volume=
Teaching machines to read and comprehend , author=. Advances in neural information processing systems , volume=
-
[19]
Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206
-
[20]
N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
Grusky, Max and Naaman, Mor and Artzi, Yoav. N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1065
-
[21]
QMS um: A New Benchmark for Query-based Multi-domain Meeting Summarization
Zhong, Ming and Yin, Da and Yu, Tao and Zaidi, Ahmad and Mutuma, Mutethia and Jha, Rahul and Awadallah, Ahmed Hassan and Celikyilmaz, Asli and Liu, Yang and Qiu, Xipeng and Radev, Dragomir. QMS um: A New Benchmark for Query-based Multi-domain Meeting Summarization. Proceedings of the 2021 Conference of the North American Chapter of the Association for Com...
-
[22]
SAMS um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander. SAMS um Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization. Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019. doi:10.18653/v1/D19-5409
-
[23]
Chen, Yulong and Liu, Yang and Chen, Liang and Zhang, Yue. D ialog S um: A Real-Life Scenario Dialogue Summarization Dataset. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.449
-
[24]
B ill S um: A Corpus for Automatic Summarization of US Legislation
Kornilova, Anastassia and Eidelman, Vladimir. B ill S um: A Corpus for Automatic Summarization of US Legislation. Proceedings of the 2nd Workshop on New Frontiers in Summarization. 2019. doi:10.18653/v1/D19-5406
-
[27]
Proceedings of the ACM web conference 2023 , pages=
Citationsum: Citation-aware graph contrastive learning for scientific paper summarization , author=. Proceedings of the ACM web conference 2023 , pages=
2023
-
[28]
Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=
Structured summarization of academic publications , author=. Joint European Conference on Machine Learning and Knowledge Discovery in Databases , pages=. 2019 , organization=
2019
-
[29]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
Making science simple: Corpora for the lay summarisation of scientific literature , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
2022
-
[31]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Automated lay language summarization of biomedical scientific reviews , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[32]
Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
On the summarization of consumer health questions , author=. Proceedings of the 57th annual meeting of the association for computational linguistics , pages=
-
[33]
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=
Ms2: Multi-document summarization of medical studies , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=
2021
-
[34]
JAMA , year =
Accuracy of Data in Abstracts of Published Research Articles , author =. JAMA , year =
-
[35]
BMC Medical Research Methodology , volume=
Abstracts in high profile journals often fail to report harm , author=. BMC Medical Research Methodology , volume=. 2008 , publisher=
2008
-
[36]
Research Integrity and Peer Review , volume=
Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications , author=. Research Integrity and Peer Review , volume=. 2023 , publisher=
2023
-
[37]
BMC medical research methodology , volume=
A scoping review of comparisons between abstracts and full reports in primary biomedical research , author=. BMC medical research methodology , volume=. 2017 , publisher=
2017
-
[38]
Journal of clinical epidemiology , volume=
Do not make clinical decisions based on abstracts of healthcare research: A systematic review , author=. Journal of clinical epidemiology , volume=. 2021 , publisher=
2021
-
[39]
BMJ Evidence-Based Medicine , volume=
Comparing data accuracy between structured abstracts and full-text journal articles: implications in their use for informing clinical decisions , author=. BMJ Evidence-Based Medicine , volume=. 2013 , publisher=
2013
-
[40]
BMC medical research methodology , volume=
Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention , author=. BMC medical research methodology , volume=. 2015 , publisher=
2015
-
[41]
Proceedings of the National Academy of Sciences , volume=
Misrepresentation and distortion of research in biomedical literature , author=. Proceedings of the National Academy of Sciences , volume=. 2018 , publisher=
2018
-
[42]
PLoS medicine , volume=
CONSORT for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration , author=. PLoS medicine , volume=. 2008 , publisher=
2008
-
[43]
PLoS medicine , volume=
PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts , author=. PLoS medicine , volume=. 2013 , publisher=
2013
-
[44]
Patterns , volume=
The landscape of biomedical research , author=. Patterns , volume=. 2024 , publisher=
2024
-
[46]
Big Bird: Transformers for Longer Sequences , url =
Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and Ahmed, Amr , booktitle =. Big Bird: Transformers for Longer Sequences , url =
-
[48]
Findings of the Association for Computational Linguistics: EACL 2023 , pages=
Long document summarization with top-down and bottom-up inference , author=. Findings of the Association for Computational Linguistics: EACL 2023 , pages=
2023
-
[50]
International Conference on Learning Representations , volume=
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting , author=. International Conference on Learning Representations , volume=
-
[52]
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
MarkupLM: Pre-training of text and markup language for visually rich document understanding , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[54]
Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=
Evaluating the factual consistency of abstractive text summarization , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=
2020
-
[56]
Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
Understanding faithfulness and reasoning of large language models on plain biomedical summaries , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
2024
-
[58]
Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
Data selection curriculum for abstractive text summarization , author=. Findings of the Association for Computational Linguistics: EMNLP 2023 , pages=
2023
-
[59]
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
Improving truthfulness of headline generation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=
-
[60]
Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume , pages=
Entity-level factual consistency of abstractive text summarization , author=. Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume , pages=
-
[61]
Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=
Learning to revise references for faithful summarization , author=. Findings of the Association for Computational Linguistics: EMNLP 2022 , pages=
2022
-
[62]
Asma Ben Abacha and Dina Demner-Fushman. 2019. On the summarization of consumer health questions. In Proceedings of the 57th annual meeting of the association for computational linguistics, pages 2228--2234
2019
-
[63]
Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, and No \'e mie Elhadad. 2022. Learning to revise references for faithful summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4009--4027
2022
-
[64]
Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150
Pith/arXiv arXiv 2020
-
[65]
Enrique Bernal-Delgado and Elliot S Fisher. 2008. Abstracts in high profile journals often fail to report harm. BMC Medical Research Methodology, 8(1):14
2008
-
[66]
Isabelle Boutron and Philippe Ravaud. 2018. Misrepresentation and distortion of research in biomedical literature. Proceedings of the National Academy of Sciences, 115(11):2613--2619
2018
-
[67]
Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan Kim, Walter Chang, and Nazli Goharian. 2018. https://doi.org/10.18653/v1/N18-2097 A discourse-aware attention model for abstractive summarization of long documents . In Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human...
-
[68]
Daniel Deutsch and Dan Roth. 2021. https://doi.org/10.18653/v1/2021.conll-1.24 Understanding the extent to which content quality metrics measure the information quality of summaries . In Proceedings of the 25th Conference on Computational Natural Language Learning, pages 300--309, Online. Association for Computational Linguistics
-
[69]
Jay DeYoung, Iz Beltagy, Madeleine van Zuylen, Bailey Kuehl, and Lucy Lu Wang. 2021. Ms2: Multi-document summarization of medical studies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7494--7513
2021
-
[70]
Biaoyan Fang, Xiang Dai, and Sarvnaz Karimi. 2024. Understanding faithfulness and reasoning of large language models on plain biomedical summaries. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9890--9911
2024
-
[71]
Alexios Gidiotis and Grigorios Tsoumakas. 2019. Structured summarization of academic publications. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 636--645. Springer
2019
-
[72]
Alexios Gidiotis and Grigorios Tsoumakas. 2020. https://doi.org/10.1109/TASLP.2020.3037401 A divide-and-conquer approach to the summarization of long documents . IEEE/ACM Trans. Audio, Speech and Lang. Proc., 28:3029–3040
-
[73]
Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. 2023. https://doi.org/10.18653/v1/2023.bionlp-1.44 Overview of the biolaysumm 2023 shared task on lay summarization of biomedical research articles . In Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Sha...
-
[74]
Tomas Goldsack, Zhihao Zhang, Chenghua Lin, and Carolina Scarton. 2022. Making science simple: Corpora for the lay summarisation of scientific literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10589--10604
2022
-
[75]
Rita Gonz \'a lez-M \'a rquez, Luca Schmidt, Benjamin M Schmidt, Philipp Berens, and Dmitry Kobak. 2024. The landscape of biomedical research. Patterns, 5(6)
2024
-
[76]
Yue Guo, Wei Qiu, Yizhong Wang, and Trevor Cohen. 2021. Automated lay language summarization of biomedical scientific reviews. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 160--168
2021
-
[77]
Vivek Gupta, Prerna Bharti, Pegah Nokhiz, and Harish Karnick. 2021. https://doi.org/10.18653/v1/2021.acl-srw.30 SumPubMed : Summarization dataset of P ub M ed scientific articles . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student R...
-
[78]
Jia He, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X Wang, and Sadid Hasan. 2024. Does prompt formatting have any impact on llm performance? arXiv preprint arXiv:2411.10541
arXiv 2024
-
[79]
Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang. 2021. https://doi.org/10.18653/v1/2021.naacl-main.112 Efficient attentions for long document summarization . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419--1436, Online. Associa...
-
[80]
Sherif Ahmed Kamel and Tamer A El-Sobky. 2023. Reporting quality of abstracts and inconsistencies with full text articles in pediatric orthopedic publications. Research Integrity and Peer Review, 8(1):11
2023
-
[81]
Wojciech Kry \'s ci \'n ski, Bryan McCann, Caiming Xiong, and Richard Socher. 2020. Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 9332--9346
2020
-
[82]
Philippe Laban, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. 2022. https://doi.org/10.1162/tacl_a_00453 S umma C : Re-visiting NLI -based models for inconsistency detection in summarization . Transactions of the Association for Computational Linguistics, 10:163--177
-
[83]
Cl \'e ment Lazarus, Romana Haneef, Philippe Ravaud, and Isabelle Boutron. 2015. Classification and prevalence of spin in abstracts of non-randomized studies evaluating an intervention. BMC medical research methodology, 15(1):85
2015
-
[84]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 7871--7880
2020
-
[85]
Guowei Li, Luciana PF Abbade, Ikunna Nwosu, Yanling Jin, Alvin Leenus, Muhammad Maaz, Mei Wang, Meha Bhatt, Laura Zielinski, Nitika Sanger, and 1 others. 2017. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC medical research methodology, 17(1):181
2017
-
[86]
Junlong Li, Yiheng Xu, Lei Cui, and Furu Wei. 2022. Markuplm: Pre-training of text and markup language for visually rich document understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6078--6087
2022
-
[87]
Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.153 G -eval: NLG evaluation using gpt-4 with better human alignment . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511--2522, Singapore. Association for Computational Linguistics
-
[88]
Zheheng Luo, Qianqian Xie, and Sophia Ananiadou. 2023. Citationsum: Citation-aware graph contrastive learning for scientific paper summarization. In Proceedings of the ACM web conference 2023, pages 1843--1852
2023
-
[89]
Kazuki Matsumaru, Sho Takase, and Naoaki Okazaki. 2020. Improving truthfulness of headline generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1335--1346
2020
-
[90]
Shafiya Mushtaq and K Veningston. 2026. Biomedical text summarization with large language models: methodologies, challenges, and future directions. International Journal of Data Science and Analytics, 22(1):29
2026
-
[91]
Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, and Bing Xiang. 2021. Entity-level factual consistency of abstractive text summarization. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume, pages 2727--2733
2021
-
[92]
Dafne P Nascimento, Raymond WJG Ostelo, Maurits W van Tulder, Gabrielle Z Gonzalez, Amanda C Araujo, Adriane A Vanin, and Leonardo OP Costa. 2021. Do not make clinical decisions based on abstracts of healthcare research: A systematic review. Journal of clinical epidemiology, 135:136--157
2021
-
[93]
Bo Pang, Erik Nijkamp, Wojciech Kry \'s ci \'n ski, Silvio Savarese, Yingbo Zhou, and Caiming Xiong. 2023. Long document summarization with top-down and bottom-up inference. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1267--1284
2023
-
[94]
RM Pitkin, MA Branagan, and LF Burmeister. 1999. https://doi.org/10.1001/jama.281.12.1110 Accuracy of data in abstracts of published research articles . JAMA, 281(12):1110--1111
-
[95]
Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. 2024. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In International Conference on Learning Representations, volume 2024, pages 25055--25083
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.