Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

Dongin Nam; Halil Kilicoglu; Joe Menke; Neil Smalheiser; Shufan Ming; Sylvey Lin

arxiv: 2605.20628 · v1 · pith:VXYWTEWXnew · submitted 2026-05-20 · 💻 cs.CL

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

Sylvey Lin , Joe Menke , Shufan Ming , Dongin Nam , Neil Smalheiser , Halil Kilicoglu This is my paper

Pith reviewed 2026-05-21 05:24 UTC · model grok-4.3

classification 💻 cs.CL

keywords biomedical abstract generationtraining-free summarizationzero-shot promptingrhetorical structurelarge language modelsfactuality evaluationPMC-MAD dataset

0 comments

The pith

Dividing full-text biomedical articles into rhetorical facets and using LLM prompts with refinement produces abstracts that are more novel while staying factually consistent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a training-free method to create abstracts for biomedical articles that lack them. It works by breaking the full text into five standard rhetorical sections, summarizing each one separately with language model prompts, and then combining and polishing the results for smooth flow. This approach matters because missing abstracts reduce the usefulness of many papers for search tools and knowledge building in biomedicine. Tests on a large set of 46,309 articles show gains in how much new phrasing the summaries use compared to pulling sentences directly or using trained models, without adding factual mistakes. The work also finds that overly detailed prompts can actually hurt accuracy, suggesting simpler strategies work better.

Core claim

DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence, resulting in improved abstractive novelty over baselines while maintaining factual consistency on the PMC-MAD dataset.

What carries the argument

The divide-prompt-refine process that applies the Background-Objective-Methods-Results-Conclusions rhetorical schema to organize parallel zero-shot summarizations followed by coherence refinement.

Load-bearing premise

The refinement stage can restore global discourse coherence without introducing factual errors or hallucinations that were not present in the individual facet summaries.

What would settle it

If automated or human fact-checking on a held-out portion of the PMC-MAD dataset reveals more factual inconsistencies in the DPR-BAG outputs than in the unrefined facet summaries, this would indicate the refinement step fails to preserve accuracy.

Figures

Figures reproduced from arXiv: 2605.20628 by Dongin Nam, Halil Kilicoglu, Joe Menke, Neil Smalheiser, Shufan Ming, Sylvey Lin.

**Figure 2.** Figure 2: Distribution of document token lengths in the [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Token Distribution Comparison [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Publication type distribution of PMC-MAD, [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

Biomedical abstracts play a critical role in downstream NLP applications, such as information retrieval, biocuration, and biomedical knowledge discovery. However, a non-trivial number of biomedical articles do not have abstracts, diminishing the utility of these articles for downstream tasks. We propose DPR-BAG (Divide, Prompt, and Refine for Biomedical Abstract Generation), a training-free, zero-shot framework that generates coherent and factually grounded abstracts for biomedical articles with full text but no abstract. DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions (BOMRC) schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence. On PMC-MAD, a distribution-aligned dataset of 46,309 biomedical articles, DPR-BAG improves abstractive novelty over strong extractive and fine-tuned baselines, while maintaining factual consistency. Our ablation study reveals a counterintuitive finding: increasing prompt complexity or explicitly injecting entity-level guidance can degrade factual alignment, highlighting the importance of controlled prompting strategies. These findings underscore the potential of training-free, structure-aware frameworks for scalable biomedical abstract generation in low-resource settings. Our data and code are available at https://huggingface.co/datasets/pmc-mad/PMC-MAD and https://github.com/ScienceNLP-Lab/MultiTagger-v2/tree/main/DPR-BAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DPR-BAG gives a practical training-free route to biomedical abstracts via BOMRC decomposition and LLM prompting, with released code but open questions on whether refinement preserves factuality.

read the letter

The key takeaway is that this paper introduces a training-free framework called DPR-BAG that generates abstracts for biomedical articles lacking them by dividing the full text into BOMRC rhetorical sections, summarizing each with an LLM in parallel, and refining the result for overall coherence. It shows gains on a large PMC dataset while releasing code and data. What the work does well is tackle a genuine practical issue in biomedical literature where many papers miss abstracts, which affects retrieval and curation. The evaluation uses a distribution-aligned set of 46,309 articles, and it outperforms extractive and fine-tuned baselines on abstractive novelty without losing factual consistency. The ablation revealing that more complex prompts or entity guidance can degrade alignment is a useful, counterintuitive observation that adds to prompting knowledge. Making the code and dataset public supports reproducibility and further work. The main soft spot is around the refinement stage. While it aims to restore global discourse, there's a risk it could introduce new factual errors or hallucinations not in the individual summaries, especially given the paper's finding on prompt sensitivity. The metrics don't seem to isolate the refinement's impact on factuality specifically, like comparing pre- and post-refinement outputs. Without full details on baseline setups or statistical significance, the strength of the improvements is a bit hard to gauge fully. This paper would appeal to researchers in biomedical NLP and information retrieval who need scalable, zero-shot solutions for handling incomplete documents. Readers focused on structure-aware summarization or low-resource settings would find the approach relevant. It deserves a serious referee because the problem is important, the method is straightforward and implementable, and the results are promising enough to warrant detailed review and potential improvements in evaluation.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DPR-BAG, a training-free, zero-shot framework for generating abstracts for biomedical articles that lack them. The approach divides the full text into BOMRC (Background, Objective, Methods, Results, Conclusions) rhetorical facets, generates parallel LLM-based summaries for each, and uses a refinement stage to restore global discourse coherence. On the PMC-MAD dataset comprising 46,309 distribution-aligned biomedical articles, DPR-BAG is shown to improve abstractive novelty compared to strong extractive and fine-tuned baselines while maintaining factual consistency. The ablation study highlights that increasing prompt complexity or adding entity-level guidance can degrade factual alignment, emphasizing controlled prompting.

Significance. Should the empirical results prove robust, this framework offers a valuable contribution to biomedical NLP by providing a scalable method for abstract generation in low-resource scenarios without requiring training data or fine-tuning. The release of the PMC-MAD dataset and code supports reproducibility and further research. The finding regarding prompt complexity provides a useful cautionary insight for LLM-based summarization tasks. The significance is moderated by the need to more thoroughly validate the refinement stage's effect on factual consistency to fully support the central claims.

major comments (2)

[Evaluation and Ablation Studies] The ablation study examines the effects of prompt complexity but does not isolate the contribution of the refinement stage to factual consistency. A direct comparison of factuality metrics (e.g., entity overlap or entailment scores) between the unrefined facet summaries and the final refined abstract is missing, which is critical to confirm that the refinement does not introduce new factual errors or hallucinations as raised in the central claim of maintained consistency.
[Experimental Results] The reported improvements in abstractive novelty on the PMC-MAD dataset lack accompanying statistical significance tests, confidence intervals, or details on multiple LLM sampling runs. Given the inherent variability in LLM outputs, this omission makes it difficult to assess the reliability of the gains over baselines.

minor comments (2)

[Introduction] The BOMRC schema is used without citing prior work on rhetorical structure in biomedical abstracts, which could strengthen the motivation.
[Method] Notation for the refinement prompt could be clarified with an example in the main text rather than appendix.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and agree that the suggested additions will strengthen the empirical support for our claims. We plan to incorporate these changes in the revised manuscript.

read point-by-point responses

Referee: [Evaluation and Ablation Studies] The ablation study examines the effects of prompt complexity but does not isolate the contribution of the refinement stage to factual consistency. A direct comparison of factuality metrics (e.g., entity overlap or entailment scores) between the unrefined facet summaries and the final refined abstract is missing, which is critical to confirm that the refinement does not introduce new factual errors or hallucinations as raised in the central claim of maintained consistency.

Authors: We agree that isolating the refinement stage's impact is essential to substantiate our claim of maintained factual consistency. In the revised version, we will add a direct comparison using factuality metrics such as entity overlap and entailment scores between the unrefined parallel facet summaries and the final refined abstract. This analysis will clarify whether the refinement step preserves alignment or introduces errors. revision: yes
Referee: [Experimental Results] The reported improvements in abstractive novelty on the PMC-MAD dataset lack accompanying statistical significance tests, confidence intervals, or details on multiple LLM sampling runs. Given the inherent variability in LLM outputs, this omission makes it difficult to assess the reliability of the gains over baselines.

Authors: We acknowledge the need for statistical rigor given LLM output variability. We will include statistical significance tests (such as paired t-tests), confidence intervals, and results from multiple independent sampling runs in the updated experimental results to better demonstrate the reliability of the observed improvements over baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: framework evaluated on held-out external data with independent baselines

full rationale

The paper describes a training-free Divide-Prompt-Refine framework that decomposes documents into BOMRC facets, generates parallel LLM summaries, and applies a refinement pass for coherence. Core claims rest on empirical results from the PMC-MAD dataset of 46,309 articles, compared against extractive and fine-tuned baselines. No equations, fitted parameters, or first-principles derivations appear; ablations test prompt variations but do not redefine outputs in terms of the framework itself. Evaluation uses held-out data and external metrics, keeping the result self-contained without reduction to author-defined inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that biomedical articles follow a consistent BOMRC rhetorical structure and that current LLMs can produce factually aligned summaries of individual facets when prompted simply.

axioms (1)

domain assumption Biomedical full-text articles can be reliably decomposed into the Background-Objective-Methods-Results-Conclusions (BOMRC) schema.
The decomposition step is presented as the foundation for parallel summarization.

pith-pipeline@v0.9.0 · 5805 in / 1266 out tokens · 35108 ms · 2026-05-21T05:24:34.147999+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions (BOMRC) schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

On PMC-MAD, a distribution-aligned dataset of 46,309 biomedical articles, DPR-BAG improves abstractive novelty over strong extractive and fine-tuned baselines, while maintaining factual consistency.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

English for Specific Purposes , volume=

Letters to the editor: Still vigorous after all these years?: A presentation of the discursive and linguistic features of the genre , author=. English for Specific Purposes , volume=. 2006 , publisher=

work page 2006
[2]

European Journal of Clinical Investigation , volume=

In-house editorials and journalistic pieces comprise a massive corpus in the scientific literature that can be improved , author=. European Journal of Clinical Investigation , volume=. 2025 , publisher=

work page 2025
[3]

Journal of biomedical informatics , volume=

Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , author=. Journal of biomedical informatics , volume=. 2012 , publisher=

work page 2012
[4]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972
[5]

Publications Manual , year = "1983", publisher =

work page 1983
[6]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[7]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[8]

Dan Gusfield , title =. 1997

work page 1997
[9]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[10]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[11]

When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in

Yizhou Zhang and Defu Cao and Lun Du and Qiang Fu and Yan Liu , booktitle=. When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in. 2025 , url=

work page 2025
[12]

Journal of Emerging Technologies in Web Intelligence , year=

A Survey of Text Summarization Extractive Techniques , author=. Journal of Emerging Technologies in Web Intelligence , year=

work page
[13]

ACM Trans

Wang, Tairan and Chen, Xiuying and Zhu, Qingqing and Guo, Taicheng and Gao, Shen and Lu, Zhiyong and Gao, Xin and Zhang, Xiangliang , title =. ACM Trans. Inf. Syst. , month = jun, articleno =. 2025 , issue_date =. doi:10.1145/3733597 , abstract =

work page doi:10.1145/3733597 2025
[14]

Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization , year=

Chernyshev, Daniil and Dobrov, Boris , journal=. Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization , year=

work page
[15]

A Divide-and-Conquer Approach to the Summarization of Long Documents , year=

Gidiotis, Alexios and Tsoumakas, Grigorios , journal=. A Divide-and-Conquer Approach to the Summarization of Long Documents , year=

work page
[16]

Improved Divide-and-Conquer Approach to Abstractive Summarization of Scientific Papers , year=

Shen, Xin and Lam, Wai , booktitle=. Improved Divide-and-Conquer Approach to Abstractive Summarization of Scientific Papers , year=

work page
[17]

AMIA Annual Symposium Proceedings , year =

Lin, Sylvey and Menke, Joseph and Holt, Arthur and Kilicoglu, Halil and Smalheiser, Neil , title =. AMIA Annual Symposium Proceedings , year =

work page
[18]

Multi-label Sequential Sentence Classification via Large Language Model

Lan, Mengfei and Zheng, Lecheng and Ming, Shufan and Kilicoglu, Halil. Multi-label Sequential Sentence Classification via Large Language Model. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.944

work page doi:10.18653/v1/2024.findings-emnlp.944 2024
[19]

D isco S core: Evaluating Text Generation with BERT and Discourse Coherence

Zhao, Wei and Strube, Michael and Eger, Steffen. D isco S core: Evaluating Text Generation with BERT and Discourse Coherence. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. doi:10.18653/v1/2023.eacl-main.278

work page doi:10.18653/v1/2023.eacl-main.278 2023
[20]

N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

Grusky, Max and Naaman, Mor and Artzi, Yoav. N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1065

work page doi:10.18653/v1/n18-1065 2018
[21]

and Manning, Christopher D

See, Abigail and Liu, Peter J. and Manning, Christopher D. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1099

work page doi:10.18653/v1/p17-1099 2017
[22]

2020 , eprint=

Longformer: The Long-Document Transformer , author=. 2020 , eprint=

work page 2020
[23]

doi: 10.18653/v1/N18-2097

Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Vo...

work page doi:10.18653/v1/n18-2097 2018
[24]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Improving Biomedical Information Retrieval with Neural Retrievers , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2022 , month=. doi:10.1609/aaai.v36i10.21352 , abstractNote=

work page doi:10.1609/aaai.v36i10.21352 2022
[25]

Ueda, Alberto and Santos, Rodrygo L. T. and Macdonald, Craig and Ounis, Iadh , title =. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3463075 , abstract =

work page doi:10.1145/3404835.3463075 2021
[26]

Database , volume =

Wiegers, Thomas C and Davis, Allan Peter and Wiegers, Jolene and Sciaky, Daniela and Barkalow, Fern and Wyatt, Brent and Strong, Melissa and McMorran, Roy and Abrar, Sakib and Mattingly, Carolyn J , title =. Database , volume =. 2025 , month =. doi:10.1093/database/baaf013 , url =

work page doi:10.1093/database/baaf013 2025
[27]

PLoS ONE , volume=

Towards effective clinical decision support systems: A systematic review , author=. PLoS ONE , volume=. 2022 , publisher=. doi:10.1371/journal.pone.0272846 , url=

work page doi:10.1371/journal.pone.0272846 2022
[28]

Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua , booktitle=

work page
[29]

Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries

Fang, Biaoyan and Dai, Xiang and Karimi, Sarvnaz. Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.578

work page doi:10.18653/v1/2024.findings-emnlp.578 2024
[30]

Applied Sciences , VOLUME =

Giarelis, Nikolaos and Mastrokostas, Charalampos and Karacapilidis, Nikos , TITLE =. Applied Sciences , VOLUME =. 2023 , NUMBER =

work page 2023
[31]

G en C ompare S um: a hybrid unsupervised summarization method using salience

Bishop, Jennifer and Xie, Qianqian and Ananiadou, Sophia. G en C ompare S um: a hybrid unsupervised summarization method using salience. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.22

work page doi:10.18653/v1/2022.bionlp-1.22 2022
[32]

L ong T 5: E fficient Text-To-Text Transformer for Long Sequences

Guo, Mandy and Ainslie, Joshua and Uthus, David and Onta \ n \'o n, Santiago and Ni, Jianmo and Sung, Yun-Hsuan and Yang, Yinfei. L ong T 5: E fficient Text-To-Text Transformer for Long Sequences. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.55

work page doi:10.18653/v1/2022.findings-naacl.55 2022
[33]

Adverse drug event detection and extraction from open data: A deep learning approach , journal =

Brandon Fan and Weiguo Fan and Carly Smith and Harold ``Skip'' Garner , keywords =. Adverse drug event detection and extraction from open data: A deep learning approach , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.ipm.2019.102131 , url =

work page doi:10.1016/j.ipm.2019.102131 2020
[34]

ACM Trans

Gu, Yu and Tinn, Robert and Cheng, Hao and Lucas, Michael and Usuyama, Naoto and Liu, Xiaodong and Naumann, Tristan and Gao, Jianfeng and Poon, Hoifung , title =. ACM Trans. Comput. Healthcare , month = oct, articleno =. 2021 , issue_date =. doi:10.1145/3458754 , abstract =

work page doi:10.1145/3458754 2021
[35]

, title =

Nuzzo, James L. , title =. Scientometrics , year =. doi:10.1007/s11192-021-04068-w , url =

work page doi:10.1007/s11192-021-04068-w
[36]

Waaijer, Cathelijn J. F. and van Bochove, Cornelis A. and van Eck, Nees Jan , title =. Scientometrics , year =. doi:10.1007/s11192-010-0205-9 , url =

work page doi:10.1007/s11192-010-0205-9
[37]

A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature

Chachra, Suchet and Ben Abacha, Asma and Shooshan, Sonya and Rodriguez, Laritza and Demner-Fushman, Dina. A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016

work page 2016
[38]

AMIA Annual Symposium Proceedings , volume=

Publication Type Tagging using Transformer Models and Multi-Label Classification , author=. AMIA Annual Symposium Proceedings , volume=. 2024 , publisher=

work page 2024
[39]

Does Prompt Formatting Have Any Impact on

Jia He and Mukund Rungta and David Koleczek and Arshdeep Sekhon and Franklin X Wang and Sadid Hasan , year=. Does Prompt Formatting Have Any Impact on. 2411.10541 , archivePrefix=

work page arXiv
[40]

and Hearst, Marti A

Laban, Philippe and Schnabel, Tobias and Bennett, Paul N. and Hearst, Marti A. , title =. Transactions of the Association for Computational Linguistics , volume =. 2022 , month =. doi:10.1162/tacl_a_00453 , url =

work page doi:10.1162/tacl_a_00453 2022
[41]

A lign S core: Evaluating Factual Consistency with A Unified Alignment Function

Zha, Yuheng and Yang, Yichi and Li, Ruichen and Hu, Zhiting. A lign S core: Evaluating Factual Consistency with A Unified Alignment Function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.634

work page doi:10.18653/v1/2023.acl-long.634 2023
[42]

M ini C heck: Efficient Fact-Checking of LLM s on Grounding Documents

Tang, Liyan and Laban, Philippe and Durrett, Greg. M ini C heck: Efficient Fact-Checking of LLM s on Grounding Documents. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.499

work page doi:10.18653/v1/2024.emnlp-main.499 2024
[43]

2021 , publisher=

Sybrandt, Justin and Safro, Ilya , journal=. 2021 , publisher=. doi:10.1371/journal.pone.0253905 , url=

work page doi:10.1371/journal.pone.0253905 2021
[44]

P ub M ed 200k RCT : a Dataset for Sequential Sentence Classification in Medical Abstracts

Dernoncourt, Franck and Lee, Ji Young. P ub M ed 200k RCT : a Dataset for Sequential Sentence Classification in Medical Abstracts. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2017

work page 2017
[45]

S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing

Neumann, Mark and King, Daniel and Beltagy, Iz and Ammar, Waleed. S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing. Proceedings of the 18th BioNLP Workshop and Shared Task. 2019. doi:10.18653/v1/W19-5034

work page doi:10.18653/v1/w19-5034 2019
[46]

Bodenreider, Olivier , journal =. The. 2004 , month =. doi:10.1093/nar/gkh061 , pmid =

work page doi:10.1093/nar/gkh061 2004
[47]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

work page 2004
[48]

Weinberger and Yoav Artzi , booktitle=

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , booktitle=. 2020 , url=

work page 2020
[49]

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Ladhak, Faisal and Durmus, Esin and He, He and Cardie, Claire and McKeown, Kathleen. Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.100

work page doi:10.18653/v1/2022.acl-long.100 2022
[50]

Pretrained Language Models for Sequential Sentence Classification

Cohan, Arman and Beltagy, Iz and King, Daniel and Dalvi, Bhavana and Weld, Dan. Pretrained Language Models for Sequential Sentence Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1383

work page doi:10.18653/v1/d19-1383 2019

[1] [1]

English for Specific Purposes , volume=

Letters to the editor: Still vigorous after all these years?: A presentation of the discursive and linguistic features of the genre , author=. English for Specific Purposes , volume=. 2006 , publisher=

work page 2006

[2] [2]

European Journal of Clinical Investigation , volume=

In-house editorials and journalistic pieces comprise a massive corpus in the scientific literature that can be improved , author=. European Journal of Clinical Investigation , volume=. 2025 , publisher=

work page 2025

[3] [3]

Journal of biomedical informatics , volume=

Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , author=. Journal of biomedical informatics , volume=. 2012 , publisher=

work page 2012

[4] [4]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972

[5] [5]

Publications Manual , year = "1983", publisher =

work page 1983

[6] [6]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[7] [7]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page

[8] [8]

Dan Gusfield , title =. 1997

work page 1997

[9] [9]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015

[10] [10]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page

[11] [11]

When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in

Yizhou Zhang and Defu Cao and Lun Du and Qiang Fu and Yan Liu , booktitle=. When Splitting Makes Stronger: A Theoretical and Empirical Analysis of Divide-and-Conquer Prompting in. 2025 , url=

work page 2025

[12] [12]

Journal of Emerging Technologies in Web Intelligence , year=

A Survey of Text Summarization Extractive Techniques , author=. Journal of Emerging Technologies in Web Intelligence , year=

work page

[13] [13]

ACM Trans

Wang, Tairan and Chen, Xiuying and Zhu, Qingqing and Guo, Taicheng and Gao, Shen and Lu, Zhiyong and Gao, Xin and Zhang, Xiangliang , title =. ACM Trans. Inf. Syst. , month = jun, articleno =. 2025 , issue_date =. doi:10.1145/3733597 , abstract =

work page doi:10.1145/3733597 2025

[14] [14]

Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization , year=

Chernyshev, Daniil and Dobrov, Boris , journal=. Investigating the Pre-Training Bias in Low-Resource Abstractive Summarization , year=

work page

[15] [15]

A Divide-and-Conquer Approach to the Summarization of Long Documents , year=

Gidiotis, Alexios and Tsoumakas, Grigorios , journal=. A Divide-and-Conquer Approach to the Summarization of Long Documents , year=

work page

[16] [16]

Improved Divide-and-Conquer Approach to Abstractive Summarization of Scientific Papers , year=

Shen, Xin and Lam, Wai , booktitle=. Improved Divide-and-Conquer Approach to Abstractive Summarization of Scientific Papers , year=

work page

[17] [17]

AMIA Annual Symposium Proceedings , year =

Lin, Sylvey and Menke, Joseph and Holt, Arthur and Kilicoglu, Halil and Smalheiser, Neil , title =. AMIA Annual Symposium Proceedings , year =

work page

[18] [18]

Multi-label Sequential Sentence Classification via Large Language Model

Lan, Mengfei and Zheng, Lecheng and Ming, Shufan and Kilicoglu, Halil. Multi-label Sequential Sentence Classification via Large Language Model. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.944

work page doi:10.18653/v1/2024.findings-emnlp.944 2024

[19] [19]

D isco S core: Evaluating Text Generation with BERT and Discourse Coherence

Zhao, Wei and Strube, Michael and Eger, Steffen. D isco S core: Evaluating Text Generation with BERT and Discourse Coherence. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 2023. doi:10.18653/v1/2023.eacl-main.278

work page doi:10.18653/v1/2023.eacl-main.278 2023

[20] [20]

N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies

Grusky, Max and Naaman, Mor and Artzi, Yoav. N ewsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi:10.18653/v1/N18-1065

work page doi:10.18653/v1/n18-1065 2018

[21] [21]

and Manning, Christopher D

See, Abigail and Liu, Peter J. and Manning, Christopher D. Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1099

work page doi:10.18653/v1/p17-1099 2017

[22] [22]

2020 , eprint=

Longformer: The Long-Document Transformer , author=. 2020 , eprint=

work page 2020

[23] [23]

doi: 10.18653/v1/N18-2097

Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Vo...

work page doi:10.18653/v1/n18-2097 2018

[24] [24]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

Improving Biomedical Information Retrieval with Neural Retrievers , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2022 , month=. doi:10.1609/aaai.v36i10.21352 , abstractNote=

work page doi:10.1609/aaai.v36i10.21352 2022

[25] [25]

Ueda, Alberto and Santos, Rodrygo L. T. and Macdonald, Craig and Ounis, Iadh , title =. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3463075 , abstract =

work page doi:10.1145/3404835.3463075 2021

[26] [26]

Database , volume =

Wiegers, Thomas C and Davis, Allan Peter and Wiegers, Jolene and Sciaky, Daniela and Barkalow, Fern and Wyatt, Brent and Strong, Melissa and McMorran, Roy and Abrar, Sakib and Mattingly, Carolyn J , title =. Database , volume =. 2025 , month =. doi:10.1093/database/baaf013 , url =

work page doi:10.1093/database/baaf013 2025

[27] [27]

PLoS ONE , volume=

Towards effective clinical decision support systems: A systematic review , author=. PLoS ONE , volume=. 2022 , publisher=. doi:10.1371/journal.pone.0272846 , url=

work page doi:10.1371/journal.pone.0272846 2022

[28] [28]

Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua , booktitle=

work page

[29] [29]

Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries

Fang, Biaoyan and Dai, Xiang and Karimi, Sarvnaz. Understanding Faithfulness and Reasoning of Large Language Models on Plain Biomedical Summaries. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.578

work page doi:10.18653/v1/2024.findings-emnlp.578 2024

[30] [30]

Applied Sciences , VOLUME =

Giarelis, Nikolaos and Mastrokostas, Charalampos and Karacapilidis, Nikos , TITLE =. Applied Sciences , VOLUME =. 2023 , NUMBER =

work page 2023

[31] [31]

G en C ompare S um: a hybrid unsupervised summarization method using salience

Bishop, Jennifer and Xie, Qianqian and Ananiadou, Sophia. G en C ompare S um: a hybrid unsupervised summarization method using salience. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.22

work page doi:10.18653/v1/2022.bionlp-1.22 2022

[32] [32]

L ong T 5: E fficient Text-To-Text Transformer for Long Sequences

Guo, Mandy and Ainslie, Joshua and Uthus, David and Onta \ n \'o n, Santiago and Ni, Jianmo and Sung, Yun-Hsuan and Yang, Yinfei. L ong T 5: E fficient Text-To-Text Transformer for Long Sequences. Findings of the Association for Computational Linguistics: NAACL 2022. 2022. doi:10.18653/v1/2022.findings-naacl.55

work page doi:10.18653/v1/2022.findings-naacl.55 2022

[33] [33]

Adverse drug event detection and extraction from open data: A deep learning approach , journal =

Brandon Fan and Weiguo Fan and Carly Smith and Harold ``Skip'' Garner , keywords =. Adverse drug event detection and extraction from open data: A deep learning approach , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.ipm.2019.102131 , url =

work page doi:10.1016/j.ipm.2019.102131 2020

[34] [34]

ACM Trans

Gu, Yu and Tinn, Robert and Cheng, Hao and Lucas, Michael and Usuyama, Naoto and Liu, Xiaodong and Naumann, Tristan and Gao, Jianfeng and Poon, Hoifung , title =. ACM Trans. Comput. Healthcare , month = oct, articleno =. 2021 , issue_date =. doi:10.1145/3458754 , abstract =

work page doi:10.1145/3458754 2021

[35] [35]

, title =

Nuzzo, James L. , title =. Scientometrics , year =. doi:10.1007/s11192-021-04068-w , url =

work page doi:10.1007/s11192-021-04068-w

[36] [36]

Waaijer, Cathelijn J. F. and van Bochove, Cornelis A. and van Eck, Nees Jan , title =. Scientometrics , year =. doi:10.1007/s11192-010-0205-9 , url =

work page doi:10.1007/s11192-010-0205-9

[37] [37]

A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature

Chachra, Suchet and Ben Abacha, Asma and Shooshan, Sonya and Rodriguez, Laritza and Demner-Fushman, Dina. A Hybrid Approach to Generation of Missing Abstracts in Biomedical Literature. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016

work page 2016

[38] [38]

AMIA Annual Symposium Proceedings , volume=

Publication Type Tagging using Transformer Models and Multi-Label Classification , author=. AMIA Annual Symposium Proceedings , volume=. 2024 , publisher=

work page 2024

[39] [39]

Does Prompt Formatting Have Any Impact on

Jia He and Mukund Rungta and David Koleczek and Arshdeep Sekhon and Franklin X Wang and Sadid Hasan , year=. Does Prompt Formatting Have Any Impact on. 2411.10541 , archivePrefix=

work page arXiv

[40] [40]

and Hearst, Marti A

Laban, Philippe and Schnabel, Tobias and Bennett, Paul N. and Hearst, Marti A. , title =. Transactions of the Association for Computational Linguistics , volume =. 2022 , month =. doi:10.1162/tacl_a_00453 , url =

work page doi:10.1162/tacl_a_00453 2022

[41] [41]

A lign S core: Evaluating Factual Consistency with A Unified Alignment Function

Zha, Yuheng and Yang, Yichi and Li, Ruichen and Hu, Zhiting. A lign S core: Evaluating Factual Consistency with A Unified Alignment Function. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.acl-long.634

work page doi:10.18653/v1/2023.acl-long.634 2023

[42] [42]

M ini C heck: Efficient Fact-Checking of LLM s on Grounding Documents

Tang, Liyan and Laban, Philippe and Durrett, Greg. M ini C heck: Efficient Fact-Checking of LLM s on Grounding Documents. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.499

work page doi:10.18653/v1/2024.emnlp-main.499 2024

[43] [43]

2021 , publisher=

Sybrandt, Justin and Safro, Ilya , journal=. 2021 , publisher=. doi:10.1371/journal.pone.0253905 , url=

work page doi:10.1371/journal.pone.0253905 2021

[44] [44]

P ub M ed 200k RCT : a Dataset for Sequential Sentence Classification in Medical Abstracts

Dernoncourt, Franck and Lee, Ji Young. P ub M ed 200k RCT : a Dataset for Sequential Sentence Classification in Medical Abstracts. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2017

work page 2017

[45] [45]

S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing

Neumann, Mark and King, Daniel and Beltagy, Iz and Ammar, Waleed. S cispa C y: F ast and R obust M odels for B iomedical N atural L anguage P rocessing. Proceedings of the 18th BioNLP Workshop and Shared Task. 2019. doi:10.18653/v1/W19-5034

work page doi:10.18653/v1/w19-5034 2019

[46] [46]

Bodenreider, Olivier , journal =. The. 2004 , month =. doi:10.1093/nar/gkh061 , pmid =

work page doi:10.1093/nar/gkh061 2004

[47] [47]

ROUGE : A Package for Automatic Evaluation of Summaries

Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

work page 2004

[48] [48]

Weinberger and Yoav Artzi , booktitle=

Tianyi Zhang and Varsha Kishore and Felix Wu and Kilian Q. Weinberger and Yoav Artzi , booktitle=. 2020 , url=

work page 2020

[49] [49]

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Ladhak, Faisal and Durmus, Esin and He, He and Cardie, Claire and McKeown, Kathleen. Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022. doi:10.18653/v1/2022.acl-long.100

work page doi:10.18653/v1/2022.acl-long.100 2022

[50] [50]

Pretrained Language Models for Sequential Sentence Classification

Cohan, Arman and Beltagy, Iz and King, Daniel and Dalvi, Bhavana and Weld, Dan. Pretrained Language Models for Sequential Sentence Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1383

work page doi:10.18653/v1/d19-1383 2019