Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?
Pith reviewed 2026-05-23 03:10 UTC · model grok-4.3
The pith
Large language models are more susceptible to spin in medical abstracts than humans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.
What carries the argument
Direct comparison of LLM and human answers on question-answering and summarization tasks using original versus spun versions of the same medical trial abstracts.
Load-bearing premise
The chosen abstracts and the particular spin changes applied to them represent the spin that LLMs will meet in real medical literature.
What would settle it
A larger study using naturally occurring spin in a broader sample of published abstracts where LLMs match or beat human resistance to spin would undermine the central finding.
Figures
read the original abstract
Medical research faces well-documented challenges in translating novel treatments into clinical practice. Publishing incentives encourage researchers to present "positive" findings, even when empirical results are equivocal. Consequently, it is well-documented that authors often spin study results, especially in article abstracts. Such spin can influence clinician interpretation of evidence and may affect patient care decisions. In this study, we ask whether the interpretation of trial results offered by Large Language Models (LLMs) is similarly affected by spin. This is important since LLMs are increasingly being used to trawl through and synthesize published medical evidence. We evaluated 22 LLMs and found that they are across the board more susceptible to spin than humans. They might also propagate spin into their outputs: We find evidence, e.g., that LLMs implicitly incorporate spin into plain language summaries that they generate. We also find, however, that LLMs are generally capable of recognizing spin, and can be prompted in a way to mitigate spin's impact on LLM outputs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical study of 22 LLMs on question-answering and summarization tasks using original versus spin-manipulated medical abstracts. It compares LLM outputs to human baselines and concludes that LLMs are across the board more susceptible to spin than humans, may propagate spin into generated plain-language summaries, yet remain capable of recognizing spin and can be prompted to mitigate its effects.
Significance. If the central empirical result holds after appropriate controls, the work is significant for medical AI applications because LLMs are already used to synthesize published evidence; greater susceptibility could systematically bias downstream clinical interpretations. The positive finding that targeted prompting reduces the effect supplies a concrete, immediately usable mitigation strategy. The study also supplies a reusable testbed of spun abstracts and human baselines that future work can extend.
major comments (2)
- [§4 and §3.2] §4 (Results) and §3.2 (Prompting regime): the headline claim that LLMs are 'across the board more susceptible to spin than humans' rests on accuracy/bias differences between original and spun conditions. The manuscript simultaneously reports that LLMs 'are generally capable of recognizing spin' when explicitly prompted; without an explicit test showing that the susceptibility gap persists after controlling for instruction-following ability, prompt length, and lexical/factual drift introduced by the spin edits, the observed difference could be an artifact of weaker instruction adherence rather than spin-specific vulnerability.
- [§3.1] §3.1 (Abstract selection and spin manipulation): the weakest assumption is that the chosen abstracts and the specific spin edits are representative of real-world medical literature. No quantitative justification (e.g., distribution of spin types, journal impact, or year range) is supplied to support generalizability of the susceptibility finding to the broader corpus LLMs will encounter.
minor comments (2)
- [Abstract] Abstract: the phrase 'across the board' is imprecise; the results section should state the range of model families, sizes, and training regimes for which the susceptibility ordering holds.
- [Results figures/tables] Table or figure captions (wherever the human-LLM comparison is presented): include exact sample sizes of abstracts, number of human raters, and the statistical test plus effect size used for the 'more susceptible' claim.
Simulated Author's Rebuttal
We appreciate the referee's detailed feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where revisions are warranted, we indicate the changes to be made in the revised version.
read point-by-point responses
-
Referee: [§4 and §3.2] §4 (Results) and §3.2 (Prompting regime): the headline claim that LLMs are 'across the board more susceptible to spin than humans' rests on accuracy/bias differences between original and spun conditions. The manuscript simultaneously reports that LLMs 'are generally capable of recognizing spin' when explicitly prompted; without an explicit test showing that the susceptibility gap persists after controlling for instruction-following ability, prompt length, and lexical/factual drift introduced by the spin edits, the observed difference could be an artifact of weaker instruction adherence rather than spin-specific vulnerability.
Authors: We agree that distinguishing spin-specific vulnerability from general instruction-following differences is important. Our primary experiments use consistent prompting across LLMs and humans without spin-specific instructions, mirroring typical usage. The recognition capability is demonstrated in a separate prompting condition. To strengthen the claim, we will add a control experiment in the revision where we normalize for instruction adherence by using a standardized instruction-following prompt and measure the remaining gap. This will clarify whether the susceptibility is spin-specific. revision: partial
-
Referee: [§3.1] §3.1 (Abstract selection and spin manipulation): the weakest assumption is that the chosen abstracts and the specific spin edits are representative of real-world medical literature. No quantitative justification (e.g., distribution of spin types, journal impact, or year range) is supplied to support generalizability of the susceptibility finding to the broader corpus LLMs will encounter.
Authors: The selection of abstracts was based on a curated set of medical trial abstracts from recent publications, with spin manipulations designed to reflect common types identified in prior literature on spin in medical abstracts. While we did not provide a full quantitative distribution in the original submission, the spin types used (e.g., overstatement of efficacy, omission of limitations) are drawn from established taxonomies in the field. In the revision, we will include additional details on the selection criteria, including the range of journals and years, and a breakdown of spin types to better support generalizability. revision: yes
Circularity Check
No significant circularity; empirical evaluation only
full rationale
This is a purely empirical study involving LLM evaluations on spun vs. original medical abstracts, with direct comparisons to human baselines. No derivations, equations, fitted parameters, self-citations as load-bearing premises, or renamings of results are present. All claims rest on external data collection and human comparisons rather than any self-referential construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Spin in abstracts influences interpretation of trial results
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We evaluated 22 LLMs … more susceptible to spin than humans … can be prompted … to mitigate spin’s impact
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
linear regression … β1k · (presence or absence of spin)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tal August, Lucy Lu Wang, Jonathan Bragg, Marti A Hearst, Andrew Head, and Kyle Lo. Paper plain: Making medical research papers approachable to healthcare consumers with natural language processing. ACM Transactions on Computer-Human Interaction, 30 0 (5): 0 1--38, 2023
work page 2023
-
[2]
Evaluation of spin within abstracts in obesity randomized clinical trials: a cross-sectional review
Jennifer Austin, Christopher Smith, Kavita Natarajan, Mousumi Som, Cole Wayant, and Matt Vassar. Evaluation of spin within abstracts in obesity randomized clinical trials: a cross-sectional review. Clinical obesity, 9 0 (2): 0 e12292, 2019
work page 2019
-
[3]
Sandeep Bala, Angela Keniston, Marisha Burden, et al. Patient perception of plain-language medical notes generated using artificial intelligence software: pilot mixed-methods study. JMIR formative research, 4 0 (6): 0 e16670, 2020
work page 2020
-
[4]
Henry C Barry, Mark H Ebell, Allen F Shaughnessy, David C Slawson, and Fern Nietzke. Family physicians' use of medical abstracts to guide decision making: style or substance? The Journal of the American Board of Family Practice, 14 0 (6): 0 437--442, 2001
work page 2001
-
[5]
Publication bias: a problem in interpreting medical data
Colin B Begg and Jesse A Berlin. Publication bias: a problem in interpreting medical data. Journal of the Royal Statistical Society Series A: Statistics in Society, 151 0 (3): 0 419--445, 1988
work page 1988
-
[6]
S ci BERT : A pretrained language model for scientific text
Iz Beltagy, Kyle Lo, and Arman Cohan. S ci BERT : A pretrained language model for scientific text. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615--36...
-
[7]
The quality of reporting of trial abstracts is suboptimal: survey of major general medical journals
Otavio Berwanger, Rodrigo A Ribeiro, Alessandro Finkelsztejn, Marcelo Watanabe, Erica A Suzumura, Bruce B Duncan, Phillip J Devereaux, and Deborah Cook. The quality of reporting of trial abstracts is suboptimal: survey of major general medical journals. Journal of clinical epidemiology, 62 0 (4): 0 387--392, 2009
work page 2009
-
[8]
Isabelle Boutron, Susan Dutton, Philippe Ravaud, and Douglas G Altman. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. Jama, 303 0 (20): 0 2058--2064, 2010
work page 2058
-
[9]
Isabelle Boutron, Douglas G Altman, Sally Hopewell, Francisco Vera-Badillo, Ian Tannock, and Philippe Ravaud. Impact of spin in the abstracts of articles reporting results of randomized controlled trials in the field of cancer: the spiin randomized controlled trial. Journal of Clinical Oncology, 32 0 (36): 0 4120--4126, 2014
work page 2014
-
[10]
Isabelle Boutron, Romana Haneef, Am \'e lie Yavchitz, Gabriel Baron, John Novack, Ivan Oransky, Gary Schwitzer, and Philippe Ravaud. Three randomized controlled trials evaluating the impact of “spin” in health news stories reporting studies of pharmacologic treatments on patients’/caregivers’ interpretation of treatment benefit. BMC medicine, 17: 0 1--10, 2019
work page 2019
-
[11]
Ahrq health literacy universal precautions toolkit, 2015
AGBJ Brega, J Barnard, NM Mabachi, B Weiss, D DeWalt, C Brach, M Cifuentes, K Albright, and D West. Ahrq health literacy universal precautions toolkit, 2015
work page 2015
-
[12]
‘spin’in published biomedical literature: a methodological systematic review
Kellia Chiu, Quinn Grundy, and Lisa Bero. ‘spin’in published biomedical literature: a methodological systematic review. PLoS Biology, 15 0 (9): 0 e2002173, 2017
work page 2017
-
[13]
Do physicians judge a study by its cover?: An investigation of journal attribution bias
Dimitri A Christakis, Sanjay Saint, Somnath Saha, Joann G Elmore, Deborah E Welsh, Paul Baker, and Thomas D Koepsell. Do physicians judge a study by its cover?: An investigation of journal attribution bias. Journal of clinical epidemiology, 53 0 (8): 0 773--778, 2000
work page 2000
-
[14]
Med42-v2: A suite of clinical llms
Cl \'e ment Christophe, Praveen K Kanithi, Tathagata Raha, Shadab Khan, and Marco AF Pimentel. Med42-v2: A suite of clinical llms. arXiv preprint arXiv:2408.06142, 2024
-
[15]
Open to the public: paywalls and the public rationale for open access medical research publishing
Suzanne Day, Stuart Rennie, Danyang Luo, and Joseph D Tucker. Open to the public: paywalls and the public rationale for open access medical research publishing. Research involvement and engagement, 6: 0 1--7, 2020
work page 2020
-
[16]
Paragraph-level simplification of medical texts
Ashwin Devaraj, Byron C Wallace, Iain J Marshall, and Junyi Jessy Li. Paragraph-level simplification of medical texts. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting, volume 2021, page 4972. NIH Public Access, 2021
work page 2021
-
[17]
Evaluating factuality in text simplification
Ashwin Devaraj, William Sheffield, Byron C Wallace, and Junyi Jessy Li. Evaluating factuality in text simplification. In Proceedings of the conference of the Association for Computational Linguistics (ACL), volume 2022, page 7331, 2022
work page 2022
-
[18]
Catalogue of bias: publication bias
Nicholas J DeVito and Ben Goldacre. Catalogue of bias: publication bias. BMJ Evidence-Based Medicine, 24 0 (2): 0 53--54, 2019
work page 2019
-
[19]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Publication bias: the problem that won't go away
K Dickersin and Y I Min. Publication bias: the problem that won't go away. Ann. N. Y. Acad. Sci., 703 0 (1): 0 135--46; discussion 146--8, December 1993
work page 1993
-
[21]
The existence of publication bias and risk factors for its occurrence
Kay Dickersin. The existence of publication bias and risk factors for its occurrence. Jama, 263 0 (10): 0 1385--1389, 1990
work page 1990
-
[22]
Publication bias in clinical research
Phillipa J Easterbrook, Ramana Gopalan, JA Berlin, and David R Matthews. Publication bias in clinical research. The Lancet, 337 0 (8746): 0 867--872, 1991
work page 1991
-
[23]
Leveraging large language models for zero-shot lay summarisation in biomedicine and beyond
Tomas Goldsack, Carolina Scarton, and Chenghua Lin. Leveraging large language models for zero-shot lay summarisation in biomedicine and beyond. arXiv preprint arXiv:2501.05224, 2025
-
[24]
Sally Hopewell, Mike Clarke, David Moher, Elizabeth Wager, Philippa Middleton, Douglas G Altman, Kenneth F Schulz, and Consort Group. Consort for reporting randomized controlled trials in journal and conference abstracts: explanation and elaboration. PLoS medicine, 5 0 (1): 0 e20, 2008
work page 2008
-
[25]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
M ath P rompter: Mathematical reasoning using large language models
Shima Imani, Liang Du, and Harsh Shrivastava. M ath P rompter: Mathematical reasoning using large language models. In Sunayana Sitaram, Beata Beigman Klebanov, and Jason D Williams, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 37--42, Toronto, Canada, July 2023. Associat...
-
[27]
Understanding pubmed user search behavior through log analysis
Rezarta Islamaj Dogan, G Craig Murray, Aur \'e lie N \'e v \'e ol, and Zhiyong Lu. Understanding pubmed user search behavior through log analysis. Database, 2009: 0 bap018, 2009
work page 2009
-
[28]
Chatgpt makes medicine easy to swallow: an exploratory case study on simplified radiology reports
Katharina Jeblick, Balthasar Schachtner, Jakob Dexl, Andreas Mittermeier, Anna Theresa St \"u ber, Johanna Topalis, Tobias Weber, Philipp Wesp, Bastian Oliver Sabel, Jens Ricke, et al. Chatgpt makes medicine easy to swallow: an exploratory case study on simplified radiology reports. European radiology, 34 0 (5): 0 2817--2825, 2024
work page 2024
-
[29]
Evaluation of spin in abstracts of papers in psychiatry and psychology journals
Samuel Jellison, Will Roberts, Aaron Bowers, Tyler Combs, Jason Beaman, Cole Wayant, and Matt Vassar. Evaluation of spin in abstracts of papers in psychiatry and psychology journals. BMJ evidence-based medicine, 25 0 (5): 0 178--181, 2020
work page 2020
-
[30]
Daniel P Jeong, Saurabh Garg, Zachary Chase Lipton, and Michael Oberst. Medical adaptation of large language and vision-language models: Are we making progress? In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 12143--12170, Miami, Florida, USA, Nove...
-
[31]
Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Multilingual simplification of medical texts
Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh J Ramanathan, Wei Xu, Byron C Wallace, and Junyi Jessy Li. Multilingual simplification of medical texts. arXiv preprint arXiv:2305.12532, 2023
-
[33]
F act PICO : Factuality evaluation for plain language summarization of medical evidence
Sebastian Joseph, Lily Chen, Jan Trienes, Hannah G \"o ke, Monika Coers, Wei Xu, Byron Wallace, and Junyi Jessy Li. F act PICO : Factuality evaluation for plain language summarization of medical evidence. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volu...
-
[34]
Muhammad Shahzeb Khan, Noman Lateef, Tariq Jamal Siddiqi, Karim Abdur Rehman, Saed Alnaimat, Safi U Khan, Haris Riaz, M Hassan Murad, John Mandrola, Rami Doukky, et al. Level and prevalence of spin in published cardiovascular randomized clinical trial reports with statistically nonsignificant primary outcomes: a systematic review. JAMA network open, 2 0 (...
work page 2019
-
[35]
Anna Koroleva and Patrick Paroubek. On the contribution of specific entity detection in comparative constructions to automatic spin detection in biomedical scientific publications. In Language and Technology Conference, pages 304--317. Springer, 2017
work page 2017
-
[36]
Annotating spin in biomedical scientific publications: the case of random controlled trials (rcts)
Anna Koroleva and Patrick Paroubek. Annotating spin in biomedical scientific publications: the case of random controlled trials (rcts). In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018
work page 2018
-
[37]
Despin: a prototype system for detecting spin in biomedical publications
Anna Koroleva, Sanjay Kamath, Patrick MM Bossuyt, and Patrick Paroubek. Despin: a prototype system for detecting spin in biomedical publications. In roceedings of the BioNLP 2020 workshop, pages 49--59. Association for Computational Linguistics, 2020
work page 2020
-
[38]
The health literacy of america's adults: Results from the 2003 national assessment of adult literacy
Mark Kutner, Elizabeth Greenburg, Ying Jin, and Christine Paulsen. The health literacy of america's adults: Results from the 2003 national assessment of adult literacy. nces 2006-483. National Center for education statistics, 2006
work page 2003
-
[39]
Biomistral: A collection of open-source pretrained large language models for medical domains
Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. Biomistral: A collection of open-source pretrained large language models for medical domains. arXiv preprint arXiv:2402.10373, 2024
-
[40]
Biobert: a pre-trained biomedical language representation model for biomedical text mining
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36 0 (4): 0 1234--1240, 2020
work page 2020
-
[41]
Suzanne Lockyer, Rob Hodgson, Jo C Dumville, and Nicky Cullum. "spin" in wound care research: the reporting and interpretation of randomized controlled trials with statistically non-significant primary outcome results or unspecified primary outcomes. Trials, 14: 0 1--10, 2013
work page 2013
-
[42]
Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine
Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv preprint arXiv:2308.09442, 2023
-
[43]
Alvin Marcelo, Alex Gavino, Iris Thiele Isip-Tan, Leilanie Apostol-Nicodemus, Faith Joan Mesa-Gaerlan, Paul Nimrod Firaza, John Francis Faustorilla, Fiona M Callaghan, and Paul Fontelo. A comparison of the accuracy of clinical decisions based on full-text articles and on journal abstracts alone: a study among residents in a tertiary care hospital. BMJ Evi...
work page 2013
-
[44]
What is readability and why should content editors care about it
Lisa Marchand. What is readability and why should content editors care about it. Center for Plain Language. https://centerforplainlanguage. org/what-isreadability, 2017
work page 2017
-
[45]
Sylvain Mathieu, Bruno Giraudeau, Martin Soubrier, and Philippe Ravaud. Misleading abstract conclusions in randomized controlled trials in rheumatology: comparison of the abstract conclusions and the results section. Joint Bone Spine, 79 0 (3): 0 262--267, 2012
work page 2012
-
[46]
Introducing meta llama 3: The most capable openly available llm to date
AI Meta. Introducing meta llama 3: The most capable openly available llm to date. Meta AI, 2024
work page 2024
-
[47]
Ross Nowlin, Alexis Wirtz, David Wenger, Ryan Ottwell, Courtney Cook, Wade Arthur, Brigitte Sallee, Jarad Levin, Micah Hartwell, Drew Wright, et al. Spin in abstracts of systematic reviews and meta-analyses of melanoma therapies: Cross-sectional analysis. JMIR dermatology, 5 0 (1): 0 e33996, 2022
work page 2022
-
[48]
Simply put: A guide for creating easy-to-understand materials
US Department of Health, Human Services, et al. Simply put: A guide for creating easy-to-understand materials. 2009
work page 2009
-
[49]
Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, et al. 2 olmo 2 furious. arXiv preprint arXiv:2501.00656, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
A survey of automated methods for biomedical text simplification
Brian Ondov, Kush Attal, and Dina Demner-Fushman. A survey of automated methods for biomedical text simplification. Journal of the American Medical Informatics Association, 29 0 (11): 0 1976--1988, 2022
work page 1976
-
[51]
OpenAI. Models, 2025. URL https://platform.openai.com/docs/models/gpt-3-5-turbo. Accessed: 2025-01-17
work page 2025
-
[52]
4o mini: Advancing cost-efficient intelligence, 2024
Gpt OpenAI. 4o mini: Advancing cost-efficient intelligence, 2024. URL: https://openai. com/index/gpt-4o-mini-advancing-cost-efficient-intelligence, 2024
work page 2024
-
[53]
CK Osborne, J Pippen, SE Jones, LM Parker, M Ellis, S Come, SZ Gertler, JT May, G Burton, I Dimery, et al. Double-blind, randomized trial comparing the efficacy and tolerability of fulvestrant versus anastrozole in postmenopausal women with advanced breast cancer progressing on prior endocrine therapy: results of a north american trial. Journal of Clinica...
work page 2002
-
[54]
Openbiollms: Advancing open-source large language models for healthcare and life sciences, 2024
Malaikannan Sankarasubbu Ankit Pal and Malaikannan Sankarasubbu. Openbiollms: Advancing open-source large language models for healthcare and life sciences, 2024
work page 2024
-
[55]
Assessing ai simplification of medical texts: readability and content fidelity
Bryce Picton, Saman Andalib, Aidin Spina, Brandon Camp, Sean S Solomon, Jason Liang, Patrick M Chen, Jefferson W Chen, Frank P Hsu, and Michael Y Oh. Assessing ai simplification of medical texts: readability and content fidelity. International Journal of Medical Informatics, 195: 0 105743, 2025
work page 2025
-
[56]
The state of oa: a large-scale analysis of the prevalence and impact of open access articles
Heather Piwowar, Jason Priem, Vincent Larivi \`e re, Juan Pablo Alperin, Lisa Matthias, Bree Norlander, Ashley Farley, Jevin West, and Stefanie Haustein. The state of oa: a large-scale analysis of the prevalence and impact of open access articles. PeerJ, 6: 0 e4375, 2018
work page 2018
-
[57]
Malignant: how bad policy and bad evidence harm people with Cancer
Vinayak K Prasad. Malignant: how bad policy and bad evidence harm people with Cancer. JHU Press, 2020
work page 2020
-
[58]
Reasoning with language model prompting: A survey
Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. Reasoning with language model prompting: A survey. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5368--...
-
[59]
Riaz Qureshi, Kevin Naaman, Nicolas G Quan, Evan Mayo-Wilson, Matthew J Page, Victoria Cornelius, Roger Chou, Isabelle Boutron, Su Golder, Lisa Bero, et al. Development and evaluation of a framework for identifying and addressing spin for harms in systematic reviews of interventions. Annals of internal medicine, 177 0 (8): 0 1089--1098, 2024
work page 2024
-
[60]
Evaluation of spin in the abstracts of emergency medicine randomized controlled trials
Victoria Reynolds-Vaughn, Jonathan Riddle, Jamin Brown, Michael Schiesel, Cole Wayant, and Matt Vassar. Evaluation of spin in the abstracts of emergency medicine randomized controlled trials. Annals of emergency medicine, 75 0 (3): 0 423--431, 2020
work page 2020
-
[61]
Mathematical discoveries from program search with large language models
Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, 2024
work page 2024
-
[62]
Summarizing, simplifying, and synthesizing medical evidence using GPT -3 (with varying success)
Chantal Shaib, Millicent Li, Sebastian Joseph, Iain Marshall, Junyi Jessy Li, and Byron Wallace. Summarizing, simplifying, and synthesizing medical evidence using GPT -3 (with varying success). In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 13...
-
[63]
Elise Smith, Stefanie Haustein, Philippe Mongeon, Fei Shu, Val \'e ry Ridde, and Vincent Larivi \`e re. Knowledge sharing in global health research--the impact, uptake and cost of open access to scholarly literature. Health Research Policy and Systems, 15: 0 1--10, 2017
work page 2017
-
[64]
Naichuan Su, Michiel W Van Der Linden, Clovis M Faggion Jr, and Geert JMG Van Der Heijden. Assessment of spin in the abstracts of randomized controlled trials in dental caries with statistically nonsignificant results for primary outcomes: A methodological study. Caries Research, 57 0 (5-6): 0 553--562, 2023
work page 2023
-
[65]
Exaggerations and caveats in press releases and health-related science news
Petroc Sumner, Solveiga Vivian-Griffiths, Jacky Boivin, Andrew Williams, Lewis Bott, Rachel Adams, Christos A Venetis, Leanne Whelan, Bethan Hughes, and Christopher D Chambers. Exaggerations and caveats in press releases and health-related science news. PloS one, 11 0 (12): 0 e0168217, 2016
work page 2016
-
[66]
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[67]
Large language models in medicine
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature medicine, 29 0 (8): 0 1930--1940, 2023
work page 1930
-
[68]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[69]
Evaluation of spin in oncology clinical trials
C Wayant, D Margalski, K Vaughn, and M Vassar. Evaluation of spin in oncology clinical trials. Critical Reviews in Oncology/Hematology, 144: 0 102821, 2019
work page 2019
-
[70]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022
work page 2022
-
[71]
Health literacy and patient safety: Help patients understand
Barry D Weiss. Health literacy and patient safety: Help patients understand. Manual for clinicians. American Medical Association Foundation, 2007
work page 2007
-
[72]
Am \'e lie Yavchitz, Philippe Ravaud, Douglas G Altman, David Moher, Asbj rn Hrobjartsson, Toby Lasserson, and Isabelle Boutron. A new classification of spin in systematic reviews and meta-analyses was developed and ranked according to the severity. Journal of clinical epidemiology, 75: 0 56--65, 2016
work page 2016
-
[73]
Alpacare: Instruction-tuned large language models for medical application
Xinlu Zhang, Chenxin Tian, Xianjun Yang, Lichang Chen, Zekun Li, and Linda Ruth Petzold. Alpacare: Instruction-tuned large language models for medical application. arXiv preprint arXiv:2310.14558, 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.