Recognition: 1 theorem link
· Lean TheoremGenerating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization
Pith reviewed 2026-05-10 18:37 UTC · model grok-4.3
The pith
Synthetic doctor-patient audio conversations with SOAP notes reveal that cascaded systems still outperform end-to-end models for long-form summarization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A pipeline of persona-driven dialogue generation followed by multi-speaker audio synthesis with acoustic modeling and then LLM-based SOAP note creation can produce 8,800 realistic conversations totaling 1.3k hours of audio; when used to evaluate open-weight systems, cascaded approaches still substantially outperform end-to-end models for long-form medical audio summarization.
What carries the argument
The three-stage synthetic data generation pipeline: persona-driven dialogue generation, multi-speaker audio synthesis including overlap/pause modeling, room acoustics and sound events, and LLM-based reference SOAP note production.
If this is right
- The released dataset supplies training material for models that must reason over long audio without needing private real recordings.
- The performance gap indicates that end-to-end audio models require further advances to match cascaded pipelines on extended medical dialogues.
- Controlled synthetic audio allows repeatable experiments on the effects of overlap, background noise, and speaker turns in summarization.
- Open-weight models can now bootstrap additional domain-specific long-context audio datasets following the same stages.
Where Pith is reading between the lines
- If the acoustic modeling proves sufficiently realistic, similar pipelines could generate training data for other long-form audio tasks such as meetings or lectures while avoiding privacy barriers.
- The continued superiority of cascaded systems suggests that current end-to-end audio models still struggle with the combination of long duration and precise factual extraction required in medical settings.
- The dataset opens the possibility of studying how specific acoustic degradations affect summarization accuracy in a repeatable way.
Load-bearing premise
The synthetic dialogues and audio are realistic enough to serve as both training data and a controlled evaluation environment for real-world long-form medical audio summarization.
What would settle it
A direct comparison of model rankings and absolute performance on the synthetic dataset versus a held-out collection of real recorded doctor-patient conversations.
Figures
read the original abstract
Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient conversations with SOAP note generation as the task. The pipeline has three stages, persona-driven dialogue generation, multi-speaker audio synthesis with overlap/pause modeling, room acoustics, and sound events, and LLM-based reference SOAP note production, built entirely on open-weight models. We release 8,800 synthetic conversations with 1.3k hours of corresponding audio and reference notes. Evaluating current open-weight systems, we find that cascaded approaches still substantially outperform end-to-end models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a synthetic data generation pipeline for long-form doctor-patient conversations aimed at training and evaluating audio summarization models, specifically for generating SOAP notes. The pipeline involves persona-driven LLM dialogue generation, multi-speaker audio synthesis incorporating overlaps, pauses, room acoustics, and sound events, and LLM-based reference SOAP note production. All components use open-weight models. The authors release a dataset of 8,800 synthetic conversations comprising 1.3k hours of audio along with reference notes, and evaluate open-weight systems on this data, concluding that cascaded approaches (ASR + LLM) substantially outperform end-to-end models.
Significance. If the synthetic data is sufficiently realistic, this work provides a valuable open resource for training and evaluating long-context audio reasoning in the medical domain, where real data is scarce due to privacy constraints. The release of 8,800 conversations with 1.3k hours of audio, built entirely on open-weight models, is a clear strength that promotes reproducibility and community use. The empirical result on cascaded vs. end-to-end performance offers a benchmark that could inform model development, provided the dataset's fidelity supports generalization.
major comments (2)
- [Evaluation of current open-weight systems] The claim that cascaded approaches substantially outperform end-to-end models is presented without details on the exact evaluation metrics for SOAP note quality (e.g., ROUGE, BERTScore, or LLM-as-judge), the specific open-weight models and configurations tested, or statistical significance of the gap. This information is necessary to assess the result's robustness.
- [Pipeline description and data release] No section reports validation of the synthetic dialogues or audio against real doctor-patient recordings, such as clinician ratings of clinical nuance, acoustic similarity metrics (e.g., MOS or spectrogram comparisons), or distributional checks (turn lengths, disfluency rates, medical terminology frequency). This is load-bearing for the assertion that the dataset serves as a controlled evaluation environment, since unvalidated artifacts could artifactually favor cascaded pipelines over end-to-end audio models.
minor comments (2)
- [Abstract] The abstract would be strengthened by explicitly naming the task (SOAP note generation from long-form audio) and the dataset scale to better frame the contribution for readers.
- [Experiments] Clarify the exact number of models evaluated and any hyperparameter details in the experimental setup to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below. We have revised the manuscript to improve clarity and completeness where feasible while remaining faithful to the synthetic nature of the work.
read point-by-point responses
-
Referee: [Evaluation of current open-weight systems] The claim that cascaded approaches substantially outperform end-to-end models is presented without details on the exact evaluation metrics for SOAP note quality (e.g., ROUGE, BERTScore, or LLM-as-judge), the specific open-weight models and configurations tested, or statistical significance of the gap. This information is necessary to assess the result's robustness.
Authors: We agree that the evaluation section would benefit from greater explicitness. In the revised manuscript we have expanded the relevant section to specify the exact metrics employed for SOAP note quality (ROUGE-1/2/L, BERTScore, and an LLM-as-judge protocol), to enumerate the precise open-weight models and hyperparameter configurations used for both the cascaded (ASR + LLM) and end-to-end pipelines, and to report statistical significance testing (bootstrap confidence intervals and paired tests) for the observed performance differences. These additions directly address the robustness concern. revision: yes
-
Referee: [Pipeline description and data release] No section reports validation of the synthetic dialogues or audio against real doctor-patient recordings, such as clinician ratings of clinical nuance, acoustic similarity metrics (e.g., MOS or spectrogram comparisons), or distributional checks (turn lengths, disfluency rates, medical terminology frequency). This is load-bearing for the assertion that the dataset serves as a controlled evaluation environment, since unvalidated artifacts could artifactually favor cascaded pipelines over end-to-end audio models.
Authors: We recognize that explicit validation against real recordings would strengthen claims about the dataset serving as a controlled environment. Because of strict privacy regulations, we have no access to real doctor-patient audio for direct comparison. In the revision we have added a new subsection detailing the pipeline's design choices that target realism (persona conditioning for clinical content, explicit modeling of overlaps/pauses/disfluencies, room acoustics, and medical terminology drawn from public medical dialogue resources). We also report internal distributional statistics of the generated data (turn lengths, disfluency rates, terminology frequency) and compare them to publicly available non-private medical dialogue corpora. We cannot supply clinician ratings or MOS scores against real audio. revision: partial
- Direct clinician ratings of clinical nuance or acoustic similarity metrics (MOS, spectrogram comparisons) against real doctor-patient recordings, as privacy regulations preclude access to such real data.
Circularity Check
No circularity: empirical evaluation on generated data with no derivations or fitted predictions
full rationale
The paper describes a three-stage synthetic data pipeline (persona-driven LLM dialogue generation, multi-speaker audio synthesis, LLM SOAP note production) and reports direct empirical comparisons of cascaded vs. end-to-end models on the resulting 8,800 conversations. No equations, parameter fits, uniqueness theorems, or self-citations are invoked as load-bearing steps in any derivation chain. The central claim is an observation from evaluation on the constructed corpus rather than a reduction of outputs to inputs by construction. This is standard empirical work on synthetic benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic data generated via persona-driven dialogue, multi-speaker audio synthesis, and LLM notes sufficiently approximates real doctor-patient interactions for training and evaluation purposes.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a synthetic data generation pipeline... persona-driven dialogue generation, multi-speaker audio synthesis... LLM-based reference SOAP note production
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
However, these bench- marks focus predominantly on short-context tasks
Introduction Toward the broader goal of human-level audio understanding, recent large audio language models (LALMs) [1, 2, 3, 4] have demonstrated impressive progress on benchmarks for audio pro- cessing and comprehension [5, 6, 7, 8]. However, these bench- marks focus predominantly on short-context tasks. Our under- standing of LALM performance on long-c...
2025
-
[2]
Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization
Related Work Recent advances have dramatically expanded the context win- dows of Large Audio Language Models (LALMs). While early attempts at end-to-end (E2E) speech summarization struggled with the quadratic memory complexity of processing long au- dio sequences [11], current systems can ingest continuous au- dio ranging from 40 minutes to over eight hou...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
hand swelling,
Transcript and Audio Generation We now describe the three stages of our data generation pipeline, each targeting a specific gap identified above: (1) per- sona and context sampling, (2) persona-conditioned text dia- logue generation, and (3) audio synthesis with acoustic simula- tion. Throughout, we use Gemma3-27B-IT [23] as the LLM, selected for its perf...
-
[4]
viral illness
SOAP Note Generation and Evaluation Using the speaker-attributed transcript produced by the pipeline, we generate reference SOAP notes and evaluate all 6Audio samples and example SOAP notes are provided as supple- mentary material for review. 7“wet” final augmented audio, optionally Opus-compressed systems via two-stage processes, both using Kimi K2 Think...
-
[5]
erence SOAP note production—built entirely on open-weight models, yielding 8,800 conversations and 1,329 hours of audio
Conclusion We presented a fully synthetic pipeline—persona-conditioned dialogue generation, multi-speaker audio synthesis, and ref- 8Medical concept F1: MeSH keyword matching + NER via en core sci md(scispaCy[37]). erence SOAP note production—built entirely on open-weight models, yielding 8,800 conversations and 1,329 hours of audio. Cascaded systems subs...
-
[6]
The contribution by Markus M¨uller was made in his capacity as a workshop leader and does not necessarily reflect the official position of Amazon
Acknowledgement The authors would like to thank Markus M¨uller (Amazon AGI) for his valuable discussions, leadership, and guidance through- out the duration of the workshop. The contribution by Markus M¨uller was made in his capacity as a workshop leader and does not necessarily reflect the official position of Amazon. This work was supported by the 2025 ...
2025
-
[7]
Qwen3-Omni technical report,
J. Xu, Z. Guo, H. Huet al., “Qwen3-Omni technical report,”
-
[8]
[Online]. Available: https://arxiv.org/abs/2509.17765
work page internal anchor Pith review arXiv
-
[9]
Audio flamingo 3: Advancing audio intelligence with fully open large audio language models,
S. Ghosh, A. Goel, J. Kim, S. Kumar, Z. Kong, S. gil Lee, C.-H. H. Yang, R. Duraiswami, D. Manocha, R. Valle, and B. Catanzaro, “Audio flamingo 3: Advancing audio intelligence with fully open large audio language models,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems,
-
[10]
Available: https://openreview.net/forum?id= FjByDpDVIO
[Online]. Available: https://openreview.net/forum?id= FjByDpDVIO
-
[11]
Ama- zon nova sonic: Technical report and model card,
Amazon Artificial General Intelligence, “Ama- zon nova sonic: Technical report and model card,”Amazon Technical Reports, 2025. [On- line]. Available: https://www.amazon.science/publications/ amazon-nova-sonic-technical-report-and-model-card
2025
-
[12]
Baichuan-Omni-1.5 technical report,
Y . Li, J. Liu, T. Zhanget al., “Baichuan-Omni-1.5 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2501. 15368
2025
-
[13]
Audio flamingo 2: An audio-language model with long-audio understanding and expert reasoning abilities,
S. Ghosh, Z. Kong, S. Kumar, S. Sakshi, J. Kim, W. Ping, R. Valle, D. Manocha, and B. Catanzaro, “Audio flamingo 2: An audio-language model with long-audio understanding and expert reasoning abilities,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=xWu5qpDK6U
2025
-
[14]
Smith, Yulia Tsvetkov, and Sachin Kumar
O. Ahia, M. Bartelds, K. Ahujaet al., “BLAB: Brutally long audio bench,” 2025. [Online]. Available: https://arxiv.org/abs/ 2505.03054
-
[15]
J. Chen, Z. Guo, J. Chun, P. Wang, A. Perrault, and M. Elsner, “Do audio LLMs really LISTEN, or just transcribe? measuring 9Train/Dev released before Interspeech 2026; Test data withheld un- til December 2026 lexical vs. acoustic emotion cues reliance,” inProceedings of the 19th Conference of the European Chapter of the Association for Computational Lingu...
2026
-
[16]
S. Kumar, ˇSimon Sedl ´aˇcek, V . Lokegaonkaret al., “MMAU- Pro: A challenging and comprehensive benchmark for holistic evaluation of audio general intelligence,” 2025. [Online]. Available: https://arxiv.org/abs/2508.13992
-
[17]
How NOT to evaluate your dialogue system: An em- pirical study of unsupervised evaluation metrics for dialogue re- sponse generation,
C.-W. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, and J. Pineau, “How NOT to evaluate your dialogue system: An em- pirical study of unsupervised evaluation metrics for dialogue re- sponse generation,” inProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016
2016
-
[18]
On faith- fulness and factuality in abstractive summarization,
J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On faith- fulness and factuality in abstractive summarization,” inProceed- ings of the 58th Annual Meeting of the Association for Computa- tional Linguistics, Jul. 2020
2020
-
[19]
End-to-end speech summarization using restricted self-attention,
R. Sharma, A. Gupta, S. Kumar, and F. Metze, “End-to-end speech summarization using restricted self-attention,” inProc. ICASSP. IEEE, 2022, pp. 8072–8076
2022
-
[20]
Closing the modality reasoning gap for speech large language models,
J. Xiang, S. Zhang, W. Zhou, and Y . Liu, “Closing the modality reasoning gap for speech large language models,” inProc. IEEE ASRU, 2025
2025
-
[21]
The cascade equivalence hypothesis: When do speech LLMs behave like ASR→LLM pipelines?
J. Billa, “The cascade equivalence hypothesis: When do speech LLMs behave like ASR→LLM pipelines?”arXiv preprint arXiv:2602.17598, 2026
-
[22]
The medical scribe: Corpus development and model performance analyses,
I. Shafran, N. Du, L. Tran, A. Perry, L. Keyes, M. Knichel, A. Domin, L. Huang, Y .-h. Chen, G. Li, M. Wang, L. El Shafey, H. Soltau, and J. S. Paul, “The medical scribe: Corpus development and model performance analyses,” inProceedings of the Twelfth Language Resources and Evaluation Conference, N. Calzolari, F. B ´echet, P. Blache, K. Choukri, C. Cieri,...
2020
-
[23]
Understanding medical conversations: Rich transcription, confidence scores & information extraction,
H. Soltau, M. Wang, I. Shafran, and L. E. Shafey, “Understanding medical conversations: Rich transcription, confidence scores & information extraction,” inInterspeech, 2021
2021
-
[24]
Synthetic patient–physician conversations simu- lated by large language models: A multi-dimensional evaluation,
S. A. Haider, S. Prabha, C. A. Gomez-Cabello, S. Borna, A. Gen- ovese, M. Trabilsy, B. G. Collaco, N. G. Wood, S. Bagaria, C. Tao, and A. J. Forte, “Synthetic patient–physician conversations simu- lated by large language models: A multi-dimensional evaluation,” Sensors, vol. 25, no. 14, 2025
2025
-
[25]
NoteChat: A dataset of synthetic patient-physician con- versations conditioned on clinical notes,
J. Wang, Z. Yao, Z. Yang, H. Zhou, R. Li, X. Wang, Y . Xu, and H. Yu, “NoteChat: A dataset of synthetic patient-physician con- versations conditioned on clinical notes,” inFindings of the Asso- ciation for Computational Linguistics: ACL 2024. Association for Computational Linguistics, 2024
2024
-
[26]
PriMock57: A dataset of primary care mock consul- tations,
A. Papadopoulos Korfiatis, F. Moramarco, R. Sarac, and A. Savkov, “PriMock57: A dataset of primary care mock consul- tations,” inProceedings of the 60th Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 2: Short Papers), 2022
2022
-
[27]
ACI-BENCH: a novel ambient clinical intelligence dataset for benchmarking automatic visit note generation,
W.-w. Yim, Y . Fu, A. Ben Abacha, N. Snider, T. Lin, and M. Yetisgen, “ACI-BENCH: a novel ambient clinical intelligence dataset for benchmarking automatic visit note generation,”Scien- tific Data, vol. 10, no. 1, p. 586, 2023
2023
-
[28]
Generating Data with Text-to-Speech and Large-Language Models for Con- versational Speech Recognition,
S. Cornell, J. Darefsky, Z. Duan, and S. Watanabe, “Generating Data with Text-to-Speech and Large-Language Models for Con- versational Speech Recognition,” inSynthetic Data’s Transforma- tive Role in Foundational Speech Models, 2024
2024
-
[29]
Judging llm-as-a-judge with mt-bench and chatbot arena,
L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” inAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., ...
2023
-
[30]
G-eval: NLG evaluation using gpt-4 with better human alignment,
Y . Liu, D. Iter, Y . Xu, S. Wang, R. Xu, and C. Zhu, “G-eval: NLG evaluation using gpt-4 with better human alignment,” inProceed- ings of the 2023 Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J. Pino, and K. Bali, Eds., 2023
2023
-
[31]
Gemma 3 technical report,
G. Team, A. Kamath, J. Ferretet al., “Gemma 3 technical report,”
-
[32]
[Online]. Available: https://arxiv.org/abs/2503.19786
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
S. Burdisso, S. Baroudi, Y . Labraket al., “Sdialog: A python toolkit for end-to-end agent building, user simulation, dialog generation, and evaluation,” 2026. [Online]. Available: https://arxiv.org/abs/2506.10622
work page internal anchor Pith review arXiv 2026
-
[34]
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech,
H. Zen, V . Dang, R. Clark, Y . Zhang, R. J. Weiss, Y . Jia, Z. Chen, and Y . Wu, “LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech,” inInterspeech 2019, 2019, pp. 1526–1530
2019
-
[35]
Scaper: A library for soundscape synthesis and augmentation,
J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, “Scaper: A library for soundscape synthesis and augmentation,” in2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017, pp. 344–348
2017
-
[36]
Pyroomacoustics: A python package for audio room simulation and array processing algorithms,
R. Scheibler, E. Bezzam, and I. Dokmani ´c, “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE Press, 2018, p. 351–355. [Online]. Available: https://doi.org/10. 1109/ICASSP.2018.8461310
-
[37]
Qwen3-ASR technical report,
X. Shi, X. Wang, Z. Guoet al., “Qwen3-ASR technical report,”
-
[38]
[Online]. Available: https://arxiv.org/abs/2601.21337
work page internal anchor Pith review arXiv
-
[39]
Uni-VERSA: Versatile Speech Assessment with a Unified Network,
J. Shi, H. jin Shim, and S. Watanabe, “Uni-VERSA: Versatile Speech Assessment with a Unified Network,” inInterspeech 2025, 2025, pp. 1798–1802
2025
-
[40]
The t05 system for the V oiceMOS Challenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech,
K. Baba, W. Nakata, Y . Saito, and H. Saruwatari, “The t05 system for the V oiceMOS Challenge 2024: Transfer learning from deep image classifier to naturalness MOS prediction of high-quality synthetic speech,” inIEEE Spoken Language Technology Work- shop (SLT), 2024, pp. 818–824
2024
-
[41]
Yusuke Fujita, Naoyuki Kanda, Shota Horiguchi, Kenji Nagamatsu, and Shinji Watanabe
D. E, A. Meena, M. Nanivadekar, N. A, V . Azad, A. N. Shenoy, P. R. Chowdhuri, S. Banga, V . Chhabra, C. Bhat, S. babu Kalluri, S. R. Chetupalli, D. Vijayasenan, and S. Ganapathy, “Benchmarking speech systems for frontline health conversations: The displace-m challenge,” 2026. [Online]. Available: https://arxiv.org/abs/2603.02813
-
[42]
Robust speech recognition via large-scale weak supervision,
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. Mcleavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” inProceedings of the 40th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., vol. 202. PMLR, 23– ...
2023
-
[43]
A. Yang, A. Li, B. Yanget al., “Qwen3 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Kimi K2: Open Agentic Intelligence
K. Team, Y . Bai, Y . Baoet al., “Kimi K2: Open agentic intelligence,” 2026. [Online]. Available: https://arxiv.org/abs/ 2507.20534
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[45]
Revisiting text decomposition methods for NLI-based factuality scoring of summaries,
J. Glover, F. Fancellu, V . Jagannathan, M. R. Gormley, and T. Schaaf, “Revisiting text decomposition methods for NLI-based factuality scoring of summaries,” inProceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), A. Bosselut, K. Chandu, K. Dhole, V . Gangal, S. Gehrmann, Y . Jernite, J. Novikova, and L. Perez-B...
2022
-
[46]
rouge-score: A python implementation of rouge,
Google Research, “rouge-score: A python implementation of rouge,” 2019. [Online]. Available: https://github.com/ google-research/google-research/tree/master/rouge
2019
-
[47]
ScispaCy: Fast and robust models for biomedical natural language processing,
M. Neumann, D. King, I. Beltagy, and W. Ammar, “ScispaCy: Fast and robust models for biomedical natural language processing,” inProceedings of the 18th BioNLP Workshop and Shared Task, D. Demner-Fushman, K. B. Cohen, S. Ananiadou, and J. Tsujii, Eds. Florence, Italy: Association for Computational Linguistics, Aug. 2019, pp. 319–327. [Online]. Available: h...
2019
-
[48]
Leveraging pretrained models for automatic summarization of doctor- patient conversations,
L. Zhang, R. Negrinho, A. Ghosh, V . Jagannathan, H. R. Hassanzadeh, T. Schaaf, and M. R. Gormley, “Leveraging pretrained models for automatic summarization of doctor- patient conversations,” inFindings of the ACL: EMNLP 2021, 2021, pp. 3693–3712. [Online]. Available: https: //aclanthology.org/2021.findings-emnlp.313/
2021
-
[49]
Manuscript preparation.Large language models were used to assist with proofreading, improving conciseness, and formatting LATEX tables
Generative AI Use Disclosure Generative AI tools were used in two distinct ways in this work. Manuscript preparation.Large language models were used to assist with proofreading, improving conciseness, and formatting LATEX tables. All such use was directed and reviewed by an author; AI tools produced no significant portions of the manuscript without subseq...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.