Recoverable Identifier

arXiv:2605.00969 · detector doi_compliance · incontrovertible · 2026-05-19 17:53:01.170434+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.18653/v1/2023.findings-emnlp.1055.12) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

URL https://doi.org/10.18653/v1/2023. findings-emnlp.1055. 12 MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio A. Appendix A.1. Dataset Description • Primock57(Korfiatis et al., 2022): It consists of 57 long mock medical primary care consultations held over 5 days by 7 Babylon clinicians and 57 Babylon employees acting as patients, using case cards with presenting complaints, symptoms, medical and general history etc. • Primock-med(Korfiatis et al., 2022; Na0s, 2024): It consists of 322 short mock medical primary care consultations between 7 Babylon clinicians and 57 Babylon employees acting as patients. It is created by chunking longer consultation recordings from the original Primock57 audio corpus. Each audio segment is exactly 30 seconds in length. • Ekacare(Ekacare, 2024): It includes approximately 3,600 English recordings and 320 Hindi recordings featuring medical terminology, including branded drugs specific to the Indian context, delivered across various speaking styles such as isolated medical entities, narrated medical sentences, and impromptu conversations. • MTS Dialog(har1, 2024): It consists of 1,701 doctor-patient conversations paired with structured clinical note summaries. The dataset is divided into a training set of 1,201 conversation-summary pairs and a validation set of 100 pairs. Each clinical note follows a standardized format comprising four sections: Symptoms, Diagnosis, History of Patient, and Plan of Action, with "N/A" entries

Evidence payload

{
  "printed_excerpt": "URL https://doi.org/10.18653/v1/2023. findings-emnlp.1055. 12 MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio A. Appendix A.1. Dataset Description \u2022 Primock57(Korfiatis et al., 2022): It consists of 57 long mock medi",
  "reconstructed_doi": "10.18653/v1/2023.findings-emnlp.1055.12",
  "ref_index": 4,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}