A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.
arXiv preprint arXiv:2303.04360 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
other 1polarities
unclear 1representative citing papers
The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.
LLM-rephrased synthetic clinical notes preserve core information and utility for coarse prediction tasks but lose fine-grained details such as ICD codes, with chunk-wise rephrasing as a partial mitigation that trades off factual accuracy.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
A pipeline generates synthetic Dutch medical dialogues via fine-tuned LLM and evaluates them quantitatively and qualitatively, showing feasibility but gaps in naturalness.
Fine-tuning and data augmentation improve LLM performance on medical jargon extraction and prioritization from EHR notes, with augmented open-source models sometimes outperforming closed-source ones on 106 annotated notes.
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.
citing papers explorer
-
BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets
A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.
-
Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset
The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.
-
Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale
LLM-rephrased synthetic clinical notes preserve core information and utility for coarse prediction tasks but lose fine-grained details such as ICD codes, with chunk-wise rephrasing as a partial mitigation that trades off factual accuracy.
-
TabEmb: Joint Semantic-Structure Embedding for Table Annotation
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
-
Generating High Quality Synthetic Data for Dutch Medical Conversations
A pipeline generates synthetic Dutch medical dialogues via fine-tuned LLM and evaluates them quantitatively and qualitatively, showing feasibility but gaps in naturalness.
-
Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation
Fine-tuning and data augmentation improve LLM performance on medical jargon extraction and prioritization from EHR notes, with augmented open-source models sometimes outperforming closed-source ones on 106 annotated notes.
-
A Survey on Knowledge Distillation of Large Language Models
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
-
Data-Centric Foundation Models in Computational Healthcare: A Survey
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.