arXiv preprint arXiv:2303.04360 , year=

Ruixiang Tang, Xiaotian Han, Xiaoqian Jiang, Xia Hu · 2023 · arXiv 2303.04360

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets

cs.CL · 2026-04-28 · conditional · novelty 7.0

A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset

cs.CL · 2026-02-18 · unverdicted · novelty 7.0

The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

LLM-rephrased synthetic clinical notes preserve core information and utility for coarse prediction tasks but lose fine-grained details such as ICD codes, with chunk-wise rephrasing as a partial mitigation that trades off factual accuracy.

TabEmb: Joint Semantic-Structure Embedding for Table Annotation

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.

Generating High Quality Synthetic Data for Dutch Medical Conversations

cs.CL · 2026-03-25 · unverdicted · novelty 4.0

A pipeline generates synthetic Dutch medical dialogues via fine-tuned LLM and evaluates them quantitatively and qualitatively, showing feasibility but gaps in naturalness.

Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation

cs.CL · 2025-02-22 · unverdicted · novelty 3.0

Fine-tuning and data augmentation improve LLM performance on medical jargon extraction and prioritization from EHR notes, with augmented open-source models sometimes outperforming closed-source ones on 106 annotated notes.

A Survey on Knowledge Distillation of Large Language Models

cs.CL · 2024-02-20 · accept · novelty 3.0

A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.

Data-Centric Foundation Models in Computational Healthcare: A Survey

cs.LG · 2024-01-04 · unverdicted · novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.

citing papers explorer

Showing 8 of 8 citing papers.

BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets cs.CL · 2026-04-28 · conditional · none · ref 24
A graphlet-anchored framework generates 119,856 factually grounded biomedical QA pairs that improve accuracy on PubMedQA and MedQA benchmarks.
Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset cs.CL · 2026-02-18 · unverdicted · none · ref 37
The MathEd-PII benchmark shows that math-aware and segment-aware LLM prompting raises PII detection F1 from 0.379 to 0.821 while cutting false redactions of instructional numbers.
Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale cs.CL · 2026-05-18 · unverdicted · none · ref 51
LLM-rephrased synthetic clinical notes preserve core information and utility for coarse prediction tasks but lose fine-grained details such as ICD codes, with chunk-wise rephrasing as a partial mitigation that trades off factual accuracy.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 15
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
Generating High Quality Synthetic Data for Dutch Medical Conversations cs.CL · 2026-03-25 · unverdicted · none · ref 13
A pipeline generates synthetic Dutch medical dialogues via fine-tuned LLM and evaluates them quantitatively and qualitatively, showing feasibility but gaps in naturalness.
Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation cs.CL · 2025-02-22 · unverdicted · none · ref 98
Fine-tuning and data augmentation improve LLM performance on medical jargon extraction and prioritization from EHR notes, with augmented open-source models sometimes outperforming closed-source ones on 106 annotated notes.
A Survey on Knowledge Distillation of Large Language Models cs.CL · 2024-02-20 · accept · none · ref 46
A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
Data-Centric Foundation Models in Computational Healthcare: A Survey cs.LG · 2024-01-04 · unverdicted · none · ref 285
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.

arXiv preprint arXiv:2303.04360 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer