A Survey on Evaluating Quality and Trustworthiness in LLM-Generated Data

Chia-Yuan Chang; Fariha Kabir Torsha; Guanchu Wang; Hoang Anh Duy Le; Kaituo Zhang; Mingzhi Hu; Minh Khai Bui; Na Zou; Ying Lin; Yu-Neng Chuang

arxiv: 2601.17717 · v3 · pith:MSDRLNLHnew · submitted 2026-01-25 · 💻 cs.AI · cs.LG

A Survey on Evaluating Quality and Trustworthiness in LLM-Generated Data

Kaituo Zhang , Mingzhi Hu , Hoang Anh Duy Le , Fariha Kabir Torsha , Zhimeng Jiang , Minh Khai Bui , Chia-Yuan Chang , Yu-Neng Chuang

show 4 more authors

Zhen Xiong Ying Lin Guanchu Wang Na Zou

This is my paper

classification 💻 cs.AI cs.LG

keywords dataevaluationacrossmodalitiesqualityframeworkgenerationllms

0 comments

read the original abstract

Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities. By transforming data from a scarce resource into a controllable asset, LLMs mitigate the bottlenecks imposed by the acquisition costs of real-world data for model training, evaluation, and system iteration. However, ensuring the high quality of LLM-generated synthetic data remains a critical challenge. Existing research primarily focuses on generation methodologies, with limited direct attention to the quality of the resulting data. Furthermore, most studies are restricted to single modalities, lacking a unified perspective across different data types. To bridge this gap, we propose the \textbf{LLM Data Auditor framework}. In this framework, we first describe how LLMs are utilized to generate data across six distinct modalities. More importantly, we systematically categorize intrinsic metrics for evaluating synthetic data from two dimensions: quality and trustworthiness. This approach shifts the focus from extrinsic evaluation, which relies on downstream task performance, to the inherent properties of the data itself. Using this evaluation system, we analyze the experimental evaluations of representative generation methods for each modality and identify substantial deficiencies in current evaluation practices. Based on these findings, we offer concrete recommendations for the community to improve the evaluation of data generation. Finally, the framework outlines methodologies for the practical application of synthetic data across different modalities.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection
cs.LG 2026-05 unverdicted novelty 7.0

LLM-generated synthetic datasets steered uniformly across a 2D performance space defined by two landmark algorithms improve meta-learner performance on algorithm selection for regression tasks.