NodeSynth generates evidence-anchored synthetic queries that trigger up to five times higher failure rates in mainstream LLMs than human-authored benchmarks.
Efficacy of synthetic data as a benchmark
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.
citing papers explorer
-
NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
NodeSynth generates evidence-anchored synthetic queries that trigger up to five times higher failure rates in mainstream LLMs than human-authored benchmarks.
-
On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics
Tabular diffusion models leak membership information via attacks even with partial attacker knowledge, and common heuristic privacy metrics like distance-to-closest-record are unreliable.
-
Synthetic Data in Education: Empirical Insights from Traditional Resampling and Deep Generative Models
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.