ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
SQLyzr is a new evaluation platform that adds diverse metrics, realistic settings, query classification, and analysis features to overcome the single-score limitations of existing text-to-SQL benchmarks.
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.
Memisis orchestrates synthetic tabular health data generation and evaluation using LLMs and multiple synthesizers, demonstrated on a schizophrenia dataset with fairness and utility checks.
citing papers explorer
-
Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data
ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
-
A Demonstration of SQLyzr: A Platform for Fine-Grained Text-to-SQL Evaluation and Analysis
SQLyzr is a new evaluation platform that adds diverse metrics, realistic settings, query classification, and analysis features to overcome the single-score limitations of existing text-to-SQL benchmarks.
-
Synthetic Data in Education: Empirical Insights from Traditional Resampling and Deep Generative Models
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.
-
Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets
Memisis orchestrates synthetic tabular health data generation and evaluation using LLMs and multiple synthesizers, demonstrated on a schizophrenia dataset with fairness and utility checks.
- Class-Dependent Hybrid Data Augmentation for Multiclass Migraine Classification under Severe Class Imbalance