ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3years
2026 3roles
background 1polarities
background 1representative citing papers
Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
CausalSynth combines structural causal models with LLMs and iterative verification to produce synthetic data that respects given causal structures while remaining linguistically natural.
citing papers explorer
-
Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data
ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
-
Concordia: Self-Improving Synthetic Tables for Federated LLMs
Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
-
CasualSynth: Generating Structurally Sound Synthetic Data
CausalSynth combines structural causal models with LLMs and iterative verification to produce synthetic data that respects given causal structures while remaining linguistically natural.