ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
years
2026 8representative citing papers
MobEvolve is an agentic self-evolving heuristic framework that generates interpretable human mobility trajectories and outperforms deep generative and LLM-based methods on Singapore and Montreal benchmarks.
After correcting prior flaws, a class-dependent hybrid augmentation strategy plus clinical subtype aggregation raises average macro-F1 robustness across eight classifiers on a 400-patient seven-subtype migraine dataset, with peak 0.914 under proportional growth.
SQLyzr is a new evaluation platform that adds diverse metrics, realistic settings, query classification, and analysis features to overcome the single-score limitations of existing text-to-SQL benchmarks.
Resampling methods achieve near-perfect utility (TSTR 0.997) but fail privacy (DCR ~0), while VAEs balance 83.3% utility with full privacy protection for synthetic educational data.
A workflow creates synthetic data from SDC-adjusted and coarsened margins via IPF, ensuring declared relationship preservation and derivation only from pre-approved safe information, shown on 1901 Scottish census tables.
Context-conditioned normalizing flows refine subnational survey distributions under severe data scarcity when conditioning covariates capture local heterogeneity.
Memisis orchestrates synthetic tabular health data generation and evaluation using LLMs and multiple synthesizers, demonstrated on a schizophrenia dataset with fairness and utility checks.
citing papers explorer
-
Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data
ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.