ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
A comprehensive survey of synthetic tabular data generation
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MbaGCN combines message aggregation, selective state space transitions, and node state prediction to create a more scalable deep graph convolutional network.
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.
SAGE improves LLM-based synthetic tabular data generation by enforcing sparse, value-adaptive dependency guidance, yielding higher fidelity and 10% better downstream F1 scores than prior methods.
PLAG boosts tabular anomaly detection by using pseudo-label-guided synthetic anomaly generation with a two-stage filter, achieving SOTA results and lifting F1 scores by 0.08-0.21 when added to existing detectors.
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.
citing papers explorer
-
Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data
ORiGAMi synthesizes sparse semi-structured mixed-type JSON data using path-encoded autoregressive tokenization and schema constraints, outperforming flattened tabular baselines on 17 of 18 fidelity, detection, and utility metrics while keeping privacy above 96%.
-
Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space
MbaGCN combines message aggregation, selective state space transitions, and node state prediction to create a more scalable deep graph convolutional network.
-
TabKDE: Simple and Scalable Tabular Data Generation with Kernel Density Estimates
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
-
Generative AI-Based Monte Carlo Simulation for Method Evaluation Using Synthetic Multilevel Data
A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.
-
SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation
SAGE improves LLM-based synthetic tabular data generation by enforcing sparse, value-adaptive dependency guidance, yielding higher fidelity and 10% better downstream F1 scores than prior methods.
-
Enhancing Tabular Anomaly Detection via Pseudo-Label-Guided Generation
PLAG boosts tabular anomaly detection by using pseudo-label-guided synthetic anomaly generation with a two-stage filter, achieving SOTA results and lifting F1 scores by 0.08-0.21 when added to existing detectors.
-
Tabular Foundation Model for Generative Modelling
TabFORGE generates high-quality synthetic tabular data by leveraging pretrained causality-aware representations in a two-stage diffusion-decoder architecture that mitigates latent distribution shifts.