Language models are realistic tabular data generators

· 2022 · arXiv 2210.06280

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 3 other 1

citation-polarity summary

background 2 unclear 2

representative citing papers

Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data

cs.LG · 2026-06-10 · unverdicted · novelty 7.0

ICL in LLMs shows a sharp ceiling on categorical distributions for high-cardinality tabular data, failing to reproduce rare classes despite examples, while numerical fidelity improves.

Concordia: Self-Improving Synthetic Tables for Federated LLMs

cs.LG · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.

LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LLM-generated synthetic datasets steered uniformly across a 2D performance space defined by two landmark algorithms improve meta-learner performance on algorithm selection for regression tasks.

AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors

cs.CV · 2026-01-28 · conditional · novelty 7.0

AnomalyVFM converts vision foundation models into zero-shot anomaly detectors via three-stage synthetic dataset generation plus low-rank adapters and weighted pixel loss, reaching 94.1% average image AUROC across nine datasets.

When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation

cs.LG · 2025-12-09 · conditional · novelty 7.0

LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.

Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.

The Power of Order: Fooling LLMs with Adversarial Table Permutations

cs.LG · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.

Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation

cs.LG · 2025-02-06 · unverdicted · novelty 6.0

Proposes three metrics for inter-column logical relationships in synthetic tabular data and reports that current generators often fail to preserve them on an industrial dataset.

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

cs.LG · 2026-06-16 · unverdicted · novelty 5.0

PSyGenTAB is a constrained-optimization framework that generates privacy-preserving synthetic clinical tabular data while preserving clinical relationships and downstream model performance.

Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.

Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation

cs.LG · 2025-01-03 · unverdicted · novelty 3.0

CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.

citing papers explorer

Showing 11 of 11 citing papers.

Categorical Prior Lock-in: Why In-Context Learning Fails for Structured Data cs.LG · 2026-06-10 · unverdicted · none · ref 1
ICL in LLMs shows a sharp ceiling on categorical distributions for high-cardinality tabular data, failing to reproduce rare classes despite examples, while numerical fidelity improves.
Concordia: Self-Improving Synthetic Tables for Federated LLMs cs.LG · 2026-05-11 · unverdicted · none · ref 5 · 2 links
Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection cs.LG · 2026-05-10 · unverdicted · none · ref 14
LLM-generated synthetic datasets steered uniformly across a 2D performance space defined by two landmark algorithms improve meta-learner performance on algorithm selection for regression tasks.
AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors cs.CV · 2026-01-28 · conditional · none · ref 6
AnomalyVFM converts vision foundation models into zero-shot anomaly detectors via three-stage synthetic dataset generation plus low-rank adapters and weighted pixel loss, reaching 94.1% average image AUROC across nine datasets.
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation cs.LG · 2025-12-09 · conditional · none · ref 5
LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning cs.LG · 2026-05-06 · unverdicted · none · ref 4
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
The Power of Order: Fooling LLMs with Adversarial Table Permutations cs.LG · 2026-05-01 · unverdicted · none · ref 5 · 2 links
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation cs.LG · 2025-02-06 · unverdicted · none · ref 3
Proposes three metrics for inter-column logical relationships in synthetic tabular data and reports that current generators often fail to preserve them on an industrial dataset.
PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization cs.LG · 2026-06-16 · unverdicted · none · ref 14
PSyGenTAB is a constrained-optimization framework that generates privacy-preserving synthetic clinical tabular data while preserving clinical relationships and downstream model performance.
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training cs.LG · 2026-04-21 · unverdicted · none · ref 17
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation cs.LG · 2025-01-03 · unverdicted · none · ref 8
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.

Language models are realistic tabular data generators

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer