Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
arXiv preprint arXiv:2210.06280 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
LLM-generated synthetic datasets steered uniformly across a 2D performance space defined by two landmark algorithms improve meta-learner performance on algorithm selection for regression tasks.
AnomalyVFM converts vision foundation models into zero-shot anomaly detectors via three-stage synthetic dataset generation plus low-rank adapters and weighted pixel loss, reaching 94.1% average image AUROC across nine datasets.
LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
Proposes three metrics for inter-column logical relationships in synthetic tabular data and reports that current generators often fail to preserve them on an industrial dataset.
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.
citing papers explorer
-
Concordia: Self-Improving Synthetic Tables for Federated LLMs
Concordia aligns synthetic table generation with federated validation utility via client-side utility scorers and group-relative policy optimization to improve LLM adaptation on non-IID tabular tasks.
-
LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection
LLM-generated synthetic datasets steered uniformly across a 2D performance space defined by two landmark algorithms improve meta-learner performance on algorithm selection for regression tasks.
-
AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors
AnomalyVFM converts vision foundation models into zero-shot anomaly detectors via three-stage synthetic dataset generation plus low-rank adapters and weighted pixel loss, reaching 94.1% average image AUROC across nine datasets.
-
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
LLM tabular generators leak memorized numeric strings, allowing a no-box attack to achieve near-perfect membership inference on some state-of-the-art models.
-
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.
-
The Power of Order: Fooling LLMs with Adversarial Table Permutations
Semantically invariant row and column permutations in tables can cause LLMs to output incorrect answers, and a gradient-based attack called ATP efficiently finds such permutations that degrade performance across many models.
-
Evaluating Inter-Column Logical Relationships in Synthetic Tabular Data Generation
Proposes three metrics for inter-column logical relationships in synthetic tabular data and reports that current generators often fail to preserve them on an industrial dataset.
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
-
Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation
CTGAN and LLMs generate synthetic student data that passes statistical and predictive utility checks for learning analytics.