LLM-TabLogic extracts inter-column logical constraints using LLMs and conditions a score-based latent diffusion model on them to generate synthetic tabular data that preserves those relationships.
arXiv preprint arXiv:2502.04055 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Current evaluations of synthetic tabular data mainly focus on how well joint distributions are modeled, often overlooking the assessment of their effectiveness in preserving realistic event sequences and coherent entity relationships across columns.This paper proposes three evaluation metrics designed to assess the preservation of logical relationships among columns in synthetic tabular data. We validate these metrics by assessing the performance of both classical and state-of-the-art generation methods on a real-world industrial dataset.Experimental results reveal that existing methods often fail to rigorously maintain logical consistency (e.g., hierarchical relationships in geography or organization) and dependencies (e.g., temporal sequences or mathematical relationships), which are crucial for preserving the fine-grained realism of real-world tabular data. Building on these insights, this study also discusses possible pathways to better capture logical relationships while modeling the distribution of synthetic tabular data. The code is available at https://github.com/Yunbo-max/TabLogicEval.
citation-role summary
citation-polarity summary
fields
cs.LG 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
citing papers explorer
-
LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion
LLM-TabLogic extracts inter-column logical constraints using LLMs and conditions a score-based latent diffusion model on them to generate synthetic tabular data that preserves those relationships.
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.