Diffusion and Flow Matching Models for Tabular Data: A Survey
Pith reviewed 2026-05-25 07:58 UTC · model grok-4.3
The pith
This is the first survey dedicated to diffusion and flow matching models for tabular data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
To the best of our knowledge, this is the first survey dedicated specifically to diffusion and flow matching models for tabular data. We review work from June 2015 to May 2026, organize it around data-engineering challenges, tasks, design choices, and evaluation dimensions, and discuss open problems in scalability, feature dependency modeling, privacy, fairness, benchmarking, and constraint-aware generation.
What carries the argument
The survey's four-way organizational structure around data-engineering challenges, tasks, design choices, and evaluation dimensions.
If this is right
- Researchers can use the organization to locate methods for specific tabular tasks such as synthesis or imputation.
- Future work must address the documented gaps in scalability and constraint-aware generation.
- Standardized benchmarks would reduce the current fragmentation in evaluation protocols.
Where Pith is reading between the lines
- A shared evaluation protocol across tasks could accelerate progress by making incremental improvements visible.
- Constraint-aware variants may prove essential for regulated domains where synthetic data must obey hard rules.
- Privacy and fairness analyses could be integrated into the generative process rather than applied after the fact.
Load-bearing premise
The literature on diffusion and flow matching models for tabular data remains difficult to compare because methods target different tasks and rely on different representations, objectives, evaluation protocols, and domain assumptions.
What would settle it
Discovery of any earlier survey whose scope is limited to diffusion and flow matching models applied to tabular data.
Figures
read the original abstract
Deep generative models have made rapid progress in image, text, audio, and video generation, and are increasingly being applied to structured records. For tabular data, however, generative modeling remains difficult: a dataset may contain numerical and categorical attributes, missing values, sensitive fields, imbalanced categories, complex feature dependencies, and domain constraints. Earlier tabular data modeling methods based on GANs or VAEs have achieved useful results, but they can suffer from unstable training, mode collapse, weak modeling of multimodal distributions, and fragile handling of mixed-type features. Diffusion models have therefore attracted growing interest because their noising-and-denoising formulation provides a flexible and stable way to model complex data distributions, and has been adapted to tabular synthesis, missing-value imputation, trustworthy data generation, and anomaly detection. Flow matching offers a closely related route by learning transport vector fields along probability paths, often with more direct control over path design and sampling efficiency. Despite this progress, the literature on diffusion and flow matching models for tabular data remains difficult to compare because methods target different tasks and rely on different representations, objectives, evaluation protocols, and domain assumptions. To the best of our knowledge, this is the first survey dedicated specifically to diffusion and flow matching models for tabular data. We review work from June 2015 to May 2026, organize it around data-engineering challenges, tasks, design choices, and evaluation dimensions, and discuss open problems in scalability, feature dependency modeling, privacy, fairness, benchmarking, and constraint-aware generation. We maintain updates in a GitHub repository.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey of diffusion and flow matching models for tabular data, claiming to be the first dedicated review of the topic. It reviews literature from June 2015 to May 2026, organizes existing work around data-engineering challenges, tasks, design choices, and evaluation dimensions, and discusses open problems including scalability, feature dependency modeling, privacy, fairness, benchmarking, and constraint-aware generation. The authors state that they maintain updates in a GitHub repository.
Significance. If the coverage is comprehensive and free of selection bias, the survey would be significant for organizing an emerging, heterogeneous literature on generative models for structured data. The explicit maintenance of a GitHub repository for updates strengthens the work by providing a mechanism for ongoing relevance and community contribution.
minor comments (2)
- [Abstract] The review period is stated as extending to May 2026. The authors should clarify whether this is a projected cutoff, a typographical error, or the intended scope, as the current date of the manuscript appears to precede this endpoint.
- [Abstract] The abstract refers to a GitHub repository for updates but does not provide the URL. Including the repository link in the manuscript (and ideally in the abstract) would improve accessibility.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation of minor revision. The assessment correctly identifies the survey's scope, organization around data-engineering challenges and tasks, coverage of open problems, and the value of the maintained GitHub repository. No specific major comments were provided in the report.
Circularity Check
No significant circularity in survey paper
full rationale
This manuscript is explicitly a literature survey with no derivations, equations, predictions, or technical claims whose validity depends on internal self-reference. The sole novel assertion (being the first dedicated survey) is a factual statement about external literature coverage rather than a result derived from the paper's own inputs. No self-citation chains, fitted parameters renamed as predictions, or ansatzes are present. The work is therefore self-contained against external benchmarks with score 0.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering
TagCC anchors statistical tabular representations to LLM-derived textual semantic concepts via contrastive learning jointly optimized with a clustering objective, outperforming prior methods on benchmarks.
Reference graph
Works this paper leans on
-
[1]
Data mining in healthcare and biomedicine: a survey of the literature,
I. Yoo, P. Alafaireet, M. Marinov, K. Pena-Hernandez, R. Gopidi, J.- F. Chang, and L. Hua, “Data mining in healthcare and biomedicine: a survey of the literature,” Journal of medical systems , vol. 36, pp. 2431–2448, 2012
work page 2012
-
[2]
M. F. Dixon, I. Halperin, and P. Bilokon, Machine learning in finance. Springer, 2020, vol. 1170
work page 2020
-
[3]
A. Algarni, “Data mining in education,” International Journal of Advanced Computer Science and Applications , vol. 7, no. 6, pp. 456– 461, 2016
work page 2016
-
[4]
S. Anand, P. Padmanabham, A. Govardhan, and R. H. Kulkarni, “An extensive review on data mining methods and clustering models for intelligent transportation system,” Journal of Intelligent Systems , vol. 27, no. 2, pp. 263–273, 2018
work page 2018
-
[5]
Data mining in psychological treatment research: a primer on classification and regression trees
M. W. King and P. A. Resick, “Data mining in psychological treatment research: a primer on classification and regression trees.” Journal of consulting and clinical psychology , vol. 82, no. 5, p. 895, 2014
work page 2014
-
[6]
General data protection regulation,
G. GDPR, “General data protection regulation,” Regulation (EU), vol. 679, 2016
work page 2016
-
[7]
California consumer privacy act of 2018 (ccpa),
C. S. Legislature, “California consumer privacy act of 2018 (ccpa),” 2018, accessed: 2024-12-27. [Online]. Available: https: //oag.ca.gov/privacy/ccpa
work page 2018
-
[8]
Tabd- dpm: Modelling tabular data with diffusion models,
A. Kotelnikov, D. Baranchuk, I. Rubachev, and A. Babenko, “Tabd- dpm: Modelling tabular data with diffusion models,” in International Conference on Machine Learning . PMLR, 2023, pp. 17 564–17 579
work page 2023
-
[9]
Miwae: Deep generative modelling and imputation of incomplete data sets,
P.-A. Mattei and J. Frellsen, “Miwae: Deep generative modelling and imputation of incomplete data sets,” in International conference on machine learning. PMLR, 2019, pp. 4413–4423
work page 2019
-
[10]
A systematic review on imbalanced data challenges in machine learning: Applications and solutions,
H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions,” ACM computing surveys (CSUR) , vol. 52, no. 4, pp. 1–36, 2019
work page 2019
-
[11]
On oversampling imbalanced data with deep conditional generative models,
V . A. Fajardo, D. Findlay, C. Jaiswal, X. Yin, R. Houmanfar, H. Xie, J. Liang, X. She, and D. B. Emerson, “On oversampling imbalanced data with deep conditional generative models,” Expert Systems with Applications, vol. 169, p. 114463, 2021
work page 2021
-
[12]
Generating synthetic data in finance: opportunities, challenges and pitfalls,
S. A. Assefa, D. Dervovic, M. Mahfouz, R. E. Tillman, P. Reddy, and M. Veloso, “Generating synthetic data in finance: opportunities, challenges and pitfalls,” in Proceedings of the First ACM International Conference on AI in Finance , 2020, pp. 1–8
work page 2020
-
[13]
Synthetic data generation for tabular health records: A systematic review,
M. Hernandez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin, “Synthetic data generation for tabular health records: A systematic review,”Neurocomputing, vol. 493, pp. 28–45, 2022
work page 2022
-
[14]
Handling missing data with graph representation learning,
J. You, X. Ma, Y . Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems , vol. 33, pp. 19 075–19 087, 2020
work page 2020
-
[15]
Gain: Missing data imputation using generative adversarial nets,
J. Yoon, J. Jordon, and M. Schaar, “Gain: Missing data imputation using generative adversarial nets,” in International conference on machine learning. PMLR, 2018, pp. 5689–5698
work page 2018
-
[16]
Tabular and latent space synthetic data generation: a literature review,
J. Fonseca and F. Bacao, “Tabular and latent space synthetic data generation: a literature review,” Journal of Big Data , vol. 10, no. 1, p. 115, 2023
work page 2023
-
[17]
A tutorial on energy-based learning,
Y . LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang et al. , “A tutorial on energy-based learning,” Predicting structured data , vol. 1, no. 0, 2006
work page 2006
-
[18]
Auto-Encoding Variational Bayes
D. P. Kingma, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[19]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” Advances in neural information processing systems , vol. 27, 2014
work page 2014
-
[20]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”
-
[21]
[Online]. Available: https://arxiv.org/abs/1706.03762
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Normalizing flows: An introduction and review of current methods,
I. Kobyzev, S. J. Prince, and M. A. Brubaker, “Normalizing flows: An introduction and review of current methods,” IEEE transactions on pattern analysis and machine intelligence , vol. 43, no. 11, pp. 3964– 3979, 2020. MANUSCRIPT SUBMITTED TO IEEE FOR POSSIBLE PUBLICATION 21 TABLE VII OVERVIEW OF DIFFUSION MODELS FOR TABULAR DATA. T HE COLUMN “NUM” INDICAT...
work page 2020
-
[23]
Deep unsupervised learning using nonequilibrium thermodynamics,
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning . PMLR, 2015, pp. 2256–2265
work page 2015
-
[24]
Catastrophic forgetting and mode collapse in gans,
H. Thanh-Tung and T. Tran, “Catastrophic forgetting and mode collapse in gans,” in 2020 international joint conference on neural networks (ijcnn). IEEE, 2020, pp. 1–10
work page 2020
-
[25]
Diagnosing and enhancing vae models,
B. Dai and D. Wipf, “Diagnosing and enhancing vae models,” in International Conference on Learning Representations , 2019
work page 2019
-
[26]
D. Carbone, “Hitchhiker’s guide on energy-based models: a compre- hensive review on the relation with other generative models, sampling and statistical physics,” arXiv preprint arXiv:2406.13661 , 2024
-
[27]
Limitations of autoregressive models and their alternatives,
C.-C. Lin, A. Jaech, X. Li, M. R. Gormley, and J. Eisner, “Limitations of autoregressive models and their alternatives,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL- HLT), 2021
work page 2021
-
[28]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems , vol. 33, pp. 6840–6851, 2020
work page 2020
-
[29]
Score-based generative modeling through stochastic differential equations,
Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Rep- resentations
-
[30]
Wavegrad: Estimating gradients for waveform generation,
N. Chen, Y . Zhang, H. Zen, R. J. Weiss, M. Norouzi, and W. Chan, “Wavegrad: Estimating gradients for waveform generation,” in Inter- national Conference on Learning Representations , 2020
work page 2020
-
[31]
Diffwave: A versatile diffusion model for audio synthesis,
Z. Kong, W. Ping, J. Huang, K. Zhao, and B. Catanzaro, “Diffwave: A versatile diffusion model for audio synthesis,” in International Conference on Learning Representations , 2020
work page 2020
-
[32]
Argmax flows and multinomial diffusion: Learning categorical distributions,
E. Hoogeboom, D. Nielsen, P. Jaini, P. Forr ´e, and M. Welling, “Argmax flows and multinomial diffusion: Learning categorical distributions,” Advances in Neural Information Processing Systems , vol. 34, pp. 12 454–12 465, 2021
work page 2021
-
[33]
Structured denoising diffusion models in discrete state-spaces,
J. Austin, D. D. Johnson, J. Ho, D. Tarlow, and R. Van Den Berg, “Structured denoising diffusion models in discrete state-spaces,” Ad- vances in Neural Information Processing Systems , vol. 34, pp. 17 981– 17 993, 2021
work page 2021
-
[34]
A survey on video diffusion models,
Z. Xing, Q. Feng, H. Chen, Q. Dai, H. Hu, H. Xu, Z. Wu, and Y .-G. Jiang, “A survey on video diffusion models,”ACM Computing Surveys, vol. 57, no. 2, pp. 1–42, 2024
work page 2024
-
[35]
Generative diffusion models on graphs: methods and applications,
C. Liu, W. Fan, Y . Liu, J. Li, H. Li, H. Liu, J. Tang, and Q. Li, “Generative diffusion models on graphs: methods and applications,” in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023, pp. 6702–6711
work page 2023
-
[36]
Stasy: Score-based tabular data synthe- sis,
J. Kim, C. Lee, and N. Park, “Stasy: Score-based tabular data synthe- sis,” in The Eleventh International Conference on Learning Represen- tations, 2023
work page 2023
-
[37]
Autodiff: combining auto-encoder and diffusion model for tabular data synthe- sizing,
N. Suh, X. Lin, D.-Y . Hsieh, M. Honarkhah, and G. Cheng, “Autodiff: combining auto-encoder and diffusion model for tabular data synthe- sizing,” in NeurIPS 2023 Workshop on Synthetic Data Generation with Generative AI
work page 2023
-
[38]
Codi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis,
C. Lee, J. Kim, and N. Park, “Codi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis,” in International Conference on Machine Learning . PMLR, 2023, pp. 18 940–18 956
work page 2023
-
[39]
Mixed-type tabular data synthesis with score-based diffusion in latent space,
H. Zhang, J. Zhang, Z. Shen, B. Srinivasan, X. Qin, C. Faloutsos, H. Rangwala, and G. Karypis, “Mixed-type tabular data synthesis with score-based diffusion in latent space,” in The Twelfth International Conference on Learning Representations , 2024
work page 2024
-
[40]
Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees,
A. Jolicoeur-Martineau, K. Fatras, and T. Kachman, “Generating and imputing tabular data via diffusion and flow-based gradient-boosted trees,” in International Conference on Artificial Intelligence and Statis- tics. PMLR, 2024, pp. 1288–1296
work page 2024
-
[41]
Diffusion models: A comprehensive survey of methods and applications,
L. Yang, Z. Zhang, Y . Song, S. Hong, R. Xu, Y . Zhao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, pp. 1–39, 2023
work page 2023
-
[42]
A survey on generative diffusion models,
H. Cao, C. Tan, Z. Gao, Y . Xu, G. Chen, P.-A. Heng, and S. Z. Li, “A survey on generative diffusion models,” IEEE Transactions on Knowledge and Data Engineering , 2024
work page 2024
-
[43]
Diffusion models in vision: A survey,
F.-A. Croitoru, V . Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 9, pp. 10 850–10 869, 2023
work page 2023
-
[44]
Diffusion models in nlp: A survey,
Y . Zhu and Y . Zhao, “Diffusion models in nlp: A survey,”arXiv preprint arXiv:2303.07576, 2023
-
[45]
L. Lin, Z. Li, R. Li, X. Li, and J. Gao, “Diffusion models for time- MANUSCRIPT SUBMITTED TO IEEE FOR POSSIBLE PUBLICATION 22 series applications: a survey,” Frontiers of Information Technology & Electronic Engineering, vol. 25, no. 1, pp. 19–41, 2024
work page 2024
-
[46]
Challenges and opportunities of generative models on tabular data,
A. X. Wang, S. S. Chukova, C. R. Simpson, and B. P. Nguyen, “Challenges and opportunities of generative models on tabular data,” Applied Soft Computing , p. 112223, 2024
work page 2024
-
[47]
Generative models for tabular data: A review,
D.-K. Kim, D. Ryu, Y . Lee, and D.-H. Choi, “Generative models for tabular data: A review,”Journal of Mechanical Science and Technology, vol. 38, no. 9, pp. 4989–5005, 2024
work page 2024
-
[48]
A comprehensive survey on generative diffusion models for structured data,
H. Koo and T. E. Kim, “A comprehensive survey on generative diffusion models for structured data,” arXiv e-prints, pp. arXiv–2306, 2023
work page 2023
-
[49]
An introduction to variational autoencoders,
D. P. Kingma, M. Welling et al. , “An introduction to variational autoencoders,”Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019
work page 2019
-
[50]
Random variables, joint distribution functions, and copulas,
A. Sklar, “Random variables, joint distribution functions, and copulas,” Kybernetika, vol. 9, no. 6, pp. 449–460, 1973
work page 1973
-
[51]
D. A. Reynolds et al. , “Gaussian mixture models.” Encyclopedia of biometrics, vol. 741, no. 659-663, 2009
work page 2009
-
[52]
Clinical reasoning over tabular data and text with bayesian networks,
P. Rabaey, J. Deleu, S. Heytens, and T. Demeester, “Clinical reasoning over tabular data and text with bayesian networks,” in International Conference on Artificial Intelligence in Medicine . Springer, 2024, pp. 229–250
work page 2024
-
[53]
Smote: synthetic minority over-sampling technique,
N. V . Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of ar- tificial intelligence research, vol. 16, pp. 321–357, 2002
work page 2002
-
[54]
Borderline-smote: a new over- sampling method in imbalanced data sets learning,
H. Han, W.-Y . Wang, and B.-H. Mao, “Borderline-smote: a new over- sampling method in imbalanced data sets learning,” in International conference on intelligent computing . Springer, 2005, pp. 878–887
work page 2005
-
[55]
Synthetic minority oversampling using edited displacement-based k-nearest neighbors,
A. X. Wang, S. S. Chukova, and B. P. Nguyen, “Synthetic minority oversampling using edited displacement-based k-nearest neighbors,” Applied Soft Computing , vol. 148, p. 110895, 2023
work page 2023
-
[56]
M. Mukherjee and M. Khushi, “Smote-enc: A novel smote-based method to generate synthetic data for nominal and continuous features,” Applied system innovation , vol. 4, no. 1, p. 18, 2021
work page 2021
-
[57]
Adasyn: Adaptive synthetic sampling approach for imbalanced learning,
H. He, Y . Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE interna- tional joint conference on neural networks (IEEE world congress on computational intelligence). Ieee, 2008, pp. 1322–1328
work page 2008
-
[58]
synthpop: Bespoke creation of synthetic data in r,
B. Nowok, G. M. Raab, and C. Dibben, “synthpop: Bespoke creation of synthetic data in r,” Journal of statistical software, vol. 74, pp. 1–26, 2016
work page 2016
-
[59]
Modeling tabular data using conditional gan,
L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, “Modeling tabular data using conditional gan,” Advances in neural information processing systems , vol. 32, 2019
work page 2019
-
[60]
Goggle: Generative modelling for tabular data by learning relational structure,
T. Liu, Z. Qian, J. Berrevoets, and M. van der Schaar, “Goggle: Generative modelling for tabular data by learning relational structure,” in The Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[61]
Ctab-gan: Effective table data synthesizing,
Z. Zhao, A. Kunar, R. Birke, and L. Y . Chen, “Ctab-gan: Effective table data synthesizing,” in Asian Conference on Machine Learning . PMLR, 2021, pp. 97–112
work page 2021
-
[62]
Ctab- gan+: Enhancing tabular data synthesis,
Z. Zhao, A. Kunar, R. Birke, H. Van der Scheer, and L. Y . Chen, “Ctab- gan+: Enhancing tabular data synthesis,” Frontiers in big Data, vol. 6, p. 1296508, 2024
work page 2024
-
[63]
Large Language Models: A Survey
S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Ama- triain, and J. Gao, “Large language models: A survey,” arXiv preprint arXiv:2402.06196, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[64]
Language models are realistic tabular data generators,
V . Borisov, K. Sessler, T. Leemann, M. Pawelczyk, and G. Kasneci, “Language models are realistic tabular data generators,” in The Eleventh International Conference on Learning Representations , 2023. [Online]. Available: https://openreview.net/forum?id=cEygmQNOeI
work page 2023
-
[65]
J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al. , “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[66]
Diffusion models beat gans on image synthesis,
P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021
work page 2021
-
[67]
Sos: Score-based oversampling for tabular data,
J. Kim, C. Lee, Y . Shin, S. Park, M. Kim, N. Park, and J. Cho, “Sos: Score-based oversampling for tabular data,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2022, pp. 762–772
work page 2022
-
[68]
Large language models (LLMs) on tabular data: Prediction, generation, and understanding - a survey,
X. Fang, W. Xu, F. A. Tan, Z. Hu, J. Zhang, Y . Qi, S. H. Sengamedu, and C. Faloutsos, “Large language models (LLMs) on tabular data: Prediction, generation, and understanding - a survey,”Transactions on Machine Learning Research , 2024. [Online]. Available: https://openreview.net/forum?id=IZnrCGF9WI
work page 2024
-
[69]
Diffusion models for missing value imputation in tabular data,
S. Zheng and N. Charoenphakdee, “Diffusion models for missing value imputation in tabular data,” inNeurIPS 2022 First Table Representation Workshop
work page 2022
-
[70]
What do we really know about wages? the importance of nonreporting and census imputation,
L. Lillard, J. P. Smith, and F. Welch, “What do we really know about wages? the importance of nonreporting and census imputation,”Journal of Political Economy, vol. 94, no. 3, Part 1, pp. 489–506, 1986
work page 1986
-
[71]
Strategies for handling missing data in electronic health record derived data,
B. J. Wells, K. M. Chagin, A. S. Nowacki, and M. W. Kattan, “Strategies for handling missing data in electronic health record derived data,” Egems, vol. 1, no. 3, 2013
work page 2013
-
[72]
A survey on missing data in machine learning,
T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” Journal of Big data , vol. 8, pp. 1–37, 2021
work page 2021
-
[73]
D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3, pp. 581–592, 1976
work page 1976
-
[74]
Tabdiff: a unified diffusion model for multi-modal tabular data generation,
J. Shi, M. Xu, H. Hua, H. Zhang, S. Ermon, and J. Leskovec, “Tabdiff: a unified diffusion model for multi-modal tabular data generation,” in NeurIPS 2024 Third Table Representation Learning Workshop
work page 2024
-
[75]
Generative modeling by estimating gradients of the data distribution,
Y . Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019
work page 2019
-
[76]
P. E. Kloeden, E. Platen, P. E. Kloeden, and E. Platen, Stochastic differential equations. Springer, 1992
work page 1992
-
[77]
Neural ordinary differential equations,
R. T. Chen, Y . Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural ordinary differential equations,” Advances in neural information pro- cessing systems, vol. 31, 2018
work page 2018
-
[78]
Classifier-free diffusion guidance,
J. Ho and T. Salimans, “Classifier-free diffusion guidance,” in NeurIPS 2021 Workshop on Deep Generative Models and Downstream Appli- cations, 2021
work page 2021
-
[79]
Tabular data aug- mentation for machine learning: Progress and prospects of embracing generative ai,
L. Cui, H. Li, K. Chen, L. Shou, and G. Chen, “Tabular data aug- mentation for machine learning: Progress and prospects of embracing generative ai,” arXiv preprint arXiv:2407.21523 , 2024
-
[80]
Missdiff: Training diffusion models on tabular data with missing values,
Y . Ouyang, L. Xie, C. Li, and G. Cheng, “Missdiff: Training diffusion models on tabular data with missing values,” in ICML 2023 Workshop on Structured Probabilistic Inference {\&} Generative Modeling , 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.