arXiv preprint arXiv:2401.02524 , year=

Bauer, A · 2024 · arXiv 2401.02524

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Robust Spectral Watermark for Synthetic Tabular Data

cs.CR · 2025-11-26 · unverdicted · novelty 7.0

TAB-DRW embeds detectable watermarks in the frequency domain of normalized synthetic tabular data via DFT and rank-based pseudorandom bits, achieving robustness to attacks while preserving fidelity and supporting mixed data types.

Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.

Quality Degradation Attack in Synthetic Data

cs.CR · 2026-01-06 · unverdicted · novelty 6.0

Adversaries can degrade synthetic data quality via small manipulations such as label flipping or feature-importance interventions, substantially harming downstream model performance and increasing statistical divergence from real data.

Scaling Synthetic Data Creation with 1,000,000,000 Personas

cs.CL · 2024-06-28 · unverdicted · novelty 6.0

A curated set of one billion personas enables scalable, diverse synthetic data generation for LLM training across reasoning, instructions, knowledge, NPCs, and tools.

PuckTrick: A Library for Making Synthetic Data More Realistic

cs.LG · 2025-06-23 · unverdicted · novelty 5.0

PuckTrick library adds controlled imperfections to synthetic data and shows that models trained on the resulting contaminated data outperform those trained on clean synthetic data in financial dataset experiments.

citing papers explorer

Showing 5 of 5 citing papers.

Robust Spectral Watermark for Synthetic Tabular Data cs.CR · 2025-11-26 · unverdicted · none · ref 3
TAB-DRW embeds detectable watermarks in the frequency domain of normalized synthetic tabular data via DFT and rank-based pseudorandom bits, achieving robustness to attacks while preserving fidelity and supporting mixed data types.
Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition cs.AI · 2026-04-20 · unverdicted · none · ref 36
Adversarial competition between attacker and defender teams generates diverse multi-turn conversational data that improves LLM performance on secure code generation benchmarks by 18-29%.
Quality Degradation Attack in Synthetic Data cs.CR · 2026-01-06 · unverdicted · none · ref 11
Adversaries can degrade synthetic data quality via small manipulations such as label flipping or feature-importance interventions, substantially harming downstream model performance and increasing statistical divergence from real data.
Scaling Synthetic Data Creation with 1,000,000,000 Personas cs.CL · 2024-06-28 · unverdicted · none · ref 4
A curated set of one billion personas enables scalable, diverse synthetic data generation for LLM training across reasoning, instructions, knowledge, NPCs, and tools.
PuckTrick: A Library for Making Synthetic Data More Realistic cs.LG · 2025-06-23 · unverdicted · none · ref 8
PuckTrick library adds controlled imperfections to synthetic data and shows that models trained on the resulting contaminated data outperform those trained on clean synthetic data in financial dataset experiments.

arXiv preprint arXiv:2401.02524 , year=

fields

years

verdicts

representative citing papers

citing papers explorer