TAB-DRW embeds detectable watermarks in the frequency domain of normalized synthetic tabular data via DFT and rank-based pseudorandom bits, achieving robustness to attacks while preserving fidelity and supporting mixed data types.
InProceedings of the 33 International Joint Conference on Artificial Intelligence, pages 8038–8047
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
HopWeaver automatically synthesizes authentic bridge and comparison multi-hop questions from cross-document sources via a pipeline that identifies complementary documents and builds reasoning paths.
StealthGraph generates implicit domain-relevant harmful prompts via knowledge-graph guidance and two-strategy obfuscation to enable more realistic red-teaming of LLM safety.
A vision-language grounded framework generates and evaluates synthetic remote sensing data, releasing ARAS400k where augmented training outperforms real-data baselines for segmentation and captioning.
Model developers must address human concerns, preferences, values, and goals with rigor at every stage of the LLM pipeline rather than only in post-training.
citing papers explorer
-
Robust Spectral Watermark for Synthetic Tabular Data
TAB-DRW embeds detectable watermarks in the frequency domain of normalized synthetic tabular data via DFT and rank-based pseudorandom bits, achieving robustness to attacks while preserving fidelity and supporting mixed data types.
-
HopWeaver: Cross-Document Synthesis of High-Quality and Authentic Multi-Hop Questions
HopWeaver automatically synthesizes authentic bridge and comparison multi-hop questions from cross-document sources via a pipeline that identifies complementary documents and builds reasoning paths.
-
StealthGraph: Exposing Domain-Specific Risks in LLMs through Knowledge-Graph-Guided Harmful Prompt Generation
StealthGraph generates implicit domain-relevant harmful prompts via knowledge-graph guidance and two-strategy obfuscation to enable more realistic red-teaming of LLM safety.
-
Grounding Synthetic Data Generation With Vision and Language Models
A vision-language grounded framework generates and evaluates synthetic remote sensing data, releasing ARAS400k where augmented training outperforms real-data baselines for segmentation and captioning.
-
Reflections and New Directions for Human-Centered Large Language Models
Model developers must address human concerns, preferences, values, and goals with rigor at every stage of the LLM pipeline rather than only in post-training.