ContinuousBench shows non-private synthetic text transfers corpus-specific capabilities while state-of-the-art DP methods fail to do so even at ε=100.
arXiv preprint arXiv:2504.05571 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
The paper defines the MPI task and proposes TriMPI, a three-stage training pipeline of continual pretraining, supervised finetuning, and policy-aware reinforcement learning that internalizes multimodal policies into model parameters for improved adherence without prompts at inference.
EmbGen creates synthetic QA data by entity decomposition, embedding-based reassembly into clusters, and multi-level sampling with cluster-specific prompts, yielding up to 88.9% higher Binary Accuracy than baselines on heterogeneous datasets under fixed token budgets.
citing papers explorer
-
EmbGen: Teaching with Reassembled Corpora
EmbGen creates synthetic QA data by entity decomposition, embedding-based reassembly into clusters, and multi-level sampling with cluster-specific prompts, yielding up to 88.9% higher Binary Accuracy than baselines on heterogeneous datasets under fixed token budgets.