On-policy self-distillation with sampled demonstrations reduces rollout diversity by amplifying existing probability gaps in the base model, unlike ideal RL which preserves ratios among correct outputs.
arXiv preprint arXiv:2312.05328 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
An iterative bootstrapped self-filtering approach selects balanced clean and diverse subsets from noisy vision-language datasets to train improved CLIP models.
citing papers explorer
-
On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity
On-policy self-distillation with sampled demonstrations reduces rollout diversity by amplifying existing probability gaps in the base model, unlike ideal RL which preserves ratios among correct outputs.
-
Data Selection Through Iterative Self-Filtering for Vision-Language Settings
An iterative bootstrapped self-filtering approach selects balanced clean and diverse subsets from noisy vision-language datasets to train improved CLIP models.