MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.
Selec- tion via proxy: Efficient data selection for deep learning
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
POES frames prompt evaluation as online adaptive testing and uses a provably submodular objective to pick informative examples, delivering 6.2% higher average accuracy and 35-60% token savings versus naive full-set scoring.
AlignPrune uses a Dynamic Alignment Score from loss trajectories to identify noisy samples more accurately than per-sample loss, improving pruning accuracy by up to 6.3% on noisy benchmarks.
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
citing papers explorer
-
Multimodal Distribution Matching for Vision-Language Dataset Distillation
MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.
-
Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
POES frames prompt evaluation as online adaptive testing and uses a provably submodular objective to pick informative examples, delivering 6.2% higher average accuracy and 35-60% token savings versus naive full-set scoring.
-
Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment
AlignPrune uses a Dynamic Alignment Score from loss trajectories to identify noisy samples more accurately than per-sample loss, improving pruning accuracy by up to 6.3% on noisy benchmarks.
-
Learning to Reason at the Frontier of Learnability
A curriculum sampling questions with high variance in success rate improves reinforcement learning performance for LLM reasoning tasks.
-
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Step-Video-T2V describes a 30B-parameter text-to-video model with custom Video-VAE, 3D DiT, flow matching, and Video-DPO that claims state-of-the-art results on a new internal benchmark.
- Exploring and Exploiting Stability in Latent Flow Matching