LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.
The modality focusing hypothesis: Towards understand- ing crossmodal knowledge distillation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
PTA framework purifies noisy multimodal data via meta-learning and distills cross-modal knowledge through diffusion to create robust single-modality models under missing modalities.
citing papers explorer
-
Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge
LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.
-
Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher
PTA framework purifies noisy multimodal data via meta-learning and distills cross-modal knowledge through diffusion to create robust single-modality models under missing modalities.