Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Liu, Haotian, Li, Chunyuan, Li, Yuheng, Lee, Yong Jae , title = · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Systematic evaluation finds cross-modal skill injection via model merging succeeds in instruction-following and cross-lingual scenarios but fails in mathematical reasoning, with TA and DARE methods outperforming others after hyperparameter analysis.

citing papers explorer

Showing 2 of 2 citing papers.

Deep Pre-Alignment for VLMs cs.CV · 2026-05-14 · unverdicted · none · ref 11
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters cs.CL · 2026-05-19 · unverdicted · none · ref 10
Systematic evaluation finds cross-modal skill injection via model merging succeeds in instruction-following and cross-lingual scenarios but fails in mathematical reasoning, with TA and DARE methods outperforming others after hyperparameter analysis.

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

fields

years

verdicts

representative citing papers

citing papers explorer