Learn to explain: Multimodal reasoning via thought chains for science question answering

Pan Lu, Swaroop Mishra, Tanglin Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

A More Word-like Image Tokenization for MLLMs

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

DiVT clusters patch embeddings into coherent semantic units and adapts token count to image complexity, matching or exceeding baselines with fewer visual tokens on multimodal benchmarks.

Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

CoM-PT trains vision foundation models in ascending size order using inverse knowledge transfer, allowing larger models to achieve superior performance with significantly reduced overall computational cost compared to individual training.

citing papers explorer

Showing 2 of 2 citing papers.

A More Word-like Image Tokenization for MLLMs cs.CV · 2026-05-18 · unverdicted · none · ref 33
DiVT clusters patch embeddings into coherent semantic units and adapts token count to image complexity, matching or exceeding baselines with fewer visual tokens on multimodal benchmarks.
Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models cs.CV · 2026-04-14 · unverdicted · none · ref 46
CoM-PT trains vision foundation models in ascending size order using inverse knowledge transfer, allowing larger models to achieve superior performance with significantly reduced overall computational cost compared to individual training.

Learn to explain: Multimodal reasoning via thought chains for science question answering

fields

years

verdicts

representative citing papers

citing papers explorer