Mathvista: Evaluating mathemat- ical reasoning of foundation models in visual contexts

Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens?

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

The Token Replacement Test shows VLMs keep most accuracy gains even after corrupting or replacing continuous thought token content, indicating the tokens are not used as information bottlenecks.

Improving Vision-language Models with Perception-centric Process Reward Models

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

Perceval is a perception-centric PRM that detects token-level perceptual errors in VLMs, supporting token-advantage RL training and iterative test-time scaling for improved reasoning.

citing papers explorer

Showing 2 of 2 citing papers.

Ablate-to-Validate: Are Vision-Language Models Really Using Continuous Thought Tokens? cs.CV · 2026-05-20 · unverdicted · none · ref 16
The Token Replacement Test shows VLMs keep most accuracy gains even after corrupting or replacing continuous thought token content, indicating the tokens are not used as information bottlenecks.
Improving Vision-language Models with Perception-centric Process Reward Models cs.CV · 2026-04-27 · unverdicted · none · ref 26
Perceval is a perception-centric PRM that detects token-level perceptual errors in VLMs, supporting token-advantage RL training and iterative test-time scaling for improved reasoning.

Mathvista: Evaluating mathemat- ical reasoning of foundation models in visual contexts

fields

years

verdicts

representative citing papers

citing papers explorer