TraversalBench shows self-intersections cause the sharpest performance drops for VLMs on exact path traversal, with errors localized at the first crossing.
Visualpuzzles: Decoupling multimodal reasoning evaluation from domain knowledge
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6representative citing papers
SCOLAR fixes information gain collapse in latent visual reasoning by generating independent auxiliary visual tokens via a detransformer, extending acceptable CoT length over 30x and delivering +14.12% gains on reasoning benchmarks.
ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
SALLIE detects jailbreaks in text and vision-language models by extracting residual stream activations, scoring maliciousness per layer with k-NN, and ensembling predictions, outperforming baselines on multiple datasets.
citing papers explorer
-
TraversalBench: Challenging Paths to Follow for Vision Language Models
TraversalBench shows self-intersections cause the sharpest performance drops for VLMs on exact path traversal, with errors localized at the first crossing.
-
Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model
SCOLAR fixes information gain collapse in latent visual reasoning by generating independent auxiliary visual tokens via a detransformer, extending acceptable CoT length over 30x and delivering +14.12% gains on reasoning benchmarks.
-
Reinforcing Multimodal Reasoning Against Visual Degradation
ROMA improves MLLM robustness to seen and unseen visual corruptions by +2.3-2.4% over GRPO on seven reasoning benchmarks while matching clean accuracy.
-
MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?
MapTab is a new multimodal benchmark with 328 images and nearly 200k queries that shows current MLLMs have substantial difficulty with multi-criteria route planning when visual and tabular information must be combined.
-
SALLIE: Safeguarding Against Latent Language & Image Exploits
SALLIE detects jailbreaks in text and vision-language models by extracting residual stream activations, scoring maliciousness per layer with k-NN, and ensembling predictions, outperforming baselines on multiple datasets.
- Semantic-Enriched Latent Visual Reasoning