TableVista benchmark finds foundation models maintain performance across visual styles but degrade sharply on complex table structures and vision-only settings.
arXiv preprint arXiv:2505.21771 , year =
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
VT-Bench aggregates 14 datasets from 9 domains and evaluates 23 models to standardize visual-tabular discriminative and generative tasks.
V-tableR1 uses a critic VLM for dense step-level feedback and a new PGPO algorithm to shift multimodal table reasoning from pattern matching to verifiable logical steps, achieving SOTA accuracy with a 4B open-source model.
citing papers explorer
-
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
TableVista benchmark finds foundation models maintain performance across visual styles but degrade sharply on complex table structures and vision-only settings.
-
VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning
VT-Bench aggregates 14 datasets from 9 domains and evaluates 23 models to standardize visual-tabular discriminative and generative tasks.
-
V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization
V-tableR1 uses a critic VLM for dense step-level feedback and a new PGPO algorithm to shift multimodal table reasoning from pattern matching to verifiable logical steps, achieving SOTA accuracy with a 4B open-source model.