hub

Tablevqa-bench: A visual question answering benchmark on multiple table domains

Tablevqa-bench: a visual question answering benchmark on multiple table domains ( · 2024 · arXiv 2404.19205

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 background 1

representative citing papers

WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild

cs.CV · 2026-05-01 · conditional · novelty 8.0 · 2 refs

WildTableBench is the first QA benchmark for naturally occurring table images, where 21 multimodal models were evaluated and only one exceeded 50% accuracy.

TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

TableVista benchmark finds foundation models maintain performance across visual styles but degrade sharply on complex table structures and vision-only settings.

TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

cs.AI · 2026-04-04 · conditional · novelty 7.0

TableVision benchmark shows explicit spatial grounding recovers MLLM reasoning on hierarchical tables, delivering 12.3% accuracy improvement through a decoupled perception-reasoning framework.

Internalized Reasoning for Long-Context Visual Document Understanding

cs.CV · 2026-03-31 · unverdicted · novelty 7.0

A synthetic pipeline creates and internalizes reasoning traces in VLMs for long-context visual document understanding, with a 32B model surpassing a 235B model on MMLongBenchDoc and showing 12.4x fewer output tokens.

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

cs.CV · 2025-09-09 · conditional · novelty 7.0

Visual-TableQA is a new open-domain benchmark of rendered table images and complex QA pairs created via multi-LLM collaborative generation, with fine-tuned models showing robust generalization to external tests.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

cs.CV · 2024-12-31 · accept · novelty 7.0

OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

Large Vision-Language Models Get Lost in Attention

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.

DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

DenTab provides 2,000 annotated dental table images and 2,208 questions to benchmark 16 systems on table structure recognition and VQA, revealing that strong layout recovery does not ensure reliable multi-step arithmetic, and proposes a Table Router Pipeline combining VLMs with rule-based execution.

DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis

cs.LG · 2026-05-02 · unverdicted · novelty 3.0

DataArc-SynData-Toolkit is an open-source, configuration-driven framework that unifies synthetic data generation for multimodal, multilingual, and multi-task LLM training with improved usability and quality control.

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

cs.CL · 2025-10-08 · unverdicted · novelty 3.0

A survey that categorizes TQA benchmarks and LLM modeling strategies by challenges while identifying underexplored areas such as reinforcement learning.

VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

cs.CV · 2026-05-03

citing papers explorer

Showing 11 of 11 citing papers.

WildTableBench: Benchmarking Multimodal Foundation Models on Table Understanding In the Wild cs.CV · 2026-05-01 · conditional · none · ref 20 · 2 links
WildTableBench is the first QA benchmark for naturally occurring table images, where 21 multimodal models were evaluated and only one exceeded 50% accuracy.
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity cs.CL · 2026-05-07 · unverdicted · none · ref 12
TableVista benchmark finds foundation models maintain performance across visual styles but degrade sharply on complex table structures and vision-only settings.
TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables cs.AI · 2026-04-04 · conditional · none · ref 30
TableVision benchmark shows explicit spatial grounding recovers MLLM reasoning on hierarchical tables, delivering 12.3% accuracy improvement through a decoupled perception-reasoning framework.
Internalized Reasoning for Long-Context Visual Document Understanding cs.CV · 2026-03-31 · unverdicted · none · ref 22
A synthetic pipeline creates and internalizes reasoning traces in VLMs for long-context visual document understanding, with a 32B model surpassing a 235B model on MMLongBenchDoc and showing 12.4x fewer output tokens.
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images cs.CV · 2025-09-09 · conditional · none · ref 21
Visual-TableQA is a new open-domain benchmark of rendered table images and complex QA pairs created via multi-LLM collaborative generation, with fine-tuned models showing robust generalization to external tests.
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning cs.CV · 2024-12-31 · accept · none · ref 18
OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.
Large Vision-Language Models Get Lost in Attention cs.AI · 2026-05-07 · unverdicted · none · ref 96
In LVLMs, attention can be replaced by random Gaussian weights with little or no performance loss, indicating that current models get lost in attention rather than efficiently using visual context.
DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates cs.CV · 2026-04-17 · unverdicted · none · ref 15
DenTab provides 2,000 annotated dental table images and 2,208 questions to benchmark 16 systems on table structure recognition and VQA, revealing that strong layout recovery does not ensure reliable multi-step arithmetic, and proposes a Table Router Pipeline combining VLMs with rule-based execution.
DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis cs.LG · 2026-05-02 · unverdicted · none · ref 20
DataArc-SynData-Toolkit is an open-source, configuration-driven framework that unifies synthetic data generation for multimodal, multilingual, and multi-task LLM training with improved usability and quality control.
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation cs.CL · 2025-10-08 · unverdicted · none · ref 10
A survey that categorizes TQA benchmarks and LLM modeling strategies by challenges while identifying underexplored areas such as reinforcement learning.
VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning cs.CV · 2026-05-03 · unreviewed · ref 49

Tablevqa-bench: A visual question answering benchmark on multiple table domains

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer