VT-Bench aggregates 14 datasets totaling over 756K samples across 9 domains and evaluates 23 models to establish a unified testbed for visual-tabular multi-modal discriminative and generative tasks.
arXiv preprint arXiv:2406.08100 , year =
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
DenTab provides 2,000 annotated dental table images and 2,208 questions to benchmark 16 systems on table structure recognition and VQA, revealing that strong layout recovery does not ensure reliable multi-step arithmetic, and proposes a Table Router Pipeline combining VLMs with rule-based execution.
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
citing papers explorer
-
VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning
VT-Bench aggregates 14 datasets totaling over 756K samples across 9 domains and evaluates 23 models to establish a unified testbed for visual-tabular multi-modal discriminative and generative tasks.
-
DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates
DenTab provides 2,000 annotated dental table images and 2,208 questions to benchmark 16 systems on table structure recognition and VQA, revealing that strong layout recovery does not ensure reliable multi-step arithmetic, and proposes a Table Router Pipeline combining VLMs with rule-based execution.
-
Bridging Language Models and Financial Analysis
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.