OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.
InProceed- ings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2131–2153, Singapore
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
PoTable uses multiple analytical stages with plan-then-execute code generation to produce accurate, commented, executable programs for table reasoning on WikiTQ and TabFact.
A survey that categorizes TQA benchmarks and LLM modeling strategies by challenges while identifying underexplored areas such as reinforcement learning.
citing papers explorer
-
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.
-
PoTable: Towards Systematic Thinking via Plan-then-Execute Stage Reasoning on Tables
PoTable uses multiple analytical stages with plan-then-execute code generation to produce accurate, commented, executable programs for table reasoning on WikiTQ and TabFact.
-
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
A survey that categorizes TQA benchmarks and LLM modeling strategies by challenges while identifying underexplored areas such as reinforcement learning.