POTATR extends TATR into a 29M-parameter image-to-graph model for contextual page-level table extraction, reporting 0.964 GriTS_Con on PubTables-v2 Single Pages while running 130x faster and 300x cheaper than tested alternatives including MLLMs.
PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Table extraction (TE) is a key challenge in document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in developing methods, such as vision-language models (VLMs), to extract tables directly in their full page or document context. However, a lack of annotated data has made progress difficult to demonstrate. To address this, we create a new large-scale dataset, PubTables-v2. PubTables-v2 unifies TE across various levels of surrounding context and, notably, is the first benchmark for multi-page TE. Our evaluations reveal that while current frontier models strongly outperform ($+0.354\ \textrm{GriTS}_\textrm{Con}$) small models on the most complex task (full-document multi-page TE), this gap can be closed or even reversed ($-0.056\ \textrm{GriTS}_\textrm{Con}$) on narrower tasks (cropped table extraction) with targeted training. Data is available at https://huggingface.co/datasets/kensho/PubTables-v2. Code and models will be released.
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
POTATR: A Lightweight Image-to-Graph Model for Page-Level Table Extraction
POTATR extends TATR into a 29M-parameter image-to-graph model for contextual page-level table extraction, reporting 0.964 GriTS_Con on PubTables-v2 Single Pages while running 130x faster and 300x cheaper than tested alternatives including MLLMs.