Tabert: Pretraining for joint understanding of textual and tabular data

Yin, P · 2005 · arXiv 2005.08314

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

representative citing papers

PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries

cs.IR · 2026-05-18 · unverdicted · novelty 6.0

PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.

Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

STC reduces tabular chunk counts by up to 56% versus baselines and raises hybrid MRR to 0.5945 and BM25 Recall@1 to 0.754 by preserving row structure during chunking.

Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation

cs.CL · 2025-02-18 · unverdicted · novelty 6.0

KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.

TabEmb: Joint Semantic-Structure Embedding for Table Annotation

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

cs.CL · 2025-07-07 · unverdicted · novelty 5.0

XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.

Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining

cs.LG · 2024-03-29 · unverdicted · novelty 5.0

Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.

When TableQA Meets Noise: A Dual Denoising Framework for Complex Questions and Large-scale Tables

cs.CL · 2025-09-22 · unverdicted · novelty 4.0

EnoTab is a dual denoising framework for TableQA that performs evidence-based question denoising via semantic unit decomposition and evidence tree-guided table pruning with post-order rollback to improve performance on complex questions and large-scale tables.

citing papers explorer

Showing 8 of 8 citing papers.

PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries cs.IR · 2026-05-18 · unverdicted · none · ref 35
PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation cs.CL · 2026-05-01 · unverdicted · none · ref 11
STC reduces tabular chunk counts by up to 56% versus baselines and raises hybrid MRR to 0.5945 and BM25 Recall@1 to 0.754 by preserving row structure during chunking.
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation cs.CL · 2025-02-18 · unverdicted · none · ref 53
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning cs.LG · 2026-04-23 · unverdicted · none · ref 10
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
TabEmb: Joint Semantic-Structure Embedding for Table Annotation cs.LG · 2026-04-21 · unverdicted · none · ref 94
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL cs.CL · 2025-07-07 · unverdicted · none · ref 11
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining cs.LG · 2024-03-29 · unverdicted · none · ref 16
Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.
When TableQA Meets Noise: A Dual Denoising Framework for Complex Questions and Large-scale Tables cs.CL · 2025-09-22 · unverdicted · none · ref 33
EnoTab is a dual denoising framework for TableQA that performs evidence-based question denoising via semantic unit decomposition and evidence tree-guided table pruning with post-order rollback to improve performance on complex questions and large-scale tables.

Tabert: Pretraining for joint understanding of textual and tabular data

fields

years

verdicts

representative citing papers

citing papers explorer