PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.
Tabert: Pretraining for joint understanding of textual and tabular data
8 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 8representative citing papers
STC reduces tabular chunk counts by up to 56% versus baselines and raises hybrid MRR to 0.5945 and BM25 Recall@1 to 0.754 by preserving row structure during chunking.
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.
EnoTab is a dual denoising framework for TableQA that performs evidence-based question denoising via semantic unit decomposition and evidence tree-guided table pruning with post-order rollback to improve performance on complex questions and large-scale tables.
citing papers explorer
-
PIPER: Content-Based Table Search via profiling and LLM-Generated Pseudoqueries
PIPER retrieves and ranks tabular datasets by profiling their content and using LLM-generated queries for dense vector search, outperforming metadata baselines and TableQA methods in low-metadata settings.
-
Structure-Aware Chunking for Tabular Data in Retrieval-Augmented Generation
STC reduces tabular chunk counts by up to 56% versus baselines and raises hybrid MRR to 0.5945 and BM25 Recall@1 to 0.754 by preserving row structure during chunking.
-
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation
KaSLA applies knapsack optimization hierarchically to schema linking for LLM text-to-SQL, claiming better results than large models and improved SQL generation on Spider and BIRD.
-
Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning
TaNOS decouples table semantics from numerical structure via anonymization, sketches, and program-first self-supervision, yielding 80.13% FinQA accuracy with 10% data and near-zero cross-domain gap versus over 10pp for standard fine-tuning.
-
TabEmb: Joint Semantic-Structure Embedding for Table Annotation
TabEmb decouples LLM-based semantic column embeddings from graph-based structural modeling to produce joint representations that improve table annotation tasks.
-
XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL
XiYan-SQL achieves SOTA Text-to-SQL accuracy by combining schema filtering, a multi-generator ensemble fine-tuned on varied SQL formats, and a selection model.
-
Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining
Table-specific pretraining of Llama-2 yields significant gains on zero-shot, few-shot, and in-context tabular prediction tasks over prior benchmarks.
-
When TableQA Meets Noise: A Dual Denoising Framework for Complex Questions and Large-scale Tables
EnoTab is a dual denoising framework for TableQA that performs evidence-based question denoising via semantic unit decomposition and evidence tree-guided table pruning with post-order rollback to improve performance on complex questions and large-scale tables.