Improving Robustness of Tabular Retrieval via Representational Stability

Adarsh Singh; Jianxi Gao; Kushal Raj Bhandari; Soham Dan; Vivek Gupta

arxiv: 2604.24040 · v2 · submitted 2026-04-27 · 💻 cs.CL · cs.AI· cs.IR· cs.IT· math.IT

Improving Robustness of Tabular Retrieval via Representational Stability

Kushal Raj Bhandari , Adarsh Singh , Jianxi Gao , Soham Dan , Vivek Gupta This is my paper

Pith reviewed 2026-05-08 03:55 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IRcs.ITmath.IT

keywords tabular retrievalserialization sensitivityembedding centroidrepresentational stabilitydense retrieversbottleneck adapter

0 comments

The pith

Averaging embeddings from multiple table serializations yields more stable retrieval than any single format.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Transformer table retrievers turn structured tables into token sequences using formats such as CSV, TSV, HTML, Markdown or DDL, yet each format shifts the resulting embedding and changes which tables are retrieved. The paper treats these equivalent serializations as noisy views of one underlying semantic signal and replaces any single embedding with their centroid. This averaging step reduces format-specific variation and recovers the shared content across tables. A lightweight residual bottleneck adapter is then trained on a frozen encoder to push single-format embeddings toward the centroid target while preserving variance through covariance regularization. The centroid itself ranks higher than individual formats in aggregate tests across MPNet, BGE-M3, ReasonIR and SPLADE, and the adapter brings robustness gains for dense retrievers.

Core claim

Serialization embeddings act as noisy observations of a shared semantic signal; their centroid supplies a canonical representation that suppresses format-induced shifts when those shifts differ across tables. Centroid vectors outrank any single serialization in pairwise retrieval comparisons on multiple benchmarks. A residual bottleneck adapter maps individual embeddings toward these centroid targets while enforcing covariance regularization, delivering robustness improvements for several dense retrievers.

What carries the argument

Centroid averaging of embeddings produced by semantically equivalent serializations (csv, tsv, html, markdown, ddl) treated as independent noisy views of the same table semantics.

If this is right

Centroid representations outperform every individual serialization in aggregate pairwise ranking across four retriever families.
The residual bottleneck adapter raises retrieval robustness for dense models while leaving sparse lexical retrievers largely unchanged.
Serialization sensitivity constitutes a measurable source of variance that post-hoc geometric correction can mitigate.
Gains remain model-dependent and are strongest when format-induced shifts are heterogeneous across tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same centroid construction could be applied to other multi-view data such as code snippets rendered in different languages or documents presented with varying layouts.
A single adapter trained once on a modest set of tables might be reused across larger frozen encoders without retraining the base model.
If the assumption of independent noise holds, the method should also reduce sensitivity to prompt wording or minor table edits that preserve semantics.

Load-bearing premise

Different serializations supply independent noisy observations whose average recovers the common semantic content without correlated distortions or lost meaning in some formats.

What would settle it

A direct test on held-out tables that measures whether centroid or adapter-adjusted embeddings retrieve the same relevant documents as the best single serialization when the tables are presented in every format.

Figures

Figures reproduced from arXiv: 2604.24040 by Adarsh Singh, Jianxi Gao, Kushal Raj Bhandari, Soham Dan, Vivek Gupta.

**Figure 1.** Figure 1: Serialization sensitivity in table retrieval. (a) Different serializations of the same WTQ table (csv/200-csv/0) map to distinct regions in the ReasonIR embedding space. (b) A single table shown under multiple serialization views with their adapter transport: circles denote original frozen embeddings, crosses denote adapted embeddings after transport, and diamonds show the centroid(CENTROID ALL) embedding… view at source ↗

**Figure 2.** Figure 2: Pairwise comparison of table formats aggregated over matched evaluation instances across all models and datasets. Panel (a): Average paired rank difference per cell, where positive values indicate the column format outperforms the row. Formats are ordered by mean rank, strongest toward the upper-left. Panel (b): Benjamini-Hochberg FDR-adjusted p-values from pairwise Wilcoxon signed-rank tests on the same i… view at source ↗

**Figure 3.** Figure 3: Heatmaps comparing adapter-based rank changes relative to the base model across three datasets, WTQ, WikiSQL, and NQ-Tables. Panels a-c show Adapter vs Base, and panels d-f show Adapter Subset vs Base. Rows correspond to input formats and columns to retriever backbones. Each cell reports the mean log-rank improvement, computed as ∆ log-rank(base - adapter) as detailed in Appendix B. Red indicates improveme… view at source ↗

**Figure 4.** Figure 4: Analysis of the format-specific shift δs(T) into its table-independent component ∥µδs ∥ (Equation 5) and table-dependent component ϵ¯s (Equation 6), shown per serialization format across four retrievers and three datasets. Each point represents one format, colored by its serialization family. The diagonal marks ∥µδs ∥/ϵ¯s = 1. Formats above the diagonal carry format-specific shifts that remain largely the … view at source ↗

**Figure 5.** Figure 5: Training trajectories of the adapter under the loss weighting in Table view at source ↗

**Figure 6.** Figure 6: Adapter transport across ten tables and four retrieval models. Each panel shows view at source ↗

read the original abstract

Transformer-based table retrieval systems flatten structured tables into token sequences, making retrieval sensitive to the choice of serialization even when table semantics remain unchanged. We show that semantically equivalent serializations, such as $\texttt{csv}$, $\texttt{tsv}$, $\texttt{html}$, $\texttt{markdown}$, and $\texttt{ddl}$, can produce substantially different embeddings and retrieval results across multiple benchmarks and retriever families. To address this instability, we treat serialization embedding as noisy views of a shared semantic signal and use its centroid as a canonical target representation. We show that centroid averaging suppresses format-specific variation and can recover the semantic content common to different serializations when format-induced shifts differ across tables. Empirically, centroid representations outrank individual formats in aggregate pairwise comparisons across $\texttt{MPNet}$, $\texttt{BGE-M3}$, $\texttt{ReasonIR}$, and $\texttt{SPLADE}$. We further introduce a lightweight residual bottleneck adapter on top of a frozen encoder that maps single-serialization embeddings towards centroid targets while preserving variance and enforcing covariance regularization. The adapter improves robustness for several dense retrievers, though gains are model-dependent and weaker for sparse lexical retrieval. These results identify serialization sensitivity as a major source of retrieval variance and show the promise of post hoc geometric correction for serialization-invariant table retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Centroid averaging across table serializations plus a residual adapter gives better retrieval than single formats on several dense models, but the independence assumption looks shaky for similar formats and the abstract lacks experimental details.

read the letter

The paper's core point is that different ways of turning a table into text (CSV, TSV, HTML, Markdown, DDL) produce noticeably different embeddings even when the underlying table is the same, and that averaging those embeddings into a centroid often beats any one of them for retrieval. They also train a small residual adapter on a frozen encoder to push single-format embeddings toward that centroid while keeping variance and adding covariance regularization. This is a direct, practical response to a real annoyance in table search and QA pipelines where serialization choice affects results.

Referee Report

2 major / 2 minor

Summary. The paper argues that flattening tables into different serializations (csv, tsv, html, markdown, ddl) produces substantially different embeddings and retrieval rankings even when table semantics are fixed. It treats these embeddings as noisy views of a shared semantic signal, proposes their centroid as a canonical stable target, and introduces a lightweight residual bottleneck adapter (with covariance regularization) that maps single-format embeddings toward the centroid while preserving variance. Empirically, centroids are shown to outrank individual formats in aggregate pairwise comparisons across MPNet, BGE-M3, ReasonIR, and SPLADE, and the adapter yields robustness gains for several dense retrievers.

Significance. If the empirical claims hold after controlling for inter-format correlations and providing full benchmark details, the work would usefully identify serialization choice as a major source of variance in table retrieval and demonstrate a practical post-hoc geometric correction that avoids retraining the base encoder. The adapter's design (frozen encoder plus residual bottleneck plus covariance term) is a concrete, lightweight contribution that could be adopted in production table-retrieval pipelines.

major comments (2)

[Abstract and §3] Abstract and §3 (method): The central modeling assumption that the five serializations act as independent noisy views whose centroid recovers common semantic content is load-bearing for both the centroid claim and the adapter training objective. CSV and TSV differ only by delimiter and share nearly identical token sequences; HTML and Markdown both introduce nested markup. These structural similarities make correlated rather than independent embedding shifts likely. When shifts are correlated, averaging need not suppress format-specific variation and may instead reinforce shared biases. No analysis of pairwise embedding correlations or of whether gains survive after controlling for correlation is reported.
[Abstract] Abstract: The claim that 'centroid representations outrank individual formats in aggregate pairwise comparisons' is presented without naming the benchmarks, the number of tables/queries, the exact retrieval metric, baseline systems, or any statistical significance tests. This absence prevents verification of whether the reported superiority is robust or driven by particular table characteristics.

minor comments (2)

[Abstract] Notation: The paper uses 'centroid' without an explicit equation defining how it is computed from the five embeddings (simple arithmetic mean? weighted?). Adding this definition early would improve reproducibility.
[§4] The adapter description mentions 'covariance regularization' but does not state the precise loss term or the hyper-parameter controlling its strength. A short equation or pseudocode would clarify the training objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these insightful comments, which highlight important aspects of our modeling assumptions and the presentation of results. We respond to each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method): The central modeling assumption that the five serializations act as independent noisy views whose centroid recovers common semantic content is load-bearing for both the centroid claim and the adapter training objective. CSV and TSV differ only by delimiter and share nearly identical token sequences; HTML and Markdown both introduce nested markup. These structural similarities make correlated rather than independent embedding shifts likely. When shifts are correlated, averaging need not suppress format-specific variation and may instead reinforce shared biases. No analysis of pairwise embedding correlations or of whether gains survive after controlling for correlation is reported.

Authors: We agree that the serializations are not fully independent due to shared structural elements, which could induce correlated embedding shifts. Our approach treats them as multiple views of the same table semantics, and the centroid is intended to capture the common signal even if some correlations exist. To strengthen this, we will include in the revised paper an analysis of pairwise cosine similarities between embeddings from different serializations, as well as an ablation study that examines performance when averaging only over less correlated formats. This will demonstrate that the benefits of the centroid are not solely due to reinforcing shared biases. revision: yes
Referee: [Abstract] Abstract: The claim that 'centroid representations outrank individual formats in aggregate pairwise comparisons' is presented without naming the benchmarks, the number of tables/queries, the exact retrieval metric, baseline systems, or any statistical significance tests. This absence prevents verification of whether the reported superiority is robust or driven by particular table characteristics.

Authors: The abstract is necessarily concise, but we acknowledge that more specifics would aid quick assessment. The full manuscript details the benchmarks (including table retrieval datasets used), the number of tables and queries, the retrieval metrics (e.g., nDCG and Recall), comparisons to individual formats as baselines, and statistical significance where applicable in the experimental section. In the revision, we will expand the abstract slightly to name the primary benchmarks and metrics, ensuring the claim is better contextualized without exceeding length limits. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core modeling choice—treating serializations as noisy views whose centroid serves as a canonical target—is presented as an explicit assumption rather than a derived claim. The subsequent adapter training uses standard losses to align single-format embeddings to these precomputed centroids, with all reported gains coming from empirical comparisons across retrievers and benchmarks. No equations reduce the claimed robustness improvement to a fitted parameter or self-referential definition, no self-citations are invoked as load-bearing uniqueness results, and no ansatz is smuggled in via prior work. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about shared semantics across serializations and standard representation-learning practices; no free parameters or invented entities are introduced beyond the adapter itself.

axioms (1)

domain assumption Semantically equivalent serializations of a table produce embeddings that are noisy views of a shared semantic signal.
Explicitly stated as the premise for centroid averaging in the abstract.

pith-pipeline@v0.9.0 · 5546 in / 1029 out tokens · 63870 ms · 2026-05-08T03:55:24.165574+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

T a P as: Weakly Supervised Table Parsing via Pre-training

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.398. Jonathan Herzig, Thomas M¨uller, Syrine Krichene, and Julian Eisenschlos. Open Domain Question Answering over Tables via Dense Retrieval. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo...

work page doi:10.18653/v1/2020.acl-main.398 2020
[2]

TABBIE : Pretrained Representations of Tabular Data

Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.270. URLhttps://aclanthology.org/2021.naacl-main.270/. Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian-Guang Lou. TAPEX: Table pre-training via learning a neural SQL executor. InInternational Conference on Learning Representations, 2022. URLhttps://o...

work page doi:10.18653/v1/2021.naacl-main.270 2021
[3]

For each dataset, load cached embeddings for all available table representations r∈ R and align them by table ID, retaining only tables that have at least two available representations

work page
[4]

22 Preprint

Build a multi-view training set where each sample corresponds to one table and contains up to max views representation embeddings {ei} drawn from its aligned cached views. 22 Preprint. Under review

work page
[5]

4.ℓ 2-normalize the adapted embeddings: zi ← zi ∥zi∥2

For each minibatch, collect all view embeddings ei, their associated table IDs, and compute adapted embeddings zi =A θ(ei). 4.ℓ 2-normalize the adapted embeddings: zi ← zi ∥zi∥2

work page
[6]

For each table appearing in the minibatch, compute its batch centroid in the adapted space: c(T) = 1 |V(T)| ∑ i∈V(T) zi, whereV(T)denotes the views of tableTpresent in the minibatch

work page
[7]

Normalize the original frozen embeddings for identity preservation: ˆei = ei ∥ei∥2

work page
[8]

Optimize the adapter parameters using L=λ invLinv(z, table IDs) +λ varLvar(z) +λ covLcov(z) +λ idLid(z, ˆe), where Linv pulls views of the same table toward their within-batch centroid, Lvar enforces feature-wise variance above a threshold,Lcov penalizes feature redundancy, andL id preserves similarity to the frozen embedding space

work page
[9]

Inference:

Update θ with AdamW, apply gradient clipping, and periodically save checkpoints and training logs. Inference:

work page
[10]

Serialize each table once using a chosen base representation and compute its frozen embedding e=f(ser(T))

work page
[11]

Produce the adapted table vector z=A θ(e)

work page
[12]

23 Preprint

Index z in the vector database, while queries remain encoded by the frozen retriever. 23 Preprint. Under review. G.2 Hyperparameters Hyperparameter Value steps 20000 batch size 512 lr 3×10 −4 weight decay 1×10 −4 devicecuda if available, else cpu max viewslen(REPRESENTATIVE ORDER) lam inv 100.0 lam var 25.0 lam cov 1.0 lam id 100.0 id modecos gamma std 0....

work page 1921

[1] [1]

T a P as: Weakly Supervised Table Parsing via Pre-training

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.398. Jonathan Herzig, Thomas M¨uller, Syrine Krichene, and Julian Eisenschlos. Open Domain Question Answering over Tables via Dense Retrieval. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technolo...

work page doi:10.18653/v1/2020.acl-main.398 2020

[2] [2]

TABBIE : Pretrained Representations of Tabular Data

Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.270. URLhttps://aclanthology.org/2021.naacl-main.270/. Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian-Guang Lou. TAPEX: Table pre-training via learning a neural SQL executor. InInternational Conference on Learning Representations, 2022. URLhttps://o...

work page doi:10.18653/v1/2021.naacl-main.270 2021

[3] [3]

For each dataset, load cached embeddings for all available table representations r∈ R and align them by table ID, retaining only tables that have at least two available representations

work page

[4] [4]

22 Preprint

Build a multi-view training set where each sample corresponds to one table and contains up to max views representation embeddings {ei} drawn from its aligned cached views. 22 Preprint. Under review

work page

[5] [5]

4.ℓ 2-normalize the adapted embeddings: zi ← zi ∥zi∥2

For each minibatch, collect all view embeddings ei, their associated table IDs, and compute adapted embeddings zi =A θ(ei). 4.ℓ 2-normalize the adapted embeddings: zi ← zi ∥zi∥2

work page

[6] [6]

For each table appearing in the minibatch, compute its batch centroid in the adapted space: c(T) = 1 |V(T)| ∑ i∈V(T) zi, whereV(T)denotes the views of tableTpresent in the minibatch

work page

[7] [7]

Normalize the original frozen embeddings for identity preservation: ˆei = ei ∥ei∥2

work page

[8] [8]

Optimize the adapter parameters using L=λ invLinv(z, table IDs) +λ varLvar(z) +λ covLcov(z) +λ idLid(z, ˆe), where Linv pulls views of the same table toward their within-batch centroid, Lvar enforces feature-wise variance above a threshold,Lcov penalizes feature redundancy, andL id preserves similarity to the frozen embedding space

work page

[9] [9]

Inference:

Update θ with AdamW, apply gradient clipping, and periodically save checkpoints and training logs. Inference:

work page

[10] [10]

Serialize each table once using a chosen base representation and compute its frozen embedding e=f(ser(T))

work page

[11] [11]

Produce the adapted table vector z=A θ(e)

work page

[12] [12]

23 Preprint

Index z in the vector database, while queries remain encoded by the frozen retriever. 23 Preprint. Under review. G.2 Hyperparameters Hyperparameter Value steps 20000 batch size 512 lr 3×10 −4 weight decay 1×10 −4 devicecuda if available, else cpu max viewslen(REPRESENTATIVE ORDER) lam inv 100.0 lam var 25.0 lam cov 1.0 lam id 100.0 id modecos gamma std 0....

work page 1921