FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data
Pith reviewed 2026-05-21 10:22 UTC · model grok-4.3
The pith
FEAT replaces quadratic attention with linear dual-axis encoding for structured data while preserving permutation invariance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FEAT replaces quadratic attention with a multi-layer dual-axis encoding architecture integrating an adaptive-fusion bidirectional state-space model with convolutional gated linear attention, enabling cross-tuple contextualization in O(N) time while supporting permutation-invariant representation learning. It further adopts a hybrid structural causal pre-training pipeline with a robust reconstruction objective to improve robustness under real-world data skewness, consistently outperforming representative SFMs on zero-shot tasks across 12 real-world database benchmarks and scaling linearly with sample length for up to 50x faster inference.
What carries the argument
The multi-layer dual-axis encoding architecture that fuses an adaptive-fusion bidirectional state-space model (AFBM) and convolutional gated linear attention (Conv-GLA) to perform linear-time cross-tuple contextualization while enforcing permutation invariance.
If this is right
- Joint processing of substantially more tuples from enterprise databases becomes computationally feasible.
- Representation quality on heterogeneous real-world data improves through the hybrid pre-training objective.
- Inference latency drops by up to 50 times relative to prior structured-data foundation models.
- Zero-shot performance on database analysis tasks exceeds that of existing quadratic and linear baselines.
Where Pith is reading between the lines
- The dual-axis pattern could transfer to other unordered data types such as sets or graphs in different application domains.
- Linear scaling may enable training directly on full-scale collections rather than on sampled subsets of records.
- Similar fusion strategies might combine with additional efficient sequence layers to extend coverage to even broader data regimes.
Load-bearing premise
That the specific combination of adaptive-fusion bidirectional state-space modeling and convolutional gated linear attention truly preserves permutation invariance and avoids introducing any artificial order bias when applied to structured data tuples.
What would settle it
Direct measurement showing that random permutation of input tuple order changes the model's embeddings or downstream task accuracy, or that measured runtime grows faster than linearly with increasing numbers of tuples.
Figures
read the original abstract
Structured data is widely used in domains such as healthcare, finance, and scientific data management. Recent studies on structured data foundation models (SFMs) aim to support data analysis and mining tasks over such data, but still face scalability and generalization challenges when applied to real-world enterprise databases. First, many SFMs rely on full self-attention, which introduces an O(N^2) computational bottleneck and limits the number of tuples that can be processed jointly. Second, directly replacing attention with linear-complexity sequence models may conflict with the permutation-invariant nature of structured data, introducing artificial order bias and degrading representation quality. Moreover, models trained only on synthetic data may struggle to generalize to the heavy-tailed and heterogeneous distributions commonly found in real-world databases. To address these challenges, we propose FEAT, a linear-complexity foundation model for extremely large structured data. FEAT replaces quadratic attention with a multi-layer dual-axis encoding architecture. It integrates an adaptive-fusion bidirectional state-space model (AFBM) with convolutional gated linear attention (Conv-GLA), enabling cross-tuple contextualization in O(N) time while supporting permutation-invariant representation learning. To improve robustness under real-world data skewness, FEAT further adopts a hybrid structural causal pre-training pipeline with a robust reconstruction objective. Experiments on 12 real-world database benchmarks show that FEAT consistently outperforms representative SFMs on zero-shot tasks and scales linearly with structured-data sample length, achieving up to 50x faster inference latency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FEAT, a linear-complexity foundation model for extremely large structured data. It replaces quadratic self-attention with a multi-layer dual-axis encoding architecture integrating an adaptive-fusion bidirectional state-space model (AFBM) and convolutional gated linear attention (Conv-GLA) to achieve O(N) cross-tuple contextualization while supporting permutation-invariant representation learning. A hybrid structural causal pre-training pipeline with robust reconstruction objective is used to handle real-world data skewness. Experiments on 12 real-world database benchmarks reportedly show consistent outperformance over representative SFMs on zero-shot tasks, linear scaling with sample length, and up to 50x faster inference.
Significance. If the architectural claims on exact permutation invariance and the empirical results with proper controls hold, the work would represent a meaningful advance in scalable structured data foundation models by mitigating the O(N^2) bottleneck of attention while addressing generalization to heterogeneous real-world distributions. The hybrid pre-training approach for skewness robustness could be a useful contribution if validated through ablations.
major comments (2)
- [Abstract / Architecture description] Abstract and architecture description: the central claim that the AFBM + Conv-GLA integration in the dual-axis encoding delivers exact permutation invariance for unordered tuples lacks an explicit symmetrization mechanism (e.g., order-agnostic pooling or averaging over permutations). Bidirectional SSMs are sequential and convolutional operators impose local ordering; without a concrete construction that cancels these dependencies while preserving O(N) scaling, residual order bias remains a risk to the invariance assertion.
- [Experiments] Experimental evaluation: the reported consistent outperformance on 12 benchmarks and linear scaling are presented without details on data splits, error bars, statistical significance, ablation studies isolating AFBM vs. Conv-GLA contributions, or controls for baseline implementations. This absence leaves the empirical support for the central claims unverifiable from the given text.
minor comments (2)
- [Method] Clarify the precise definition and fusion rule for AFBM and Conv-GLA in the dual-axis layers to allow reproduction.
- [Experiments] Add a table or figure summarizing the 12 benchmarks with key statistics (e.g., tuple counts, feature heterogeneity) to contextualize the results.
Simulated Author's Rebuttal
We thank the referee for their thorough and constructive review of our manuscript. We address each of the major comments in detail below and have made revisions to the manuscript to incorporate the suggested improvements where applicable.
read point-by-point responses
-
Referee: [Abstract / Architecture description] Abstract and architecture description: the central claim that the AFBM + Conv-GLA integration in the dual-axis encoding delivers exact permutation invariance for unordered tuples lacks an explicit symmetrization mechanism (e.g., order-agnostic pooling or averaging over permutations). Bidirectional SSMs are sequential and convolutional operators impose local ordering; without a concrete construction that cancels these dependencies while preserving O(N) scaling, residual order bias remains a risk to the invariance assertion.
Authors: We thank the referee for this comment. To strengthen the presentation of our permutation invariance claim, we will revise the abstract and architecture section to explicitly describe the symmetrization mechanism employed. Specifically, we will detail how the dual-axis encoding uses order-independent operations, including symmetric fusion in AFBM and translation-equivariant but globally pooled Conv-GLA, ensuring no residual order bias. We will also include a formal proof in the supplementary material that the architecture is exactly permutation-invariant while maintaining linear complexity. revision: yes
-
Referee: [Experiments] Experimental evaluation: the reported consistent outperformance on 12 benchmarks and linear scaling are presented without details on data splits, error bars, statistical significance, ablation studies isolating AFBM vs. Conv-GLA contributions, or controls for baseline implementations. This absence leaves the empirical support for the central claims unverifiable from the given text.
Authors: We agree that the experimental section requires more rigorous documentation to allow for full verification of the results. In the revised version of the manuscript, we will expand Section 4 to include: detailed descriptions of the data splits used for each of the 12 real-world benchmarks; error bars computed as standard deviations over 5 independent runs with different random seeds; results of statistical significance tests (e.g., Wilcoxon signed-rank tests) against the baseline models; comprehensive ablation studies that isolate the impact of the AFBM component versus the Conv-GLA component; and explicit controls and implementation details for all baseline SFMs to ensure fair comparison. These additions will significantly enhance the reproducibility and credibility of our empirical findings. revision: yes
Circularity Check
No circularity detected in architecture proposal
full rationale
The paper proposes FEAT as a novel multi-layer dual-axis encoding architecture that integrates AFBM and Conv-GLA to achieve O(N) cross-tuple contextualization while claiming permutation invariance. No equations, derivations, or self-citations are exhibited in the abstract or described claims that reduce the performance assertions or invariance property to fitted parameters, prior self-referential results, or inputs by construction. The central contribution is presented as an original construction with external benchmark validation on 12 real-world databases, making the derivation chain self-contained rather than circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The proposed AFBM and Conv-GLA integration supports permutation-invariant representation learning for structured data.
invented entities (2)
-
AFBM (adaptive-fusion bidirectional state-space model)
no independent evidence
-
Conv-GLA (convolutional gated linear attention)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C Bayan Bruss, and Tom Goldstein. SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,
-
[2]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–1901,
work page 1901
-
[4]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. LLaMA 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, et al. Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025a. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polos...
-
[6]
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164,
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Alex Labach, Hamidreza Kamkari, Jesse C Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L Caterini, and Maksims V olkovs. TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164,
-
[8]
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, et al. TabPFN-2.5: Advancing the state of the art in tabular foundation models.arXiv preprint arXiv:2511.08667,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
State-space models for tabular prior-data fitted networks.arXiv preprint arXiv:2510.14573,
Felix Koch, Marcel Wever, Fabian Raisch, and Benjamin Tischler. State-space models for tabular prior-data fitted networks.arXiv preprint arXiv:2510.14573,
-
[10]
Xiyuan Zhang, Danielle C Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W Mahoney, et al. Mitra: Mixed synthetic priors for enhancing tabular foundation models.arXiv preprint arXiv:2510.21204, 2025b. Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea G...
-
[11]
Gaussian Error Linear Units (GELUs)
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (GELUs).arXiv preprint arXiv:1606.08415,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Anurag Garg, Muhammad Ali, Noah Hollmann, Lennart Purucker, Samuel Müller, and Frank Hutter. Real-tabpfn: Im- proving tabular foundation models via continued pre-training with real-world data.arXiv preprint arXiv:2507.03971,
-
[13]
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
Nick Erickson, Jonas Mueller, Alexander Shirkov, Alexander Smola, and Hang Wang. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505,
work page internal anchor Pith review Pith/arXiv arXiv 2003
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.