pith. sign in

arxiv: 2502.05564 · v2 · pith:YF5SB6PXnew · submitted 2025-02-08 · 💻 cs.LG · cs.AI

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

Pith reviewed 2026-05-20 13:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords tabular classificationin-context learningfoundation modelslarge-scale tabular dataattention mechanismssynthetic pretrainingTabPFN
0
0 comments X

The pith

TabICL scales in-context learning to tabular datasets with 500K rows via a two-stage attention design.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TabICL to extend in-context learning to much larger tabular classification problems than prior foundation models could manage. Previous approaches like TabPFNv2 become too slow on tables beyond roughly 10K samples because their combined row and column attentions scale poorly. TabICL instead uses a first stage of column-then-row attention to compress each row into a fixed-size vector, then runs a standard transformer on those vectors for in-context prediction. The model is pretrained once on synthetic tables up to 60K rows and can then process real tables with 500K rows on ordinary hardware. Benchmarks across 200 datasets show accuracy comparable to TabPFNv2 but with up to 10x speedups, and clear gains over both TabPFNv2 and CatBoost on the 53 largest datasets.

Core claim

TabICL is a tabular foundation model for classification, pretrained on synthetic datasets with up to 60K samples and capable of handling 500K samples on affordable resources. This is enabled by a novel two-stage architecture: a column-then-row attention mechanism to build fixed-dimensional embeddings of rows, followed by a transformer for efficient ICL. Across 200 classification datasets from the TALENT benchmark, TabICL is on par with TabPFNv2 while being systematically faster (up to 10 times), and significantly outperforms all other approaches. On 53 datasets with over 10K samples, TabICL surpasses both TabPFNv2 and CatBoost, demonstrating the potential of ICL for large data.

What carries the argument

A two-stage architecture that first applies column-then-row attention to produce fixed-dimensional row embeddings, then feeds those embeddings into a transformer for in-context learning.

If this is right

  • In-context learning becomes feasible for tabular classification tasks involving hundreds of thousands of rows without per-dataset retraining.
  • Inference on large tables can be performed up to ten times faster than with prior ICL models while maintaining accuracy.
  • Synthetic pretraining transfers effectively enough to deliver strong results on real data distributions with more than 10K samples.
  • Gradient-boosted trees can be challenged on large tabular problems by a single forward-pass foundation model.
  • The same two-stage compression idea could support scaling ICL to even bigger tables if further efficiency gains are added.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The architecture might be adapted to regression or multi-label tasks by changing only the final prediction head.
  • Similar factored attention patterns could be explored for other high-cardinality structured data where full self-attention is prohibitive.
  • One could measure how much the choice of synthetic data generator affects downstream performance on specific real domains.

Load-bearing premise

The column-then-row attention mechanism produces fixed-dimensional row embeddings that retain enough information for the subsequent transformer-based in-context learning to succeed on real large tables.

What would settle it

A direct accuracy comparison between TabICL and TabPFNv2 on multiple real-world tables each containing more than 100,000 rows; if TabICL falls below TabPFNv2 or strong gradient-boosted baselines, the claim that the embeddings preserve sufficient information would not hold.

read the original abstract

The long-standing dominance of gradient-boosted decision trees on tabular data is currently challenged by tabular foundation models using In-Context Learning (ICL): setting the training data as context for the test data and predicting in a single forward pass without parameter updates. While TabPFNv2 foundation model excels on tables with up to 10K samples, its alternating column- and row-wise attentions make handling large training sets computationally prohibitive. So, can ICL be effectively scaled and deliver a benefit for larger tables? We introduce TabICL, a tabular foundation model for classification, pretrained on synthetic datasets with up to 60K samples and capable of handling 500K samples on affordable resources. This is enabled by a novel two-stage architecture: a column-then-row attention mechanism to build fixed-dimensional embeddings of rows, followed by a transformer for efficient ICL. Across 200 classification datasets from the TALENT benchmark, TabICL is on par with TabPFNv2 while being systematically faster (up to 10 times), and significantly outperforms all other approaches. On 53 datasets with over 10K samples, TabICL surpasses both TabPFNv2 and CatBoost, demonstrating the potential of ICL for large data. Pretraining code, inference code, and pre-trained models are available at https://github.com/soda-inria/tabicl.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces TabICL, a tabular foundation model for classification that uses in-context learning on large tables. It proposes a two-stage architecture in which a column-then-row attention mechanism first produces fixed-dimensional row embeddings, after which a standard transformer performs ICL over those embeddings. The model is pretrained on synthetic datasets containing up to 60K samples and is claimed to handle inference on tables with 500K samples. On the TALENT benchmark the paper reports that TabICL matches TabPFNv2 across 200 classification datasets while being up to 10 times faster and outperforms both TabPFNv2 and CatBoost on the 53 datasets that exceed 10K samples.

Significance. If the performance claims hold under rigorous controls, the work would demonstrate that ICL-based tabular foundation models can be scaled beyond the 10K-sample regime that limits TabPFNv2, thereby offering a practical alternative to gradient-boosted trees on large tabular data. The public release of pretraining code, inference code, and pretrained weights is a clear strength that supports reproducibility.

major comments (2)
  1. [Architecture section] Architecture description (two-stage design): The central scaling claim—that the column-then-row attention produces fixed-dimensional row embeddings that retain sufficient statistical structure for downstream ICL on tables larger than 10K samples—is not isolated by any ablation. No experiment compares the full model against a variant that omits the column-attention stage, varies the embedding dimension, or substitutes a joint column-row attention baseline on the same large TALENT subsets; without such controls it is impossible to attribute the reported gains on the >10K-sample regime to the proposed compression step rather than to other factors.
  2. [Experiments / TALENT results] Experimental results, TALENT benchmark tables: The abstract and results section state that TabICL is “on par” with TabPFNv2 and “surpasses” it on the 53 large datasets, yet no error bars, number of random seeds, or statistical significance tests are reported for any of the 200 datasets. This omission is load-bearing for the claim that the two-stage model delivers a systematic advantage on large data.
minor comments (2)
  1. [§3] The notation for the column-then-row attention blocks is introduced without an explicit equation defining the output dimensionality of the row embeddings; adding a short equation would improve clarity.
  2. [Figure 2] Figure 2 (or equivalent architecture diagram) would benefit from explicit arrows or labels indicating the transition from column-attended features to the fixed-dimensional row embeddings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the strengths and areas for improvement in our work on TabICL. We respond to each major comment below and indicate the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Architecture section] Architecture description (two-stage design): The central scaling claim—that the column-then-row attention produces fixed-dimensional row embeddings that retain sufficient statistical structure for downstream ICL on tables larger than 10K samples—is not isolated by any ablation. No experiment compares the full model against a variant that omits the column-attention stage, varies the embedding dimension, or substitutes a joint column-row attention baseline on the same large TALENT subsets; without such controls it is impossible to attribute the reported gains on the >10K-sample regime to the proposed compression step rather than to other factors.

    Authors: We agree that additional ablations isolating the column-then-row attention on the large TALENT subsets would provide stronger evidence for its specific contribution to scaling. The manuscript emphasizes end-to-end comparisons against TabPFNv2 (which relies on alternating column-row attention but cannot scale beyond ~10K samples) and other baselines, showing clear gains on datasets >10K. We will add targeted ablations in the revision, including a no-column-attention variant evaluated on representative large subsets, to better attribute the benefits of the two-stage compression. revision: yes

  2. Referee: [Experiments / TALENT results] Experimental results, TALENT benchmark tables: The abstract and results section state that TabICL is “on par” with TabPFNv2 and “surpasses” it on the 53 large datasets, yet no error bars, number of random seeds, or statistical significance tests are reported for any of the 200 datasets. This omission is load-bearing for the claim that the two-stage model delivers a systematic advantage on large data.

    Authors: We acknowledge that reporting variability and statistical tests would increase the rigor of the performance claims. The TALENT results follow the benchmark's standard protocol with comparisons to published TabPFNv2 numbers and deterministic CatBoost runs. In the revised manuscript we will include standard deviations over multiple random seeds for the key large-dataset comparisons (the 53 datasets >10K samples) along with appropriate significance tests to substantiate the statements of parity and outperformance. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external empirical benchmarks

full rationale

The paper presents TabICL as a new architecture for scaling in-context learning to large tabular datasets and supports its claims through direct performance comparisons against TabPFNv2, CatBoost, and other baselines on the independent TALENT benchmark (200 classification datasets, with a 53-dataset subset >10K samples). No equations, fitted parameters, or self-citations are used to derive results that reduce to the model's own inputs by construction. The two-stage column-then-row attention is introduced as an engineering choice whose effectiveness is measured externally rather than assumed via internal redefinition or prior self-work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the transferability of synthetic pretraining to real tabular data and on the information-preserving properties of the two-stage attention; these are domain assumptions rather than derived results.

free parameters (1)
  • embedding dimension and attention hyperparameters
    Standard neural-network choices that are tuned during pretraining on synthetic data and affect the row-embedding quality.
axioms (1)
  • domain assumption Synthetic tabular datasets capture the statistical structure needed for generalization to real classification tasks
    The model is pretrained exclusively on synthetic data yet evaluated on real TALENT datasets.

pith-pipeline@v0.9.0 · 5787 in / 1457 out tokens · 69524 ms · 2026-05-20T13:28:46.864815+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 21 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TabArena: A Living Benchmark for Machine Learning on Tabular Data

    cs.LG 2025-06 conditional novelty 8.0

    TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small ...

  2. TabQL: In-Context Q-Learning with Tabular Foundation Models

    cs.LG 2026-05 unverdicted novelty 7.0

    TabQL is a reinforcement learning framework that substitutes a tabular foundation model with in-context capabilities for the parametric Q-network in DQN, with a warm-up phase and theoretical analysis claiming improved...

  3. MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

    cs.LG 2026-05 unverdicted novelty 7.0

    MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

  4. FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

    cs.LG 2026-05 unverdicted novelty 7.0

    FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.

  5. TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

    cs.LG 2026-05 unverdicted novelty 7.0

    TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to...

  6. Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models

    cs.LG 2026-04 unverdicted novelty 7.0

    TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.

  7. On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses

    cs.LG 2025-06 unverdicted novelty 7.0

    Tabular foundation models suffer from test-time adversarial vulnerabilities that degrade accuracy and enable transferable attacks, but incremental adversarial in-context learning improves robustness on multiple benchmarks.

  8. Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors

    stat.ML 2026-05 conditional novelty 6.0

    Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.

  9. TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models

    cs.LG 2026-05 unverdicted novelty 6.0

    TFM-Retouche is an input-space residual adapter that lifts TabICLv2 performance by 56 Elo points on 51 tabular datasets while remaining architecture-agnostic and computationally light.

  10. Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    DiffICL breaks the quality-privacy tradeoff in small-data tabular synthesis by using in-context learning on pretrained structural priors to generate data that is both higher quality and less memorizing of training samples.

  11. Prior-Aligned Data Cleaning for Tabular Foundation Models

    cs.LG 2026-04 unverdicted novelty 6.0

    L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.

  12. Tabular foundation models for in-context prediction of molecular properties

    cs.LG 2026-04 unverdicted novelty 6.0

    Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.

  13. From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning

    cs.LG 2026-04 unverdicted novelty 6.0

    Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family...

  14. xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

    cs.LG 2025-08 unverdicted novelty 6.0

    xRFM merges kernel-based feature learning with tree structures for scalable, interpretable tabular modeling and reports top performance on 100 regression and competitive results on 200 classification datasets versus 3...

  15. When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

    cs.AI 2026-05 unverdicted novelty 5.0

    The paper proposes Strategic Prior-data Fitted Network (SPN), an inference-time method that adapts pretrained tabular foundation models to strategic feature manipulation by constructing aligned in-context examples.

  16. TabH2O: A Unified Foundation Model for Tabular Prediction

    cs.LG 2026-05 unverdicted novelty 5.0

    TabH2O presents a unified tabular foundation model with dual-head architecture and single-stage pretraining that achieves an average rank of 2.55 on the TALENT benchmark, outperforming several established methods.

  17. Foundation Models for Credit Risk Prediction: A Game Changer?

    cs.LG 2026-05 unverdicted novelty 5.0

    Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.

  18. VIP-COP: Context Optimization for Tabular Foundation Models

    cs.LG 2026-05 unverdicted novelty 5.0

    VIP-COP is a black-box method that optimizes context for tabular foundation models by ranking and selecting high-value samples and features via online KernelSHAP regression, outperforming baselines on large high-dimen...

  19. Evaluating Tabular Representation Learning for Network Intrusion Detection

    cs.LG 2026-05 unverdicted novelty 5.0

    Tabular representation learning for network intrusion detection exhibits strong dataset-model dependency, with supervised methods outperforming unsupervised anomaly detection and limited but possible cross-dataset gen...

  20. Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models

    cs.LG 2026-05 unverdicted novelty 4.0

    Context construction strategies such as balanced sampling improve AUC-ROC by 3-4 points over uniform sampling in tabular foundation models for credit risk, exceeding differences between model families and matching cla...

  21. Challenges and opportunities for AI to help deliver fusion energy

    physics.plasm-ph 2026-03 unverdicted novelty 2.0

    AI offers opportunities to advance fusion energy R&D but requires responsible practices and expert collaborations to overcome its inherent challenges.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · cited by 20 Pith papers · 37 internal anchors

  1. [1]

    Tensorflow:

    Abadi, Mart. Tensorflow:. 12th \ \ \ \. 2016 , pages =

  2. [2]

    Acuna, David and Law, Marc T and Zhang, Guojun and Fidler, Sanja , year =. Domain. arXiv preprint arXiv:2202.05352 , eprint =

  3. [3]

    Agarwal, Rishabh and Melnick, Levi and Frosst, Nicholas and Zhang, Xuezhou and Lengerich, Ben and Caruana, Rich and Hinton, Geoffrey , year =. Neural. doi:10.48550/arXiv.2004.13912 , urldate =. arXiv , keywords =:2004.13912 , primaryclass =

  4. [4]

    Agarwal, Rishabh and Melnick, Levi and Frosst, Nicholas and Zhang, Xuezhou and Lengerich, Ben and Caruana, Rich and Hinton, Geoffrey , year =. Neural. arXiv , keywords =:2004.13912 , publisher =

  5. [5]

    2023 , month = nov, number =

    Transformers Learn to Implement Preconditioned Gradient Descent for In-Context Learning , author =. 2023 , month = nov, number =. doi:10.48550/arXiv.2306.00297 , urldate =. arXiv , keywords =:2306.00297 , primaryclass =

  6. [6]

    2021 , journal =

    Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization , author =. 2021 , journal =

  7. [7]

    , year =

    Alberty, Robert A. , year =. Use of. Pure and Applied Chemistry , volume =

  8. [8]

    2010 , journal =

    Permutation Importance: A Corrected Feature Importance Measure , author =. 2010 , journal =

  9. [9]

    1965 , journal =

    Iterative Procedures for Nonlinear Integral Equations , author =. 1965 , journal =

  10. [10]

    Andersson, Jan-Olof and Helander, Thomas and H. Thermo-. 2002 , journal =

  11. [11]

    Accelerating Reservoir Simulators Using

    Appleyard, John R and Appleyard, Jeremy D and Wakefield, Mark A and Desitter, Arnaud L , year =. Accelerating Reservoir Simulators Using

  12. [12]

    2002 , journal =

    Clustered Linear Regression , author =. 2002 , journal =

  13. [13]

    Arik, Sercan. Tabnet:. Proceedings of the. 2021 , volume =

  14. [14]

    Invariant Risk Minimization

    Invariant Risk Minimization , author =. 2019 , journal =. 1907.02893 , archiveprefix =

  15. [15]

    Ensemble of

    Arpit, Devansh and Wang, Huan and Zhou, Yingbo and Xiong, Caiming , year =. Ensemble of. doi:10.48550/arXiv.2110.10832 , urldate =. arXiv , keywords =:2110.10832 , primaryclass =

  16. [16]

    Ensemble of Averages:

    Arpit, Devansh and Wang, Huan and Zhou, Yingbo and Xiong, Caiming , year =. Ensemble of Averages:. Advances in Neural Information Processing Systems , volume =

  17. [17]

    Boosted Mixture of Experts:

    Avnimelech, Ran and Intrator, Nathan , year =. Boosted Mixture of Experts:. Neural computation , volume =

  18. [18]

    2015 , journal =

    Support Vector Regression , author =. 2015 , journal =

  19. [19]

    doi:10.48550/arXiv.2106.15147 , urldate =

    Bahri, Dara and Jiang, Heinrich and Tay, Yi and Metzler, Donald , year =. doi:10.48550/arXiv.2106.15147 , urldate =. arXiv , keywords =:2106.15147 , primaryclass =

  20. [20]

    Transformers as

    Bai, Yu and Chen, Fan and Wang, Huan and Xiong, Caiming and Mei, Song , year =. Transformers as. Advances in Neural Information Processing Systems , volume =

  21. [21]

    1982 , journal =

    Gibbs Energy Analysis of Phase Equilibria , author =. 1982 , journal =

  22. [22]

    Metareg:

    Balaji, Yogesh and Sankaranarayanan, Swami and Chellappa, Rama , year =. Metareg:. Advances in neural information processing systems , volume =

  23. [23]

    1998 , journal =

    Linear Discriminant Analysis-a Brief Tutorial , author =. 1998 , journal =

  24. [24]

    Round and round we go! what makes rotary positional encodings useful?, 2025

    Barbero, Federico and Vitvitskyi, Alex and Perivolaropoulos, Christos and Pascanu, Razvan and Veli. Round and. 2024 , month = oct, number =. arXiv , keywords =:2410.06205 , publisher =

  25. [25]

    , year =

    Barron, Jonathan T. , year =. A General and Adaptive Robust Loss Function , booktitle =

  26. [26]

    , year =

    Beazley, David M. , year =. Tcl/

  27. [27]

    Recognition in Terra Incognita , booktitle =

    Beery, Sara and Van Horn, Grant and Perona, Pietro , year =. Recognition in Terra Incognita , booktitle =

  28. [28]

    Mutual Information Neural Estimation , booktitle =

    Belghazi, Mohamed Ishmael and Baratin, Aristide and Rajeshwar, Sai and Ozair, Sherjil and Bengio, Yoshua and Courville, Aaron and Hjelm, Devon , year =. Mutual Information Neural Estimation , booktitle =

  29. [29]

    Comparison of Two Methods for Speeding up Flash Calculations in Compositional Simulations , booktitle =

    Belkadi, Abdelkrim and Yan, Wei and Michelsen, Michael L and Stenby, Erling H , year =. Comparison of Two Methods for Speeding up Flash Calculations in Compositional Simulations , booktitle =

  30. [30]

    2019 , journal =

    Reconciling Modern Machine-Learning Practice and the Classical Bias--Variance Trade-Off , author =. 2019 , journal =

  31. [31]

    2010 , journal =

    A Theory of Learning from Different Domains , author =. 2010 , journal =

  32. [32]

    Conditional Computation in Neural Networks for faster models

    Conditional Computation in Neural Networks for Faster Models , author =. 2015 , journal =. 1511.06297 , archiveprefix =

  33. [33]

    A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

    A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , author =. 2019 , journal =. 1901.10912 , archiveprefix =

  34. [34]

    Representation Learning:

    Bengio, Yoshua and Courville, Aaron and Vincent, Pascal , year =. Representation Learning:. IEEE transactions on pattern analysis and machine intelligence , volume =

  35. [35]

    2011 , journal =

    Algorithms for Hyper-Parameter Optimization , author =. 2011 , journal =

  36. [36]

    International Conference on Machine Learning , author =

    Making a Science of Model Search:. International Conference on Machine Learning , author =. 2013 , pages =

  37. [37]

    , author =

    Random Search for Hyper-Parameter Optimization. , author =. 2012 , journal =

  38. [38]

    2002 , journal =

    Bezanehtak, K and Combes,. 2002 , journal =

  39. [39]

    and Buhendwa, Aaron B

    Bezgin, Deniz A. and Buhendwa, Aaron B. and Adams, Nikolaus A. , year =. Computer Physics Communications , volume =

  40. [40]

    Understanding

    Bhattamishra, Satwik and Patel, Arkil and Blunsom, Phil and Kanade, Varun , year =. Understanding. doi:10.48550/arXiv.2310.03016 , urldate =. arXiv , keywords =:2310.03016 , primaryclass =

  41. [41]

    2021 , journal =

    Domain Generalization by Marginal Transfer Learning , author =. 2021 , journal =

  42. [42]

    and Vega, Lourdes F

    Blas, Felipe J. and Vega, Lourdes F. , year =. Prediction of Binary and Ternary Diagrams Using the Statistical Associating Fluid Theory (. Industrial & engineering chemistry research , volume =

  43. [43]

    and Rivest, Ronald L

    Blum, Avrim L. and Rivest, Ronald L. , year =. Training a 3-Node Neural Network Is. Neural Networks , volume =

  44. [44]

    On the Opportunities and Risks of Foundation Models

    Bommasani, Rishi and Hudson, Drew A. and Adeli, Ehsan and Altman, Russ and Arora, Simran and von Arx, Sydney and Bernstein, Michael S. and Bohg, Jeannette and Bosselut, Antoine and Brunskill, Emma and Finn, Chelsea and Gillespie, Lauren and Goel, Karan and Goodman, Noah and Grossman, Shelby and Guha, Neel and Hashimoto, Tatsunori and Henderson, Peter and ...

  45. [45]

    Revisiting

    Bonnier, Thomas , editor =. Revisiting. Findings of the. 2024 , month = aug, pages =

  46. [46]

    Borisov, Vadim and Leemann, Tobias and Se. Deep. 2024 , month = jun, journal =. doi:10.1109/TNNLS.2022.3229161 , urldate =. arXiv , keywords =:2110.01889 , primaryclass =

  47. [47]

    Bradbury, James and Frostig, Roy and Hawkins, Peter and Johnson, Matthew James and Leary, Chris and Maclaurin, Dougal and Necula, George and Paszke, Adam and VanderPlas, Jake and

  48. [48]

    Hypernetworks for

    Brahma, Dhanajit and Verma, Vinay Kumar and Rai, Piyush , year =. Hypernetworks for. arXiv preprint arXiv:2110.01856 , eprint =

  49. [49]

    den Breejen, Felix and Bae, Sangmin and Cha, Stephen and Yun, Se-Young , year =. Why. doi:10.48550/arXiv.2405.13396 , urldate =. arXiv , keywords =:2405.13396 , primaryclass =

  50. [50]

    2017 , publisher =

    Classification and Regression Trees , author =. 2017 , publisher =

  51. [51]

    2001 , journal =

    Random Forests , author =. 2001 , journal =

  52. [52]

    SMASH: One-Shot Model Architecture Search through HyperNetworks

    Smash: One-Shot Model Architecture Search through Hypernetworks , author =. 2017 , journal =. 1708.05344 , archiveprefix =

  53. [53]

    Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and. Language. 2020 , month = jul, number =. doi:10.48550/arXiv.2005.14165 , urldate =. arXiv , keywords =:2005.14165 , primaryclass =

  54. [54]

    and D'Innocente, Antonio and Bucci, Silvia and Caputo, Barbara and Tommasi, Tatiana , year =

    Carlucci, Fabio M. and D'Innocente, Antonio and Bucci, Silvia and Caputo, Barbara and Tommasi, Tatiana , year =. Domain Generalization by Solving Jigsaw Puzzles , booktitle =

  55. [55]

    Adversarial

    Cartella, Francesco and Anunciacao, Orlando and Funabiki, Yuki and Yamaguchi, Daisuke and Akishita, Toru and Elshocht, Olivier , year =. Adversarial. doi:10.48550/arXiv.2101.08030 , urldate =. arXiv , keywords =:2101.08030 , primaryclass =

  56. [56]

    1997 , journal =

    Multitask Learning , author =. 1997 , journal =

  57. [57]

    Caruso, Camillo Maria and Soda, Paolo and Guarrasi, Valerio , year =. Not. arXiv , keywords =:2407.11540 , primaryclass =

  58. [58]

    Domain Generalization by Mutual-Information Regularization with Pre-Trained Models , booktitle =

    Cha, Junbum and Lee, Kyungjae and Park, Sungrae and Chun, Sanghyuk , year =. Domain Generalization by Mutual-Information Regularization with Pre-Trained Models , booktitle =

  59. [59]

    1974 , journal =

    An Algorithm for High-Speed Curve Generation , author =. 1974 , journal =

  60. [60]

    Chan, Stephanie and Santoro, Adam and Lampinen, Andrew and Wang, Jane and Singh, Aaditya and Richemond, Pierre and McClelland, James and Hill, Felix , year =. Data. Advances in Neural Information Processing Systems , volume =

  61. [61]

    Proceedings of the

    Chang, Qing and Peng, Junran and Xie, Lingxi and Sun, Jiajun and Yin, Haoran and Tian, Qi and Zhang, Zhaoxiang , year =. Proceedings of the

  62. [62]

    A New Apparatus for the Determination of

    Chang, Chiehming J and Chiu, Kou-Lung and Day, Chang-Yih , year =. A New Apparatus for the Determination of. The Journal of supercritical fluids , volume =

  63. [63]

    Principled Weight Initialization for Hypernetworks , booktitle =

    Chang, Oscar and Flokas, Lampros and Lipson, Hod , year =. Principled Weight Initialization for Hypernetworks , booktitle =

  64. [64]

    2000 , journal =

    Vicinal Risk Minimization , author =. 2000 , journal =

  65. [65]

    1990 , journal =

    New Reference Equation of State for Associating Liquids , author =. 1990 , journal =

  66. [66]

    Fluid Phase Equilibria , volume =

    Chapman, Walter G and Gubbins, Keith E and Jackson, George and Radosz, Maciej , year =. Fluid Phase Equilibria , volume =

  67. [67]

    doi:10.48550/arXiv.2102.08604 , urldate =

    Cha, Junbum and Chun, Sanghyuk and Lee, Kyungjae and Cho, Han-Cheol and Park, Seunghyun and Lee, Yunsung and Park, Sungrae , year =. doi:10.48550/arXiv.2102.08604 , urldate =. arXiv , keywords =:2102.08604 , primaryclass =

  68. [68]

    and Bowyer, Kevin W

    Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip , year =. Journal of artificial intelligence research , volume =

  69. [69]

    Compound

    Chen, Chaoqi and Li, Jiongcheng and Han, Xiaoguang and Liu, Xiaoqing and Yu, Yizhou , year =. Compound. Proceedings of the

  70. [70]

    Chen, Jintai and Lin, Zhen and Chen, Qiyuan and Sun, Jimeng , year =. Cross-. arXiv.org , urldate =

  71. [71]

    doi:10.48550/arXiv.2301.02819 , urldate =

    Chen, Jintai and Yan, Jiahuan and Chen, Qiyuan and Chen, Danny Ziyi and Wu, Jian and Sun, Jimeng , year =. doi:10.48550/arXiv.2301.02819 , urldate =. arXiv , keywords =:2301.02819 , primaryclass =

  72. [72]

    Exploring Simple Siamese Representation Learning , booktitle =

    Chen, Xinlei and He, Kaiming , year =. Exploring Simple Siamese Representation Learning , booktitle =

  73. [73]

    Extending Context Window of Large Language Models via Positional Interpolation

    Chen, Shouyuan and Wong, Sherman and Chen, Liangjian and Tian, Yuandong , year =. Extending. doi:10.48550/arXiv.2306.15595 , urldate =. arXiv , keywords =:2306.15595 , primaryclass =

  74. [74]

    Arithmetic

    Cheng, Yi and Hu, Renjun and Ying, Haochao and Shi, Xing and Wu, Jian and Lin, Wei , year =. Arithmetic. doi:10.48550/arXiv.2402.02334 , urldate =. arXiv , keywords =:2402.02334 , primaryclass =

  75. [75]

    International Conference on Machine Learning , author =

    Club:. International Conference on Machine Learning , author =. 2020 , pages =

  76. [76]

    2004 , journal =

    Determining the Equilibrium Partitioning Coefficients of Volatile Organic Compounds at an Air--Water Interface , author =. 2004 , journal =

  77. [77]

    Chen, Zhangxin and Liu, Hui and Yu, Song and Hsieh, Ben and Shao, Lei , year =. Domain

  78. [78]

    2019 , journal =

    Regression Clustering for Improved Accuracy and Training Costs with Molecular-Orbital-Based Machine Learning , author =. 2019 , journal =

  79. [79]

    Wide & Deep Learning for Recommender Systems , booktitle =

    Cheng, Heng-Tze and Koc, Levent and Harmsen, Jeremiah and Shaked, Tal and Chandra, Tushar and Aradhye, Hrishi and Anderson, Glen and Corrado, Greg and Chai, Wei and Ispir, Mustafa and others , year =. Wide & Deep Learning for Recommender Systems , booktitle =

  80. [80]

    Advances in Neural Information Processing Systems , volume =

    Chen, Pei and Sarkar, Soumajyoti and Lausen, Leonard and Srinivasan, Balasubramaniam and Zha, Sheng and Huang, Ruihong and Karypis, George , year =. Advances in Neural Information Processing Systems , volume =

Showing first 80 references.