Bounded Context Management for Tabular Foundation Models on Stream Learning

Doyun Choi; Jaemin Yoo; Jinmo Lee; Moongi Choi

arxiv: 2606.18677 · v1 · pith:EFANGDWXnew · submitted 2026-06-17 · 💻 cs.LG · cs.AI

Bounded Context Management for Tabular Foundation Models on Stream Learning

Jinmo Lee , Doyun Choi , Moongi Choi , Jaemin Yoo This is my paper

Pith reviewed 2026-06-26 21:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular stream learningcontext managementtabular foundation modelsin-context learninguncertainty estimationredundancy evictiondistribution shift

0 comments

The pith

Tabular foundation models outperform classical stream learners by managing context through uncertainty-aware admission and redundancy-aware eviction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that tabular stream learning under distribution shift is best addressed by managing the labeled context window for in-context predictions in foundation models rather than by updating model parameters. A future information view is used to derive three requirements for effective management: preserve recent examples, retain uncertain examples, and remove redundant examples. These requirements are implemented in the CURE policy via entropy-gated admission of new points and redundancy-aware eviction of existing ones. Experiments across seven streams show CURE delivers up to 27% relative gains over classical methods, stays robust to different foundation model backbones, and ranks highest among tested policies.

Core claim

The future information view identifies preserving recent examples, retaining uncertain examples, and removing redundant examples as the necessary and sufficient conditions for bounded context management in tabular foundation models. These conditions are realized as CURE through entropy-gated admission and redundancy-aware eviction, producing up to 27.0% relative improvement over classical stream learners on seven streams while remaining robust across multiple TFM backbones and ranking first among policy variants.

What carries the argument

CURE, the context management policy that applies entropy-gated admission and redundancy-aware eviction to enforce the three requirements derived from the future information view.

If this is right

CURE achieves up to 27.0% relative improvement over classical stream learners across seven streams.
The approach remains robust when swapping among multiple TFM backbones.
CURE ranks first among the policy variants evaluated on the streams.
Context management replaces internal model-state updates as the primary adaptation mechanism for tabular foundation models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three requirements could be tested as a design template for context policies in non-tabular foundation models that perform sequential prediction.
Dynamic context sizing that explicitly tracks uncertainty and redundancy may reduce the need for fixed window lengths in other streaming settings.
The separation of admission and eviction rules offers a modular way to combine different uncertainty estimators with different redundancy metrics.

Load-bearing premise

The future information view correctly identifies preserving recent examples, retaining uncertain examples, and removing redundant examples as the necessary and sufficient conditions for effective context management in tabular foundation models.

What would settle it

A stream where a policy that violates at least one of the three requirements (recent preservation, uncertainty retention, or redundancy removal) matches or exceeds CURE accuracy on the same seven datasets would falsify the necessity of those conditions.

Figures

Figures reproduced from arXiv: 2606.18677 by Doyun Choi, Jaemin Yoo, Jinmo Lee, Moongi Choi.

**Figure 1.** Figure 1: Overview of CURE. A new labeled example zt = (xt, yt) first enters the short bank St to preserve recent support. When St overflows, the oldest item z + becomes a long-bank candidate and is admitted according to its stored prediction-time entropy. When the long bank Lt exceeds its budget, CURE removes a locally redundant same-class example. where St is a FIFO short bank and Lt is a long bank. The short bank… view at source ↗

**Figure 2.** Figure 2: Prequential accuracy trajectories of CURE and MOA baselines on NOAA [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Prequential accuracy trajectories of CURE and MOA baselines on METER. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Prequential accuracy trajectories of CURE and MOA baselines on RIALTO [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Prequential accuracy trajectories of CURE and MOA baselines on POSTURE-No8 [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Prequential accuracy trajectories of CURE and MOA baselines on POKER. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Prequential accuracy trajectories of CURE and MOA baselines on NOMAO [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Prequential accuracy trajectories of CURE and MOA baselines on AGR(A). 17 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Tabular stream learning requires predictions on sequentially arriving examples under distribution shift. While standard methods adapt by updating model states, tabular foundation models (TFMs) make predictions conditioned on a labeled context in an in-context manner, making them a natural alternative for stream learning. This shifts the challenge from how to update the model to how to manage the context. We propose a future information view that yields three practical requirements for context management: preserve recent examples, retain uncertain examples, and remove redundant examples. We instantiate these requirements as CURE (Context management via Uncertainty-aware admission and Redundancy aware Eviction), a context-managing policy with entropy-gated admission and redundancy-aware eviction. Across seven streams, CURE shows up to 27.0% relative improvement over classical stream learners, remains robust across multiple TFM backbones, and ranks first among other policy variants. Code and datasets are available at https://github.com/morcellinus/CURE-ICML-FMSD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CURE is a practical new policy for TFM context management in streams that delivers measurable gains, but its future information view rests on informal reasoning rather than derivation.

read the letter

The paper introduces CURE, a context policy that uses entropy-gated admission to keep uncertain examples and redundancy-aware eviction to drop duplicates, all while preserving recent ones. On seven streams it reports up to 27% relative improvement over classical stream learners, stays consistent across several TFM backbones, and beats the other policy variants they tried. The code and datasets are public, which is the strongest part of the submission.

The experiments look like the main value here. Anyone working on tabular data that arrives sequentially and wants to avoid full retraining will see a concrete option they can test. The robustness across backbones is useful to note.

The weaker part is the framing. The authors present a future information view that is said to produce exactly three requirements as necessary and sufficient, then map those directly to CURE. No derivation, reduction, or minimality argument is supplied; it stays at the level of reasonable intuition. If the gains trace more to the specific heuristics than to the view, the conceptual contribution shrinks. The abstract also leaves out statistical testing details and baseline implementation notes, so those need checking in the full text.

This belongs in a conference track on applied stream learning or foundation-model adaptations rather than a theory venue. Readers who need working methods for non-stationary tabular prediction will get something usable from it. It has enough concrete results and reproducibility to merit referee time even with the light theoretical grounding.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CURE, a context-management policy for tabular foundation models (TFMs) performing stream learning under distribution shift. It introduces a 'future information view' asserted to yield three requirements—preserve recent examples, retain uncertain examples, remove redundant examples—as necessary and sufficient conditions; these are instantiated via entropy-gated admission and redundancy-aware eviction. On seven streams the policy reports up to 27% relative improvement over classical stream learners, robustness across multiple TFM backbones, and first rank among policy variants. Code and datasets are released.

Significance. If the empirical ranking holds after verification of baselines and statistical testing, the work supplies a practical, bounded-context strategy that shifts the adaptation burden from model updates to context curation for in-context TFMs. The open release of code strengthens reproducibility and enables direct follow-up.

major comments (2)

[§3] §3 (Future information view): the manuscript states that the view 'yields' the three requirements as necessary and sufficient, yet supplies no derivation, reduction, or minimality argument establishing that these three conditions are jointly required and sufficient for effective TFM context management. Without such grounding, the attribution of the reported gains to the proposed framework (rather than to the specific entropy and redundancy heuristics) remains open.
[Experimental section] Experimental section (results on seven streams): the abstract and summary claim 'consistent gains' and 'robustness across backbones,' but the provided text does not report statistical significance tests, variance across random seeds, or explicit baseline implementation details (e.g., hyper-parameter search budgets for classical stream learners). These omissions are load-bearing for the central empirical claim of up to 27% relative improvement.

minor comments (2)

Notation for entropy threshold and redundancy metric should be defined once in a single location rather than re-introduced in multiple subsections.
Figure captions for the policy-variant comparison should explicitly state the number of runs and error bars used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and commit to revisions that strengthen the manuscript's rigor.

read point-by-point responses

Referee: [§3] §3 (Future information view): the manuscript states that the view 'yields' the three requirements as necessary and sufficient, yet supplies no derivation, reduction, or minimality argument establishing that these three conditions are jointly required and sufficient for effective TFM context management. Without such grounding, the attribution of the reported gains to the proposed framework (rather than to the specific entropy and redundancy heuristics) remains open.

Authors: We agree that the current text presents the three requirements as following from the future information view without an explicit derivation or minimality argument. In the revision we will expand §3 with a step-by-step reasoning that derives the necessity and sufficiency of the three conditions directly from the view, including a brief minimality discussion. This will clarify the link between the framework and the observed gains versus the specific heuristics. revision: yes
Referee: Experimental section (results on seven streams): the abstract and summary claim 'consistent gains' and 'robustness across backbones,' but the provided text does not report statistical significance tests, variance across random seeds, or explicit baseline implementation details (e.g., hyper-parameter search budgets for classical stream learners). These omissions are load-bearing for the central empirical claim of up to 27% relative improvement.

Authors: We concur that statistical tests, seed-wise variance, and baseline implementation details are required to substantiate the central claims. In the revised version we will report performance averaged over multiple random seeds with standard deviations, include statistical significance tests (e.g., paired t-tests), and provide explicit hyper-parameter search budgets and procedures for all classical stream learners. These additions will be placed in the experimental section and supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external baselines

full rationale

The paper introduces a 'future information view' that informally motivates three requirements for context management, then instantiates them as the CURE policy and reports relative improvements (up to 27%) against classical stream learners across seven streams. No equations, fitted parameters, or self-citations are described that would make any reported prediction reduce to the inputs by construction. The central result is an empirical ranking that can be independently replicated or falsified on the released code and datasets; the mapping from view to requirements is conceptual rather than a self-referential derivation. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that the future-information view supplies the correct three requirements; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The future information view correctly identifies preserve-recent, retain-uncertain, and remove-redundant as the three practical requirements for context management.
This premise is invoked to derive the design of CURE.

pith-pipeline@v0.9.1-grok · 5702 in / 1161 out tokens · 29741 ms · 2026-06-26T21:42:26.518045+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 9 canonical work pages · 3 internal anchors

[1]

Mining associ- ation rules between sets of items in large databases

Agrawal, R., Imieli´nski, T., and Swami, A. Mining associ- ation rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216,

1993
[2]

de Barros, R. S. M., de Carvalho Santos, S. G. T., and J´unior, P. M. G. A boosting-like online learning ensemble. In 2016 international joint conference on neural networks (IJCNN), pp. 1871–1878. IEEE,

2016
[3]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Grinsztajn, L., Fl¨oge, K., Key, O., Birkel, F., Jund, P., Roof, B., J ¨ager, B., Safaric, D., Alessi, S., Hayler, A., et al. Tabpfn-2.5: Advancing the state of the art in tabular foun- dation models.arXiv preprint arXiv:2511.08667,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Hollmann, N., M ¨uller, S., Eggensperger, K., and Hut- ter, F. Tabpfn: A transformer that solves small tabu- lar classification problems in a second.arXiv preprint arXiv:2207.01848,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

P., and Marreiros, G

Lourenc ¸o, A., Gama, J., Xing, E. P., and Marreiros, G. Bridging streaming continual learning via in-context large tabular models.arXiv preprint arXiv:2512.11668,

work page arXiv
[6]

C., Golestan, K., Yu, G., Caterini, A

Ma, J., Thomas, V ., Hosseinzadeh, R., Labach, A., Kamkari, H., Cresswell, J. C., Golestan, K., Yu, G., Caterini, A. L., and V olkovs, M. Tabdpt: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164,

work page arXiv
[7]

I., and Salehi, M

Manapragada, C., Webb, G. I., and Salehi, M. Extremely fast decision tree. InProceedings of the 24th ACM SIGKDD international conference on knowledge discov- ery & data mining, pp. 1953–1962,

1953
[8]

P., Grabocka, J., and Hutter, F

M¨uller, S., Hollmann, N., Arango, S. P., Grabocka, J., and Hutter, F. Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510,

work page arXiv
[9]

Qu, J., Holzmuller, D., Varoquaux, G., and Morvan, M. L. Tabicl: A tabular foundation model for in-context learn- ing on large data.arXiv preprint arXiv:2502.05564,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

5 Bounded Context Management for Tabular Foundation Models on Stream Learning Qu, J., Holzm¨uller, D., Varoquaux, G., and Morvan, M. L. Tabiclv2: A better, faster, scalable, and open tabular foun- dation model.arXiv preprint arXiv:2602.11139,

work page arXiv
[11]

Limix: Unleashing structured-data modeling capability for generalist intelligence

Zhang, X., Ren, G., Yu, H., Yuan, H., Wang, H., Li, J., Wu, J., Mo, L., Mao, L., Hao, M., et al. Limix: Unleashing structured-data modeling capability for generalist intelli- gence.arXiv preprint arXiv:2509.03505,

work page arXiv
[12]

The near-future feature distribution is P + t,X = 1 h X s∈H+ t Ps,X ,H + t ={t+ 1,

We fix a stream step t, a current context Dt, and a newly observed candidatez t = (xt, yt). The near-future feature distribution is P + t,X = 1 h X s∈H+ t Ps,X ,H + t ={t+ 1, . . . , t+h}. For a future feature value x′, let Yx′ denote its label random variable. For the current candidate feature xt, we use Yxt to denote the label random variable before its...

work page arXiv 2000

[1] [1]

Mining associ- ation rules between sets of items in large databases

Agrawal, R., Imieli´nski, T., and Swami, A. Mining associ- ation rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216,

1993

[2] [2]

de Barros, R. S. M., de Carvalho Santos, S. G. T., and J´unior, P. M. G. A boosting-like online learning ensemble. In 2016 international joint conference on neural networks (IJCNN), pp. 1871–1878. IEEE,

2016

[3] [3]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Grinsztajn, L., Fl¨oge, K., Key, O., Birkel, F., Jund, P., Roof, B., J ¨ager, B., Safaric, D., Alessi, S., Hayler, A., et al. Tabpfn-2.5: Advancing the state of the art in tabular foun- dation models.arXiv preprint arXiv:2511.08667,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Hollmann, N., M ¨uller, S., Eggensperger, K., and Hut- ter, F. Tabpfn: A transformer that solves small tabu- lar classification problems in a second.arXiv preprint arXiv:2207.01848,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

P., and Marreiros, G

Lourenc ¸o, A., Gama, J., Xing, E. P., and Marreiros, G. Bridging streaming continual learning via in-context large tabular models.arXiv preprint arXiv:2512.11668,

work page arXiv

[6] [6]

C., Golestan, K., Yu, G., Caterini, A

Ma, J., Thomas, V ., Hosseinzadeh, R., Labach, A., Kamkari, H., Cresswell, J. C., Golestan, K., Yu, G., Caterini, A. L., and V olkovs, M. Tabdpt: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164,

work page arXiv

[7] [7]

I., and Salehi, M

Manapragada, C., Webb, G. I., and Salehi, M. Extremely fast decision tree. InProceedings of the 24th ACM SIGKDD international conference on knowledge discov- ery & data mining, pp. 1953–1962,

1953

[8] [8]

P., Grabocka, J., and Hutter, F

M¨uller, S., Hollmann, N., Arango, S. P., Grabocka, J., and Hutter, F. Transformers can do bayesian inference.arXiv preprint arXiv:2112.10510,

work page arXiv

[9] [9]

Qu, J., Holzmuller, D., Varoquaux, G., and Morvan, M. L. Tabicl: A tabular foundation model for in-context learn- ing on large data.arXiv preprint arXiv:2502.05564,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

5 Bounded Context Management for Tabular Foundation Models on Stream Learning Qu, J., Holzm¨uller, D., Varoquaux, G., and Morvan, M. L. Tabiclv2: A better, faster, scalable, and open tabular foun- dation model.arXiv preprint arXiv:2602.11139,

work page arXiv

[11] [11]

Limix: Unleashing structured-data modeling capability for generalist intelligence

Zhang, X., Ren, G., Yu, H., Yuan, H., Wang, H., Li, J., Wu, J., Mo, L., Mao, L., Hao, M., et al. Limix: Unleashing structured-data modeling capability for generalist intelli- gence.arXiv preprint arXiv:2509.03505,

work page arXiv

[12] [12]

The near-future feature distribution is P + t,X = 1 h X s∈H+ t Ps,X ,H + t ={t+ 1,

We fix a stream step t, a current context Dt, and a newly observed candidatez t = (xt, yt). The near-future feature distribution is P + t,X = 1 h X s∈H+ t Ps,X ,H + t ={t+ 1, . . . , t+h}. For a future feature value x′, let Yx′ denote its label random variable. For the current candidate feature xt, we use Yxt to denote the label random variable before its...

work page arXiv 2000