Unlocking Compositional Generalization in Continual Few-Shot Learning

Chi-Nguyen Tran; Dao Sy Duy Minh; Huynh Trung Kiet; Long Tran-Thanh; Phu-Hoa Pham; Phu-Quy Nguyen-Lam

arxiv: 2605.11710 · v2 · pith:OK6CIMY2new · submitted 2026-05-12 · 💻 cs.LG · cs.CV

Unlocking Compositional Generalization in Continual Few-Shot Learning

Phu-Quy Nguyen-Lam , Phu-Hoa Pham , Dao Sy Duy Minh , Chi-Nguyen Tran , Huynh Trung Kiet , Long Tran-Thanh This is my paper

Pith reviewed 2026-05-20 21:35 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords continual few-shot learningcompositional generalizationobject-centric representationsself-supervised vision transformersslot optimizationrepresentation drift

0 comments

The pith

By decoupling representation learning from compositional inference using frozen self-supervised Vision Transformers, models achieve superior generalization to novel concepts with minimal forgetting in continual few-shot learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to prove that continual few-shot learners can generalize to new concepts by decoupling how they learn object representations from how they use them to compose scenes. The key is to optimize slot features only for matching whole classes while the self-supervised Vision Transformer backbone stays frozen, then at test time compose those slots freely for unfamiliar inputs. If this holds, it would solve the tension where current methods either flatten everything into one embedding or tie parts too tightly to what they have seen, blocking real novelty. The result is less forgetting over time and stronger performance on benchmarks with unseen classes.

Core claim

Leveraging the inherent patch-level semantic geometry of self-supervised Vision Transformers, the framework uses a dual-phase strategy where slot representations are optimized entirely toward holistic class identity during training to preserve generalizable object-level geometries, and at inference the preserved slots are dynamically composed to match novel scenes, offering dual benefits of no representation drift and preserved capacity for novel-concept transfer.

What carries the argument

The dual-phase strategy that decouples training-time holistic optimization of slots for class identity from inference-time dynamic composition, relying on the patch-level semantic geometry of self-supervised Vision Transformers.

If this is right

The frozen backbone prevents representation drift across continual tasks.
Holistic optimization during training maintains the features' ability to transfer to novel concepts.
This results in state-of-the-art performance on unseen-concept generalization.
Minimal forgetting is achieved across standard continual learning benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This decoupling could be tested in other architectures beyond Vision Transformers to see if similar benefits emerge.
The approach implies that general pre-trained geometries may suffice for composition if not specialized too narrowly during fine-tuning.
Extending the method to multi-modal continual learning might reveal if the same separation principle applies to language or other data types.

Load-bearing premise

The patch-level semantic geometry in self-supervised Vision Transformers remains general and composable enough for novel concepts even when slots are optimized solely for holistic class identity.

What would settle it

A direct comparison on a benchmark featuring concepts with completely different visual structures from the training set, where the method shows equivalent or worse generalization than global embedding baselines, would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.11710 by Chi-Nguyen Tran, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh, Phu-Hoa Pham, Phu-Quy Nguyen-Lam.

**Figure 1.** Figure 1: Top (Phase I): A frozen ViT and slot attention process images. A trainable MLP router and projection head map raw aggregates to unit-norm embeddings, optimized via cross-entropy and a cross-correlation penalty. Bottom (Phase II): Gradient-free inference composes novel scenes by centering and matching slots via bidirectional Chamfer distance. inference (Phase II). By freezing the backbone and slot attention… view at source ↗

read the original abstract

Object-centric representations promise a key property for few-shot learning: Rather than treating a scene as a single unit, a model can decompose it into individual object-level parts that can be matched and compared across different concepts. In practice, this potential is rarely realized. Continual learners either collapse scenes into global embeddings, or train with part-level matching objectives that tie representations too closely to seen patterns, leaving them unable to generalize to truly novel concepts. In this paper, we identify this fundamental structural conflict and pioneer a new paradigm that strictly decouples representation learning from compositional inference. Leveraging the inherent patch-level semantic geometry of self-supervised Vision Transformers (ViTs), our framework employs a dual-phase strategy. During training, slot representations are optimized entirely toward holistic class identity, preserving highly generalizable, object-level geometries. At inference, preserved slots are dynamically composed to match novel scenes. We demonstrate that this paradigm offers dual structural benefits: The frozen backbone naturally prevents representation drift, while our lightweight, holistic optimization preserves the features' capacity for novel-concept transfer. Extensive experiments validate this approach, achieving state-of-the-art unseen-concept generalization and minimal forgetting across standard continual learning benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's dual-phase decoupling of holistic slot optimization from compositional inference on a frozen ViT is a straightforward structural fix for continual few-shot issues, but the abstract gives almost no data to check if it actually works.

read the letter

The main thing here is the clean separation: freeze a self-supervised ViT to keep its patch geometry, train slots only on whole-class identity during the continual phase, then compose those slots at inference for new object combinations. This targets the problem where most continual learners either flatten scenes into global vectors or tie parts too tightly to what they have seen so far. The approach is simple and avoids adding heavy new components, which is a plus. It also makes sense that freezing the backbone stops representation drift while the class-only objective might leave more room for novel recombinations than part-matching losses do. That framing of the structural conflict is useful to think about even if the execution details are still light. The soft spot is the missing evidence. The abstract claims state-of-the-art unseen generalization and low forgetting on standard benchmarks, yet supplies no numbers, no listed baselines, and no ablations. Without those it is impossible to tell whether the method actually outperforms slot-attention or object-centric continual learners or whether the key assumption holds. The stress-test point is fair: optimizing slots for class identity could still nudge the underlying patch alignments in ways that hurt flexibility for truly novel compositions once tasks pile up, and nothing in the text shows a derivation or test that the geometry stays untouched. This is for people working on continual learning or compositional vision who want a minimal-change way to keep object structure alive. It shows clear thinking on the trade-off, so it is worth a serious referee to see the full experiments and check the numbers. I would send it to review rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The paper proposes a paradigm for continual few-shot learning that decouples representation learning from compositional inference. It leverages the inherent patch-level semantic geometry of self-supervised Vision Transformers by keeping the backbone frozen and optimizing slot representations exclusively toward holistic class identity during training. At inference, preserved slots are dynamically composed to match novel scenes. The approach claims dual benefits of preventing representation drift and preserving capacity for novel-concept transfer, achieving state-of-the-art unseen-concept generalization and minimal forgetting on standard continual learning benchmarks.

Significance. If the central claims hold, the work could meaningfully advance compositional generalization in continual few-shot settings by resolving the tension between stable representations and flexibility for unseen recombinations. The strategy of using pre-trained ViT patch geometry without part-level matching objectives during training is a clear strength, as is the explicit separation of training and inference phases. This could offer a simpler path than methods that tie representations closely to seen patterns.

major comments (2)

[§3] §3 (Method): The claim that holistic slot optimization for class identity leaves the underlying patch-level semantic geometry of the frozen ViT untouched and recombinable for novel concepts is load-bearing for the central generalization argument, yet the manuscript provides no derivation, invariance analysis, or ablation demonstrating that the objective does not induce alignments that reduce flexibility for unseen object recombinations as tasks accumulate.
[§5] §5 (Experiments): The abstract and results sections assert state-of-the-art performance on unseen-concept generalization and minimal forgetting, but without specific quantitative numbers, baseline comparisons, or ablation tables isolating the contribution of holistic optimization versus the frozen backbone, the strength of the empirical support cannot be fully assessed.

minor comments (1)

[§3.1] The description of 'slots' and their relation to ViT patches could be introduced with more precise notation or a diagram in the early method section to improve clarity for readers unfamiliar with object-centric representations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We address the major comments point by point below, outlining how we will strengthen the manuscript through targeted revisions.

read point-by-point responses

Referee: [§3] §3 (Method): The claim that holistic slot optimization for class identity leaves the underlying patch-level semantic geometry of the frozen ViT untouched and recombinable for novel concepts is load-bearing for the central generalization argument, yet the manuscript provides no derivation, invariance analysis, or ablation demonstrating that the objective does not induce alignments that reduce flexibility for unseen object recombinations as tasks accumulate.

Authors: We agree that an explicit analysis would better support this load-bearing claim. In the revised manuscript, we will add a short theoretical derivation showing that holistic class-identity optimization on frozen self-supervised ViT patches preserves their original semantic geometry (due to the absence of part-level matching losses). We will also include a new ablation that measures slot recombinability on held-out novel concept compositions after sequential tasks, directly comparing against part-level baselines to quantify retained flexibility. revision: yes
Referee: [§5] §5 (Experiments): The abstract and results sections assert state-of-the-art performance on unseen-concept generalization and minimal forgetting, but without specific quantitative numbers, baseline comparisons, or ablation tables isolating the contribution of holistic optimization versus the frozen backbone, the strength of the empirical support cannot be fully assessed.

Authors: The full manuscript already reports quantitative results and baseline comparisons in Section 5. To address the request for clearer isolation of contributions, we will expand the experiments with an additional ablation table that separately varies the holistic optimization objective and the frozen backbone, reporting exact numerical values for unseen-concept accuracy and forgetting on the standard benchmarks. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical validation of decoupled optimization rather than self-referential definitions or fitted predictions

full rationale

The paper's core argument identifies a structural conflict between representation learning and compositional inference, then proposes a dual-phase strategy that optimizes slots solely for holistic class identity while freezing the ViT backbone. No equations appear that define a target quantity in terms of itself or rename a fitted parameter as a prediction. The preservation of patch-level geometry is presented as an empirical property of self-supervised ViTs rather than a derived result that reduces to the training objective by construction. Central claims are supported by benchmark experiments on unseen-concept generalization and forgetting, which are falsifiable outside any internal fit. Self-citations, if present, are not load-bearing for the uniqueness of the paradigm.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that self-supervised ViT patch geometry is inherently suitable for novel-concept composition after holistic optimization; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Self-supervised Vision Transformers possess inherent patch-level semantic geometry that can be preserved for novel concepts.
Invoked in the description of the dual-phase strategy that relies on this geometry remaining generalizable.

pith-pipeline@v0.9.0 · 5757 in / 1304 out tokens · 49576 ms · 2026-05-20T21:35:59.189631+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

During training, slot representations are optimized entirely toward holistic class identity, preserving highly generalizable, object-level geometries. At inference, preserved slots are dynamically composed to match novel scenes.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We escape this trap via strict phase separation: Holistic training, and compositional inference.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

Does Continual Learning Meet Compositionality?

Liao, Weiduo and Wei, Ying and Jiang, Mingchen and Zhang, Qingfu and Ishibuchi, Hisao , booktitle=. Does Continual Learning Meet Compositionality?

work page
[2]

Mark McDonnell and Dong Gong and Amin Parvaneh and Ehsan Abbasnejad and Anton van den Hengel , booktitle=. Ran. 2023 , url=

work page 2023
[3]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page
[4]

Weighted Ensemble Models Are Strong Continual Learners , year =

Marouf, Imad Eddine and Roy, Subhankar and Tartaglione, Enzo and Lathuili\`. Weighted Ensemble Models Are Strong Continual Learners , year =. Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXI , pages =. doi:10.1007/978-3-031-73209-6_18 , abstract =

work page doi:10.1007/978-3-031-73209-6_18 2024
[5]

Advances in Neural Information Processing Systems , volume=

Object-Centric Learning with Slot Attention , author=. Advances in Neural Information Processing Systems , volume=

work page
[6]

The Eleventh International Conference on Learning Representations , year=

Bridging the Gap to Real-World Object-Centric Learning , author=. The Eleventh International Conference on Learning Representations , year=

work page
[7]

Transactions on Machine Learning Research , issn=

Maxime Oquab and Timoth. Transactions on Machine Learning Research , issn=. 2024 , url=

work page 2024
[8]

Zhou, Jinghao and Wei, Chen and Wang, Huiyu and Shen, Wei and Xie, Cihang and Yuille, Alan and Kong, Tao , booktitle=. i

work page
[9]

Proceedings of the 38th International Conference on Machine Learning , pages=

Learning Transferable Visual Models From Natural Language Supervision , author=. Proceedings of the 38th International Conference on Machine Learning , pages=

work page
[10]

Proceedings of the 38th International Conference on Machine Learning , pages=

Barlow Twins: Self-Supervised Learning via Redundancy Reduction , author=. Proceedings of the 38th International Conference on Machine Learning , pages=

work page
[11]

Bardes, Adrien and Ponce, Jean and LeCun, Yann , booktitle=

work page
[12]

Learning robust global representations by penalizing local predictive power

SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning , author=. arXiv preprint arXiv:1911.04623 , year=

work page arXiv 1911
[13]

CVPR , year=

FEAT: Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions , author=. CVPR , year=

work page
[14]

NeurIPS , year=

Prototypical Networks for Few-Shot Learning , author=. NeurIPS , year=

work page
[15]

Preprint , year=

Compositional Few-Shot Class Incremental Learning , author=. Preprint , year=

work page
[16]

NeurIPS , year=

Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition , author=. NeurIPS , year=

work page
[17]

CVPR , year=

Learning Graph Embeddings for Compositional Zero-Shot Learning , author=. CVPR , year=

work page
[18]

Preprint , year=

On the Interaction of Variance Objectives and Classification Losses , author=. Preprint , year=

work page
[19]

Eva-02: A visual representation for neon genesis,

EVA-02: A Visual Representation for Neon Genesis , author=. arXiv preprint arXiv:2303.11331 , year=

work page arXiv
[20]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J\'egou, Herv\'e and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

work page 2021
[21]

2025 , eprint=

DINOv3 , author=. 2025 , eprint=

work page 2025
[22]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Zou, Yixiong and Zhang, Shanghang and Zhou, Haichen and Li, Yuhua and Li, Ruixuan , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024
[23]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,

Evolutionary Generalized Zero-Shot Learning , author =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,. 2024 , month =. doi:10.24963/ijcai.2024/70 , url =

work page doi:10.24963/ijcai.2024/70 2024
[24]

The Fourteenth International Conference on Learning Representations , year=

Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models , author=. The Fourteenth International Conference on Learning Representations , year=

work page
[25]

Bootstrap your own latent a new approach to self-supervised learning , year =

Grill, Jean-Bastien and Strub, Florian and Altch\'. Bootstrap your own latent a new approach to self-supervised learning , year =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

work page
[26]

Proceedings of the 37th International Conference on Machine Learning , articleno =

Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =

work page 2020
[27]

2023 , eprint=

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture , author=. 2023 , eprint=

work page 2023
[28]

International Conference on Learning Representations , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

work page
[29]

, title =

Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =

work page
[30]

Learning Multiple Layers of Features from Tiny Images , author=

work page
[31]

and Branson, S

Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S. , institution=. The Caltech-

work page
[32]

Advances in Neural Information Processing Systems , pages=

Matching Networks for One Shot Learning , author=. Advances in Neural Information Processing Systems , pages=

work page
[33]

The Many Faces of Robustness:

Hendrycks, Dan and Basart, Steven and Mu, Norman and Kadavath, Saurav and Wang, Frank and Dorundo, Evan and Desai, Rahul and Zhu, Tyler and Parajuli, Samyak and Guo, Mike and Song, Dawn and Steinhardt, Jacob and Gilmer, Justin , booktitle=. The Many Faces of Robustness:

work page
[34]

Masked Autoencoders Are Scalable Vision Learners , booktitle =

He, Kaiming and Chen, Xinlei and Xie, Saining and Li, Yanghao and Doll. Masked Autoencoders Are Scalable Vision Learners , booktitle =. 2022 , pages =

work page 2022
[35]

F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning , url =

Zhuang, Huiping and Liu, Yuchen and He, Run and Tong, Kai and Zeng, Ziqian and Chen, Cen and Wang, Yi and Chau, Lap-Pui , booktitle =. F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning , url =. doi:10.52202/079017-1314 , editor =

work page doi:10.52202/079017-1314
[36]

2024 , eprint=

Provable Compositional Generalization for Object-Centric Learning , author=. 2024 , eprint=

work page 2024
[37]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Reproducible Scaling Laws for Contrastive Language-Image Learning , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[38]

, title =

Fan, Haoqiang and Su, Hao and Guibas, Leonidas J. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2017 , pages =

work page 2017
[39]

Vardan Papyan and X. Y. Han and David L. Donoho , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2015509117 , abstract =

work page doi:10.1073/pnas.2015509117 2020
[40]

2025 , eprint=

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics , author=. 2025 , eprint=

work page 2025
[41]

Computational Optimal Transport: With Applications to Data Science , publisher =

Peyr. Computational Optimal Transport: With Applications to Data Science , publisher =. 2019 , series =

work page 2019
[42]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Cuturi, Marco , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[43]

Learning Generative Models with

Genevay, Aude and Peyr. Learning Generative Models with. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2018 , publisher =

work page 2018
[44]

and Belanger, David and Linderman, Scott W

Mena, Gonzalo E. and Belanger, David and Linderman, Scott W. and Snoek, Jasper , title =. International Conference on Learning Representations (ICLR) , year =

work page
[45]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Luise, Giulia and Rudi, Alessandro and Pontil, Massimiliano and Ciliberto, Carlo , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[46]

Interpolating between Optimal Transport and

Feydy, Jean and S. Interpolating between Optimal Transport and. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2019 , publisher =

work page 2019
[47]

, title =

Clarke, Frank H. , title =. 1990 , series =

work page 1990
[48]

SIAM Journal on Optimization , volume =

Bolte, J. SIAM Journal on Optimization , volume =. 2007 , doi =

work page 2007
[49]

Proceedings of the 35th International Conference on Machine Learning (ICML) , pages =

Attention-based Deep Multiple Instance Learning , author =. Proceedings of the 35th International Conference on Machine Learning (ICML) , pages =. 2018 , publisher =

work page 2018
[50]

Dipam Goswami and Yuyang Liu and Bart. Fe. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[51]

SCIENCE CHINA Information Sciences , year=

PILOT: A Pre-Trained Model-Based Continual Learning Toolbox , author=. SCIENCE CHINA Information Sciences , year=

work page
[52]

IJCAI , pages=

Continual learning with pre-trained models: A survey , author=. IJCAI , pages=

work page
[53]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page

[1] [1]

Does Continual Learning Meet Compositionality?

Liao, Weiduo and Wei, Ying and Jiang, Mingchen and Zhang, Qingfu and Ishibuchi, Hisao , booktitle=. Does Continual Learning Meet Compositionality?

work page

[2] [2]

Mark McDonnell and Dong Gong and Amin Parvaneh and Ehsan Abbasnejad and Anton van den Hengel , booktitle=. Ran. 2023 , url=

work page 2023

[3] [3]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page

[4] [4]

Weighted Ensemble Models Are Strong Continual Learners , year =

Marouf, Imad Eddine and Roy, Subhankar and Tartaglione, Enzo and Lathuili\`. Weighted Ensemble Models Are Strong Continual Learners , year =. Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXI , pages =. doi:10.1007/978-3-031-73209-6_18 , abstract =

work page doi:10.1007/978-3-031-73209-6_18 2024

[5] [5]

Advances in Neural Information Processing Systems , volume=

Object-Centric Learning with Slot Attention , author=. Advances in Neural Information Processing Systems , volume=

work page

[6] [6]

The Eleventh International Conference on Learning Representations , year=

Bridging the Gap to Real-World Object-Centric Learning , author=. The Eleventh International Conference on Learning Representations , year=

work page

[7] [7]

Transactions on Machine Learning Research , issn=

Maxime Oquab and Timoth. Transactions on Machine Learning Research , issn=. 2024 , url=

work page 2024

[8] [8]

Zhou, Jinghao and Wei, Chen and Wang, Huiyu and Shen, Wei and Xie, Cihang and Yuille, Alan and Kong, Tao , booktitle=. i

work page

[9] [9]

Proceedings of the 38th International Conference on Machine Learning , pages=

Learning Transferable Visual Models From Natural Language Supervision , author=. Proceedings of the 38th International Conference on Machine Learning , pages=

work page

[10] [10]

Proceedings of the 38th International Conference on Machine Learning , pages=

Barlow Twins: Self-Supervised Learning via Redundancy Reduction , author=. Proceedings of the 38th International Conference on Machine Learning , pages=

work page

[11] [11]

Bardes, Adrien and Ponce, Jean and LeCun, Yann , booktitle=

work page

[12] [12]

Learning robust global representations by penalizing local predictive power

SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning , author=. arXiv preprint arXiv:1911.04623 , year=

work page arXiv 1911

[13] [13]

CVPR , year=

FEAT: Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions , author=. CVPR , year=

work page

[14] [14]

NeurIPS , year=

Prototypical Networks for Few-Shot Learning , author=. NeurIPS , year=

work page

[15] [15]

Preprint , year=

Compositional Few-Shot Class Incremental Learning , author=. Preprint , year=

work page

[16] [16]

NeurIPS , year=

Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition , author=. NeurIPS , year=

work page

[17] [17]

CVPR , year=

Learning Graph Embeddings for Compositional Zero-Shot Learning , author=. CVPR , year=

work page

[18] [18]

Preprint , year=

On the Interaction of Variance Objectives and Classification Losses , author=. Preprint , year=

work page

[19] [19]

Eva-02: A visual representation for neon genesis,

EVA-02: A Visual Representation for Neon Genesis , author=. arXiv preprint arXiv:2303.11331 , year=

work page arXiv

[20] [20]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J\'egou, Herv\'e and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

work page 2021

[21] [21]

2025 , eprint=

DINOv3 , author=. 2025 , eprint=

work page 2025

[22] [22]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Zou, Yixiong and Zhang, Shanghang and Zhou, Haichen and Li, Yuhua and Li, Ruixuan , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024

[23] [23]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,

Evolutionary Generalized Zero-Shot Learning , author =. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence,. 2024 , month =. doi:10.24963/ijcai.2024/70 , url =

work page doi:10.24963/ijcai.2024/70 2024

[24] [24]

The Fourteenth International Conference on Learning Representations , year=

Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models , author=. The Fourteenth International Conference on Learning Representations , year=

work page

[25] [25]

Bootstrap your own latent a new approach to self-supervised learning , year =

Grill, Jean-Bastien and Strub, Florian and Altch\'. Bootstrap your own latent a new approach to self-supervised learning , year =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

work page

[26] [26]

Proceedings of the 37th International Conference on Machine Learning , articleno =

Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =

work page 2020

[27] [27]

2023 , eprint=

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture , author=. 2023 , eprint=

work page 2023

[28] [28]

International Conference on Learning Representations , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. International Conference on Learning Representations , year=

work page

[29] [29]

, title =

Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =

work page

[30] [30]

Learning Multiple Layers of Features from Tiny Images , author=

work page

[31] [31]

and Branson, S

Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S. , institution=. The Caltech-

work page

[32] [32]

Advances in Neural Information Processing Systems , pages=

Matching Networks for One Shot Learning , author=. Advances in Neural Information Processing Systems , pages=

work page

[33] [33]

The Many Faces of Robustness:

Hendrycks, Dan and Basart, Steven and Mu, Norman and Kadavath, Saurav and Wang, Frank and Dorundo, Evan and Desai, Rahul and Zhu, Tyler and Parajuli, Samyak and Guo, Mike and Song, Dawn and Steinhardt, Jacob and Gilmer, Justin , booktitle=. The Many Faces of Robustness:

work page

[34] [34]

Masked Autoencoders Are Scalable Vision Learners , booktitle =

He, Kaiming and Chen, Xinlei and Xie, Saining and Li, Yanghao and Doll. Masked Autoencoders Are Scalable Vision Learners , booktitle =. 2022 , pages =

work page 2022

[35] [35]

F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning , url =

Zhuang, Huiping and Liu, Yuchen and He, Run and Tong, Kai and Zeng, Ziqian and Chen, Cen and Wang, Yi and Chau, Lap-Pui , booktitle =. F-OAL: Forward-only Online Analytic Learning with Fast Training and Low Memory Footprint in Class Incremental Learning , url =. doi:10.52202/079017-1314 , editor =

work page doi:10.52202/079017-1314

[36] [36]

2024 , eprint=

Provable Compositional Generalization for Object-Centric Learning , author=. 2024 , eprint=

work page 2024

[37] [37]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Reproducible Scaling Laws for Contrastive Language-Image Learning , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[38] [38]

, title =

Fan, Haoqiang and Su, Hao and Guibas, Leonidas J. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2017 , pages =

work page 2017

[39] [39]

Vardan Papyan and X. Y. Han and David L. Donoho , title =. Proceedings of the National Academy of Sciences , volume =. 2020 , doi =. https://www.pnas.org/doi/pdf/10.1073/pnas.2015509117 , abstract =

work page doi:10.1073/pnas.2015509117 2020

[40] [40]

2025 , eprint=

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics , author=. 2025 , eprint=

work page 2025

[41] [41]

Computational Optimal Transport: With Applications to Data Science , publisher =

Peyr. Computational Optimal Transport: With Applications to Data Science , publisher =. 2019 , series =

work page 2019

[42] [42]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Cuturi, Marco , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[43] [43]

Learning Generative Models with

Genevay, Aude and Peyr. Learning Generative Models with. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2018 , publisher =

work page 2018

[44] [44]

and Belanger, David and Linderman, Scott W

Mena, Gonzalo E. and Belanger, David and Linderman, Scott W. and Snoek, Jasper , title =. International Conference on Learning Representations (ICLR) , year =

work page

[45] [45]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Luise, Giulia and Rudi, Alessandro and Pontil, Massimiliano and Ciliberto, Carlo , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[46] [46]

Interpolating between Optimal Transport and

Feydy, Jean and S. Interpolating between Optimal Transport and. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2019 , publisher =

work page 2019

[47] [47]

, title =

Clarke, Frank H. , title =. 1990 , series =

work page 1990

[48] [48]

SIAM Journal on Optimization , volume =

Bolte, J. SIAM Journal on Optimization , volume =. 2007 , doi =

work page 2007

[49] [49]

Proceedings of the 35th International Conference on Machine Learning (ICML) , pages =

Attention-based Deep Multiple Instance Learning , author =. Proceedings of the 35th International Conference on Machine Learning (ICML) , pages =. 2018 , publisher =

work page 2018

[50] [50]

Dipam Goswami and Yuyang Liu and Bart. Fe. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page

[51] [51]

SCIENCE CHINA Information Sciences , year=

PILOT: A Pre-Trained Model-Based Continual Learning Toolbox , author=. SCIENCE CHINA Information Sciences , year=

work page

[52] [52]

IJCAI , pages=

Continual learning with pre-trained models: A survey , author=. IJCAI , pages=

work page

[53] [53]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Zhou, Da-Wei and Wang, Qi-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan and Liu, Ziwei , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

work page