iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models
Pith reviewed 2026-05-20 07:13 UTC · model grok-4.3
The pith
Vision-language models adapt continually by projecting new task gradients onto subspaces identified from early MoE router convergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
iGSP splits adaptation into a Subspace Identification phase that expands candidate experts, applies subspace-constrained regularization to project incoming gradients onto the historical basis established by early MoE router convergence, and prunes redundant dimensions using routing probabilities as gradient-flow indicators, followed by an Orthogonal Subspace Fine-Tuning phase that fixes the basis and drops the regularization to fit task residuals. This process is claimed to deliver state-of-the-art accuracy on the MTIL benchmark while cutting average trainable parameters by 42.7 percent and final total parameters by 86.9 percent relative to prior methods.
What carries the argument
Implicit gradient subspace projection that treats early-converged MoE routing probabilities as indicators for projecting new gradients onto and pruning within a shared historical low-rank subspace.
If this is right
- State-of-the-art accuracy is reached on the MTIL continual learning benchmark for vision-language models.
- Average trainable parameters drop by 42.7 percent compared with current state-of-the-art continual learning methods.
- Final total parameter count falls by 86.9 percent relative to counterpart approaches that assign isolated modules per task.
- Negative transfer between visually similar yet logically distinct tasks is reduced by aligning on optimization trajectory overlap instead of surface similarity.
- Training becomes more efficient because the structural basis is identified once and then held fixed during rapid residual fitting.
Where Pith is reading between the lines
- The same early-router convergence signal could serve as a general cue for subspace construction in other routed or gated architectures beyond vision-language models.
- Monitoring gradient flow indicators in deployed systems might allow subspaces to expand or contract dynamically after the initial identification phase.
- Direct comparison on task sequences where visual appearance and logical structure are deliberately decorrelated would isolate the benefit of trajectory-based sharing over similarity-based baselines.
Load-bearing premise
Early convergence of MoE routers produces a stable subspace basis that captures all necessary historical information so that later gradient projection and pruning preserve task-specific details without loss.
What would settle it
Running the method on MTIL but beginning subspace identification only after routers have not converged, or replacing routing probabilities with uniform values during pruning, and observing whether accuracy falls below prior methods or the reported parameter reductions vanish.
Figures
read the original abstract
Vision-Language Models require efficient adaptation to continually emerging downstream tasks. While Parameter-Efficient Fine-Tuning mitigates catastrophic forgetting, assigning isolated modules per task leads to parameter explosion. Conversely, recent similarity-driven sharing mechanisms falsely equate superficial visual similarity with underlying alignment consistency. This fundamental mismatch triggers severe negative transfer between visually similar but logically distinct tasks and fails to exploit alignment reuse across visually diverse ones. We argue thatalignment sharing is fundamentally a geometric problem of overlapping optimization trajectories within shared low-rank subspaces. Grounded in this insight, we propose iGSP, a novel framework that achieves efficient adaptation via implicit gradient subspace projection. Leveraging the early convergence of MoE routers to establish the subspace basis, iGSP bifurcates the adaptation process into two phases. First, the Subspace Identification phase introduces candidate experts via basis pre-expansion, applies a novel subspace-constrained regularization to implicitly project new task gradients onto the historical subspace, and precisely prunes redundant dimensions by treating routing probabilities as gradient flow indicators, ultimately to maximize knowledge reuse. Second, the Orthogonal Subspace Fine-Tuning phase fixes this structural basis and removes the regularization to rapidly fit the task-specific residual loss. Extensive experiments on the MTIL benchmark demonstrate that iGSP achieves state-of-the-art accuracy while significantly improving training efficiency, reducing the average trainable parameters by 42.7\% compared to current SOTA methods, and decreasing the final total parameters by 86.9\% relative to counterparts. The source code is available at https://github.com/GeoX-Lab/iGSP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes iGSP for efficient continual adaptation of vision-language models. It argues that alignment sharing is a geometric problem of overlapping low-rank subspaces and introduces a two-phase procedure: Subspace Identification (basis pre-expansion, subspace-constrained regularization, and pruning via MoE routing probabilities treated as gradient-flow proxies) followed by Orthogonal Subspace Fine-Tuning (fixed basis, regularization removed). On the MTIL benchmark the method reports state-of-the-art accuracy together with a 42.7% reduction in average trainable parameters and an 86.9% reduction in final total parameters relative to prior SOTA.
Significance. If the central geometric claim and the stability of the early-MoE-router subspace basis hold, the work supplies a principled route to parameter-efficient continual VLM adaptation that avoids both parameter explosion and negative transfer between visually similar but semantically distinct tasks. The explicit two-phase separation and the use of routing probabilities for pruning constitute a concrete, testable contribution to the PEFT/continual-learning literature.
major comments (2)
- [Section 3.2] Subspace Identification phase (Section 3.2): The central claim that early MoE router convergence produces a fixed, reusable low-rank subspace basis whose pruning (via routing probabilities as gradient-flow indicators) preserves task-specific information is load-bearing for both the 42.7% trainable-parameter reduction and the absence of negative transfer. The manuscript provides no ablation on router stability after the identification cutoff, no sensitivity analysis to the cutoff epoch, and no targeted evaluation on MTIL task pairs that are visually similar yet semantically distinct; without these, the reported efficiency gains cannot be confidently attributed to the proposed projection rather than to incomplete bases.
- [Section 3.3] Orthogonal Subspace Fine-Tuning phase (Section 3.3): The transition from subspace-constrained regularization to unconstrained fine-tuning assumes the identified basis already captures all reusable alignment; if this assumption fails on later tasks, the orthogonal residual fitting may still incur negative transfer. The current MTIL results do not report per-task forgetting curves or gradient alignment metrics before versus after the phase switch.
minor comments (2)
- [Section 3.1] The abstract and Section 3.1 refer to 'basis pre-expansion' without an explicit equation or pseudocode showing how the candidate experts are added to the historical subspace; a compact algorithmic box would improve reproducibility.
- [Figure 2] Figure 2 (or equivalent) comparing parameter counts should include error bars or multiple random seeds; the 86.9% total-parameter reduction is stated as a single figure without variance.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We appreciate the recognition of the geometric framing of alignment sharing and the potential contribution of the two-phase iGSP procedure. Below we respond point by point to the major comments, indicating the revisions we will incorporate to address the concerns raised.
read point-by-point responses
-
Referee: [Section 3.2] Subspace Identification phase (Section 3.2): The central claim that early MoE router convergence produces a fixed, reusable low-rank subspace basis whose pruning (via routing probabilities as gradient-flow indicators) preserves task-specific information is load-bearing for both the 42.7% trainable-parameter reduction and the absence of negative transfer. The manuscript provides no ablation on router stability after the identification cutoff, no sensitivity analysis to the cutoff epoch, and no targeted evaluation on MTIL task pairs that are visually similar yet semantically distinct; without these, the reported efficiency gains cannot be confidently attributed to the proposed projection rather than to incomplete bases.
Authors: We agree that these analyses would strengthen the attribution of the reported gains to the implicit projection mechanism. In the revised manuscript we will add (i) an ablation monitoring router stability by extending training 10 epochs past the identification cutoff and reporting changes in routing probability distributions and downstream accuracy; (ii) a sensitivity study varying the cutoff epoch from 3 to 15 and tabulating the resulting trade-offs in parameter count versus final MTIL accuracy; and (iii) a targeted subsection evaluating negative transfer on visually similar yet semantically distinct MTIL pairs (e.g., different animal classes or vehicle subtypes), using both accuracy deltas and gradient cosine similarity as metrics. These additions will allow readers to assess whether the efficiency improvements stem from the subspace projection rather than incomplete bases. The existing MTIL results already include a broad range of task similarities, but we will make the supporting evidence more explicit. revision: yes
-
Referee: [Section 3.3] Orthogonal Subspace Fine-Tuning phase (Section 3.3): The transition from subspace-constrained regularization to unconstrained fine-tuning assumes the identified basis already captures all reusable alignment; if this assumption fails on later tasks, the orthogonal residual fitting may still incur negative transfer. The current MTIL results do not report per-task forgetting curves or gradient alignment metrics before versus after the phase switch.
Authors: We acknowledge the value of explicit per-task metrics to verify that the phase switch does not re-introduce negative transfer. In the revised version we will include (i) per-task forgetting curves showing accuracy on each prior task after every new-task adaptation and (ii) gradient alignment statistics (cosine similarity between pre- and post-switch gradients projected onto the identified subspace). The design of iGSP ensures that phase-one regularization projects new gradients onto the historical basis, so that phase-two residual fitting operates in the orthogonal complement; the observed SOTA accuracy together with the 86.9 % reduction in total parameter growth on MTIL is consistent with limited interference. Nevertheless, we will add the requested curves and metrics to make this claim directly verifiable. revision: yes
Circularity Check
No significant circularity detected in iGSP derivation
full rationale
The paper's central derivation introduces a two-phase process (Subspace Identification with basis pre-expansion, subspace-constrained regularization, and routing-probability pruning, followed by Orthogonal Subspace Fine-Tuning) grounded in the geometric insight that alignment sharing is a problem of overlapping low-rank optimization trajectories. This is presented as a novel framework rather than a re-derivation of prior fitted quantities or self-citations. Experimental results on the MTIL benchmark are used to support the reported accuracy and parameter reductions; no equations or steps in the provided text reduce the claimed gains to inputs by construction, self-definition, or load-bearing self-citation chains. The method adds independent regularization and pruning mechanisms.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Alignment sharing is fundamentally a geometric problem of overlapping optimization trajectories within shared low-rank subspaces.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Leveraging the early convergence of MoE routers to establish the subspace basis, iGSP bifurcates the adaptation process into Subspace Identification and Orthogonal Subspace Fine-Tuning phases
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Subspace-Constrained Regularization (SCR) ... implicitly forcing the optimizer to exhaust the expressive capacity of the existing expert subspace
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763
work page 2021
-
[2]
H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” Advances in neural information processing systems, vol. 36, pp. 34 892– 34 916, 2023. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12
work page 2023
-
[3]
T. Huai, J. Zhou, X. Wu, Q. Chen, Q. Bai, Z. Zhou, and L. He, “Cl- moe: Enhancing multimodal large language model with dual momentum mixture-of-experts for continual visual question answering,” inProceed- ings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 19 608–19 617
work page 2025
-
[4]
Continual learning of image classes with language guidance from a vision-language model,
W. Zhang, Y . Huang, W. Zhang, T. Zhang, Q. Lao, Y . Yu, W.-S. Zheng, and R. Wang, “Continual learning of image classes with language guidance from a vision-language model,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 12, pp. 13 152–13 163, 2024
work page 2024
-
[5]
Bilora: Almost-orthogonal parameter spaces for continual learning,
H. Zhu, Y . Zhang, J. Dong, and P. Koniusz, “Bilora: Almost-orthogonal parameter spaces for continual learning,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25 613–25 622
work page 2025
-
[6]
Language guided concept bottleneck models for interpretable continual learning,
L. Yu, H. Han, Z. Tao, H. Yao, and C. Xu, “Language guided concept bottleneck models for interpretable continual learning,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 976–14 986
work page 2025
-
[7]
Do your best and get enough rest for continual learning,
H. Kang, G. Seifer, D. Lee, and J. Ryu, “Do your best and get enough rest for continual learning,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 10 077–10 086
work page 2025
-
[8]
Catastrophic interference in connec- tionist networks: The sequential learning problem,
M. McCloskey and N. J. Cohen, “Catastrophic interference in connec- tionist networks: The sequential learning problem,” inPsychology of learning and motivation. Elsevier, 1989, vol. 24, pp. 109–165
work page 1989
-
[9]
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
Y . Liu, Q. Hong, L. Huang, A. Gomez-Villa, D. Goswami, X. Liu, J. van de Weijer, and Y . Tian, “Continual learning for vlms: A survey and taxonomy beyond forgetting,”arXiv preprint arXiv:2508.04227, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Assessing and learning align- ment of unimodal vision and language models,
L. Zhang, Q. Yang, and A. Agrawal, “Assessing and learning align- ment of unimodal vision and language models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 14 604– 14 614
work page 2025
-
[11]
Z. Li and D. Hoiem, “Learning without forgetting,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935– 2947, 2017
work page 2017
-
[12]
Preventing zero-shot transfer degradation in continual learning of vision-language models,
Z. Zheng, M. Ma, K. Wang, Z. Qin, X. Yue, and Y . You, “Preventing zero-shot transfer degradation in continual learning of vision-language models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 19 125–19 136
work page 2023
-
[13]
C. Wu, Q. Wu, R. Ma, K. N. Ngan, H. Li, F. Meng, and H. Qiu, “Continual cross-domain image compression via entropy prior guided knowledge distillation and scalable decoding,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8080– 8092, 2024
work page 2024
-
[14]
Overcoming catastrophic forgetting in neural networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,”Pro- ceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017
work page 2017
-
[15]
Memory aware synapses: Learning what (not) to forget,
R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 139– 154
work page 2018
-
[16]
icarl: Incremental classifier and representation learning,
S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010
work page 2017
-
[17]
Synthetic data is an elegant gift for continual vision-language models,
B. Wu, W. Shi, J. Wang, and M. Ye, “Synthetic data is an elegant gift for continual vision-language models,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 2813–2823
work page 2025
-
[18]
Selective experience replay for lifelong learn- ing,
D. Isele and A. Cosgun, “Selective experience replay for lifelong learn- ing,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018
work page 2018
-
[19]
Squeezing more past knowledge for online class-incremental continual learning,
D. Yu, M. Zhang, M. Li, F. Zha, J. Zhang, L. Sun, and K. Huang, “Squeezing more past knowledge for online class-incremental continual learning,”IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 3, pp. 722–736, 2023
work page 2023
-
[20]
Class-specific knowledge-guided multimodal prompt tuning for few-shot class-incremental learning,
F. Xiong, Z. Yuan, X. Wu, and C. Xu, “Class-specific knowledge-guided multimodal prompt tuning for few-shot class-incremental learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 1, pp. 763–776, 2026
work page 2026
-
[21]
Learning to prompt for continual learning,
Z. Wang, Z. Zhang, C.-Y . Lee, H. Zhang, R. Sun, X. Ren, G. Su, V . Perot, J. Dy, and T. Pfister, “Learning to prompt for continual learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 139–149
work page 2022
-
[22]
L. Tang, Z. Tian, K. Li, C. He, H. Zhou, H. Zhao, X. Li, and J. Jia, “Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models,” inEuropean conference on computer vision. Springer, 2024, pp. 346–365
work page 2024
-
[23]
Inflora: Interference-free low-rank adaptation for continual learning,
Y .-S. Liang and W.-J. Li, “Inflora: Interference-free low-rank adaptation for continual learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23 638–23 647
work page 2024
-
[24]
Lora: Low-rank adaptation of large language models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022
work page 2022
-
[25]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,”arXiv preprint arXiv:2101.00190, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[26]
J. Yu, Z. Huang, Y . Zhuge, L. Zhang, P. Hu, D. Wang, H. Lu, and Y . He, “Moe-adapters++: Towards more efficient continual learning of vision-language models via dynamic mixture-of-experts adapters,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[27]
Self-expansion of pre-trained models with mixture of adapters for continual learning,
H. Wanget al., “Self-expansion of pre-trained models with mixture of adapters for continual learning,” inCVPR, 2025, pp. 10 087–10 098
work page 2025
-
[28]
Con- tinual learning with pre-trained models: A survey,
D.-W. Zhou, H.-L. Sun, J. Ning, H.-J. Ye, and D.-C. Zhan, “Con- tinual learning with pre-trained models: A survey,”arXiv preprint arXiv:2401.16386, 2024
-
[29]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[30]
Class incremental learning with pre-trained vision-language models,
X. Liu, X. Cao, H. Lu, J.-w. Xiao, A. D. Bagdanov, and M.-M. Cheng, “Class incremental learning with pre-trained vision-language models,” arXiv preprint arXiv:2310.20348, 2023
-
[31]
Dualprompt: Complementary prompting for rehearsal-free continual learning,
Z. Wang, Z. Zhang, S. Ebrahimi, R. Sun, H. Zhang, C.-Y . Lee, X. Ren, G. Su, V . Perot, J. Dyet al., “Dualprompt: Complementary prompting for rehearsal-free continual learning,” inEuropean conference on computer vision. Springer, 2022, pp. 631–648
work page 2022
-
[32]
S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning,
Y . Wang, Z. Huang, and X. Hong, “S-prompts learning with pre-trained transformers: An occam’s razor for domain incremental learning,”Ad- vances in Neural Information Processing Systems, vol. 35, pp. 5682– 5695, 2022
work page 2022
-
[33]
Pectp: Parameter-efficient cross-task prompts for incremental vision transformer,
Q. Feng, H. Zhao, C. Zhang, J. Dong, H. Ding, Y .-G. Jiang, and H. Qian, “Pectp: Parameter-efficient cross-task prompts for incremental vision transformer,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 11, pp. 11 282–11 296, 2025
work page 2025
-
[34]
Cl-lora: Continual low-rank adaptation for rehearsal-free class-incremental learning,
J. He, Z. Duan, and F. Zhu, “Cl-lora: Continual low-rank adaptation for rehearsal-free class-incremental learning,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 534– 30 544
work page 2025
-
[35]
Boosting continual learning of vision-language models via mixture-of-experts adapters,
J. Yu, Y . Zhuge, L. Zhang, P. Hu, D. Wang, H. Lu, and Y . He, “Boosting continual learning of vision-language models via mixture-of-experts adapters,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 23 219–23 230
work page 2024
-
[36]
Gradient episodic memory for continual learning,
D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[37]
arXiv preprint arXiv:2005.00944 , year=
S. Wu, H. R. Zhang, and C. R ´e, “Understanding and improving informa- tion transfer in multi-task learning,”arXiv preprint arXiv:2005.00944, 2020
-
[38]
Data augmented flatness-aware gradient projection for continual learning,
E. Yang, L. Shen, Z. Wang, S. Liu, G. Guo, and X. Wang, “Data augmented flatness-aware gradient projection for continual learning,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 5630–5639
work page 2023
-
[39]
Rethinking gradient projection continual learning: Stability/plasticity feature space decoupling,
Z. Zhao, Z. Zhang, X. Tan, J. Liu, Y . Qu, Y . Xie, and L. Ma, “Rethinking gradient projection continual learning: Stability/plasticity feature space decoupling,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3718–3727
work page 2023
-
[40]
Code-cl: Conceptor- based gradient projection for deep continual learning,
M. P. Apolinario, S. Choudhary, and K. Roy, “Code-cl: Conceptor- based gradient projection for deep continual learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 775–784
work page 2025
-
[41]
Visual prompt tuning in null space for continual learning,
Y . Lu, S. Zhang, D. Cheng, Y . Xing, N. Wang, P. Wang, and Y . Zhang, “Visual prompt tuning in null space for continual learning,”Advances in neural information processing systems, vol. 37, pp. 7878–7901, 2024
work page 2024
-
[42]
Prompt gradient projection for continual learning,
J. Qiao, X. Tan, C. Chen, Y . Qu, Y . Peng, Y . Xieet al., “Prompt gradient projection for continual learning,” inThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[43]
Keeplora: Continual learning with residual gradient adaptation,
M.-L. Luo, Z.-H. Zhou, Y .-L. Zhang, Y . Wan, T. Wei, and M.-L. Zhang, “Keeplora: Continual learning with residual gradient adaptation,”arXiv preprint arXiv:2601.19659, 2026
-
[44]
H. Qiu, M. Zhang, Z. Qiao, W. Guan, M. Zhang, and L. Nie, “Splitlora: Balancing stability and plasticity in continual learning through gradient space splitting,”arXiv preprint arXiv:2505.22370, 2025
-
[45]
T. Peng, Y . Liu, S. Yang, Q. Hong, and Y . Tian, “Gnsp: Gradient null space projection for preserving cross-modal alignment in vlms continual learning,”arXiv preprint arXiv:2507.19839, 2025. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13
-
[46]
Dynamic multi-layer null space projection for vision-language continual learning,
B. Kang, L. Wang, Z. Wu, T. Feng, Y . Li, Y . Gao, and W. Li, “Dynamic multi-layer null space projection for vision-language continual learning,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 2077–2086
work page 2025
-
[47]
Adaptive mixtures of local experts,
R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural computation, vol. 3, no. 1, pp. 79–87, 1991
work page 1991
-
[48]
Learning multiple layers of features from tiny images,
A. Krizhevsky, G. Hintonet al., “Learning multiple layers of features from tiny images,” 2009
work page 2009
-
[49]
Der: Dynamically expandable representation for class incremental learning,
S. Yan, J. Xie, and X. He, “Der: Dynamically expandable representation for class incremental learning,” inProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, 2021, pp. 3014–3023
work page 2021
-
[50]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[51]
Don’t stop learn- ing: Towards continual learning for the clip model,
Y . Ding, L. Liu, C. Tian, J. Yang, and H. Ding, “Don’t stop learn- ing: Towards continual learning for the clip model,”arXiv preprint arXiv:2207.09248, 2022
-
[52]
Robust fine-tuning of zero-shot models,
M. Wortsman, G. Ilharco, J. W. Kim, M. Li, S. Kornblith, R. Roelofs, R. G. Lopes, H. Hajishirzi, A. Farhadi, H. Namkoonget al., “Robust fine-tuning of zero-shot models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 7959– 7971
work page 2022
-
[53]
End-to-end incremental learning,
F. M. Castro, M. J. Mar ´ın-Jim´enez, N. Guil, C. Schmid, and K. Alahari, “End-to-end incremental learning,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 233–248
work page 2018
-
[54]
Learning a unified classifier incrementally via rebalancing,
S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin, “Learning a unified classifier incrementally via rebalancing,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 831–839
work page 2019
-
[55]
More classifiers, less forgetting: A generic multi-classifier paradigm for incremental learning,
Y . Liu, S. Parisot, G. Slabaugh, X. Jia, A. Leonardis, and T. Tuytelaars, “More classifiers, less forgetting: A generic multi-classifier paradigm for incremental learning,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 699–716
work page 2020
-
[56]
Prototype aug- mentation and self-supervision for incremental learning,
F. Zhu, X.-Y . Zhang, C. Wang, F. Yin, and C.-L. Liu, “Prototype aug- mentation and self-supervision for incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5871–5880
work page 2021
-
[57]
Dytox: Trans- formers for continual learning with dynamic token expansion,
A. Douillard, A. Ram ´e, G. Couairon, and M. Cord, “Dytox: Trans- formers for continual learning with dynamic token expansion,” inPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9285–9295
work page 2022
-
[58]
Large scale incremental learning,
Y . Wu, Y . Chen, L. Wang, Y . Ye, Z. Liu, Y . Guo, and Y . Fu, “Large scale incremental learning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 374–382
work page 2019
-
[59]
Podnet: Pooled outputs distillation for small-tasks incremental learning,
A. Douillard, M. Cord, C. Ollion, T. Robert, and E. Valle, “Podnet: Pooled outputs distillation for small-tasks incremental learning,” in European Conference on Computer Vision. Springer, 2020, pp. 86– 102
work page 2020
-
[60]
Dense network ex- pansion for class incremental learning,
Z. Hu, Y . Li, J. Lyu, D. Gao, and N. Vasconcelos, “Dense network ex- pansion for class incremental learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 11 858–11 867
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.