HERMAN: Hierarchical Representation Matching for CLIP-based Class-Incremental Learning

Da-Wei Zhou; De-Chuan Zhan; Han-Jia Ye; Lan Li; Yan Wang; Zhen-Hao Xie

arxiv: 2509.22645 · v2 · pith:JDVISAHTnew · submitted 2025-09-26 · 💻 cs.CV · cs.AI

HERMAN: Hierarchical Representation Matching for CLIP-based Class-Incremental Learning

Zhen-Hao Xie , Yan Wang , Lan Li , Han-Jia Ye , De-Chuan Zhan , Da-Wei Zhou This is my paper

classification 💻 cs.CV cs.AI

keywords hierarchicalrepresentationclass-incrementalclipclip-basedcuesdescriptorsherman

0 comments

read the original abstract

Class-Incremental Learning (CIL) aims to endow models with the ability to continuously adapt to evolving data streams. Recent advances in pre-trained vision-language models (e.g., CLIP) provide a powerful foundation for this task. However, existing approaches often rely on simplistic templates, such as "a photo of a [CLASS]", which overlook the hierarchical nature of visual concepts. For example, recognizing "cat" versus "car" depends on coarse-grained cues, while distinguishing "cat" from "lion" requires fine-grained details. Similarly, the current feature mapping in CLIP relies solely on the representation from the last layer, neglecting the hierarchical information contained in earlier layers. In this work, we introduce HiErarchical Representation MAtchiNg (HERMAN) for CLIP-based CIL. Our approach leverages LLMs to recursively generate discriminative textual descriptors, thereby augmenting the semantic space with explicit hierarchical cues. These descriptors are matched to different levels of the semantic hierarchy and adaptively routed based on task-specific requirements, enabling precise discrimination while alleviating catastrophic forgetting in incremental tasks. Extensive experiments on multiple benchmarks demonstrate that our method consistently achieves state-of-the-art performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AREA: Attribute Extraction and Aggregation for CLIP-Based Class-Incremental Learning
cs.CV 2026-05 unverdicted novelty 6.0

AREA stabilizes attribute extraction with principal geodesic analysis on hyperspherical space and aggregation with lightweight task experts plus variational bottleneck and optimal transport routing, outperforming SOTA...