Recognition: unknown
SECOS: Semantic Capture for Rigorous Classification in Open-World Semi-Supervised Learning
Pith reviewed 2026-05-07 08:40 UTC · model grok-4.3
The pith
SECOS uses external knowledge to align semantics across modalities so models can directly predict textual labels for novel classes in open-world semi-supervised learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SECOS extracts and aligns semantic representations for known and novel classes using external knowledge, supplying explicit supervisory signals that let the model directly output the most relevant textual label from a candidate set for every input without any post-processing stage.
What carries the argument
SECOS framework that extracts cross-modal semantic representations from external knowledge sources and uses the resulting alignments as explicit supervision for novel-class samples during training.
If this is right
- Practical OWSSL systems can perform rigorous classification by selecting labels directly from candidate sets at inference time.
- Novel classes receive explicit semantic supervision rather than relying solely on clustering or pseudo-labeling.
- Performance gains of up to 5.4 percent are observed even when competing methods are allowed post-hoc label matching.
- The need for separate post-processing modules is removed for both training and deployment.
Where Pith is reading between the lines
- The same alignment mechanism could be tested on open-world object detection or segmentation tasks where semantic consistency across modalities is also required.
- Replacing the external knowledge source with a larger multimodal model might further tighten the alignment for rare novel classes.
- If the external knowledge proves incomplete for certain domains, the performance gap between SECOS and post-processed baselines would shrink or reverse.
Load-bearing premise
External knowledge sources will supply semantic representations that reliably match the visual content of novel classes outside the training distribution.
What would settle it
A controlled test on a dataset whose novel classes have no usable semantic coverage in the chosen external knowledge base, measuring whether SECOS accuracy falls below that of post-processing baselines.
Figures
read the original abstract
In open-world semi-supervised learning (OWSSL), a model learns from labeled data and unlabeled data containing both known and novel classes. In practical OWSSL applications, models are expected to perform rigorous classification by directly selecting the most semantically relevant label from a candidate set for each sample. Existing OWSSL methods fail to achieve this because novel samples are trained without explicit supervision, and these methods lack mechanisms to extract latent semantic information, resulting in predicted labels that have no semantic correspondence to candidate textual labels. To address this, we introduce SEmantic Capture for Open-world Semi-supervised learning (SECOS), which directly predicts textual labels from the candidate set without post-processing, meeting the requirements of practical OWSSL applications. SECOS leverages external knowledge to extract and align semantic representations across modalities for both known and novel classes, providing explicit supervisory signals for training novel classes. Extensive experiments demonstrate that even when existing OWSSL methods are evaluated under the more lenient post-hoc matching setting, SECOS still surpasses them by up to 5.4\% without such assistance, highlighting its superior effectiveness. Code is available at https://github.com/ganchi-huanggua/OSSL-Classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SECOS for open-world semi-supervised learning (OWSSL). It claims that by using external knowledge sources to extract and align semantic representations across modalities for both known and novel classes, the method supplies explicit supervisory signals. This enables direct prediction of textual labels from a candidate set without any post-processing or post-hoc matching, unlike prior OWSSL approaches that lack mechanisms for latent semantic extraction. Experiments are reported to show that SECOS outperforms existing methods by up to 5.4% even when those baselines are evaluated under the more lenient post-hoc matching protocol.
Significance. If the central claims hold after proper validation, SECOS would address a practical gap in OWSSL by achieving rigorous, semantically grounded classification directly from candidate textual labels. Code availability is a positive factor for reproducibility. The approach's dependence on external knowledge for novel-class alignment, however, requires explicit testing of robustness to source quality and distributional mismatch before the reported gains can be considered generalizable.
major comments (3)
- [Experiments] Experiments section (and abstract): Performance gains of up to 5.4% are reported, yet no details are provided on experimental setup, error bars, data splits, number of runs, or the precise choice and preprocessing of external knowledge sources. Without these, the central claim of superiority under direct prediction cannot be verified or reproduced.
- [Method] Method section (description of semantic capture): The alignment of external knowledge representations for novel classes is treated as reliable and is used to generate explicit supervisory signals, but the manuscript contains no controlled ablation or degradation experiments on the external source (e.g., noisy embeddings, incomplete knowledge bases, or domain-shifted retrieval). This leaves the load-bearing assumption untested.
- [Method] §3 (or equivalent method description): The claim that SECOS 'directly predicts textual labels from the candidate set without post-processing' is central, yet the precise inference procedure, loss formulation for novel-class supervision, and how candidate-set selection is performed at test time are not specified with sufficient algorithmic detail or pseudocode to allow independent implementation.
minor comments (2)
- [Abstract] The abstract states that existing methods 'lack mechanisms to extract latent semantic information' but does not cite the specific prior works or sections where this limitation is demonstrated.
- [Method] Notation for semantic representations and alignment functions should be introduced earlier and used consistently; current description mixes 'external knowledge' with 'semantic representations' without clear definitions.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript introducing SECOS for open-world semi-supervised learning. We address each major comment point by point below, providing clarifications and indicating revisions made to strengthen the paper.
read point-by-point responses
-
Referee: [Experiments] Experiments section (and abstract): Performance gains of up to 5.4% are reported, yet no details are provided on experimental setup, error bars, data splits, number of runs, or the precise choice and preprocessing of external knowledge sources. Without these, the central claim of superiority under direct prediction cannot be verified or reproduced.
Authors: We agree that the initial submission lacked sufficient experimental details for full reproducibility. In the revised manuscript, we have added a new subsection to the Experiments section that specifies the full setup: dataset splits (with exact proportions for labeled/unlabeled/novel classes), number of runs (5 independent runs with mean and standard deviation), error bars on all reported metrics, and the precise external knowledge sources (e.g., specific embedding models and Wikipedia-derived corpora) along with preprocessing steps such as normalization and filtering. These additions directly support verification of the 5.4% gains under the direct-prediction protocol. revision: yes
-
Referee: [Method] Method section (description of semantic capture): The alignment of external knowledge representations for novel classes is treated as reliable and is used to generate explicit supervisory signals, but the manuscript contains no controlled ablation or degradation experiments on the external source (e.g., noisy embeddings, incomplete knowledge bases, or domain-shifted retrieval). This leaves the load-bearing assumption untested.
Authors: The referee correctly notes the absence of explicit robustness ablations on the external knowledge component. While our primary experiments rely on standard, domain-aligned sources, we have incorporated new controlled ablation experiments in the revised manuscript. These include tests with injected noise in embeddings and reduced knowledge-base coverage, demonstrating that performance remains competitive. We provide a brief discussion on domain shift but acknowledge that exhaustive cross-domain retrieval experiments were not feasible within the current scope; the added results nonetheless address the core concern. revision: partial
-
Referee: [Method] §3 (or equivalent method description): The claim that SECOS 'directly predicts textual labels from the candidate set without post-processing' is central, yet the precise inference procedure, loss formulation for novel-class supervision, and how candidate-set selection is performed at test time are not specified with sufficient algorithmic detail or pseudocode to allow independent implementation.
Authors: We thank the referee for highlighting the need for greater implementation detail. The revised manuscript expands §3 with a step-by-step description of the inference procedure for direct textual label selection, the complete loss formulation for novel-class supervision (including the specific cross-modal alignment and contrastive terms), and the exact mechanism for candidate-set construction and selection at test time. We have also inserted pseudocode for the training loop and inference pipeline. The publicly released code at the provided GitHub link already implements these components, but the paper is now self-contained for independent reproduction. revision: yes
Circularity Check
No significant circularity: SECOS relies on independent external knowledge for semantic alignment rather than self-referential definitions or fitted predictions
full rationale
The paper's central mechanism extracts and aligns semantic representations using external knowledge sources to supply explicit supervisory signals for novel classes, enabling direct textual label prediction from a candidate set. This does not reduce to a self-definitional loop, a fitted input renamed as prediction, or a load-bearing self-citation chain. No equations or steps in the abstract or description equate the claimed output (rigorous classification without post-processing) to the model's own parameters by construction. The 5.4% gain is presented as an empirical result against baselines under different evaluation settings, not a tautological consequence of the method's definition. The derivation remains self-contained because the external knowledge bases and alignment process are treated as independent inputs, not derived from or equivalent to the SECOS training objective itself.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. InNeurIPS, pages 1877–1901, 2020. 3
1901
-
[2]
Open-world semi-supervised learning
Kaidi Cao, Maria Brbic, and Jure Leskovec. Open-world semi-supervised learning. InICLR, 2022. 1, 2
2022
-
[3]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021. 4
2021
-
[4]
Cgmatch: A different perspective of semi- supervised learning
Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, and Lan Du. Cgmatch: A different perspective of semi- supervised learning. InCVPR, pages 15381–15391, 2025. 1
2025
-
[5]
Forensics adapter: Adapting clip for generalizable face forgery detection
Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, and Junyu Dong. Forensics adapter: Adapting clip for generalizable face forgery detection. InCVPR, pages 19207–19217, 2025. 3
2025
-
[6]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 4
work page internal anchor Pith review arXiv 2010
-
[7]
Learning textual prompts for open-world semi-supervised learning
Yuxin Fan, Junbiao Cui, and Jiye Liang. Learning textual prompts for open-world semi-supervised learning. InCVPR, pages 14756–14765, 2025. 2, 5, 6
2025
-
[8]
Adapter merging with centroid prototype mapping for scal- able class-incremental learning
Takuma Fukuda, Hiroshi Kera, and Kazuhiko Kawamoto. Adapter merging with centroid prototype mapping for scal- able class-incremental learning. InCVPR, pages 4884–4893,
-
[9]
Robust semi-supervised learning when not all classes have labels
Lan-Zhe Guo, Yi-Ge Zhang, Zhi-Fan Wu, Jie-Jing Shao, and Yu-Feng Li. Robust semi-supervised learning when not all classes have labels. InNeurIPS, pages 3305–3317, 2022. 2
2022
-
[10]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, pages 770–778, 2016. 2, 4
2016
-
[11]
Parameter-efficient transfer learning for nlp
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-efficient transfer learning for nlp. InICML, pages 2790–2799, 2019. 2
2019
-
[12]
3d object representations for fine-grained categorization
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3d object representations for fine-grained categorization. In ICCVW, pages 554–561, 2013. 5
2013
-
[13]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 5
2009
-
[14]
Imagenet classification with deep convolutional neural net- works
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural net- works. InNeurIPS, 2012. 2, 5
2012
-
[15]
Fate: A prompt- tuning-based semi-supervised learning framework for ex- tremely limited labeled data
Hezhao Liu, Yang Lu, Mengke Li, Yiqun Zhang, Shreyank N Gowda, Chen Gong, and Hanzi Wang. Fate: A prompt- tuning-based semi-supervised learning framework for ex- tremely limited labeled data. InACM MM, page 3153–3162,
-
[16]
Open-world semi-supervised novel class discovery
Jiaming Liu, Yangqiming Wang, Tongze Zhang, Yulu Fan, Qinli Yang, and Junming Shao. Open-world semi-supervised novel class discovery. InIJCAI, pages 4002–4010, 2023. 2
2023
-
[17]
Mind the gap: Confidence discrepancy can guide federated semi-supervised learning across pseudo-mismatch
Yijie Liu, Xinyi Shang, Yiqun Zhang, Yang Lu, Chen Gong, Jing-Hao Xue, and Hanzi Wang. Mind the gap: Confidence discrepancy can guide federated semi-supervised learning across pseudo-mismatch. InCVPR, pages 10173–10182,
-
[18]
Enhancing clip with clip: Exploring pseudolabeling for limited-label prompt tuning
Cristina Menghini, Andrew Delworth, and Stephen Bach. Enhancing clip with clip: Exploring pseudolabeling for limited-label prompt tuning. InNeurIPS, pages 60984– 61007, 2023. 3
2023
-
[19]
Automated flower classification over a large number of classes
Maria-Elena Nilsback and Andrew Zisserman. Automated flower classification over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & im- age processing, pages 722–729, 2008. 5
2008
-
[20]
Ow- match: Conditional self-labeling with consistency for open- world semi-supervised learning
Shengjie Niu, Lifan Lin, Jian Huang, and Chao Wang. Ow- match: Conditional self-labeling with consistency for open- world semi-supervised learning. InNeurIPS, pages 99836– 99866, 2024. 2, 5, 6
2024
-
[21]
Rabah Ouldnoughi, Chia-Wen Kuo, and Zsolt Kira. Clip- gcd: Simple language guided generalized category discov- ery.arXiv preprint arXiv:2305.10420, 2023. 2
-
[22]
St-adapter: Parameter-efficient image-to-video transfer learning
Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, and Hong- sheng Li. St-adapter: Parameter-efficient image-to-video transfer learning. InNeurIPS, pages 26462–26477, 2022. 3
2022
-
[23]
Cats and dogs
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. Cats and dogs. InCVPR, pages 3498–3505,
-
[24]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019. 5
2019
-
[25]
AdapterFusion: Non-Destructive Task Composition for Transfer Learning , journal =
Jonas Pfeiffer, Aishwarya Kamath, Andreas R ¨uckl´e, Kyunghyun Cho, and Iryna Gurevych. Adapterfusion: Non- destructive task composition for transfer learning.arXiv preprint arXiv:2005.00247, 2020. 3
-
[26]
Dynamic conceptional contrastive learning for generalized category discovery
Nan Pu, Zhun Zhong, and Nicu Sebe. Dynamic conceptional contrastive learning for generalized category discovery. In CVPR, pages 7579–7588, 2023. 2, 5, 6
2023
-
[27]
Improving language understanding by gen- erative pre-training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by gen- erative pre-training. 2018. 3
2018
-
[28]
Learn- ing transferable visual models from natural language super- vision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. InICML, pages 8748–8763, 2021. 2, 4
2021
-
[29]
Openldn: Learn- ing to discover novel classes for open-world semi-supervised learning
Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fa- had Shahbaz Khan, and Mubarak Shah. Openldn: Learn- ing to discover novel classes for open-world semi-supervised learning. InECCV, pages 382–401, 2022. 2
2022
-
[30]
Towards realistic semi-supervised learning
Mamshad Nayeem Rizve, Navid Kardan, and Mubarak Shah. Towards realistic semi-supervised learning. InECCV, pages 437–455, 2022. 2
2022
-
[31]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan and Andrew Zisserman. Very deep convo- lutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014. 2
work page internal anchor Pith review arXiv 2014
-
[32]
Fixmatch: Simplifying semi-supervised learning with consistency and confidence
Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. InNeurIPS, pages 596–608, 2020. 5
2020
-
[33]
Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks
Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. InCVPR, pages 5227–5237, 2022. 3
2022
-
[34]
Going deeper with convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InCVPR, pages 1–9, 2015. 2
2015
-
[35]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 3
2017
-
[36]
Generalized category discovery
Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zisser- man. Generalized category discovery. InCVPR, pages 7492– 7501, 2022. 1, 2, 5, 6
2022
-
[37]
The caltech-ucsd birds-200-2011 dataset
Catherine Wah, Steve Branson, Peter Welinder, Pietro Per- ona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. 2011. 5
2011
-
[38]
Get: Unlocking the multi-modal potential of clip for generalized category dis- covery
Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, and Ming-Ming Cheng. Get: Unlocking the multi-modal potential of clip for generalized category dis- covery. InCVPR, pages 20296–20306, 2025. 2
2025
-
[39]
Hongjun Wang, Sagar Vaze, and Kai Han. Sptnet: An efficient alternative framework for generalized category discovery with spatial prompt tuning.arXiv preprint arXiv:2403.13684, 2024. 5, 6
-
[40]
Discover and align taxonomic context priors for open- world semi-supervised learning
Yu Wang, Zhun Zhong, Pengchong Qiao, Xuxin Cheng, Xi- awu Zheng, Chang Liu, Nicu Sebe, Rongrong Ji, and Jie Chen. Discover and align taxonomic context priors for open- world semi-supervised learning. InNeurIPS, pages 19015– 19028, 2023. 1, 2, 6
2023
-
[41]
Parametric classification for generalized category discovery: A baseline study
Xin Wen, Bingchen Zhao, and Xiaojuan Qi. Parametric classification for generalized category discovery: A baseline study. InICCV, pages 16590–16600, 2023. 2, 6
2023
-
[42]
Targeted representation align- ment for open-world semi-supervised learning
Ruixuan Xiao, Lei Feng, Kai Tang, Junbo Zhao, Yixuan Li, Gang Chen, and Haobo Wang. Targeted representation align- ment for open-world semi-supervised learning. InCVPR, pages 23072–23082, 2024. 1, 2, 5, 6
2024
-
[43]
Mma: Multi-modal adapter for vision-language models
Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang, and Xiao- hua Xie. Mma: Multi-modal adapter for vision-language models. InCVPR, pages 23826–23837, 2024. 3
2024
-
[44]
Candidate pseudolabel learning: Enhancing vision-language models by prompt tuning with unlabeled data
Jiahan Zhang, Qi Wei, Feng Liu, and Lei Feng. Candidate pseudolabel learning: Enhancing vision-language models by prompt tuning with unlabeled data. InICML, pages 60004– 60020, 2024. 4, 5
2024
-
[45]
Prompt- cal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery
Sheng Zhang, Salman Khan, Zhiqiang Shen, Muzammal Naseer, Guangyi Chen, and Fahad Shahbaz Khan. Prompt- cal: Contrastive affinity learning via auxiliary prompts for generalized novel category discovery. InCVPR, pages 3479– 3488, 2023. 2, 5, 6
2023
-
[46]
Learning semi- supervised gaussian mixture models for generalized category discovery
Bingchen Zhao, Xin Wen, and Kai Han. Learning semi- supervised gaussian mixture models for generalized category discovery. InICCV, pages 16623–16633, 2023. 6
2023
-
[47]
Textual knowledge matters: Cross-modality co- teaching for generalized visual class discovery
Haiyang Zheng, Nan Pu, Wenjing Li, Nicu Sebe, and Zhun Zhong. Textual knowledge matters: Cross-modality co- teaching for generalized visual class discovery. InECCV, pages 41–58, 2024. 1, 2, 5, 6
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.