Deep Multi Label Classification in Affine Subspaces

Pablo Marquez Neila; Raphael Sznitman; Sebastian Wolf; Thomas Kurmann

arxiv: 1907.04563 · v1 · pith:6ZBJXXMKnew · submitted 2019-07-10 · 💻 cs.CV · cs.LG

Deep Multi Label Classification in Affine Subspaces

Thomas Kurmann , Pablo Marquez Neila , Sebastian Wolf , Raphael Sznitman This is my paper

Pith reviewed 2026-05-25 00:06 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords multi-label classificationaffine subspacesdeep learningmedical imagingloss functionfeature embeddingend-to-end training

0 comments

The pith

Standard multi-class losses are suboptimal for multi-label classification; pulling label features into distinct affine subspaces improves results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that common objective functions borrowed from multi-class classification do not suit multi-label settings and instead introduces a loss that maps each class's features toward its own affine subspace while pushing subspaces apart. This matters for medical imaging tasks where multiple labels per image are common yet cheaper to annotate than full segmentations. The approach is presented as a drop-in replacement that trains end-to-end and is tested on two medical datasets where it yields large gains over prior multi-label frameworks.

Core claim

A deep multi-label classifier can be trained by encouraging the feature vectors of different class labels to lie in separate affine subspaces while maximizing the distance between those subspaces; this objective replaces standard multi-class losses and produces higher performance on medical multi-label datasets.

What carries the argument

The affine-subspace separation loss, which pulls class-specific features toward distinct affine subspaces and maximizes inter-subspace distances.

If this is right

The method functions as a plug-in loss that can replace existing objectives in any deep multi-label architecture.
Performance gains appear on medical imaging tasks where multiple labels per image are required.
Training remains end-to-end differentiable because the subspace objective is formulated as a loss term.
The approach directly addresses the mismatch between multi-class loss design and multi-label label statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same subspace-separation idea could be tested on non-medical multi-label datasets such as scene or document classification.
If the subspaces remain low-dimensional, the method might reduce the number of parameters needed compared with per-class heads.
The geometric separation might also be applied to continual learning to keep class representations from interfering.

Load-bearing premise

The assumption that separating class features into affine subspaces will improve generalization in multi-label settings without hidden optimization or overfitting costs.

What would settle it

A controlled experiment on the same two medical datasets that shows the proposed subspace loss produces equal or lower performance than standard multi-label losses.

Figures

Figures reproduced from arXiv: 1907.04563 by Pablo Marquez Neila, Raphael Sznitman, Sebastian Wolf, Thomas Kurmann.

**Figure 1.** Figure 1: Left: Retinal Optical Coherence Tomography scan containing 11 possible biomarkers. In this example Intraretinal Cysts and Fibrovascular PED are present. Right: Chest X-Ray scan [2] with Cardiomegaly and Emphysema present. features, recent approaches have looked to share information across different labeling tasks using deep neural networks [2]. In particular, while feature sharing is achieved via network … view at source ↗

**Figure 2.** Figure 2: Illustration of MLC (left) and our proposed AS-MLC method (right). This synthetic example the feature space is of size d = 2 with n = 2 labels: Red points = (00), Green = (01), Pink = (10) and Blue = (11). Note how the joint distribution of labels is clustered at the intersections of the hyperplanes [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Left: 10 fold cross validation of bandwith size using a fixed feature size e = 32. Right: Impact analysis for the feature dimension size e at a fixed bandwidth δ = 0.1. Method Atelectasis Cardiomegaly Effusion Infiltration Mass Original [2] 0.7003 0.8100 0.7585 0.6614 0.6933 Softmax 0.7290 0.8514 0.7893 0.6692 0.7853 AS-MLC 0.7471 0.8481 0.8203 0.6647 0.7957 Method Nodule Pneumonia Pneumothorax Consolidati… view at source ↗

read the original abstract

Multi-label classification (MLC) problems are becoming increasingly popular in the context of medical imaging. This has in part been driven by the fact that acquiring annotations for MLC is far less burdensome than for semantic segmentation and yet provides more expressiveness than multi-class classification. However, to train MLCs, most methods have resorted to similar objective functions as with traditional multi-class classification settings. We show in this work that such approaches are not optimal and instead propose a novel deep MLC classification method in affine subspace. At its core, the method attempts to pull features of class-labels towards different affine subspaces while maximizing the distance between them. We evaluate the method using two MLC medical imaging datasets and show a large performance increase compared to previous multi-label frameworks. This method can be seen as a plug-in replacement loss function and is trainable in an end-to-end fashion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The affine-subspace loss is a concrete geometric alternative for multi-label classification, but the handling of co-occurring labels needs explicit verification.

read the letter

The main takeaway is a new loss that maps per-class features into separate affine subspaces while pushing the subspaces apart, framed as better than standard multi-class objectives for multi-label medical imaging tasks. It is presented as a plug-in replacement that trains end-to-end. The claim of large gains on two medical datasets is the central empirical point. The geometric construction itself appears new in the MLC setting and avoids obvious reduction to prior losses. The medical focus fits the annotation-cost argument in the abstract. The formulation is straightforward from the description and shows no circular fitting or self-referential issues. The stress-test concern about conflicting pulls for co-occurring labels is worth checking in the full text. If the loss applies attraction per positive label while the repulsion term is either relaxed for co-occurring classes or permits subspace intersections, the geometry can remain consistent; otherwise the advantage may be limited to datasets with low label overlap. The abstract does not spell this out, so the paper's results on real medical data will determine whether the constraint is handled or ignored. No other load-bearing assumptions stand out. This work is for people building or tuning losses for multi-label vision problems, especially in medical imaging. A reader looking for alternatives to binary cross-entropy would get direct value from the formulation and the reported numbers. It is solid enough on its own terms to deserve peer review, with the main referee questions being the exact loss equations for multi-label samples and whether the gains hold after proper ablations and statistical checks.

Referee Report

2 major / 1 minor

Summary. The paper argues that standard multi-class objectives are suboptimal for multi-label classification (MLC) and proposes a new end-to-end trainable loss that attracts per-class features toward distinct affine subspaces while maximizing distances between those subspaces. It evaluates the approach on two MLC medical imaging datasets and claims a large performance increase relative to prior multi-label frameworks.

Significance. A geometrically motivated subspace-separation loss could, if shown to be compatible with label co-occurrence, offer a principled alternative to binary cross-entropy or ranking losses in MLC. The medical-imaging setting is appropriate, but the absence of any quantitative results, baseline specifications, or ablation evidence in the provided text prevents assessment of whether the claimed gains are real or generalizable.

major comments (2)

[Abstract] Abstract: the central claim of 'large performance increase' is asserted without any numerical metrics, baseline details, statistical tests, or dataset names, rendering the empirical support for superiority impossible to evaluate from the given text.
[Method] Method description (core loss): the formulation pulls features toward multiple distinct affine subspaces for co-occurring positive labels while simultaneously maximizing inter-subspace distances. No mechanism is indicated for reconciling the resulting geometric tension (a single feature vector cannot lie near several mutually distant subspaces), which directly threatens the applicability of the loss to realistic MLC data.

minor comments (1)

[Abstract] Abstract: the two medical imaging datasets are referenced but never named or characterized (size, label cardinality, co-occurrence statistics).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address each major comment below and will make revisions to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'large performance increase' is asserted without any numerical metrics, baseline details, statistical tests, or dataset names, rendering the empirical support for superiority impossible to evaluate from the given text.

Authors: We agree that the abstract should be self-contained and provide concrete support for the performance claims. The full manuscript reports results on two specific medical imaging datasets with comparisons against prior multi-label methods. We will revise the abstract to include key quantitative metrics, baseline names, and dataset identifiers. revision: yes
Referee: [Method] Method description (core loss): the formulation pulls features toward multiple distinct affine subspaces for co-occurring positive labels while simultaneously maximizing inter-subspace distances. No mechanism is indicated for reconciling the resulting geometric tension (a single feature vector cannot lie near several mutually distant subspaces), which directly threatens the applicability of the loss to realistic MLC data.

Authors: The method operates on class-specific features rather than a single shared image feature vector. For each positive label, a dedicated class feature (or projection) is pulled toward its own affine subspace; the inter-subspace repulsion acts between the subspaces themselves. This per-class formulation avoids placing one vector near multiple distant subspaces. We will expand the method section to explicitly describe this per-class mechanism and its handling of label co-occurrence. revision: yes

Circularity Check

0 steps flagged

No circularity: novel loss introduced as independent geometric objective

full rationale

The paper proposes a new multi-label loss that attracts per-class features to distinct affine subspaces while maximizing inter-subspace distances, presented as a plug-in replacement trainable end-to-end. No equations, fitted parameters, or self-citations are shown that would reduce the claimed superiority or the geometric construction to a tautology or prior result by the same authors. Evaluation on external medical imaging datasets supplies independent empirical content, so the derivation chain remains self-contained against the listed circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; none can be extracted.

pith-pipeline@v0.9.0 · 5679 in / 967 out tokens · 21132 ms · 2026-05-25T00:06:40.641091+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(6) 411–444

Gibaja, E., Ventura, S.: Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(6) 411–444

work page
[2]

In: CVPR 2017

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classiﬁ- cation and localization of common thorax diseases. In: CVPR 2017

work page 2017
[3]

In: MICCAI 2018

Adeli, E., Kwon, D., Pohl, K.M.: Multi-label transduction for identifying disease comorbidity patterns. In: MICCAI 2018. (2018) 575–583

work page 2018
[4]

Fauw, D., et al.: Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24(9) (2018) 1342–1350

work page 2018
[5]

Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classiﬁer chains for multi-label classiﬁcation. Mach. Learn. 85(3) (2011) 333

work page 2011
[6]

Pattern Recognit

Zhang, M.L., Zhou, Z.H.: ML-KNN: A lazy learning approach to multi-label learn- ing. Pattern Recognit. 40(7) (2007) 2038–2048

work page 2007
[7]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classi- ﬁcation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2017) 3617–3625

work page 2017
[8]

Twenty-Fourth International Joint Conference on (2015)

Li, X., Guo, Y.: Multi-label classiﬁcation with feature-aware non-linear label space transformation. Twenty-Fourth International Joint Conference on (2015)

work page 2015
[9]

Yeh, C.K., Wu, W.C., Ko, W.J., Wang, Y.C.F.: Learning deep latent spaces for Multi-Label classiﬁcation. (2017)

work page 2017
[10]

In: CVPR

Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR. (2017) Deep Multi Label Classiﬁcation in Aﬃne Subspaces 9

work page 2017
[11]

In: CVPR

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. (2016) 770–778

work page 2016
[12]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

Guendel, S., Grbic, S., Georgescu, B., Zhou, K., Ritschl, L., Meier, A., Comaniciu, D.: Learning to recognize abnormalities in chest X-Rays with Location-Aware dense networks. (2018)

work page 2018
[14]

In: Machine Learning in Medical Imaging, Springer International Publishing (2018) 249–258

Tang, Y., Wang, X., Harrison, A.P., Lu, L., Xiao, J., Summers, R.M.: Attention- Guided curriculum learning for weakly supervised classiﬁcation and localization of thoracic diseases on chest radiographs. In: Machine Learning in Medical Imaging, Springer International Publishing (2018) 249–258

work page 2018
[15]

Pattern Recognit

Guan, Q., Huang, Y.: Multi-label chest x-ray image classiﬁcation via category-wise residual attention learning. Pattern Recognit. Lett. (2018)

work page 2018
[16]

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- CAM: Visual explanations from deep networks via gradient-based localization. (2016)

work page 2016

[1] [1]

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(6) 411–444

Gibaja, E., Ventura, S.: Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(6) 411–444

work page

[2] [2]

In: CVPR 2017

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classiﬁ- cation and localization of common thorax diseases. In: CVPR 2017

work page 2017

[3] [3]

In: MICCAI 2018

Adeli, E., Kwon, D., Pohl, K.M.: Multi-label transduction for identifying disease comorbidity patterns. In: MICCAI 2018. (2018) 575–583

work page 2018

[4] [4]

Fauw, D., et al.: Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24(9) (2018) 1342–1350

work page 2018

[5] [5]

Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classiﬁer chains for multi-label classiﬁcation. Mach. Learn. 85(3) (2011) 333

work page 2011

[6] [6]

Pattern Recognit

Zhang, M.L., Zhou, Z.H.: ML-KNN: A lazy learning approach to multi-label learn- ing. Pattern Recognit. 40(7) (2007) 2038–2048

work page 2007

[7] [7]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classi- ﬁcation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. (2017) 3617–3625

work page 2017

[8] [8]

Twenty-Fourth International Joint Conference on (2015)

Li, X., Guo, Y.: Multi-label classiﬁcation with feature-aware non-linear label space transformation. Twenty-Fourth International Joint Conference on (2015)

work page 2015

[9] [9]

Yeh, C.K., Wu, W.C., Ko, W.J., Wang, Y.C.F.: Learning deep latent spaces for Multi-Label classiﬁcation. (2017)

work page 2017

[10] [10]

In: CVPR

Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: CVPR. (2017) Deep Multi Label Classiﬁcation in Aﬃne Subspaces 9

work page 2017

[11] [11]

In: CVPR

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. (2016) 770–778

work page 2016

[12] [12]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

Guendel, S., Grbic, S., Georgescu, B., Zhou, K., Ritschl, L., Meier, A., Comaniciu, D.: Learning to recognize abnormalities in chest X-Rays with Location-Aware dense networks. (2018)

work page 2018

[14] [14]

In: Machine Learning in Medical Imaging, Springer International Publishing (2018) 249–258

Tang, Y., Wang, X., Harrison, A.P., Lu, L., Xiao, J., Summers, R.M.: Attention- Guided curriculum learning for weakly supervised classiﬁcation and localization of thoracic diseases on chest radiographs. In: Machine Learning in Medical Imaging, Springer International Publishing (2018) 249–258

work page 2018

[15] [15]

Pattern Recognit

Guan, Q., Huang, Y.: Multi-label chest x-ray image classiﬁcation via category-wise residual attention learning. Pattern Recognit. Lett. (2018)

work page 2018

[16] [16]

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- CAM: Visual explanations from deep networks via gradient-based localization. (2016)

work page 2016