When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy

April Chan; Davide D'Ascenzo; Sebastiano Cultrera di Montesano

arxiv: 2605.06274 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.CV

When Labels Have Structure: Improving Image Classification with Hierarchy-Aware Cross-Entropy

April Chan , Davide D'Ascenzo , Sebastiano Cultrera di Montesano This is my paper

Pith reviewed 2026-05-08 13:00 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords hierarchy-aware cross-entropyimage classificationclass hierarchylabel smoothingprediction aggregationCIFAR-100FGVC AircraftNABirds

0 comments

The pith

Incorporating a class hierarchy into the loss function improves image classification accuracy over standard cross-entropy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Hierarchy-Aware Cross-Entropy (HACE) to replace standard cross-entropy by directly using a known class hierarchy. HACE aggregates model predictions upward so that parent classes receive probability mass from their children and applies ancestral label smoothing to spread the true label signal along the path from the correct class to the root. This approach yields higher accuracy than standard cross-entropy in 15 of 18 architecture-dataset combinations during end-to-end training and beats all tested baselines during linear probing on frozen features. A sympathetic reader would care because everyday classification problems involve classes that share semantic structure, and ignoring that structure forces models to treat every error as equally bad.

Core claim

HACE improves accuracy over standard cross-entropy in 15 out of 18 architecture-dataset pairs, with a mean gain of 4.66%. In linear probing on frozen DINOv2-Large features, HACE outperforms all competing methods on all three datasets, with a mean improvement of 2.18% over the next best baseline. The method combines prediction aggregation, which propagates probability mass upward through the class hierarchy, and ancestral label smoothing, which distributes the ground-truth signal along ancestry paths.

What carries the argument

Hierarchy-Aware Cross-Entropy (HACE) that integrates prediction aggregation to accumulate parent-node confidence from children and ancestral label smoothing to distribute ground-truth probability along the path to the root.

If this is right

HACE functions as a drop-in replacement for cross-entropy and requires no change to model architecture.
Accuracy gains appear consistently across convolutional and attention-based networks on CIFAR-100, FGVC Aircraft, and NABirds.
The same loss also improves linear probes on frozen pre-trained features from DINOv2-Large.
By respecting semantic distances, the trained models make fewer errors between unrelated classes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same loss formulation could be applied to hierarchical label sets outside vision, such as product taxonomies or medical diagnosis codes.
Automatically inferring or refining the hierarchy from data might extend the benefits to datasets that lack an explicit tree.
Combining HACE with existing regularization methods could produce additive gains in generalization.
Scaling the approach to ImageNet-scale hierarchies would test whether the observed improvements hold when the tree becomes deeper and wider.

Load-bearing premise

The supplied class hierarchy accurately encodes the semantic distances that matter for distinguishing the classes in the task.

What would settle it

Training the same architectures on the same datasets with an independently verified accurate hierarchy and finding that HACE produces equal or lower accuracy than standard cross-entropy on average would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.06274 by April Chan, Davide D'Ascenzo, Sebastiano Cultrera di Montesano.

**Figure 1.** Figure 1: Illustration of the two components of HACE applied to a toy animal hierarchy. view at source ↗

**Figure 2.** Figure 2: Per-class accuracy at the family level of the FGVC Aircraft hierarchy, comparing HACE and view at source ↗

**Figure 3.** Figure 3: Per-class accuracy at the manufacturer level of the FGVC Aircraft hierarchy, comparing view at source ↗

read the original abstract

Standard cross-entropy is the default classification loss across virtually all of machine learning, yet it treats all misclassifications equally, ignoring the semantic distances that a class hierarchy encodes. We propose Hierarchy-Aware Cross-Entropy (HACE), a drop-in replacement for standard cross-entropy that incorporates a known class hierarchy directly into the loss. HACE combines two components: prediction aggregation, which propagates the model's probability mass upward through the class hierarchy to ensure that parent nodes accumulate the confidence of their children; and ancestral label smoothing, which distributes the ground-truth signal along the path from the true class to the root. We evaluate HACE on CIFAR-100, FGVC Aircraft, and NABirds in two regimes: end-to-end training across six architectures spanning convolutional and attention-based designs, and linear probing on frozen DINOv2-Large features. In end-to-end training, HACE improves accuracy over standard cross-entropy in 15 out of 18 architecture--dataset pairs, with a mean gain of 4.66\%. In linear probing on frozen DINOv2-Large features, HACE outperforms all competing methods on all three datasets, with a mean improvement of 2.18\% over the next best baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HACE provides a practical drop-in replacement for cross-entropy that incorporates hierarchy through prediction aggregation and ancestral label smoothing, leading to consistent accuracy gains in the reported experiments.

read the letter

The main takeaway is that HACE provides a practical drop-in replacement for cross-entropy that incorporates hierarchy through prediction aggregation and ancestral label smoothing, leading to consistent accuracy gains in the reported experiments. The authors define two operations: one that aggregates the model's output probabilities by propagating them up the tree so parents get the sum of children, and another that smooths the target labels by distributing mass along the ancestral path. This pairing is what sets it apart from previous hierarchical classification losses. The paper does a decent job evaluating this on three datasets with different characteristics and across several model types. The end-to-end results show improvement in most cases, and the linear probing experiment on DINOv2 features is a nice addition because it isolates the loss effect from feature learning. Mean gains of 4.66% and 2.18% are not huge but consistent enough to be interesting. The soft spot is the lack of detail in the abstract on implementation choices, such as how exactly the aggregation is normalized or what temperature is used in smoothing. Without seeing ablations or sensitivity analysis, it's difficult to know if the method is as general as claimed or if it relies on the specific hierarchies in these datasets. The assumption that the hierarchy encodes semantic distances is taken as given, which is fine for these experiments but could be a point for discussion. This work is for people who train classifiers on data with natural taxonomies and want a low-effort way to leverage that structure. It would be useful for practitioners in computer vision who already have or can obtain hierarchies. The paper shows clear thinking in how it builds on standard cross-entropy and tests the idea in multiple regimes. It deserves a serious referee to verify the details and see if the gains hold under closer scrutiny. I would recommend sending it to peer review rather than desk rejecting it.

Referee Report

2 major / 2 minor

Summary. The paper proposes Hierarchy-Aware Cross-Entropy (HACE) as a drop-in replacement for standard cross-entropy that incorporates a known class hierarchy via two components: prediction aggregation (upward propagation of model probabilities to parent nodes) and ancestral label smoothing (distributing ground-truth probability mass along the ancestry path to the root). It reports consistent accuracy gains over vanilla cross-entropy in end-to-end training on CIFAR-100, FGVC Aircraft, and NABirds across six architectures (15/18 pairs, mean +4.66%), and superior results in linear probing on frozen DINOv2-Large features (mean +2.18% over the next-best baseline).

Significance. If the empirical results hold under full scrutiny, HACE provides a simple, hierarchy-aware loss that could be adopted as a default when class taxonomies are available, particularly for fine-grained datasets. The gains in both full training and linear-probing regimes, plus the method's parameter-free nature relative to the hierarchy, represent a modest but practical advance over treating all misclassifications equally.

major comments (2)

[§4] §4 (Experiments), Table 1: the reported mean gain of 4.66% aggregates across 18 pairs without per-pair standard deviations or paired statistical tests; several individual improvements appear small enough that run-to-run variance could alter the 15/18 count.
[§3.2] §3.2 (Ancestral label smoothing): the smoothing distributes mass uniformly along the path to the root, but no ablation is shown on alternative weightings (e.g., exponential decay by depth) or on the sensitivity of final accuracy to the smoothing coefficient; this choice is load-bearing for the claimed generalization benefit.

minor comments (2)

The abstract and §4 claim outperformance over 'all competing methods' in linear probing, but the exact list of baselines and their hyper-parameter tuning protocols should be stated explicitly for reproducibility.
[§2] Notation for the hierarchy (parent/child relations, depth) is introduced in §2 but used without a small illustrative diagram; adding one would clarify the upward aggregation step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and the recommendation for minor revision. We address each of the major comments below and outline the changes we will make to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experiments), Table 1: the reported mean gain of 4.66% aggregates across 18 pairs without per-pair standard deviations or paired statistical tests; several individual improvements appear small enough that run-to-run variance could alter the 15/18 count.

Authors: We agree that including standard deviations and statistical tests would enhance the rigor of our empirical evaluation. Due to the significant computational resources required to train six different architectures on three datasets, we performed single runs for each experiment. Nevertheless, the improvements are consistent across 15 out of 18 diverse settings, with several gains being substantial (over 5% in multiple cases). In the revised manuscript, we will expand Table 1 to list the per-pair accuracy differences explicitly and add a paragraph in §4 discussing the single-run limitation and the robustness suggested by the breadth of our experiments. We will also note that future work could include multi-seed evaluations for formal statistical testing. revision: partial
Referee: [§3.2] §3.2 (Ancestral label smoothing): the smoothing distributes mass uniformly along the path to the root, but no ablation is shown on alternative weightings (e.g., exponential decay by depth) or on the sensitivity of final accuracy to the smoothing coefficient; this choice is load-bearing for the claimed generalization benefit.

Authors: The uniform distribution was deliberately chosen to ensure the method remains simple, hyperparameter-free, and a true drop-in replacement for cross-entropy. We will revise the text in §3.2 to provide a clearer justification for this design decision, emphasizing its alignment with the goal of incorporating hierarchy without additional complexity. To address the sensitivity concern, we will include in the supplementary material a plot or table showing accuracy as a function of the smoothing coefficient on CIFAR-100 for one architecture. Regarding alternative weightings, we will add a discussion acknowledging that non-uniform schemes (such as depth-based decay) could be explored in future work and may yield further improvements, but that the uniform approach already delivers consistent gains. revision: partial

Circularity Check

0 steps flagged

No significant circularity in HACE derivation

full rationale

The paper defines Hierarchy-Aware Cross-Entropy directly as a combination of prediction aggregation (upward probability propagation through the given hierarchy) and ancestral label smoothing (distributing ground-truth along ancestry paths), both constructed from the supplied class hierarchy and standard cross-entropy. No load-bearing equation reduces to a fitted parameter renamed as a prediction, no self-citation chain justifies a uniqueness claim, and no ansatz is smuggled in. The reported accuracy gains (15/18 pairs, mean +4.66%) are presented as empirical outcomes rather than mathematical derivations that collapse to the inputs by construction. The method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a known class hierarchy exists and meaningfully captures semantic relationships; no free parameters or invented entities are introduced beyond the standard cross-entropy formulation.

axioms (1)

domain assumption A class hierarchy is available and correctly represents semantic relationships between classes
HACE requires this hierarchy to perform prediction aggregation and ancestral label smoothing.

pith-pipeline@v0.9.0 · 5529 in / 1199 out tokens · 48449 ms · 2026-05-08T13:00:45.963374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[2]

Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

work page 2012
[3]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[4]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis Bach and David Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 448–456, Lille, France, 07–09 Jul 2015. PMLR

work page 2015
[5]

Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958, 2014

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958, 2014

work page 1929
[6]

A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019

Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019

work page 2019
[7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021
[8]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021
[9]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

work page 2022
[10]

Cultrera di Montesano, D

S. Cultrera di Montesano, D. D’Ascenzo, S. Raghavan, A.P. Amini, P.S. Winter, and L. Crawford. Improving atlas-scale single-cell annotation models with hierarchical cross-entropy loss.Nature Computational Science, 6:243–249, 2026

work page 2026
[11]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

work page 2009
[12]

Fine- grained visual classification of aircraft, 2013

Subhransu Maji, Juho Kannala, Esa Rahtu, Matthew Blaschko, and Andrea Vedaldi. Fine- grained visual classification of aircraft, 2013

work page 2013
[13]

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 595–604, 2015. 10

work page 2015
[14]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024
[15]

Making better mistakes: Leveraging class hierarchies with deep networks

Luca Bertinetto, Romain Mueller, Konstantinos Tertikas, Sina Samangooei, and Nicholas A Lord. Making better mistakes: Leveraging class hierarchies with deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12506–12515, 2020

work page 2020
[16]

A survey of hierarchical classification across different application domains.Data mining and knowledge discovery, 22(1):31–72, 2011

Carlos N Silla Jr and Alex A Freitas. A survey of hierarchical classification across different application domains.Data mining and knowledge discovery, 22(1):31–72, 2011

work page 2011
[17]

Berg, Kai Li, and Li Fei-Fei

Jia Deng, Alexander C. Berg, Kai Li, and Li Fei-Fei. What does classifying more than 10,000 image categories tell us? In Kostas Daniilidis, Petros Maragos, and Nikos Paragios, editors, Computer Vision – ECCV 2010, pages 71–84. Springer Berlin Heidelberg, 2010. ISBN 978-3- 642-15555-0

work page 2010
[18]

Re- thinking the inception architecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Re- thinking the inception architecture for computer vision. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016

work page 2016
[19]

When does label smoothing help? Advances in neural information processing systems, 32, 2019

Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? Advances in neural information processing systems, 32, 2019

work page 2019
[20]

Simloss: Class similarities in cross entropy

Konstantin Kobs, Michael Steininger, Albin Zehe, Florian Lautenschlager, and Andreas Hotho. Simloss: Class similarities in cross entropy. InF oundations of Intelligent Systems: 25th International Symposium, ISMIS 2020, Graz, Austria, September 23–25, 2020, Proceedings, page 431–439. Springer-Verlag, 2020

work page 2020
[21]

Human uncertainty makes classification more robust

Joshua C Peterson, Ruairidh M Battleday, Thomas L Griffiths, and Olga Russakovsky. Human uncertainty makes classification more robust. InProceedings of the IEEE/CVF international conference on computer vision, pages 9617–9626, 2019

work page 2019
[22]

Hierarchy-based image embeddings for semantic image retrieval

Björn Barz and Joachim Denzler. Hierarchy-based image embeddings for semantic image retrieval. In2019 IEEE winter conference on applications of computer vision (WACV), pages 638–647. IEEE, 2019

work page 2019
[23]

Hyperbolic image embeddings

Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempit- sky. Hyperbolic image embeddings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6418–6428, 2020. A Appendix A.1 Extension to directed acyclic graphs The description of HACE in Section 3 assumes that the class hierarchy ...

work page arXiv 2020

[1] [1]

Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998

[2] [2]

Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

work page 2012

[3] [3]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[4] [4]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis Bach and David Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 448–456, Lille, France, 07–09 Jul 2015. PMLR

work page 2015

[5] [5]

Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958, 2014

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958, 2014

work page 1929

[6] [6]

A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019

Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning.Journal of big data, 6(1):1–48, 2019

work page 2019

[7] [7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021

[8] [8]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021

[9] [9]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

work page 2022

[10] [10]

Cultrera di Montesano, D

S. Cultrera di Montesano, D. D’Ascenzo, S. Raghavan, A.P. Amini, P.S. Winter, and L. Crawford. Improving atlas-scale single-cell annotation models with hierarchical cross-entropy loss.Nature Computational Science, 6:243–249, 2026

work page 2026

[11] [11]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009

work page 2009

[12] [12]

Fine- grained visual classification of aircraft, 2013

Subhransu Maji, Juho Kannala, Esa Rahtu, Matthew Blaschko, and Andrea Vedaldi. Fine- grained visual classification of aircraft, 2013

work page 2013

[13] [13]

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 595–604, 2015. 10

work page 2015

[14] [14]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...

work page 2024

[15] [15]

Making better mistakes: Leveraging class hierarchies with deep networks

Luca Bertinetto, Romain Mueller, Konstantinos Tertikas, Sina Samangooei, and Nicholas A Lord. Making better mistakes: Leveraging class hierarchies with deep networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12506–12515, 2020

work page 2020

[16] [16]

A survey of hierarchical classification across different application domains.Data mining and knowledge discovery, 22(1):31–72, 2011

Carlos N Silla Jr and Alex A Freitas. A survey of hierarchical classification across different application domains.Data mining and knowledge discovery, 22(1):31–72, 2011

work page 2011

[17] [17]

Berg, Kai Li, and Li Fei-Fei

Jia Deng, Alexander C. Berg, Kai Li, and Li Fei-Fei. What does classifying more than 10,000 image categories tell us? In Kostas Daniilidis, Petros Maragos, and Nikos Paragios, editors, Computer Vision – ECCV 2010, pages 71–84. Springer Berlin Heidelberg, 2010. ISBN 978-3- 642-15555-0

work page 2010

[18] [18]

Re- thinking the inception architecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Re- thinking the inception architecture for computer vision. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016

work page 2016

[19] [19]

When does label smoothing help? Advances in neural information processing systems, 32, 2019

Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? Advances in neural information processing systems, 32, 2019

work page 2019

[20] [20]

Simloss: Class similarities in cross entropy

Konstantin Kobs, Michael Steininger, Albin Zehe, Florian Lautenschlager, and Andreas Hotho. Simloss: Class similarities in cross entropy. InF oundations of Intelligent Systems: 25th International Symposium, ISMIS 2020, Graz, Austria, September 23–25, 2020, Proceedings, page 431–439. Springer-Verlag, 2020

work page 2020

[21] [21]

Human uncertainty makes classification more robust

Joshua C Peterson, Ruairidh M Battleday, Thomas L Griffiths, and Olga Russakovsky. Human uncertainty makes classification more robust. InProceedings of the IEEE/CVF international conference on computer vision, pages 9617–9626, 2019

work page 2019

[22] [22]

Hierarchy-based image embeddings for semantic image retrieval

Björn Barz and Joachim Denzler. Hierarchy-based image embeddings for semantic image retrieval. In2019 IEEE winter conference on applications of computer vision (WACV), pages 638–647. IEEE, 2019

work page 2019

[23] [23]

Hyperbolic image embeddings

Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempit- sky. Hyperbolic image embeddings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6418–6428, 2020. A Appendix A.1 Extension to directed acyclic graphs The description of HACE in Section 3 assumes that the class hierarchy ...

work page arXiv 2020