Data Selection for training Semantic Segmentation CNNs with cross-dataset weak supervision

Gijs Dubbelman; Panagiotis Meletis; Rob Romijnders

arxiv: 1907.07023 · v1 · pith:Y5DX47CCnew · submitted 2019-07-16 · 💻 cs.CV · cs.LG

Data Selection for training Semantic Segmentation CNNs with cross-dataset weak supervision

Panagiotis Meletis , Rob Romijnders , Gijs Dubbelman This is my paper

Pith reviewed 2026-05-24 20:53 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords semantic segmentationweak supervisiondata selectionGaussian mixture modelobject diversityCityscapesOpen Imagesautomated driving

0 comments

The pith

Selecting subsets of weakly labeled images lets semantic segmentation networks match full-set accuracy with up to 100 times less data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two selection methods to identify the most useful images that carry only bounding-box labels when training per-pixel semantic segmentation networks. The first models visual appearance of images through a Gaussian mixture to locate similar examples without using any labels. The second counts distinct objects inside the boxes to favor scenes with high variety. Tests on Cityscapes driving scenes and Open Images show that networks trained on the chosen small subsets reach the same accuracy as those trained on the entire weak collection. This approach matters because pixel-level labels are costly to obtain, so trimming the weak data volume lowers the overall supervision burden while preserving performance.

Core claim

Modeling image representations with a Gaussian Mixture Model finds visually similar images, while counting object instances from bounding boxes finds diverse images; both criteria select small subsets of weakly labeled data that train semantic segmentation CNNs to the same accuracy level as the full sets, enabling reductions of up to 100 times on Open Images and 20 times on Cityscapes.

What carries the argument

Gaussian Mixture Model fitted to image feature representations for similarity-based selection, together with object-count diversity measured from bounding boxes; these act as filters that reduce the weak training set before the segmentation network is trained.

If this is right

The GMM method requires no labels at all, only raw image features.
The diversity method needs only the bounding-box annotations already present.
Accuracy stays level even after cutting the weak data volume by the reported factors on both datasets.
GMM fitting also yields direct descriptions of the underlying image distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two selection rules could be applied together to form even smaller yet still sufficient subsets.
The same filtering logic might transfer to other tasks that rely on bounding-box weak labels, such as object detection.
Lower data volume would also cut the compute time and memory needed for each training run.

Load-bearing premise

The chosen small subsets still hold enough variety for the network to learn the same pixel-level class distinctions that the full weak collection would provide.

What would settle it

Train identical segmentation networks on the selected reduced sets versus the full weak sets and check whether mean intersection-over-union on a fixed test set drops below the full-set result.

Figures

Figures reproduced from arXiv: 1907.07023 by Gijs Dubbelman, Panagiotis Meletis, Rob Romijnders.

**Figure 2.** Figure 2: Example of selected images from N = 1.74 million Open Images images using our data selection methods in descending order. First row: visual similarity using GMM, the simcitys measure is shown. Second row: object diversity using class scores, the heuristics scores and the number of objects of interest are shown. xi . In other words we slice the output of f, to the set Φ containing H · W elements with C feat… view at source ↗

**Figure 3.** Figure 3: Performance (mIoU) on Cityscapes validation set. The networks are trained on Cityscapes Dense and optionally on additional selected data from Cityscapes Coarse and Open Images. The dots mark the conducted experiments. The black horizontal line denotes the mIoU of training without weak supervision. For the GMM model, we fit the parameters of the mixtures using Expectation Maximization. We continue updates u… view at source ↗

**Figure 5.** Figure 5: Empirical histogram of the log probabilities for the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 4.** Figure 4: tSNE plot for the image representations for a sample [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Training convolutional networks for semantic segmentation with strong (per-pixel) and weak (per-bounding-box) supervision requires a large amount of weakly labeled data. We propose two methods for selecting the most relevant data with weak supervision. The first method is designed for finding visually similar images without the need of labels and is based on modeling image representations with a Gaussian Mixture Model (GMM). As a byproduct of GMM modeling, we present useful insights on characterizing the data generating distribution. The second method aims at finding images with high object diversity and requires only the bounding box labels. Both methods are developed in the context of automated driving and experimentation is conducted on Cityscapes and Open Images datasets. We demonstrate performance gains by reducing the amount of employed weakly labeled images up to 100 times for Open Images and up to 20 times for Cityscapes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies GMM on image features and bounding-box diversity counting to cut weak labels for segmentation training by 20-100x, but the abstract gives no metrics or controls to verify the gains hold.

read the letter

The paper's main move is to take two off-the-shelf selection tricks and use them to shrink the pool of weakly labeled images needed for semantic segmentation in driving scenes. One fits a GMM to global image representations to grab visually similar examples without labels. The other simply tallies object variety from the bounding boxes. They test this on Cityscapes and Open Images and state that the reduced sets let them drop most of the data while still seeing performance gains. That is the concrete claim. The GMM part also yields some side observations about the data distribution, which is a minor plus. The work is squarely aimed at the automated-driving corner of computer vision where labeling cost matters. The methods are not invented here, but the cross-dataset weak-supervision framing and the specific reduction numbers are the application that is new. The soft spot is exactly the one the stress-test flags: nothing in the GMM step guarantees that the selected images keep the original frequency of semantic classes or their spatial arrangements. Global embeddings often latch onto scene style or lighting instead, so rare classes could vanish from the subset and any accuracy drop would be misattributed. The abstract claims the reductions preserve performance but supplies no numbers, no random-selection baseline, no class-balance checks, and no error bars, so it is impossible to tell whether the premise actually holds. If the full experiments include those controls and show the subsets really work, the paper is useful for practitioners who want cheaper training sets. If they do not, the reductions are just post-hoc selection that happened to look good on these runs. This is worth sending to referees because the problem is practical, the proposed fixes are simple to reproduce, and the claims are easy to test or falsify.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes two methods for selecting subsets of weakly labeled (bounding-box) images to train semantic segmentation CNNs: (1) GMM modeling of global image representations to identify visually similar images without using labels, and (2) a bounding-box-based selection for images with high object diversity. Experiments are performed in the automated-driving setting on Cityscapes and Open Images; the central claim is that these selections yield performance gains while reducing the weakly labeled training data by up to 20× (Cityscapes) and 100× (Open Images).

Significance. If the experimental results demonstrate that the reduced subsets maintain segmentation accuracy comparable to the full weak-supervision set, the work would be significant for reducing annotation and compute costs in large-scale semantic segmentation. The GMM byproduct insights on characterizing the data-generating distribution could also be useful for dataset analysis.

major comments (1)

[Abstract and Methods] Abstract and Methods: the claim that GMM-selected subsets (and diversity-selected ones) allow a segmentation CNN to reach performance comparable to the full weak set is load-bearing, yet the method operates solely on global image embeddings. Nothing in the selection guarantees preservation of semantic class frequencies or spatial contexts; in automated-driving data, global features often correlate with scene style or illumination rather than object-class presence. If rare classes (e.g., traffic signs, cyclists) are under-represented, reported gains cannot be attributed to the selection preserving information content.

minor comments (1)

[Abstract] The abstract states performance gains but supplies no quantitative numbers, baselines, error bars, or dataset splits, making it impossible to verify whether the claimed reductions actually preserve accuracy.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below and will revise the manuscript to incorporate additional analysis as outlined.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods: the claim that GMM-selected subsets (and diversity-selected ones) allow a segmentation CNN to reach performance comparable to the full weak set is load-bearing, yet the method operates solely on global image embeddings. Nothing in the selection guarantees preservation of semantic class frequencies or spatial contexts; in automated-driving data, global features often correlate with scene style or illumination rather than object-class presence. If rare classes (e.g., traffic signs, cyclists) are under-represented, reported gains cannot be attributed to the selection preserving information content.

Authors: We agree that the GMM-based selection using global image embeddings provides no explicit guarantee of preserving semantic class frequencies or spatial contexts, and that global features in driving scenes may correlate more with style or illumination than with object presence. This is a substantive methodological limitation. Our defense rests on the empirical results: the selected subsets achieve segmentation performance comparable to the full weak-supervision set despite the large reductions (20× on Cityscapes, 100× on Open Images). These outcomes indicate that the visual similarity modeled by the GMM selects sufficiently informative images in practice for this task and these datasets. To directly address the concern, we will add an analysis of per-class frequencies (including rare classes such as traffic signs and cyclists) in the GMM-selected and diversity-selected subsets versus the full sets, to be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical methods with no derivations

full rationale

The paper describes two empirical data selection procedures (GMM modeling of image representations and bounding-box diversity counting) and reports experimental performance gains on Cityscapes and Open Images. No equations, derivations, predictions, or first-principles results are present in the provided text. Claims rest on standard statistical tools applied to external data rather than any self-definitional reduction, fitted-input renaming, or load-bearing self-citation chain. The work is therefore self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The GMM modeling implicitly assumes that visual similarity in feature space correlates with utility for segmentation training.

axioms (2)

domain assumption Image representations modeled by GMM capture visual similarity relevant to semantic segmentation performance
Invoked by the first selection method; no justification supplied in abstract.
domain assumption Higher object diversity (measured by bounding boxes) improves training data quality for segmentation
Invoked by the second selection method.

pith-pipeline@v0.9.0 · 5678 in / 1352 out tokens · 18038 ms · 2026-05-24T20:53:39.957250+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 7 internal anchors

[1]

Semantic segmentation via multi-task, multi-domain learn- ing,

D. Fourure, R. Emonet, E. Fromont, D. Muselet, A. Tr ´emeau, and C. Wolf, “Semantic segmentation via multi-task, multi-domain learn- ing,” in Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) . Springer, 2016, pp. 333–343

work page 2016
[2]

Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmen- tation,

P. Meletis and G. Dubbelman, “Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmen- tation,” in 2018 IEEE Intelligent V ehicles Symposium (IV) . IEEE, 2018, pp. 1045–1050

work page 2018
[3]

Robust vision challenge,

A. Geiger and et. al., “Robust vision challenge,” http://robustvision. net/index.php, 2018, [Online; accessed 12-April-2019]

work page 2018
[4]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 3431–3440

work page 2015
[5]

Learning semantic segmentation with diverse supervision,

L. Ye, Z. Liu, and Y . Wang, “Learning semantic segmentation with diverse supervision,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . IEEE, 2018, pp. 1461–1469

work page 2018
[6]

Learning to segment under various forms of weak supervision,

J. Xu, A. G. Schwing, and R. Urtasun, “Learning to segment under various forms of weak supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 3781–3790

work page 2015
[7]

Learning speciﬁc- class segmentation from diverse data,

M. P. Kumar, H. Turki, D. Preston, and D. Koller, “Learning speciﬁc- class segmentation from diverse data,” in 2011 International Confer- ence on Computer Vision . IEEE, 2011, pp. 1800–1807

work page 2011
[8]

Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

V . Birodkar, H. Mobahi, and S. Bengio, “Semantic redundancies in image-classiﬁcation datasets: The 10% you don’t need,”arXiv preprint arXiv:1901.11409, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[9]

Are All Training Examples Created Equal? An Empirical Study

K. V odrahalli, K. Li, and J. Malik, “Are all training examples created equal? an empirical study,” arXiv preprint arXiv:1811.12569 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Pixel level data augmentation for semantic image segmentation using generative adversarial networks,

S. Liu, J. Zhang, Y . Chen, Y . Liu, Z. Qin, and T. Wan, “Pixel level data augmentation for semantic image segmentation using generative adversarial networks,” arXiv preprint arXiv:1811.00174 , 2018

work page arXiv 2018
[11]

Implementation code for selection methods, inference and all mod- els will be made publicly available at https://github.com/pmeletis/ data-selection

“Implementation code for selection methods, inference and all mod- els will be made publicly available at https://github.com/pmeletis/ data-selection.”

work page
[12]

The cityscapes dataset for semantic urban scene understanding,

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016

work page 2016
[13]

The open images dataset v4: Uniﬁed image classiﬁcation, object de- tection, and visual relationship detection at scale,

A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont- Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, et al. , “The open images dataset v4: Uniﬁed image classiﬁcation, object de- tection, and visual relationship detection at scale,” arXiv preprint arXiv:1811.00982, 2018

work page arXiv 2018
[14]

On Boosting Semantic Street Scene Segmentation with Weak Supervision

P. Meletis and G. Dubbelman, “On boosting semantic street scene seg- mentation with weak supervision,” arXiv preprint arXiv:1903.03462 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[15]

Image retrieval using gaussian mixture models,

Z. Robotka and A. Zempl ´eni, “Image retrieval using gaussian mixture models,” Annals Univ. Sci. Budapest, Sect. Comp , vol. 31, pp. 93–105, 2009

work page 2009
[16]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[17]

Adversarially Learned Inference

V . Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” arXiv preprint arXiv:1606.00704 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , 2014, pp. 2672– 2680

work page 2014
[19]

The information bottleneck method

N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057 , 2000

work page internal anchor Pith review Pith/arXiv arXiv 2000
[20]

Taskonomy: Disentangling task transfer learning,

A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, “Taskonomy: Disentangling task transfer learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 3712–3722

work page 2018
[21]

Representation learning: A review and new perspectives,

Y . Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence , vol. 35, no. 8, pp. 1798–1828, 2013

work page 2013
[22]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016
[23]

Finite mixture models,

G. J. McLachlan, S. X. Lee, and S. I. Rathnayake, “Finite mixture models,” Annual review of statistics and its application , vol. 6, pp. 355–378, 2019

work page 2019
[24]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Van- derplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011

work page 2011
[25]

Visualizing data using t-sne,

L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research , vol. 9, no. Nov, pp. 2579–2605, 2008

work page 2008
[26]

Variational Inference with Normalizing Flows

D. J. Rezende and S. Mohamed, “Variational inference with normal- izing ﬂows,” arXiv preprint arXiv:1505.05770 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[27]

Remarks on some nonparametric estimates of a density function,

M. Rosenblatt, “Remarks on some nonparametric estimates of a density function,” The Annals of Mathematical Statistics , pp. 832–837, 1956

work page 1956

[1] [1]

Semantic segmentation via multi-task, multi-domain learn- ing,

D. Fourure, R. Emonet, E. Fromont, D. Muselet, A. Tr ´emeau, and C. Wolf, “Semantic segmentation via multi-task, multi-domain learn- ing,” in Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) . Springer, 2016, pp. 333–343

work page 2016

[2] [2]

Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmen- tation,

P. Meletis and G. Dubbelman, “Training of convolutional networks on multiple heterogeneous datasets for street scene semantic segmen- tation,” in 2018 IEEE Intelligent V ehicles Symposium (IV) . IEEE, 2018, pp. 1045–1050

work page 2018

[3] [3]

Robust vision challenge,

A. Geiger and et. al., “Robust vision challenge,” http://robustvision. net/index.php, 2018, [Online; accessed 12-April-2019]

work page 2018

[4] [4]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 3431–3440

work page 2015

[5] [5]

Learning semantic segmentation with diverse supervision,

L. Ye, Z. Liu, and Y . Wang, “Learning semantic segmentation with diverse supervision,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . IEEE, 2018, pp. 1461–1469

work page 2018

[6] [6]

Learning to segment under various forms of weak supervision,

J. Xu, A. G. Schwing, and R. Urtasun, “Learning to segment under various forms of weak supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 3781–3790

work page 2015

[7] [7]

Learning speciﬁc- class segmentation from diverse data,

M. P. Kumar, H. Turki, D. Preston, and D. Koller, “Learning speciﬁc- class segmentation from diverse data,” in 2011 International Confer- ence on Computer Vision . IEEE, 2011, pp. 1800–1807

work page 2011

[8] [8]

Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

V . Birodkar, H. Mobahi, and S. Bengio, “Semantic redundancies in image-classiﬁcation datasets: The 10% you don’t need,”arXiv preprint arXiv:1901.11409, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[9] [9]

Are All Training Examples Created Equal? An Empirical Study

K. V odrahalli, K. Li, and J. Malik, “Are all training examples created equal? an empirical study,” arXiv preprint arXiv:1811.12569 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Pixel level data augmentation for semantic image segmentation using generative adversarial networks,

S. Liu, J. Zhang, Y . Chen, Y . Liu, Z. Qin, and T. Wan, “Pixel level data augmentation for semantic image segmentation using generative adversarial networks,” arXiv preprint arXiv:1811.00174 , 2018

work page arXiv 2018

[11] [11]

Implementation code for selection methods, inference and all mod- els will be made publicly available at https://github.com/pmeletis/ data-selection

“Implementation code for selection methods, inference and all mod- els will be made publicly available at https://github.com/pmeletis/ data-selection.”

work page

[12] [12]

The cityscapes dataset for semantic urban scene understanding,

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016

work page 2016

[13] [13]

The open images dataset v4: Uniﬁed image classiﬁcation, object de- tection, and visual relationship detection at scale,

A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont- Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, et al. , “The open images dataset v4: Uniﬁed image classiﬁcation, object de- tection, and visual relationship detection at scale,” arXiv preprint arXiv:1811.00982, 2018

work page arXiv 2018

[14] [14]

On Boosting Semantic Street Scene Segmentation with Weak Supervision

P. Meletis and G. Dubbelman, “On boosting semantic street scene seg- mentation with weak supervision,” arXiv preprint arXiv:1903.03462 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[15] [15]

Image retrieval using gaussian mixture models,

Z. Robotka and A. Zempl ´eni, “Image retrieval using gaussian mixture models,” Annals Univ. Sci. Budapest, Sect. Comp , vol. 31, pp. 93–105, 2009

work page 2009

[16] [16]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[17] [17]

Adversarially Learned Inference

V . Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” arXiv preprint arXiv:1606.00704 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Generative adversarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , 2014, pp. 2672– 2680

work page 2014

[19] [19]

The information bottleneck method

N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057 , 2000

work page internal anchor Pith review Pith/arXiv arXiv 2000

[20] [20]

Taskonomy: Disentangling task transfer learning,

A. R. Zamir, A. Sax, W. Shen, L. J. Guibas, J. Malik, and S. Savarese, “Taskonomy: Disentangling task transfer learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 3712–3722

work page 2018

[21] [21]

Representation learning: A review and new perspectives,

Y . Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence , vol. 35, no. 8, pp. 1798–1828, 2013

work page 2013

[22] [22]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016

[23] [23]

Finite mixture models,

G. J. McLachlan, S. X. Lee, and S. I. Rathnayake, “Finite mixture models,” Annual review of statistics and its application , vol. 6, pp. 355–378, 2019

work page 2019

[24] [24]

Scikit-learn: Machine learning in Python,

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Van- derplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011

work page 2011

[25] [25]

Visualizing data using t-sne,

L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research , vol. 9, no. Nov, pp. 2579–2605, 2008

work page 2008

[26] [26]

Variational Inference with Normalizing Flows

D. J. Rezende and S. Mohamed, “Variational inference with normal- izing ﬂows,” arXiv preprint arXiv:1505.05770 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[27] [27]

Remarks on some nonparametric estimates of a density function,

M. Rosenblatt, “Remarks on some nonparametric estimates of a density function,” The Annals of Mathematical Statistics , pp. 832–837, 1956

work page 1956