Gradient-Discrepancy Acquisition for Pool-Based Active Learning

Mohamadsadegh Khosravani; Sandra Zilles

arxiv: 2605.02609 · v2 · pith:GUGLZKSUnew · submitted 2026-05-04 · 💻 cs.LG

Gradient-Discrepancy Acquisition for Pool-Based Active Learning

Mohamadsadegh Khosravani , Sandra Zilles This is my paper

Pith reviewed 2026-05-19 17:15 UTC · model grok-4.3

classification 💻 cs.LG

keywords active learningpool-based active learningacquisition functiongradient discrepancygeneralization bounduncertainty sampling

0 comments

The pith

A gradient-discrepancy measure derived from a generalization bound serves as an effective acquisition criterion for pool-based active learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a gradient-based acquisition criterion drawn directly from an existing generalization bound. The measure can stand in for uncertainty scores during sampling or combine with diversity considerations that account for the spread of selected points. It targets data points whose addition would most alter the model's gradients in a way that tightens the bound on generalization error. Readers would care if this leads to fewer labels needed to achieve strong model performance compared with conventional uncertainty or diversity strategies.

Core claim

The authors establish that a novel gradient-discrepancy acquisition criterion, derived from the generalization bound of Luo et al. (2022), can be applied in lieu of uncertainty measures in uncertainty sampling or incorporated into diversity-based methods, supported by theoretical justification and empirical evaluation on its effectiveness.

What carries the argument

The gradient-discrepancy acquisition criterion, which quantifies the discrepancy induced in model gradients by candidate points to guide selection toward those most reducing the generalization bound.

Load-bearing premise

The generalization bound from Luo et al. (2022) can be directly leveraged to create an acquisition criterion that effectively identifies informative points beyond standard uncertainty or diversity measures.

What would settle it

Experiments on standard benchmarks where the gradient-discrepancy criterion selects points that yield no better or worse model performance than random sampling or conventional uncertainty sampling after a fixed number of queries would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.02609 by Mohamadsadegh Khosravani, Sandra Zilles.

**Figure 1.** Figure 1: MNIST sanity check. Geometry of selected points in input space (top row) and final-layer gradient space (bottom row), using a shared initial labeled set and one acquisition step per method. hidden layer, linear output) on I0 with cross-entropy. For gradient-based methods (DF and BADGE), we compute gradients with respect to the final linear layer. To isolate the effect of the acquisition rule, we fix the sa… view at source ↗

**Figure 2.** Figure 2: Histograms of DF scores on CIFAR-10 (in-distribution) and SVHN (out-ofdistribution), each with n=10,000 examples. 4.2. Active Learning We consider the standard pool-based active learning (AL) protocol. At acquisition round t, given a labeled set D (t) L and an unlabeled pool D (t) U , the learner selects a batch of b unlabeled points, queries their labels, augments the labeled set, and retrains the model … view at source ↗

**Figure 2.** Figure 2: Histograms of DF scores on CIFAR-10 (in-distribution) and SVHN (out-of-distribution), each with n=10,000 examples. Compared methods. We compare our proposed gradient-based acquisition strategy (grad ) against four standard baselines: (i) uncertainty sampling via predictive entropy under the current model; (ii) BADGE, which forms a last-layer gradient embedding for each unlabeled point (using the model’s cu… view at source ↗

**Figure 3.** Figure 3: Active learning test accuracy (mean over 5 seeds) for text/tabular benchmarks with step size 100. 20 Newsgroups, ISOLET and OpenML 155 Text and tabular benchmarks view at source ↗

**Figure 4.** Figure 4: Active learning test accuracy on image benchmarks (mean over five seeds; shaded regions indicate variability). Image benchmarks view at source ↗

**Figure 5.** Figure 5: Overall comparison using the pairwise penalty matrix (top row) and the corresponding loss-score ranking (bottom row). (a,d) aggregate over all rounds; (b,e) earlystage rounds; (c,f) late-stage rounds. Larger PPM entries indicate more frequent statistically significant wins, while lower loss scores indicate stronger overall performance. Method Per-round time (ignoring training) Wall-clock (s/round) Entropy O view at source ↗

**Figure 6.** Figure 6: DF values throughout epochs for nine dataset Across Gisette and 20 Newsgroups the discrepancy decreases over training and typically stabilizes (approaching a near-constant plateau) in later epochs. This behavior is consistent with the conclusion of Proposition A.1: once the iterates enter a stable neighborhood U where S1–S2 are approximately satisfied, the discrepancy should contract with an effective rate… view at source ↗

**Figure 6.** Figure 6: DF values over training epochs for German, Gisette, and 20 Newsgroups. Across Gisette and 20 Newsgroups the discrepancy decreases over training and typically stabilizes (approaching a near-constant plateau) in later epochs. This behavior is consistent with the conclusion of Proposition A.1: once the iterates enter a stable neighborhood U where S1–S2 are approximately satisfied, the discrepancy should cont… view at source ↗

**Figure 7.** Figure 7: Active learning test accuracy on image benchmarks (mean over five seeds; shaded regions indicate variability). (a) (b) (c) view at source ↗

**Figure 7.** Figure 7: Active learning test accuracy on image benchmarks, mean over five seeds; shaded regions indicate [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Active learning test accuracy on image benchmarks (mean over five seeds; shaded regions indicate variability). Appendix C. Pairwise comparisons In 4.5, we did the comparison over all experimen and rouns anlongwith first three rounds of each experiment and last three rounds. Here, we have ppm and loss plot of experiments separetad by datasets or models. Appendix D. BADGE-like Acquisition Result In the begin… view at source ↗

**Figure 9.** Figure 9: Overall comparison using the pairwise penalty matrix (top row) and the corresponding loss-score ranking (bottom row). (a,d) LeNet experiments ; (b,e) ResNet experiments ; (c,f) VGG-16 experiments. Larger PPM entries indicate more frequent statistically significant wins, while lower loss scores indicate stronger overall performance. (a) (b) (c) (d) (e) (f) view at source ↗

**Figure 8.** Figure 8: Overall comparison using the pairwise penalty matrix (top row) and the corresponding loss-score [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 10.** Figure 10: Overall comparison using the pairwise penalty matrix (top row) and the corresponding loss-score ranking (bottom row). (a,d) CIFAR-10 experiments ; (b,e) SVHN experiments ; (c,f) CINIC-10 experiments. Larger PPM entries indicate more frequent statistically significant wins, while lower loss scores indicate stronger overall performance view at source ↗

**Figure 9.** Figure 9: Overall comparison using the pairwise penalty matrix (top row) and the corresponding loss-score [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 11.** Figure 11: Diversity comparison: Accuracy on eight datasets with an initial training set size of 50 and acquisition batch size of 200. References [1] B. Settles. “Active learning literature survey”. In: (2009). [2] Y. Gal, R. Islam, and Z. Ghahramani. “Deep bayesian active learning with image data”. In: International conference on machine learning. PMLR. 2017, pp. 1183–1192. [3] O. Sener and S. Savarese. “Active lea… view at source ↗

**Figure 12.** Figure 12: Diversity comparison: Accuracy on SVHN and CIFAR-10, trained with three different models over 5 runs [7] J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agarwal. “Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds”. In: Proceedings of the International Conference on Learning Representations (ICLR). 2020. url: https : / / openreview . net / forum?id=ryghZJBKPS. [8] K. Killams… view at source ↗

read the original abstract

The effectiveness of active learning hinges on the choice of the acquisition criterion by which a learning algorithm selects potentially informative data points whose label is subsequently queried. This paper proposes a novel gradient-based acquisition criterion, derived from a generalization bound introduced by Luo et al. (2022). This criterion can be applied in lieu of uncertainty measures in uncertainty sampling, or incorporated into diversity-based methods that consider the spread of sampled points in addition to the uncertainty of their labels. We provide a theoretical justification of the proposed acquisition criterion, and demonstrate its effectiveness in an empirical evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper turns a prior generalization bound into a gradient-discrepancy acquisition function for active learning, but the iterative validity of that bound is the open question.

read the letter

The main point is that this work derives a new acquisition criterion called gradient-discrepancy from the generalization bound in Luo et al. (2022) and shows how to use it in place of uncertainty sampling or alongside diversity methods. The derivation gives the criterion a direct theoretical link rather than treating it as another ad-hoc score. That connection is the clearest novelty here, and the paper includes an empirical evaluation to test whether it selects more informative points than standard baselines. The empirical part at least demonstrates practical effectiveness on the datasets they chose, which is better than papers that stop at the derivation. The soft spot is the iterative setting. The original bound is stated for a fixed dataset after one training run, yet active learning grows the labeled set round by round. The paper does not appear to verify that the bound stays non-vacuous or that minimizing the discrepancy at each step actually reduces true risk faster than simpler alternatives once the model updates. If that link is only assumed rather than checked, the central claim rests on weaker ground than the abstract suggests. This is the sort of targeted method paper that active learning researchers might want to read for the new criterion and the way it reuses the bound. A reader already working on acquisition functions could extract the idea and test it themselves without much trouble. It has enough of a distinct angle and some empirical backing to merit a serious referee rather than a desk reject, though any review would likely press on the bound's behavior across rounds.

Referee Report

2 major / 2 minor

Summary. The paper proposes a novel gradient-discrepancy acquisition criterion for pool-based active learning, derived from a generalization bound introduced by Luo et al. (2022). This criterion is intended to replace uncertainty measures in uncertainty sampling or to be combined with diversity-based methods. The authors provide a theoretical justification for the criterion and demonstrate its effectiveness through empirical evaluation on standard benchmarks.

Significance. If the derivation is valid and the empirical gains are robust, the work would offer a theoretically grounded alternative to standard acquisition functions by directly leveraging an external generalization bound, which could improve sample efficiency in active learning settings where uncertainty or diversity heuristics fall short.

major comments (2)

[Method section (derivation of gradient-discrepancy criterion)] The derivation of the gradient-discrepancy acquisition function from the Luo et al. (2022) bound (detailed in the method section) does not establish that the bound remains informative or non-vacuous once the labeled set is iteratively expanded by the proposed criterion. The original bound applies to a single training run on a fixed dataset; no argument or analysis is supplied showing that minimization of the derived acquisition function preserves the bound's utility for reducing true risk across multiple AL rounds.
[Experiments section] The empirical evaluation does not include an analysis of how the tightness or validity of the underlying generalization bound evolves over successive acquisition rounds under the paper's training regime, which is required to support the central claim that the criterion identifies points that reduce risk faster than baselines.

minor comments (2)

[Method section] Notation for the gradient discrepancy term and its relation to the bound could be introduced with an explicit equation early in the method section to improve readability.
[Abstract] The abstract states that the criterion 'can be applied in lieu of uncertainty measures' but does not specify the exact substitution rule or hyper-parameters involved in the replacement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their constructive comments, which have helped us identify areas where the manuscript can be improved. We respond to each major comment in turn and outline the revisions we plan to make.

read point-by-point responses

Referee: [Method section (derivation of gradient-discrepancy criterion)] The derivation of the gradient-discrepancy acquisition function from the Luo et al. (2022) bound (detailed in the method section) does not establish that the bound remains informative or non-vacuous once the labeled set is iteratively expanded by the proposed criterion. The original bound applies to a single training run on a fixed dataset; no argument or analysis is supplied showing that minimization of the derived acquisition function preserves the bound's utility for reducing true risk across multiple AL rounds.

Authors: We appreciate this observation. Our derivation extracts the gradient-discrepancy term as a key component of the generalization bound from Luo et al. (2022), which we then use as an acquisition function to select points likely to minimize this term. In the pool-based active learning setting, the model is retrained from scratch or fine-tuned on the augmented labeled set after each round. By choosing points that reduce the discrepancy at the current model state, we aim to iteratively tighten the bound. That said, we did not provide a formal inductive argument showing the bound stays non-vacuous over rounds. In the revised manuscript, we will expand the method section with a discussion of this point, explaining that the per-round minimization targets the same term appearing in the bound and is therefore expected to maintain its relevance for risk reduction. revision: yes
Referee: [Experiments section] The empirical evaluation does not include an analysis of how the tightness or validity of the underlying generalization bound evolves over successive acquisition rounds under the paper's training regime, which is required to support the central claim that the criterion identifies points that reduce risk faster than baselines.

Authors: We agree that such an analysis would provide valuable additional evidence. Our current experiments demonstrate superior performance in terms of test accuracy and label efficiency on standard benchmarks. To directly address the referee's concern, we will add to the experiments section an evaluation of the generalization bound's value (or a proxy such as the gradient discrepancy) computed at each acquisition round for the proposed method and the baselines. This will illustrate how the bound evolves under our training regime and support the claim that our criterion leads to faster risk reduction. revision: yes

Circularity Check

0 steps flagged

Derivation from external Luo et al. (2022) bound provides independent grounding

full rationale

The paper's central acquisition criterion is explicitly derived from the generalization bound of Luo et al. (2022), an independent prior result with no author overlap. No equations reduce the proposed gradient-discrepancy measure to a fitted parameter, self-defined quantity, or self-citation chain. The theoretical justification and empirical claims rest on this external bound rather than tautological re-expression of the paper's own inputs or assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests primarily on the external generalization bound from Luo et al. (2022) as the source for the new criterion; no free parameters or invented entities are indicated in the abstract.

axioms (1)

domain assumption Generalization bound introduced by Luo et al. (2022) is valid and applicable for deriving acquisition criteria.
The proposed criterion is explicitly derived from this bound per the abstract.

pith-pipeline@v0.9.0 · 5613 in / 1051 out tokens · 37676 ms · 2026-05-19T17:15:07.399804+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define a gradient-discrepancy acquisition score inspired by the theoretical study of Luo et al. (2022)... st(x) = ||DF_θt (D_L ∪ Ŝ(x), Ŝ(x))||₂ (eq. 11, Algorithm 1)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A key term in equation 6 is the cumulative gradient-discrepancy sum... Assumption 1 (Monotone contraction of DF)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 2 internal anchors

[1]

Accessed 2025-12-15

URL https://archive.ics.uci.edu/ml/datasets/poker+hand. Accessed 2025-12-15. Adam Coates, Honglak Lee, and Andrew Y. Ng. Stl-10 dataset. Stanford University,

work page 2025
[2]

Accessed 2025-12-15

URLhttp: //cs.stanford.edu/~acoates/stl10. Accessed 2025-12-15. Ron Cole and Mark Fanty. Isolet [dataset]. UCI Machine Learning Repository,

work page 2025
[3]

Accessed 2025-12-15

URLhttps:// archive.ics.uci.edu/ml/datasets/isolet. Accessed 2025-12-15. Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos Storkey. Cinic-10 is not imagenet or cifar-10 [dataset],

work page 2025
[4]

Accessed 2025-12-15

URLhttps://datashare.ed.ac.uk/handle/10283/3192. Accessed 2025-12-15. Janez Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7:1–30,

work page 2025
[5]

Yarin Gal, Riashat Islam, and Zoubin Ghahramani

doi: 10.1007/s101070100263. Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In International conference on machine learning, pp. 1183–1192. PMLR,

work page doi:10.1007/s101070100263
[6]

Deep residual learning for image recognition,

doi: 10.1109/CVPR.2016.90. URLhttps://www.cv-foundation.org/openaccess/content_ cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf. KrishnaTeja Killamsetty, Durga Sivasubramanian, Baharan Mirzasoleiman, Ganesh Ramakrishnan, Abir De, and Rishabh K. Iyer. GRAD-MATCH: A gradient matching based data subset selection for efficient learning.CoRR, ab...

work page doi:10.1109/cvpr.2016.90 2016
[7]

13 Alex Krizhevsky

URLhttps://arxiv.org/abs/2103.00123. 13 Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto,

work page arXiv
[8]

Lipton, and Byron C

David Lowell, Zachary C. Lipton, and Byron C. Wallace. Practical obstacles to deploying active learning. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 21–30,

work page 2019
[10]

Jason Rennie

URLhttps://arxiv.org/abs/ 2107.07075. Jason Rennie. 20 newsgroups data set.https://qwone.com/~jason/20Newsgroups/,

work page arXiv
[11]

Accessed 2025- 12-15. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back- propagating errors.Nature, 323:533–536,

work page 2025
[12]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

doi: 10.1038/323533a0. URLhttps://www.nature. com/articles/323533a0. Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/323533a0
[13]

Karen Simonyan and Andrew Zisserman

URLhttps://proceedings.neurips.cc/paper_files/paper/2007/file/ a1519de5b5d44b31a01de013b9b51a80-Paper.pdf. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations (ICLR),

work page 2007
[14]

Very Deep Convolutional Networks for Large-Scale Image Recognition

URLhttps://arxiv.org/abs/ 1409.1556. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition.Neural Networks, 32:323–332,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Joaquin Vanschoren, Jan N

doi: 10.1016/ j.neunet.2012.02.016. Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning.SIGKDD Explorations, 15(2):49–60,

work page 2012
[16]

van Rijn, Bernd Bischl, and Luis Torgo

doi: 10.1145/2641190.2641198. 14 A Contraction of Gradient Discrepancy The following proposition gives sufficient local conditions under which Assumption 1 can hold. We then provide qualitative empirical evidence that a decreasing discrepancy trend can appear during training. Proposition A.1(Sufficient conditions for eventual contraction of gradient discr...

work page doi:10.1145/2641190.2641198

[1] [1]

Accessed 2025-12-15

URL https://archive.ics.uci.edu/ml/datasets/poker+hand. Accessed 2025-12-15. Adam Coates, Honglak Lee, and Andrew Y. Ng. Stl-10 dataset. Stanford University,

work page 2025

[2] [2]

Accessed 2025-12-15

URLhttp: //cs.stanford.edu/~acoates/stl10. Accessed 2025-12-15. Ron Cole and Mark Fanty. Isolet [dataset]. UCI Machine Learning Repository,

work page 2025

[3] [3]

Accessed 2025-12-15

URLhttps:// archive.ics.uci.edu/ml/datasets/isolet. Accessed 2025-12-15. Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos Storkey. Cinic-10 is not imagenet or cifar-10 [dataset],

work page 2025

[4] [4]

Accessed 2025-12-15

URLhttps://datashare.ed.ac.uk/handle/10283/3192. Accessed 2025-12-15. Janez Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine Learning Research, 7:1–30,

work page 2025

[5] [5]

Yarin Gal, Riashat Islam, and Zoubin Ghahramani

doi: 10.1007/s101070100263. Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In International conference on machine learning, pp. 1183–1192. PMLR,

work page doi:10.1007/s101070100263

[6] [6]

Deep residual learning for image recognition,

doi: 10.1109/CVPR.2016.90. URLhttps://www.cv-foundation.org/openaccess/content_ cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf. KrishnaTeja Killamsetty, Durga Sivasubramanian, Baharan Mirzasoleiman, Ganesh Ramakrishnan, Abir De, and Rishabh K. Iyer. GRAD-MATCH: A gradient matching based data subset selection for efficient learning.CoRR, ab...

work page doi:10.1109/cvpr.2016.90 2016

[7] [7]

13 Alex Krizhevsky

URLhttps://arxiv.org/abs/2103.00123. 13 Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto,

work page arXiv

[8] [8]

Lipton, and Byron C

David Lowell, Zachary C. Lipton, and Byron C. Wallace. Practical obstacles to deploying active learning. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 21–30,

work page 2019

[9] [10]

Jason Rennie

URLhttps://arxiv.org/abs/ 2107.07075. Jason Rennie. 20 newsgroups data set.https://qwone.com/~jason/20Newsgroups/,

work page arXiv

[10] [11]

Accessed 2025- 12-15. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back- propagating errors.Nature, 323:533–536,

work page 2025

[11] [12]

Active Learning for Convolutional Neural Networks: A Core-Set Approach

doi: 10.1038/323533a0. URLhttps://www.nature. com/articles/323533a0. Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1038/323533a0

[12] [13]

Karen Simonyan and Andrew Zisserman

URLhttps://proceedings.neurips.cc/paper_files/paper/2007/file/ a1519de5b5d44b31a01de013b9b51a80-Paper.pdf. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. InInternational Conference on Learning Representations (ICLR),

work page 2007

[13] [14]

Very Deep Convolutional Networks for Large-Scale Image Recognition

URLhttps://arxiv.org/abs/ 1409.1556. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition.Neural Networks, 32:323–332,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [15]

Joaquin Vanschoren, Jan N

doi: 10.1016/ j.neunet.2012.02.016. Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning.SIGKDD Explorations, 15(2):49–60,

work page 2012

[15] [16]

van Rijn, Bernd Bischl, and Luis Torgo

doi: 10.1145/2641190.2641198. 14 A Contraction of Gradient Discrepancy The following proposition gives sufficient local conditions under which Assumption 1 can hold. We then provide qualitative empirical evidence that a decreasing discrepancy trend can appear during training. Proposition A.1(Sufficient conditions for eventual contraction of gradient discr...

work page doi:10.1145/2641190.2641198