Submodular Batch Selection for Training Deep Neural Networks

K J Joseph; Krishnakant Singh; Vamshi Teja R; Vineeth N Balasubramanian

arxiv: 1906.08771 · v1 · pith:TS33M47Ynew · submitted 2019-06-20 · 💻 cs.LG · stat.ML

Submodular Batch Selection for Training Deep Neural Networks

K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian This is my paper

Pith reviewed 2026-05-25 19:28 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords submodular optimizationbatch selectionmini-batch gradient descentdeep neural networksgeneralizationinformativenessdiversitygreedy algorithm

0 comments

The pith

Submodular batch selection improves generalization of deep neural networks over standard stochastic gradient descent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a way to choose mini-batches for neural network training by maximizing a submodular function that scores both how informative each individual sample is and how diverse the whole batch is. It supplies an efficient greedy algorithm to find good batches even though the exact optimization problem is NP-hard. Experiments on standard datasets show that networks trained with these batches reach higher test accuracy than networks trained with random batches under stochastic gradient descent, and the gains appear across multiple learning rates, batch sizes, and distance measures between samples. A reader would care because almost all modern deep learning relies on random mini-batch selection, so replacing that step with a better rule could raise final model quality without any change to the network architecture or the optimizer itself.

Core claim

The authors introduce a novel submodular formulation for mini-batch selection that captures both the informativeness of individual samples and the diversity of the selected subset. They design an efficient greedy algorithm for this NP-hard problem and show through extensive experiments that deep models trained with this strategy generalize better than those trained with standard SGD or a popular baseline sampling strategy, across various learning rates, batch sizes, and distance metrics.

What carries the argument

A submodular function that scores candidate batches by jointly rewarding sample informativeness and subset diversity, solved approximately by a greedy algorithm.

If this is right

Models reach higher test accuracy than with random batch selection in SGD.
The improvement appears across different learning rates and batch sizes.
The gains persist when different distance metrics are used to compare samples.
The greedy algorithm supplies a practical way to solve the otherwise intractable selection problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selection rule might let a network reach a target accuracy level after fewer total gradient steps.
Because the method is defined only in terms of a scoring function on samples, it could be tried on other iterative training loops that also use batches, such as in reinforcement learning or language model fine-tuning.
If the greedy step remains fast on very large datasets, the approach could become a standard preprocessing stage before each epoch rather than a one-time choice.

Load-bearing premise

The chosen submodular function actually captures both the informativeness of individual samples and the diversity of the selected subset.

What would settle it

Train identical network architectures on identical data using the submodular batch selector versus ordinary random selection and check whether final test accuracy is higher for the submodular method.

Figures

Figures reproduced from arXiv: 1906.08771 by K J Joseph, Krishnakant Singh, Vamshi Teja R, Vineeth N Balasubramanian.

**Figure 1.** Figure 1: Comparison of the proposed SMDL, with SGD and loss based sampling scheme on SVHN, CIFAR-10 and CIFAR-100 datasets. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: The figure shows the ablation results on SVHN dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: The figure shows the ablation results on SVHN dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 3.** Figure 3: The graph brings out the importance of each of the term in [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: The figure shows how the training error and loss varies over epochs on different datasets. It is very interesting to note that on all [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The submodular batch selection formulation is a clean extension of existing ideas, but the generalization claims rest on experiments whose details are too thin to evaluate.

read the letter

The paper's main move is to cast mini-batch selection as a submodular maximization problem whose objective trades off per-sample informativeness against diversity of the chosen subset. They give a greedy algorithm for the NP-hard problem and report that models trained this way generalize better than plain SGD and one baseline across learning rates, batch sizes, and distance metrics. The formulation itself is straightforward and directly targets two properties that matter for batch quality, and the greedy solver is the standard efficient approximation for this class of functions. That part of the work is easy to follow and builds on established submodular machinery without obvious circularity. The experiments are the soft spot. The abstract asserts consistent gains but supplies no information on number of random seeds, variance across runs, error bars, or significance tests. Neural net training is stochastic enough that modest reported improvements can easily come from seed variation rather than the selection method, so the central claim is hard to assess from what is shown. The stress-test concern about missing replication controls looks accurate on the available description. This is the sort of paper that might interest people already working on data selection or curriculum strategies in deep learning. A reader looking for a new batching heuristic could extract the objective and algorithm and test it themselves, but they would have to supply the missing controls. I would send it to peer review because the method is clearly stated and the idea is grounded enough to be worth a referee's time, even if the current evidence needs strengthening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a submodular function maximization approach for mini-batch selection during DNN training. The formulation is designed to jointly capture per-sample informativeness and subset diversity; an efficient greedy algorithm is presented for the resulting NP-hard problem. Extensive experiments on standard datasets are reported to show that models trained with the proposed selection strategy achieve better generalization than vanilla SGD and a popular baseline sampler, with the advantage holding across learning rates, batch sizes, and distance metrics.

Significance. If the empirical claims are robust, the work would supply a principled, optimization-based alternative to uniform random sampling that could improve training outcomes without increasing computational cost. The submodular framing is attractive because it inherits standard greedy approximation guarantees, which is a concrete strength relative to purely heuristic batch-selection heuristics.

major comments (2)

[Experiments section] Experiments section (and abstract): the central claim of consistent outperformance in generalization is presented without any report of multiple random seeds, error bars, variance across runs, or statistical significance tests. Because DNN training is highly stochastic, single-run accuracy differences across LR/batch-size settings cannot reliably support the claim that the submodular strategy is superior.
[Section 3] Section 3 (submodular formulation): the weakest modeling assumption—that the chosen submodular objective simultaneously encodes both informativeness and diversity—is asserted but not supported by any ablation that isolates the contribution of each term or compares against variants that omit one component.

minor comments (2)

[Section 3] Notation for the submodular function and the distance metric used inside it should be introduced once and used consistently; several symbols appear to be redefined in passing.
[Abstract] The abstract states 'extensive experiments' yet supplies no concrete dataset names, model architectures, or evaluation metrics; these details belong in the abstract or a dedicated experimental summary paragraph.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate the changes planned for the revised manuscript.

read point-by-point responses

Referee: [Experiments section] Experiments section (and abstract): the central claim of consistent outperformance in generalization is presented without any report of multiple random seeds, error bars, variance across runs, or statistical significance tests. Because DNN training is highly stochastic, single-run accuracy differences across LR/batch-size settings cannot reliably support the claim that the submodular strategy is superior.

Authors: We agree that the absence of multiple random seeds, error bars, and statistical tests weakens the reliability of the generalization claims, given the stochasticity of DNN training. In the revision we will rerun the main experiments (across the reported learning rates and batch sizes) with at least five independent random seeds, report mean test accuracies together with standard deviations, and include paired t-tests or Wilcoxon tests to assess statistical significance of the observed differences. revision: yes
Referee: [Section 3] Section 3 (submodular formulation): the weakest modeling assumption—that the chosen submodular objective simultaneously encodes both informativeness and diversity—is asserted but not supported by any ablation that isolates the contribution of each term or compares against variants that omit one component.

Authors: The objective is explicitly defined as the sum of a coverage (diversity) term and a modular (informativeness) term; this additive construction is the standard mechanism by which submodular functions jointly encode both properties. Nevertheless, we acknowledge that an explicit ablation isolating each component would strengthen the modeling justification. We will add this ablation study in the revised Section 3, comparing the full objective against the diversity-only and informativeness-only variants on the same datasets. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method and claims are independently formulated and empirically validated.

full rationale

The paper presents a submodular objective that combines per-sample informativeness with subset diversity, solved via a standard greedy algorithm for the NP-hard problem. No equations or claims reduce by construction to fitted parameters renamed as predictions, self-definitions, or load-bearing self-citations. The derivation chain (formulation → algorithm → experiments) is self-contained against external benchmarks; generalization improvements are asserted via direct comparison to SGD and a baseline on standard datasets, without circular reduction. This is the expected honest non-finding for an applied methods paper whose core contribution is a new objective rather than a derived identity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach assumes standard properties of submodular functions allow effective capture of sample properties and that the greedy algorithm yields high-quality solutions for the NP-hard problem.

axioms (1)

standard math Submodular functions admit efficient greedy maximization with approximation guarantees.
Invoked implicitly to justify the efficient algorithm design.

pith-pipeline@v0.9.0 · 5633 in / 926 out tokens · 18597 ms · 2026-05-25T19:28:00.518415+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

F(S) = ∑ λ1U(xi)+λ2R(xi)+λ3MC(xi)+λ4FM(xi) ... Lemma 1. The score function F(.) ... is submodular. ... Theorem 1. F(S) ≥ (1−1/e)F(S*)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate batch selection as solving a submodular optimization problem

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 9 internal anchors

[1]

Variance Reduction in SGD by Distributed Importance Sampling

Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[2]

Katyusha: The first direct acceleration of stochastic gradient methods

Zeyuan Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. The Journal of Machine Learning Research , 18(1):8194--8244, 2017

work page 2017
[3]

Subset replay based continual learning for scalable improvement of autonomous systems

Pratik Prabhanjan Brahma and Adrienne Othon. Subset replay based continual learning for scalable improvement of autonomous systems. In CVPR Workshops , pages 1179--11798. IEEE, 2018

work page 2018
[4]

Adaptive batch mode active learning

Shayok Chakraborty, Vineeth Balasubramanian, and Sethuraman Panchanathan. Adaptive batch mode active learning. IEEE TNNLS , 26(8):1747--1760, 2015

work page 2015
[5]

Active bias: Training more accurate neural networks by emphasizing high variance samples

Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. Active bias: Training more accurate neural networks by emphasizing high variance samples. In Advances in Neural Information Processing Systems , pages 1002--1012, 2017

work page 2017
[6]

Algorithms for subset selection in linear regression

Abhimanyu Das and David Kempe. Algorithms for subset selection in linear regression. In ACM STOC , pages 45--54. ACM, 2008

work page 2008
[7]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR , pages 770--778, 2016

work page 2016
[8]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

Accelerating stochastic gradient descent using predictive variance reduction

Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS , pages 315--323, 2013

work page 2013
[10]

Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos and Fran c ois Fleuret. Biased importance sampling for deep neural network training. arXiv preprint arXiv:1706.00043 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Not all samples are created equal: Deep learning with importance sampling

Angelos Katharopoulos and Fran c ois Fleuret. Not all samples are created equal: Deep learning with importance sampling. arXiv:1803.00942 , 2018

work page arXiv 2018
[12]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009
[14]

Efficient mini-batch training for stochastic optimization

Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 661--670. ACM, 2014

work page 2014
[15]

Fast DPP Sampling for Nystr\"om with Application to Kernel Methods

Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast dpp sampling for nystr " om with application to kernel methods. arXiv preprint arXiv:1603.06052 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[16]

A class of submodular functions for document summarization

Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In ACL , pages 510--520. Association for Computational Linguistics, 2011

work page 2011
[17]

Online batch selection for faster training of neural networks

Ilya Loshchilov and Frank Hutter. Online batch selection for faster training of neural networks. ICLR Workshops , 2015

work page 2015
[18]

Accelerated greedy algorithms for maximizing submodular set functions

Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization techniques , pages 234--243. Springer, 1978

work page 1978
[19]

Distributed submodular maximization: Identifying representative elements in massive data

Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems , pages 2049--2057, 2013

work page 2049
[20]

Lazier than lazy greedy

Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondr \'a k, and Andreas Krause. Lazier than lazy greedy. In AAAI , 2015

work page 2015
[21]

An analysis of approximations for maximizing submodular set functions—i

George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions—i. Mathematical programming , 14(1):265--294, 1978

work page 1978
[22]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop , page 5, 2011

work page 2011
[23]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W , 2017

work page 2017
[24]

Greedy sensor selection: Leveraging submodularity

Manohar Shamaiah, Siddhartha Banerjee, and Haris Vikalo. Greedy sensor selection: Leveraging submodularity. In 49th IEEE conference on decision and control (CDC) , pages 2572--2577. IEEE, 2010

work page 2010
[25]

Submodular importance sampling for neural network training

Krishna Kant Singh and Vineeth N Balasubramanian. Submodular importance sampling for neural network training. Master's thesis, Indian Institute of Technology Hyderabad, 2018

work page 2018
[26]

Self-Paced Learning with Adaptive Deep Visual Embeddings

Vithursan Thangarasa and Graham W Taylor. Self-paced learning with adaptive deep visual embeddings. arXiv preprint arXiv:1807.09200 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[27]

Submodular subset selection for large-scale speech training data

Kai Wei, Yuzong Liu, Katrin Kirchhoff, Chris Bartels, and Jeff Bilmes. Submodular subset selection for large-scale speech training data. In ICASSP , pages 3311--3315. IEEE, 2014

work page 2014
[28]

Submodularity in data subset selection and active learning

Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In ICML , pages 1954--1963, 2015

work page 1954
[29]

Determinantal Point Processes for Mini-Batch Diversification

Cheng Zhang, Hedvig Kjellstrom, and Stephan Mandt. Determinantal point processes for mini-batch diversification. arXiv preprint arXiv:1705.00607 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[30]

Active Mini-Batch Sampling using Repulsive Point Processes

Cheng Zhang, Cengiz \"O ztireli, Stephan Mandt, and Giampiero Salvi. Active mini-batch sampling using repulsive point processes. arXiv:1804.02772 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Peilin Zhao and Tong Zhang. Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprint arXiv:1405.3080 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[32]

Stochastic optimization with importance sampling for regularized loss minimization

Peilin Zhao and Tong Zhang. Stochastic optimization with importance sampling for regularized loss minimization. In ICML , pages 1--9, 2015

work page 2015
[33]

Minimax curriculum learning: Machine teaching with desirable difficulties and scheduled diversity

Tianyi Zhou and Jeff Bilmes. Minimax curriculum learning: Machine teaching with desirable difficulties and scheduled diversity. In ICLR , 2018

work page 2018
[34]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page

[1] [1]

Variance Reduction in SGD by Distributed Importance Sampling

Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[2] [2]

Katyusha: The first direct acceleration of stochastic gradient methods

Zeyuan Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. The Journal of Machine Learning Research , 18(1):8194--8244, 2017

work page 2017

[3] [3]

Subset replay based continual learning for scalable improvement of autonomous systems

Pratik Prabhanjan Brahma and Adrienne Othon. Subset replay based continual learning for scalable improvement of autonomous systems. In CVPR Workshops , pages 1179--11798. IEEE, 2018

work page 2018

[4] [4]

Adaptive batch mode active learning

Shayok Chakraborty, Vineeth Balasubramanian, and Sethuraman Panchanathan. Adaptive batch mode active learning. IEEE TNNLS , 26(8):1747--1760, 2015

work page 2015

[5] [5]

Active bias: Training more accurate neural networks by emphasizing high variance samples

Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum. Active bias: Training more accurate neural networks by emphasizing high variance samples. In Advances in Neural Information Processing Systems , pages 1002--1012, 2017

work page 2017

[6] [6]

Algorithms for subset selection in linear regression

Abhimanyu Das and David Kempe. Algorithms for subset selection in linear regression. In ACM STOC , pages 45--54. ACM, 2008

work page 2008

[7] [7]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR , pages 770--778, 2016

work page 2016

[8] [8]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

Accelerating stochastic gradient descent using predictive variance reduction

Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In NIPS , pages 315--323, 2013

work page 2013

[10] [10]

Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos and Fran c ois Fleuret. Biased importance sampling for deep neural network training. arXiv preprint arXiv:1706.00043 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Not all samples are created equal: Deep learning with importance sampling

Angelos Katharopoulos and Fran c ois Fleuret. Not all samples are created equal: Deep learning with importance sampling. arXiv:1803.00942 , 2018

work page arXiv 2018

[12] [12]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009

[14] [14]

Efficient mini-batch training for stochastic optimization

Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , pages 661--670. ACM, 2014

work page 2014

[15] [15]

Fast DPP Sampling for Nystr\"om with Application to Kernel Methods

Chengtao Li, Stefanie Jegelka, and Suvrit Sra. Fast dpp sampling for nystr " om with application to kernel methods. arXiv preprint arXiv:1603.06052 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[16] [16]

A class of submodular functions for document summarization

Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In ACL , pages 510--520. Association for Computational Linguistics, 2011

work page 2011

[17] [17]

Online batch selection for faster training of neural networks

Ilya Loshchilov and Frank Hutter. Online batch selection for faster training of neural networks. ICLR Workshops , 2015

work page 2015

[18] [18]

Accelerated greedy algorithms for maximizing submodular set functions

Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization techniques , pages 234--243. Springer, 1978

work page 1978

[19] [19]

Distributed submodular maximization: Identifying representative elements in massive data

Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submodular maximization: Identifying representative elements in massive data. In Advances in Neural Information Processing Systems , pages 2049--2057, 2013

work page 2049

[20] [20]

Lazier than lazy greedy

Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondr \'a k, and Andreas Krause. Lazier than lazy greedy. In AAAI , 2015

work page 2015

[21] [21]

An analysis of approximations for maximizing submodular set functions—i

George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of approximations for maximizing submodular set functions—i. Mathematical programming , 14(1):265--294, 1978

work page 1978

[22] [22]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop , page 5, 2011

work page 2011

[23] [23]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W , 2017

work page 2017

[24] [24]

Greedy sensor selection: Leveraging submodularity

Manohar Shamaiah, Siddhartha Banerjee, and Haris Vikalo. Greedy sensor selection: Leveraging submodularity. In 49th IEEE conference on decision and control (CDC) , pages 2572--2577. IEEE, 2010

work page 2010

[25] [25]

Submodular importance sampling for neural network training

Krishna Kant Singh and Vineeth N Balasubramanian. Submodular importance sampling for neural network training. Master's thesis, Indian Institute of Technology Hyderabad, 2018

work page 2018

[26] [26]

Self-Paced Learning with Adaptive Deep Visual Embeddings

Vithursan Thangarasa and Graham W Taylor. Self-paced learning with adaptive deep visual embeddings. arXiv preprint arXiv:1807.09200 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[27] [27]

Submodular subset selection for large-scale speech training data

Kai Wei, Yuzong Liu, Katrin Kirchhoff, Chris Bartels, and Jeff Bilmes. Submodular subset selection for large-scale speech training data. In ICASSP , pages 3311--3315. IEEE, 2014

work page 2014

[28] [28]

Submodularity in data subset selection and active learning

Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. In ICML , pages 1954--1963, 2015

work page 1954

[29] [29]

Determinantal Point Processes for Mini-Batch Diversification

Cheng Zhang, Hedvig Kjellstrom, and Stephan Mandt. Determinantal point processes for mini-batch diversification. arXiv preprint arXiv:1705.00607 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[30] [30]

Active Mini-Batch Sampling using Repulsive Point Processes

Cheng Zhang, Cengiz \"O ztireli, Stephan Mandt, and Giampiero Salvi. Active mini-batch sampling using repulsive point processes. arXiv:1804.02772 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Peilin Zhao and Tong Zhang. Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprint arXiv:1405.3080 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[32] [32]

Stochastic optimization with importance sampling for regularized loss minimization

Peilin Zhao and Tong Zhang. Stochastic optimization with importance sampling for regularized loss minimization. In ICML , pages 1--9, 2015

work page 2015

[33] [33]

Minimax curriculum learning: Machine teaching with desirable difficulties and scheduled diversity

Tianyi Zhou and Jeff Bilmes. Minimax curriculum learning: Machine teaching with desirable difficulties and scheduled diversity. In ICLR , 2018

work page 2018

[34] [34]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page