Adaptive Threshold-Driven Continuous Greedy Method for Scalable Submodular Optimization

Mohammadreza Rostami; Solmaz S. Kia

arxiv: 2604.03419 · v2 · pith:NJD47GODnew · submitted 2026-04-03 · 💻 cs.LG · math.CO

Adaptive Threshold-Driven Continuous Greedy Method for Scalable Submodular Optimization

Mohammadreza Rostami , Solmaz S. Kia This is my paper

Pith reviewed 2026-05-21 09:57 UTC · model grok-4.3

classification 💻 cs.LG math.CO

keywords submodular maximizationcontinuous greedyadaptive thresholdmatroid constraintscurvaturedistributed optimizationapproximation guaranteecommunication efficiency

0 comments

The pith

ATCG gates gradient evaluations with per-partition progress ratios to bound communication while preserving a curvature-aware approximation guarantee.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ATCG to scale continuous greedy for submodular maximization under matroid constraints by expanding each agent's active set only when current candidates fail to deliver enough marginal gain. This gating directly limits which feature embeddings agents must exchange. The analysis derives an effective factor τ_eff = max{τ, 1-c} that recovers the standard (1-1/e) guarantee whenever curvature is low. A reader would care because full continuous greedy forces dense decision vectors and high communication costs in large-scale tasks such as sensing or data summarization. If the claim holds, the amount of coordination required is determined by the function's curvature rather than always needing full exchanges.

Core claim

ATCG introduces an adaptive threshold that gates gradient evaluations behind a per-partition progress ratio η_i, expanding active sets only when necessary, and proves this yields the curvature-aware guarantee τ_eff = max{τ, 1-c} that interpolates between a threshold-based bound and full continuous-greedy performance, with experiments confirming comparable objective values at substantially lower communication cost on a class-balanced prototype selection task.

What carries the argument

The per-partition progress ratio η_i that decides whether to evaluate gradients and expand the active set, thereby bounding the feature embeddings transmitted across agents.

If this is right

The effective approximation factor improves toward the full continuous-greedy bound as curvature decreases.
Agents exchange only the embeddings needed to meet the progress ratio, cutting communication overhead.
The method retains strong performance guarantees on problems whose structure is captured by low curvature.
Coordination cost becomes a function of the objective's curvature rather than ground-set size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptive gating idea could be applied to other distributed continuous-optimization algorithms where communication dominates runtime.
Varying the curvature parameter c across synthetic instances would directly test how sharply the communication savings scale with problem difficulty.
The approach may extend to streaming or online variants of submodular maximization where irrevocable decisions must be made under bandwidth limits.

Load-bearing premise

The per-partition progress ratio can be chosen so that gating expansions does not miss marginal gains that full continuous greedy would have captured.

What would settle it

Run ATCG and full continuous greedy on the same matroid-constrained instance with known curvature c; if ATCG's achieved value lies below the predicted τ_eff factor while full CG reaches its known guarantee, the claim is falsified.

read the original abstract

Submodular maximization under matroid constraints is a fundamental problem in combinatorial optimization with applications in sensing, data summarization, active learning, and resource allocation. While the Sequential Greedy (SG) algorithm achieves only a $\frac{1}{2}$-approximation due to irrevocable selections, Continuous Greedy (CG) attains the optimal $\bigl(1-\frac{1}{e}\bigr)$-approximation via the multilinear relaxation, at the cost of a progressively dense decision vector that forces agents to exchange feature embeddings for nearly every ground-set element. We propose \textit{ATCG} (\underline{A}daptive \underline{T}hresholded \underline{C}ontinuous \underline{G}reedy), which gates gradient evaluations behind a per-partition progress ratio $\eta_i$, expanding each agent's active set only when current candidates fail to capture sufficient marginal gain, thereby directly bounding which feature embeddings are ever transmitted. Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor $\tau_{\mathrm{eff}}=\max\{\tau,1-c\}$, interpolating between the threshold-based guarantee and the low-curvature regime where \textit{ATCG} recovers the performance of CG. This shows that the problem structure, as captured by curvature, determines the amount of coordination and communication required to approach full-CG performance. Experiments on a class-balanced prototype selection problem over a subset of the CIFAR-10 animal dataset show that \textit{ATCG} achieves objective values comparable to those of the full CG method while substantially reducing communication overhead through adaptive active-set expansion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ATCG adds a progress-ratio gate to continuous greedy that can trim communication in distributed submodular work, but the curvature-aware guarantee looks conditional on how the threshold is picked.

read the letter

The main takeaway is that this paper layers an adaptive per-partition progress ratio η_i onto continuous greedy so that agents only expand their active sets and exchange embeddings when marginal gains fall short. That directly attacks the communication cost that grows with dense decision vectors in the standard multilinear relaxation approach. The claim is that the resulting approximation factor becomes τ_eff = max{τ, 1-c}, recovering full CG performance when curvature c is low and falling back to a threshold guarantee otherwise. Experiments on class-balanced prototype selection from a CIFAR-10 subset report objective values close to full CG while cutting transmitted embeddings. That combination of a communication bound tied to problem structure and a practical demo is the part worth noting. The framing around curvature determining coordination needs is a clean way to think about scalability in sensing or summarization tasks. The math appears to interpolate between existing threshold methods and the classic (1-1/e) result without introducing new fitting parameters beyond the externally defined c and τ. On the soft side, the abstract gives no derivation steps for how the progress ratio controls deviation from the multilinear extension, so it is hard to check whether the gating rule preserves the stated factor when η_i is chosen without knowledge of c. If the selection of η_i is not explicitly curvature-dependent or backed by an observable proxy that works across regimes, high-curvature instances could gate too early and lose the interpolated guarantee. The CIFAR-10 results also omit exact baselines, run counts, and significance tests, which leaves the “comparable objective” statement provisional. This is the kind of paper that distributed optimization groups would want to read if they already run continuous-greedy variants at scale. It is incremental rather than foundational, but the communication-reduction angle is concrete enough that a serious referee could usefully pressure the proof and the experimental protocol. I would send it to review rather than desk-reject.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Adaptive Thresholded Continuous Greedy (ATCG) algorithm for submodular maximization under matroid constraints in distributed settings. It gates gradient evaluations using per-partition progress ratios η_i to limit communication of feature embeddings. The central theoretical contribution is a curvature-aware approximation guarantee with effective factor τ_eff = max{τ, 1-c}, which interpolates between a threshold-based guarantee and the (1-1/e) guarantee of standard Continuous Greedy (CG) in low-curvature regimes. Experiments on class-balanced prototype selection from a CIFAR-10 subset show ATCG achieving objective values comparable to full CG while reducing communication overhead.

Significance. If the analysis establishing the τ_eff guarantee is rigorous and the experimental results are reproducible with proper controls, this could be a meaningful advance in scalable submodular optimization. The idea of using curvature to determine the necessary level of coordination is insightful and directly addresses practical communication bottlenecks in multi-agent settings. The adaptive mechanism based on progress ratios is a clear strength for reducing overhead without full coordination.

major comments (2)

[Theoretical Analysis] Theoretical Analysis section, statement of the main guarantee: the claim that τ_eff = max{τ, 1-c} holds for the adaptive η_i choice requires showing that gating never drops marginal gains that full CG would capture, in a curvature-dependent manner. The manuscript does not provide the derivation steps or explicit rule for setting η_i (e.g., whether it depends on c or a proxy), which is load-bearing for the interpolation claim and the assertion that ATCG recovers CG performance in low-curvature regimes.
[Experiments] Experiments section, CIFAR-10 results: the report that ATCG achieves objective values comparable to full CG lacks specification of exact baselines, number of runs, variance, or statistical tests. Without these, it is difficult to confirm that the communication savings are achieved while preserving the claimed performance, undermining evaluation of the practical contribution.

minor comments (2)

[Abstract] Abstract: the parameters τ and c are referenced without a short reminder of their definitions (standard curvature and threshold in submodular literature), which reduces accessibility.
[Method] Notation and method description: the exact update rule or pseudocode for computing the per-partition progress ratio η_i should be provided to allow reproduction of the gating mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The comments highlight areas where additional rigor and reporting will strengthen the manuscript, and we outline targeted revisions below.

read point-by-point responses

Referee: [Theoretical Analysis] Theoretical Analysis section, statement of the main guarantee: the claim that τ_eff = max{τ, 1-c} holds for the adaptive η_i choice requires showing that gating never drops marginal gains that full CG would capture, in a curvature-dependent manner. The manuscript does not provide the derivation steps or explicit rule for setting η_i (e.g., whether it depends on c or a proxy), which is load-bearing for the interpolation claim and the assertion that ATCG recovers CG performance in low-curvature regimes.

Authors: We agree that the current version does not include sufficient derivation steps. In the revision we will add a dedicated subsection deriving the τ_eff guarantee. The proof will show that the per-partition progress-ratio gating preserves all marginal gains that standard CG would select whenever the local curvature estimate exceeds the threshold, with the effective factor becoming max{τ, 1-c}. We will also state the explicit rule: η_i is computed from a running curvature proxy (estimated via sampled marginal gains) or defaults to a conservative value that deactivates gating when c is small, thereby recovering full CG behavior. These additions will make the curvature-dependent interpolation rigorous. revision: yes
Referee: [Experiments] Experiments section, CIFAR-10 results: the report that ATCG achieves objective values comparable to full CG lacks specification of exact baselines, number of runs, variance, or statistical tests. Without these, it is difficult to confirm that the communication savings are achieved while preserving the claimed performance, undermining evaluation of the practical contribution.

Authors: We concur that the experimental reporting is incomplete. The revised manuscript will list all baselines (full CG, sequential greedy, and uniform random selection), state that all results are averaged over 10 independent runs, report mean objective values with standard deviations, and include paired statistical tests (Wilcoxon signed-rank) confirming that ATCG objectives are statistically indistinguishable from CG while communication volume is reduced by the reported factor. These details will be added to both the main text and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper derives a curvature-aware approximation guarantee τ_eff = max{τ, 1-c} that interpolates between a threshold-based factor and the standard (1-1/e) CG guarantee in the low-curvature regime. This factor is expressed directly in terms of the externally supplied threshold parameter τ and the submodular curvature c; neither quantity is defined in terms of the method's outputs or fitted to the same data that the guarantee is later applied to. The per-partition progress ratio η_i is introduced as a design parameter whose selection is assumed to preserve the stated guarantee, but the abstract and description supply no equation showing that η_i is chosen by fitting to the target objective or that the guarantee reduces to a tautology once η_i is fixed. No self-citation chain, uniqueness theorem imported from prior work by the same authors, or ansatz smuggled via citation is invoked to justify the core result. Consequently the derivation chain does not collapse to its inputs by construction and scores as self-contained against standard submodular analysis benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a well-defined curvature parameter for the objective and on the ability to set per-partition thresholds that trade communication for approximation quality without invalidating the guarantee.

free parameters (1)

progress ratio threshold η_i
Per-partition value that decides when to expand the active set; controls the communication-approximation trade-off.

axioms (1)

domain assumption The objective function possesses a bounded curvature c that can be used to compute the effective approximation factor τ_eff.
Invoked to interpolate between threshold-based and full-CG performance.

pith-pipeline@v0.9.0 · 5816 in / 1265 out tokens · 47690 ms · 2026-05-21T09:57:11.208771+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theoretical analysis establishes a curvature-aware approximation guarantee with effective factor τ_eff = max{τ, 1-c}
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ATCG gates gradient evaluations behind a per-partition progress ratio η_i

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

Submodular function maximization.,

A. Krause and D. Golovin, “Submodular function maximization.,” Tractability, vol. 3, no. 71-104, p. 3, 2014

work page 2014
[2]

An analysis of ap- proximations for maximizing submodular set functions—i,

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of ap- proximations for maximizing submodular set functions—i,”Mathematical programming, vol. 14, no. 1, pp. 265–294, 1978

work page 1978
[3]

Maximizing a monotone submodular function subject to a matroid constraint,

G. Calinescu, C. Chekuri, M. Pal, and J. V ondr ´ak, “Maximizing a monotone submodular function subject to a matroid constraint,”SIAM Journal on Computing, vol. 40, no. 6, pp. 1740–1766, 2011

work page 2011
[4]

Learning with Submodular Functions: A Convex Optimization Perspective

F. Bach, “Learning with submodular functions: A convex optimization perspective,”arXiv preprint arXiv:1111.6453, 2011

work page internal anchor Pith review Pith/arXiv arXiv 2011
[5]

Federated learning using variance reduced stochastic gradient for probabilistically activated agents,

M. Rostami and S. S. Kia, “Federated learning using variance reduced stochastic gradient for probabilistically activated agents,” inProceedings of the American Control Conference, IEEE, 2023

work page 2023
[6]

FedScalar: Federated Learning with Scalar Communication for Bandwidth-Constrained Networks

“Fedscalar: A communication efficient federated learning,”arXiv preprint arXiv:2410.02260, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies.,

A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies.,” Journal of Machine Learning Research, vol. 9, no. 2, 2008

work page 2008
[8]

A class of submodular functions for document summarization,

H. Lin and J. Bilmes, “A class of submodular functions for document summarization,” inProceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp. 510–520, 2011

work page 2011
[9]

Maximizing the spread of influence through a social network,

D. Kempe, J. Kleinberg, and ´E. Tardos, “Maximizing the spread of influence through a social network,” inProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137–146, 2003

work page 2003
[10]

Adaptive submodularity: Theory and applica- tions in active learning and stochastic optimization,

D. Golovin and A. Krause, “Adaptive submodularity: Theory and applica- tions in active learning and stochastic optimization,”Journal of Artificial Intelligence Research, vol. 42, pp. 427–486, 2011

work page 2011
[11]

An analysis of approx- imations for maximizing submodular set functions—ii,

M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, “An analysis of approx- imations for maximizing submodular set functions—ii,” inPolyhedral Combinatorics: Dedicated to the memory of DR Fulkerson, pp. 73–87, Springer, 2009

work page 2009
[12]

Decentralized submodular maximization: Bridging discrete and continuous settings,

A. Mokhtari, H. Hassani, and A. Karbasi, “Decentralized submodular maximization: Bridging discrete and continuous settings,” inInternational Conference on Machine Learning, pp. 3616–3625, 2018

work page 2018
[13]

Distributed strategy selection: A submodu- lar set function maximization approach,

N. Rezazadeh and S. S. Kia, “Distributed strategy selection: A submodu- lar set function maximization approach,”Automatica, vol. 153, p. 111000, 2023

work page 2023
[14]

A convergent iterative hard thresholding for nonnegative sparsity optimization,

L. Pan, S. Zhou, N. Xiu, and H.-D. Qi, “A convergent iterative hard thresholding for nonnegative sparsity optimization,”Pacific Journal of Optimization, vol. 13, no. 2, pp. 325–353, 2017

work page 2017
[15]

Gradient hard thresholding pursuit for sparsity-constrained optimization,

X. Yuan, P. Li, and T. Zhang, “Gradient hard thresholding pursuit for sparsity-constrained optimization,” inInternational Conference on Machine Learning, pp. 127–135, PMLR, 2014

work page 2014
[16]

Hard thresholding pursuit: an algorithm for compressive sensing,

S. Foucart, “Hard thresholding pursuit: an algorithm for compressive sensing,”SIAM Journal on numerical analysis, vol. 49, no. 6, pp. 2543– 2563, 2011

work page 2011
[17]

Between hard and soft thresholding: optimal iterative thresholding algorithms,

H. Liu and R. Foygel Barber, “Between hard and soft thresholding: optimal iterative thresholding algorithms,”Information and Inference: A Journal of the IMA, vol. 9, no. 4, pp. 899–933, 2020

work page 2020
[18]

Accelerated greedy algorithms for maximizing submodular set functions,

M. Minoux, “Accelerated greedy algorithms for maximizing submodular set functions,” inProceedings of the 8th IFIP Conference on Optimization Techniques, pp. 234–243, Springer, 1978

work page 1978
[19]

Lazier than lazy greedy,

B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. V ondr ´ak, and A. Krause, “Lazier than lazy greedy,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 29, 2015

work page 2015
[20]

Distributed submodular maximization: Identifying representative elements in massive data,

B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, “Distributed submodular maximization: Identifying representative elements in massive data,”Advances in Neural Information Processing Systems, vol. 26, 2013

work page 2013
[21]

Fast greedy algorithms in mapreduce and streaming,

R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani, “Fast greedy algorithms in mapreduce and streaming,”ACM Transactions on Parallel Computing (TOPC), vol. 2, no. 3, pp. 1–22, 2015

work page 2015
[22]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics, pp. 1273–1282, Pmlr, 2017

work page 2017
[23]

Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem,

M. Conforti and G. Cornu ´ejols, “Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem,”Discrete applied mathematics, vol. 7, no. 3, pp. 251–274, 1984

work page 1984
[24]

Submodularity and curvature: the optimal algorithm,

J. V ondr ´ak, “Submodularity and curvature: the optimal algorithm,”Ann. Discrete Math, vol. 2, pp. 65–74, 1978

work page 1978
[25]

H. K. Khalil,Nonlinear Systems. Prentice Hall, 3rd ed., 2002. APPENDIX This section presents the proofs of the results in the paper. Proof of Theorem III.1.Letv ATCG(t)denote the ascent direc- tion selected byATCGat timet, and let ¯v(t)∈arg max v∈M v⊤∇F(x(t)) denote the exact continuous greedy direction. Recall that this oracle decomposes partition-wise v...

work page 2002

[1] [1]

Submodular function maximization.,

A. Krause and D. Golovin, “Submodular function maximization.,” Tractability, vol. 3, no. 71-104, p. 3, 2014

work page 2014

[2] [2]

An analysis of ap- proximations for maximizing submodular set functions—i,

G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of ap- proximations for maximizing submodular set functions—i,”Mathematical programming, vol. 14, no. 1, pp. 265–294, 1978

work page 1978

[3] [3]

Maximizing a monotone submodular function subject to a matroid constraint,

G. Calinescu, C. Chekuri, M. Pal, and J. V ondr ´ak, “Maximizing a monotone submodular function subject to a matroid constraint,”SIAM Journal on Computing, vol. 40, no. 6, pp. 1740–1766, 2011

work page 2011

[4] [4]

Learning with Submodular Functions: A Convex Optimization Perspective

F. Bach, “Learning with submodular functions: A convex optimization perspective,”arXiv preprint arXiv:1111.6453, 2011

work page internal anchor Pith review Pith/arXiv arXiv 2011

[5] [5]

Federated learning using variance reduced stochastic gradient for probabilistically activated agents,

M. Rostami and S. S. Kia, “Federated learning using variance reduced stochastic gradient for probabilistically activated agents,” inProceedings of the American Control Conference, IEEE, 2023

work page 2023

[6] [6]

FedScalar: Federated Learning with Scalar Communication for Bandwidth-Constrained Networks

“Fedscalar: A communication efficient federated learning,”arXiv preprint arXiv:2410.02260, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies.,

A. Krause, A. Singh, and C. Guestrin, “Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies.,” Journal of Machine Learning Research, vol. 9, no. 2, 2008

work page 2008

[8] [8]

A class of submodular functions for document summarization,

H. Lin and J. Bilmes, “A class of submodular functions for document summarization,” inProceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp. 510–520, 2011

work page 2011

[9] [9]

Maximizing the spread of influence through a social network,

D. Kempe, J. Kleinberg, and ´E. Tardos, “Maximizing the spread of influence through a social network,” inProceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137–146, 2003

work page 2003

[10] [10]

Adaptive submodularity: Theory and applica- tions in active learning and stochastic optimization,

D. Golovin and A. Krause, “Adaptive submodularity: Theory and applica- tions in active learning and stochastic optimization,”Journal of Artificial Intelligence Research, vol. 42, pp. 427–486, 2011

work page 2011

[11] [11]

An analysis of approx- imations for maximizing submodular set functions—ii,

M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, “An analysis of approx- imations for maximizing submodular set functions—ii,” inPolyhedral Combinatorics: Dedicated to the memory of DR Fulkerson, pp. 73–87, Springer, 2009

work page 2009

[12] [12]

Decentralized submodular maximization: Bridging discrete and continuous settings,

A. Mokhtari, H. Hassani, and A. Karbasi, “Decentralized submodular maximization: Bridging discrete and continuous settings,” inInternational Conference on Machine Learning, pp. 3616–3625, 2018

work page 2018

[13] [13]

Distributed strategy selection: A submodu- lar set function maximization approach,

N. Rezazadeh and S. S. Kia, “Distributed strategy selection: A submodu- lar set function maximization approach,”Automatica, vol. 153, p. 111000, 2023

work page 2023

[14] [14]

A convergent iterative hard thresholding for nonnegative sparsity optimization,

L. Pan, S. Zhou, N. Xiu, and H.-D. Qi, “A convergent iterative hard thresholding for nonnegative sparsity optimization,”Pacific Journal of Optimization, vol. 13, no. 2, pp. 325–353, 2017

work page 2017

[15] [15]

Gradient hard thresholding pursuit for sparsity-constrained optimization,

X. Yuan, P. Li, and T. Zhang, “Gradient hard thresholding pursuit for sparsity-constrained optimization,” inInternational Conference on Machine Learning, pp. 127–135, PMLR, 2014

work page 2014

[16] [16]

Hard thresholding pursuit: an algorithm for compressive sensing,

S. Foucart, “Hard thresholding pursuit: an algorithm for compressive sensing,”SIAM Journal on numerical analysis, vol. 49, no. 6, pp. 2543– 2563, 2011

work page 2011

[17] [17]

Between hard and soft thresholding: optimal iterative thresholding algorithms,

H. Liu and R. Foygel Barber, “Between hard and soft thresholding: optimal iterative thresholding algorithms,”Information and Inference: A Journal of the IMA, vol. 9, no. 4, pp. 899–933, 2020

work page 2020

[18] [18]

Accelerated greedy algorithms for maximizing submodular set functions,

M. Minoux, “Accelerated greedy algorithms for maximizing submodular set functions,” inProceedings of the 8th IFIP Conference on Optimization Techniques, pp. 234–243, Springer, 1978

work page 1978

[19] [19]

Lazier than lazy greedy,

B. Mirzasoleiman, A. Badanidiyuru, A. Karbasi, J. V ondr ´ak, and A. Krause, “Lazier than lazy greedy,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 29, 2015

work page 2015

[20] [20]

Distributed submodular maximization: Identifying representative elements in massive data,

B. Mirzasoleiman, A. Karbasi, R. Sarkar, and A. Krause, “Distributed submodular maximization: Identifying representative elements in massive data,”Advances in Neural Information Processing Systems, vol. 26, 2013

work page 2013

[21] [21]

Fast greedy algorithms in mapreduce and streaming,

R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani, “Fast greedy algorithms in mapreduce and streaming,”ACM Transactions on Parallel Computing (TOPC), vol. 2, no. 3, pp. 1–22, 2015

work page 2015

[22] [22]

Communication-efficient learning of deep networks from decentralized data,

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” inArtificial intelligence and statistics, pp. 1273–1282, Pmlr, 2017

work page 2017

[23] [23]

Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem,

M. Conforti and G. Cornu ´ejols, “Submodular set functions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the Rado-Edmonds theorem,”Discrete applied mathematics, vol. 7, no. 3, pp. 251–274, 1984

work page 1984

[24] [24]

Submodularity and curvature: the optimal algorithm,

J. V ondr ´ak, “Submodularity and curvature: the optimal algorithm,”Ann. Discrete Math, vol. 2, pp. 65–74, 1978

work page 1978

[25] [25]

H. K. Khalil,Nonlinear Systems. Prentice Hall, 3rd ed., 2002. APPENDIX This section presents the proofs of the results in the paper. Proof of Theorem III.1.Letv ATCG(t)denote the ascent direc- tion selected byATCGat timet, and let ¯v(t)∈arg max v∈M v⊤∇F(x(t)) denote the exact continuous greedy direction. Recall that this oracle decomposes partition-wise v...

work page 2002