RA-DCA: A Randomized Active-Set DCA for Directional Stationarity in Max-Structured DC Programs

Yi-Shuai Niu

arxiv: 2605.23550 · v1 · pith:W7MEQOPXnew · submitted 2026-05-22 · 🧮 math.OC · cs.AI· cs.NA· math.NA

RA-DCA: A Randomized Active-Set DCA for Directional Stationarity in Max-Structured DC Programs

Yi-Shuai Niu This is my paper

Pith reviewed 2026-05-25 04:20 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.NAmath.NA

keywords difference-of-convex programmingdirectional stationarityactive-set methodsrandomized algorithmsnonsmooth optimizationmax-structured DC programsDCA

0 comments

The pith

RA-DCA ensures every accumulation point of the safeguarded method is directionally stationary with probability one under regularity, active-set consistency, and random-embedding assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses nonsmooth difference-of-convex programs in which the subtracted convex piece is a finite maximum of smooth convex functions. Standard DCA iterations can reach critical points that fail to be directionally stationary, while exact active-vertex screening becomes expensive for large or combinatorial active sets. RA-DCA therefore performs vertex-first randomized screening by projecting active gradients onto sampled directions, checking a sampled vertex residual, and invoking a small linear program only as a low-residual fallback. The algorithm keeps the descent property of classical DCA and replaces most of the screening work with matrix multiplications. Under the paper's stated regularity, numerical active-set consistency, and random-embedding conditions, every accumulation point of the safeguarded iteration is directionally stationary with probability one.

Core claim

Under the stated regularity, numerical active-set consistency, and random-embedding assumptions, every accumulation point generated by the safeguarded RA-DCA method is directionally stationary with probability one.

What carries the argument

Randomized active-set DCA that projects active gradients onto sampled directions, checks sampled vertex residuals, and falls back to a small linear program only for low-residual convex combinations.

If this is right

The method preserves the descent structure of classical DCA.
Screening cost reduces to matrix multiplications rather than repeated large linear programs.
The safeguard prevents convergence to nonstationary critical points on the tested degenerate max-affine, max-quadratic, and sparse support-function models.
The same sampling idea remains effective for block top-k problems whose exact aggregate enumeration is combinatorial.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The randomized screening layer could be inserted into other DC algorithms that rely on active-set information.
The separation of cases where active-set selection helps from those dominated by multistart or the DC split supplies a practical diagnostic for choosing solvers.
Empirical checks of the random-embedding condition on new problem families would give concrete guidance on when the probability-one guarantee applies.

Load-bearing premise

The random-embedding assumptions together with numerical active-set consistency hold for the instances on which the method is run.

What would settle it

An explicit instance, together with its random embeddings, on which an accumulation point produced by the safeguarded method is not directionally stationary.

Figures

Figures reproduced from arXiv: 2605.23550 by Yi-Shuai Niu.

**Figure 2.** Figure 2: Objective histories on one signed-pair max-affine instance with [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Sampling effect on the selected active vertex. The ratio compares the true norm of the [PITH_FULL_IMAGE:figures/full_fig_p021_3.png] view at source ↗

**Figure 4.** Figure 4: Residual and objective histories on one max-quadratic instance with [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: QUBO budget sensitivity on the OR-Library [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: QUBO gap profiles over all thirty OR-Library instances. Each curve reports the fraction [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗

**Figure 7.** Figure 7: Starts and backend sensitivity on bqp250.8. Left: percentage gap versus the number of starts. Right: speedup relative to the serial CPU implementation; the horizontal line marks parity. The batched implementations solve all starts as a matrix of iterates, while the serial CPU implementation follows the default per-start loop used in the main QUBO table. For each split and start budget, all three backends r… view at source ↗

read the original abstract

We study nonsmooth difference-of-convex programs whose subtracted convex term is a finite maximum of smooth convex functions. In this setting, standard DCA iterations may converge to critical points that are not directionally stationary, whereas exact active-vertex screening can be expensive when active sets are large or combinatorial. We propose RA-DCA, a vertex-first randomized active-set DCA that projects active gradients onto sampled directions, checks a sampled vertex residual, and uses a small linear program only as a low-residual convex-combination fallback. The method preserves the descent structure of DCA and reduces the randomized screening layer to matrix multiplications. Under the stated regularity, numerical active-set consistency, and random-embedding assumptions, every accumulation point generated by the safeguarded method is directionally stationary with probability one. MATLAB experiments first test the theorem on degenerate max-affine, max-quadratic, and sparse support-function models, where the safeguard avoids nonstationary critical points and closely tracks a full active-vertex scan. Block top-k tests then show that the same screening idea remains useful when exact aggregate enumeration is combinatorial. Trimmed-regression, complementarity, and QUBO diagnostics separate cases where active-set selection helps from cases dominated by multistart search, the DC split, or other problem-specific features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RA-DCA adds randomized screening to DCA for max-structured problems but its directional stationarity guarantee rests on unverified numerical consistency and random-embedding assumptions.

read the letter

The main takeaway is that this randomized active-set DCA preserves the basic DCA descent while adding a vertex-first randomized projection, sampled residual check, and small LP fallback to target directional stationarity instead of just criticality. The combination looks new relative to the cited prior work on DCA for max-structured DC programs. Experiments on degenerate max-affine, max-quadratic, and sparse support-function cases show the safeguard avoiding nonstationary points and tracking full active-vertex scans, while block top-k tests indicate the screening stays practical when exact enumeration is combinatorial. The trimmed-regression, complementarity, and QUBO diagnostics usefully separate when active-set selection matters from when multistart or the DC split dominates. The method reduces screening to matrix multiplications most of the time, which is a practical plus for large active sets. The central claim of probability-one directional stationarity holds only under regularity plus numerical active-set consistency and random-embedding assumptions. These last two are non-standard and the abstract gives no indication they are verified or guaranteed beyond the fallback mechanism. If either fails on a positive-measure set of instances, the conclusion does not follow. The experiments are limited to specific models, so it remains unclear how often the assumptions hold in broader use. This paper is aimed at specialists already working on nonsmooth DC programming and related areas like complementarity or combinatorial optimization. A reader in that niche would get value from the algorithmic idea and the case-separation experiments. The work shows clear engagement with the literature and is coherent on its own terms, so it deserves referee time to check the proof details and whether the assumptions can be justified or relaxed.

Referee Report

2 major / 0 minor

Summary. The paper introduces RA-DCA, a randomized active-set variant of the DC algorithm for nonsmooth DC programs in which the subtracted convex term is a finite max of smooth convex functions. Standard DCA may converge only to critical points that fail directional stationarity; the proposed method performs randomized vertex screening via projected gradients and sampled residuals, falling back to a small LP when needed, while preserving the DCA descent property. The central theorem asserts that, under regularity conditions together with numerical active-set consistency and random-embedding assumptions, every accumulation point of the safeguarded iterates is directionally stationary with probability one. MATLAB experiments on max-affine, max-quadratic, sparse support-function, block top-k, trimmed-regression, complementarity, and QUBO instances are used to illustrate that the safeguard prevents nonstationary limits and that the screening remains useful when exact enumeration is combinatorial.

Significance. If the stated convergence result holds, the work supplies a computationally lighter route to directional stationarity for a practically relevant subclass of DC programs whose active-set structure is combinatorial. The explicit preservation of the DCA descent property and the reduction of screening to matrix multiplications are concrete algorithmic strengths; the experiments separate regimes in which active-set randomization helps from those dominated by multistart or the DC decomposition itself.

major comments (2)

[Abstract] Abstract: the probability-one directional-stationarity claim is conditioned on three classes of assumptions, two of which (numerical active-set consistency and random-embedding assumptions) are non-standard and appear only as hypotheses; no argument is supplied showing that the randomized screening plus safeguard actually enforces these conditions on a set of full measure, so the theorem does not yet establish the claimed guarantee for the algorithm as implemented.
[Abstract] Abstract (final paragraph): the experiments are described as testing the theorem on degenerate models and showing that the safeguard avoids nonstationary critical points, yet no quantitative measure (e.g., frequency of safeguard activation, distance to stationarity, or comparison against plain DCA) is reported; without such data it is impossible to judge whether the observed behavior supports the theorem or merely reflects the safeguard fallback.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the two major comments on the abstract. We respond point by point below, proposing targeted revisions to the abstract where they strengthen clarity without altering the technical claims.

read point-by-point responses

Referee: [Abstract] Abstract: the probability-one directional-stationarity claim is conditioned on three classes of assumptions, two of which (numerical active-set consistency and random-embedding assumptions) are non-standard and appear only as hypotheses; no argument is supplied showing that the randomized screening plus safeguard actually enforces these conditions on a set of full measure, so the theorem does not yet establish the claimed guarantee for the algorithm as implemented.

Authors: The theorem (Theorem 4.1) and the abstract explicitly condition the probability-one directional-stationarity result on the three listed assumption classes; the manuscript does not claim that RA-DCA enforces numerical active-set consistency or the random-embedding property with probability one. The randomized screening and safeguard are presented as practical mechanisms whose behavior is consistent with the assumptions in the reported experiments. Establishing that the implemented procedure satisfies the non-standard assumptions almost surely would require a separate technical development. We will revise the abstract to emphasize the conditional character of the guarantee. revision: partial
Referee: [Abstract] Abstract (final paragraph): the experiments are described as testing the theorem on degenerate models and showing that the safeguard avoids nonstationary critical points, yet no quantitative measure (e.g., frequency of safeguard activation, distance to stationarity, or comparison against plain DCA) is reported; without such data it is impossible to judge whether the observed behavior supports the theorem or merely reflects the safeguard fallback.

Authors: The abstract is a high-level summary. Section 5 of the manuscript already contains per-instance comparisons of safeguard activation, stationarity residuals, and performance versus plain DCA on the max-affine, max-quadratic, and block top-k suites. To make this evidence visible at the abstract level we will insert concise quantitative indicators (e.g., average safeguard invocation rate and median stationarity gap) in the revised abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: theorem is conditional on explicit non-derived assumptions; descent property is the standard DCA one.

full rationale

The paper defines RA-DCA by modifying standard DCA with a randomized screening layer that reduces to matrix multiplications plus a fallback LP. The central claim is that, under three listed assumptions (regularity, numerical active-set consistency, random-embedding), accumulation points are directionally stationary with probability one. These assumptions are stated as hypotheses in the abstract and are not shown to follow from the algorithm; the descent structure is explicitly the inherited DCA property rather than a new fitted quantity. No self-definitional loop, fitted-input prediction, or load-bearing self-citation chain appears in the provided derivation outline.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The convergence result rests on three classes of assumptions explicitly named in the abstract; no free parameters or new entities are introduced.

axioms (3)

domain assumption Regularity conditions on the DC program
Required for the directional stationarity claim (abstract).
domain assumption Numerical active-set consistency
Assumed so that sampled vertices correctly identify the active set (abstract).
domain assumption Random-embedding assumptions
Needed for the probability-one guarantee (abstract).

pith-pipeline@v0.9.0 · 5764 in / 1418 out tokens · 29034 ms · 2026-05-25T04:20:45.229001+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the stated regularity, numerical active-set consistency, and random-embedding assumptions, every accumulation point generated by the safeguarded method is directionally stationary with probability one.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RA-DCA is a vertex-first randomized active-set DCA that projects active gradients onto sampled directions, checks a sampled vertex residual, and uses a small linear program only as a low-residual convex-combination fallback.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Mathematical Programming169(1), 69–94 (2018)

Ahmadi, A.A., Hall, G.: DC decomposition of nonconvex polynomials with algebraic techniques. Mathematical Programming169(1), 69–94 (2018). DOI10.1007/s10107-017-1144-5

work page 2018
[2]

The Annals of Applied Statistics7(1), 226–248 (2013)

Alfons, A., Croux, C., Gelper, S.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. The Annals of Applied Statistics7(1), 226–248 (2013). DOI10.1214/12-AOAS575

work page 2013
[3]

Cooperative exploration of level surfaces of three dimensional scalar ﬁelds,

Bagirov, A.M., Taheri, S., Ugon, J.: Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems. Pattern Recognition53, 12–24 (2016). DOI 10.1016/j. patcog.2015.11.011

work page doi:10.1016/j 2016
[4]

Journal of the Operational Research Society41(11), 1069–1072 (1990)

Beasley, J.E.: OR-Library: distributing test problems by electronic mail. Journal of the Operational Research Society41(11), 1069–1072 (1990). DOI10.1057/jors.1990.166

work page 1990
[5]

Princeton University Press, Princeton (2009)

Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009). DOI10.1515/9781400831050 35 Table 14: Starts and backend sensitivity on bqp250.8. The table compares serial CPU, batched CPU, and batched GPU prototypes using the same initial multistart pool. The speedup columns are relative to the matching seri...

work page 2009
[6]

SIAM Review53(3), 464–501 (2011)

Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Review53(3), 464–501 (2011). DOI10.1137/080734510

work page 2011
[7]

The Annals of Statistics44(2), 813–852 (2016)

Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. The Annals of Statistics44(2), 813–852 (2016). DOI10.1214/15-AOS1388

work page 2016
[8]

ACM Transactions on Intelligent Systems and Technology2(3), 27:1–27:27 (2011)

Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology2(3), 27:1–27:27 (2011). DOI 10.1145/1961189.1961199. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

work page doi:10.1145/1961189.1961199 2011
[9]

Default” uses the parameters of the reported experiments; “LP-forced

Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983) 36 0 50 100 150 200 250 300 starts 0 0.5 1 1.5 2 2.5 3 3.5gap (%) Solution quality shift gap spectral gap 0 50 100 150 200 250 300 starts 100 101 102 speedup vs serial CPU Backend speedup shift / batched CPU shift / batched GPU spectral / batched CPU spectral / batched GPU Figure 7:...

work page 1983
[10]

SIAM Journal on Optimization28(4), 3344–3374 (2018)

Cui, Y., Pang, J.S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM Journal on Optimization28(4), 3344–3374 (2018). DOI 10.1137/ 18M117337X

work page 2018
[11]

Optimization Methods and Software19(1), 15–40 (2004)

Fletcher, R., Leyffer, S., Ralph, D., Scholtes, S.: Solving mathematical programs with com- plementarity constraints as nonlinear programs. Optimization Methods and Software19(1), 15–40 (2004). DOI10.1080/10556780410001654241

work page 2004
[12]

The exact residual is computed using the exact active set, not the initial ε0-active set

Gleixner, A., Hendel, G., Gamrath, G., et al.: MIPLIB 2017: data-driven compilation of the 6th 37 Table 16: A near-active diagnostic for the LP fallback. The exact residual is computed using the exact active set, not the initial ε0-active set. The LP branch prevents a spurious step caused by nearly active but exactly inactive affine pieces. methodxobjecti...

work page 2017
[13]

https://docs

Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2026). https://docs. gurobi.com/

work page 2026
[14]

SIAM Review53(2), 217–288 (2011)

Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review53(2), 217–288 (2011). DOI10.1137/090771806

work page 2011
[15]

Journal of Optimization Theory and Applications103(1), 1–43 (1999)

Horst, R., Thoai, N.V.: DC programming: overview. Journal of Optimization Theory and Applications103(1), 1–43 (1999). DOI10.1023/A:1021765131316

work page 1999
[16]

The unconstrained binary quadratic programming problem: a survey.Journal of Combinatorial Optimization, 28:58–81, 2014

Kochenberger, G., Hao, J.K., Glover, F., Lewis, M., L¨ u, Z., Wang, H., Wang, Y.: The unconstrained binary quadratic programming problem: a survey. Journal of Combinatorial Optimization28(1), 58–81 (2014). DOI10.1007/s10878-014-9734-0

work page 2014
[17]

Finding directional stationary points of DC programs

Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: A unified approach for finding directional stationary points of DC programs (2026). ArXiv:2605.15838,https://arxiv.org/abs/2605.15838

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Pattern Recognition47(1), 388–401 (2014)

Le Thi, H.A., Le, H.M., Pham Dinh, T.: New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recognition47(1), 388–401 (2014). DOI 10.1016/j.patcog. 2013.07.012

work page doi:10.1016/j.patcog 2014
[19]

Annals of Operations Research133(1–4), 23–46 (2005)

Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research133(1–4), 23–46 (2005). DOI10.1007/s10479-004-5022-1

work page 2005
[20]

Mathematical Programming169(1), 5–68 (2018)

Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Mathematical Programming169(1), 5–68 (2018). DOI10.1007/s10107-018-1235-y 38

work page 2018
[21]

SIAM Journal on Optimization29(4), 2725–2752 (2019)

Lu, Z., Zhou, Z.: Nonmonotone enhanced proximal DC algorithms for a class of structured nonsmooth DC programming. SIAM Journal on Optimization29(4), 2725–2752 (2019). DOI10.1137/18M1214342

work page 2019
[22]

Mathematical Programming176(1–2), 369–401 (2019)

Lu, Z., Zhou, Z., Sun, Z.: Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Mathematical Programming176(1–2), 369–401 (2019). DOI10.1007/s10107-018-1318-9

work page 2019
[23]

Mathematical Programming75(1), 19–76 (1996)

Luo, Z.Q., Pang, J.S., Ralph, D., Wu, S.: Exact penalization and stationarity conditions of mathematical programs with equilibrium constraints. Mathematical Programming75(1), 19–76 (1996). DOI10.1007/BF02592205

work page 1996
[24]

Optimization and Engineering10(1), 1–17 (2009)

Magnani, A., Boyd, S.P.: Convex piecewise-linear fitting. Optimization and Engineering10(1), 1–17 (2009). DOI10.1007/s11081-008-9045-3

work page 2009
[25]

ArXiv:2211.10942, https://arxiv

Niu, Y.S.: On the convergence analysis of DCA (2022). ArXiv:2211.10942, https://arxiv. org/abs/2211.10942

work page arXiv 2022
[26]

Journal of Scientific Computing92(2), 46 (2022)

Niu, Y.S., Glowinski, R.: Discrete dynamical system approaches for boolean polyno- mial optimization. Journal of Scientific Computing92(2), 46 (2022). DOI 10.1007/ s10915-022-01882-z

work page 2022
[27]

SIAM Journal on Optimization34(2), 1852–1878 (2024)

Niu, Y.S., Le Thi, H.A., Pham Dinh, T.: On difference-of-SOS and difference-of-convex-SOS decompositions for polynomials. SIAM Journal on Optimization34(2), 1852–1878 (2024). DOI10.1137/22M1495524

work page 2024
[28]

Optimization pp

Niu, Y.S., You, Y., Benammour, M.F., Wang, Y.: A parallel difference-of-convex cutting plane algorithm for mixed-binary linear programs. Optimization pp. 1–38 (2025). DOI 10.1080/ 02331934.2025.2588422

work page arXiv 2025
[29]

Mathematics of Operations Research42(1), 95–118 (2017)

Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Mathematics of Operations Research42(1), 95–118 (2017). DOI 10.1287/moor. 2016.0795

work page doi:10.1287/moor 2017
[30]

SIAM Journal on Optimization28(2), 1640–1669 (2018)

Pang, J.S., Tao, M.: Decomposition methods for computing directional stationary solutions of a class of nonsmooth nonconvex optimization problems. SIAM Journal on Optimization28(2), 1640–1669 (2018). DOI10.1137/17M1110249

work page 2018
[31]

programming: theory, algorithms and applications

Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Mathematica Vietnamica22(1), 289–355 (1997)

work page 1997
[32]

In: Proceedings of The 33rd International Conference on Machine Learning,Proceedings of Machine Learning Research, vol

Reddi, S.J., Hefny, A., Sra, S., P´ oczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: Proceedings of The 33rd International Conference on Machine Learning,Proceedings of Machine Learning Research, vol. 48, pp. 314–323. PMLR, New York, New York, USA (2016).https://proceedings.mlr.press/v48/reddi16.html

work page 2016
[33]

Princeton University Press, Princeton (1970)

Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

work page 1970
[34]

In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp

Sarl´ os, T.: Improved approximation algorithms for large matrices via random projections. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 143–152 (2006). DOI10.1109/FOCS.2006.37 39

work page 2006
[35]

Computational Optimization and Applications69(2), 297–324 (2018)

Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapola- tion. Computational Optimization and Applications69(2), 297–324 (2018). DOI 10.1007/ s10589-017-9954-1

work page 2018
[36]

Wiegele, A.: Biq Mac Library: a collection of max-cut and quadratic 0–1 programming instances of medium size. Tech. rep., Alpen-Adria-Universit¨ at Klagenfurt (2007). https: //biqmac.aau.at/biqmaclib.html

work page 2007
[37]

Foundations and Trends in Theoretical Computer Science10(1–2), 1–157 (2014)

Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science10(1–2), 1–157 (2014). DOI10.1561/0400000060 40

work page 2014

[1] [1]

Mathematical Programming169(1), 69–94 (2018)

Ahmadi, A.A., Hall, G.: DC decomposition of nonconvex polynomials with algebraic techniques. Mathematical Programming169(1), 69–94 (2018). DOI10.1007/s10107-017-1144-5

work page 2018

[2] [2]

The Annals of Applied Statistics7(1), 226–248 (2013)

Alfons, A., Croux, C., Gelper, S.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. The Annals of Applied Statistics7(1), 226–248 (2013). DOI10.1214/12-AOAS575

work page 2013

[3] [3]

Cooperative exploration of level surfaces of three dimensional scalar ﬁelds,

Bagirov, A.M., Taheri, S., Ugon, J.: Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems. Pattern Recognition53, 12–24 (2016). DOI 10.1016/j. patcog.2015.11.011

work page doi:10.1016/j 2016

[4] [4]

Journal of the Operational Research Society41(11), 1069–1072 (1990)

Beasley, J.E.: OR-Library: distributing test problems by electronic mail. Journal of the Operational Research Society41(11), 1069–1072 (1990). DOI10.1057/jors.1990.166

work page 1990

[5] [5]

Princeton University Press, Princeton (2009)

Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009). DOI10.1515/9781400831050 35 Table 14: Starts and backend sensitivity on bqp250.8. The table compares serial CPU, batched CPU, and batched GPU prototypes using the same initial multistart pool. The speedup columns are relative to the matching seri...

work page 2009

[6] [6]

SIAM Review53(3), 464–501 (2011)

Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Review53(3), 464–501 (2011). DOI10.1137/080734510

work page 2011

[7] [7]

The Annals of Statistics44(2), 813–852 (2016)

Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. The Annals of Statistics44(2), 813–852 (2016). DOI10.1214/15-AOS1388

work page 2016

[8] [8]

ACM Transactions on Intelligent Systems and Technology2(3), 27:1–27:27 (2011)

Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology2(3), 27:1–27:27 (2011). DOI 10.1145/1961189.1961199. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

work page doi:10.1145/1961189.1961199 2011

[9] [9]

Default” uses the parameters of the reported experiments; “LP-forced

Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983) 36 0 50 100 150 200 250 300 starts 0 0.5 1 1.5 2 2.5 3 3.5gap (%) Solution quality shift gap spectral gap 0 50 100 150 200 250 300 starts 100 101 102 speedup vs serial CPU Backend speedup shift / batched CPU shift / batched GPU spectral / batched CPU spectral / batched GPU Figure 7:...

work page 1983

[10] [10]

SIAM Journal on Optimization28(4), 3344–3374 (2018)

Cui, Y., Pang, J.S., Sen, B.: Composite difference-max programs for modern statistical estimation problems. SIAM Journal on Optimization28(4), 3344–3374 (2018). DOI 10.1137/ 18M117337X

work page 2018

[11] [11]

Optimization Methods and Software19(1), 15–40 (2004)

Fletcher, R., Leyffer, S., Ralph, D., Scholtes, S.: Solving mathematical programs with com- plementarity constraints as nonlinear programs. Optimization Methods and Software19(1), 15–40 (2004). DOI10.1080/10556780410001654241

work page 2004

[12] [12]

The exact residual is computed using the exact active set, not the initial ε0-active set

Gleixner, A., Hendel, G., Gamrath, G., et al.: MIPLIB 2017: data-driven compilation of the 6th 37 Table 16: A near-active diagnostic for the LP fallback. The exact residual is computed using the exact active set, not the initial ε0-active set. The LP branch prevents a spurious step caused by nearly active but exactly inactive affine pieces. methodxobjecti...

work page 2017

[13] [13]

https://docs

Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2026). https://docs. gurobi.com/

work page 2026

[14] [14]

SIAM Review53(2), 217–288 (2011)

Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review53(2), 217–288 (2011). DOI10.1137/090771806

work page 2011

[15] [15]

Journal of Optimization Theory and Applications103(1), 1–43 (1999)

Horst, R., Thoai, N.V.: DC programming: overview. Journal of Optimization Theory and Applications103(1), 1–43 (1999). DOI10.1023/A:1021765131316

work page 1999

[16] [16]

The unconstrained binary quadratic programming problem: a survey.Journal of Combinatorial Optimization, 28:58–81, 2014

Kochenberger, G., Hao, J.K., Glover, F., Lewis, M., L¨ u, Z., Wang, H., Wang, Y.: The unconstrained binary quadratic programming problem: a survey. Journal of Combinatorial Optimization28(1), 58–81 (2014). DOI10.1007/s10878-014-9734-0

work page 2014

[17] [17]

Finding directional stationary points of DC programs

Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: A unified approach for finding directional stationary points of DC programs (2026). ArXiv:2605.15838,https://arxiv.org/abs/2605.15838

work page internal anchor Pith review Pith/arXiv arXiv 2026

[18] [18]

Pattern Recognition47(1), 388–401 (2014)

Le Thi, H.A., Le, H.M., Pham Dinh, T.: New and efficient DCA based algorithms for minimum sum-of-squares clustering. Pattern Recognition47(1), 388–401 (2014). DOI 10.1016/j.patcog. 2013.07.012

work page doi:10.1016/j.patcog 2014

[19] [19]

Annals of Operations Research133(1–4), 23–46 (2005)

Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research133(1–4), 23–46 (2005). DOI10.1007/s10479-004-5022-1

work page 2005

[20] [20]

Mathematical Programming169(1), 5–68 (2018)

Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of developments. Mathematical Programming169(1), 5–68 (2018). DOI10.1007/s10107-018-1235-y 38

work page 2018

[21] [21]

SIAM Journal on Optimization29(4), 2725–2752 (2019)

Lu, Z., Zhou, Z.: Nonmonotone enhanced proximal DC algorithms for a class of structured nonsmooth DC programming. SIAM Journal on Optimization29(4), 2725–2752 (2019). DOI10.1137/18M1214342

work page 2019

[22] [22]

Mathematical Programming176(1–2), 369–401 (2019)

Lu, Z., Zhou, Z., Sun, Z.: Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Mathematical Programming176(1–2), 369–401 (2019). DOI10.1007/s10107-018-1318-9

work page 2019

[23] [23]

Mathematical Programming75(1), 19–76 (1996)

Luo, Z.Q., Pang, J.S., Ralph, D., Wu, S.: Exact penalization and stationarity conditions of mathematical programs with equilibrium constraints. Mathematical Programming75(1), 19–76 (1996). DOI10.1007/BF02592205

work page 1996

[24] [24]

Optimization and Engineering10(1), 1–17 (2009)

Magnani, A., Boyd, S.P.: Convex piecewise-linear fitting. Optimization and Engineering10(1), 1–17 (2009). DOI10.1007/s11081-008-9045-3

work page 2009

[25] [25]

ArXiv:2211.10942, https://arxiv

Niu, Y.S.: On the convergence analysis of DCA (2022). ArXiv:2211.10942, https://arxiv. org/abs/2211.10942

work page arXiv 2022

[26] [26]

Journal of Scientific Computing92(2), 46 (2022)

Niu, Y.S., Glowinski, R.: Discrete dynamical system approaches for boolean polyno- mial optimization. Journal of Scientific Computing92(2), 46 (2022). DOI 10.1007/ s10915-022-01882-z

work page 2022

[27] [27]

SIAM Journal on Optimization34(2), 1852–1878 (2024)

Niu, Y.S., Le Thi, H.A., Pham Dinh, T.: On difference-of-SOS and difference-of-convex-SOS decompositions for polynomials. SIAM Journal on Optimization34(2), 1852–1878 (2024). DOI10.1137/22M1495524

work page 2024

[28] [28]

Optimization pp

Niu, Y.S., You, Y., Benammour, M.F., Wang, Y.: A parallel difference-of-convex cutting plane algorithm for mixed-binary linear programs. Optimization pp. 1–38 (2025). DOI 10.1080/ 02331934.2025.2588422

work page arXiv 2025

[29] [29]

Mathematics of Operations Research42(1), 95–118 (2017)

Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Mathematics of Operations Research42(1), 95–118 (2017). DOI 10.1287/moor. 2016.0795

work page doi:10.1287/moor 2017

[30] [30]

SIAM Journal on Optimization28(2), 1640–1669 (2018)

Pang, J.S., Tao, M.: Decomposition methods for computing directional stationary solutions of a class of nonsmooth nonconvex optimization problems. SIAM Journal on Optimization28(2), 1640–1669 (2018). DOI10.1137/17M1110249

work page 2018

[31] [31]

programming: theory, algorithms and applications

Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Mathematica Vietnamica22(1), 289–355 (1997)

work page 1997

[32] [32]

In: Proceedings of The 33rd International Conference on Machine Learning,Proceedings of Machine Learning Research, vol

Reddi, S.J., Hefny, A., Sra, S., P´ oczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: Proceedings of The 33rd International Conference on Machine Learning,Proceedings of Machine Learning Research, vol. 48, pp. 314–323. PMLR, New York, New York, USA (2016).https://proceedings.mlr.press/v48/reddi16.html

work page 2016

[33] [33]

Princeton University Press, Princeton (1970)

Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

work page 1970

[34] [34]

In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp

Sarl´ os, T.: Improved approximation algorithms for large matrices via random projections. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 143–152 (2006). DOI10.1109/FOCS.2006.37 39

work page 2006

[35] [35]

Computational Optimization and Applications69(2), 297–324 (2018)

Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapola- tion. Computational Optimization and Applications69(2), 297–324 (2018). DOI 10.1007/ s10589-017-9954-1

work page 2018

[36] [36]

Wiegele, A.: Biq Mac Library: a collection of max-cut and quadratic 0–1 programming instances of medium size. Tech. rep., Alpen-Adria-Universit¨ at Klagenfurt (2007). https: //biqmac.aau.at/biqmaclib.html

work page 2007

[37] [37]

Foundations and Trends in Theoretical Computer Science10(1–2), 1–157 (2014)

Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science10(1–2), 1–157 (2014). DOI10.1561/0400000060 40

work page 2014