Faster than Fast-LTS: Robust Regression and Outlier Detection with DC Programming

Alain B. Zemkoho; Marah-Lisanne Thormann; Phan Tu Vuong; Tri-Dung Nguyen

arxiv: 2606.28974 · v1 · pith:ROYP2SSZnew · submitted 2026-06-27 · 🧮 math.OC · cs.NA· math.NA· math.ST· stat.TH

Faster than Fast-LTS: Robust Regression and Outlier Detection with DC Programming

Marah-Lisanne Thormann , Phan Tu Vuong , Alain B. Zemkoho , Tri-Dung Nguyen This is my paper

Pith reviewed 2026-06-30 08:31 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NAmath.STstat.TH

keywords robust regressionleast trimmed squaresDC programmingoutlier detectionconcave minimizationdifference of convex functionspreconditioningoptimization algorithms

0 comments

The pith

Reformulating Least Trimmed Squares as a DC program enables the sBDCA algorithm to solve robust regression faster and more accurately than Fast-LTS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the combinatorial Least Trimmed Squares problem can be recast exactly as a concave minimization over a capped simplex. It proposes the successive Boosted Difference of Convex Functions Algorithm (sBDCA) to solve this formulation and proves linear convergence to local solutions using the Lojasiewicz property. A custom preconditioning matrix is derived to achieve robust results from a single starting point. Experiments on synthetic and real datasets demonstrate that this method runs up to 3.25 times faster than the standard Fast-LTS heuristic while producing objective values up to 90 percent lower, especially when the number of dimensions is large. The work includes open Python code to make the approach practical.

Core claim

The LTS problem can be exactly recast as a concave minimization subject to a capped simplex constraint. The sBDCA algorithm solves this reformulation and, when combined with a derived preconditioning matrix, converges to a local solution with linear rate in the fastest case while delivering robust performance from a single initialization.

What carries the argument

The successive Boosted Difference of Convex Functions Algorithm (sBDCA) applied to the DC reformulation of the LTS estimator under a capped simplex constraint, augmented by a problem-specific preconditioning matrix.

If this is right

sBDCA converges linearly to local solutions via the Lojasiewicz property.
The preconditioning matrix enables robustness from one initialization without loss of solution quality.
The method runs up to 3.25 times faster than Fast-LTS on tested instances.
Objective function values are up to 90% lower than those from Fast-LTS, especially in high dimensions.
Open Python code is provided for practical use in robust regression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The DC reformulation approach could be tested on other robust estimators that involve combinatorial subset selection.
Preconditioning strategies derived here might improve convergence for similar DC programs in statistics.
High-dimensional performance gains suggest potential for scaling robust methods to large datasets where Fast-LTS struggles.

Load-bearing premise

The combinatorial LTS problem admits an exact reformulation as a concave minimization over a capped simplex whose solutions match the original problem, and the preconditioning matrix ensures single-start robustness.

What would settle it

Running the method on a high-dimensional dataset where the preconditioned sBDCA returns a worse objective value or requires multiple starts to match Fast-LTS performance would falsify the practical claims.

Figures

Figures reproduced from arXiv: 2606.28974 by Alain B. Zemkoho, Marah-Lisanne Thormann, Phan Tu Vuong, Tri-Dung Nguyen.

**Figure 1.** Figure 1: Masking, Swamping, and Types of Outliers in Regression Analysis. In regression analysis, different types of outliers can be distinguished. If an observation deviates significantly from the true underlying regression line, it is classified as a regression outlier [cf. Rousseeuw and van Zomeren (1990, p. 636)]. In contrast, an instance is considered a leverage 4 [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Influence of ρk on Average Trimmed Distance (ATD) and Computation Time. The results for the ten toy examples are summarized in [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Influence of α on the PDF of the Dirichlet Distribution. Since the inspection of the gradient ∇f at the beginning of this subsection suggested that starting points should ideally not be too far from the global solution – which corresponds to a vertex – we next consider an initialization strategy based on an extreme point of the feasible set. Specifically, we denote by v0 ∈ {0, 1} n a vertex satisfying v ⊤ … view at source ↗

**Figure 4.** Figure 4: Comparison of different Initialization Strategies for sBDCA. Notably, Fast-LTS does not suffer from the drawback of the p-subset initialization. As explained in Subsection 1.4, the heuristic internally generates 500 random p-subsets during each initialization, which are processed separately through the internal sorting procedure. After a few iterations, only the most promising results are retained and run … view at source ↗

**Figure 5.** Figure 5: Influence of Preconditioner on ATD and Computation Time. 4. Applications to Robust Regression In the theoretical part of this paper, preliminary numerical results were already presented in Subsection 3.2 and Subsection 3.3 to showcase the influence of the successive DC decompositions, the choice of starting points, and the proposed preconditioning matrix. To provide a more comprehensive evaluation of the … view at source ↗

**Figure 6.** Figure 6: Simple Linear Regression Examples. The regression lines in [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: compares the fraction of infeasible solutions (left y-axis, bar plots) and the ATD of all feasible outcomes (right y-axis, box plots) across the selected settings. Overall, the subplots confirm that sBDCA with preconditioning achieves superior performance in terms of both solution quality and algorithmic reliability. Compared to Fast-LTS, the results of the proposed DC programming algorithm are less sensi… view at source ↗

**Figure 8.** Figure 8: Performance comparison between sBDCA with preconditioning and Fast-LTS using the OSCM and focusing on algorithmic efficiency. Each subplot displays the median number of iterations / c-steps (left y-axis, bar plots) and the computation time of all feasible solutions (right y-axis, box plots) across different numbers of input variables on the x-axis. Results are grouped by solver type and initialization stra… view at source ↗

**Figure 9.** Figure 9: Performance comparison between sBDCA with preconditioning and Fast-LTS using the DMLP and focusing on algorithmic reliability and output quality. Each subplot displays the fraction of infeasible solutions (left y-axis, bar plots) and the ATD of all feasible outcomes (right y-axis, box plots) across different numbers of input variables on the x-axis. Results are grouped by solver type and initialization str… view at source ↗

**Figure 10.** Figure 10: Optimization Paths of Fast-LTS and sBDCA with Preconditioning. To provide the final part of the performance comparison using the DMLP, [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: Performance comparison between sBDCA with preconditioning and Fast-LTS using the DMLP and focusing on algorithmic efficiency. Each subplot displays the median number of iterations / c-steps (left y-axis, bar plots) and the computation time of all feasible solutions (right y-axis, box plots) across different numbers of input variables on the x-axis. Results are grouped by solver type and initialization str… view at source ↗

**Figure 12.** Figure 12: Performance comparison between sBDCA with preconditioning and Fast-LTS using the MSD / SCD and focusing on algorithmic reliability and output quality. Each subplot displays the fraction of infeasible solutions (left y-axis, bar plots) and the ATD of all feasible outcomes (right y-axis, box plots) across different numbers of input variables on the x-axis. Results are grouped by solver type and initializati… view at source ↗

**Figure 13.** Figure 13: Performance comparison between sBDCA with preconditioning and Fast-LTS using the MSD / SCD and focusing on algorithmic efficiency. Each subplot displays the median number of iterations / c-steps (left y-axis, bar plots) and the computation time of all feasible solutions (right y-axis, box plots) across different numbers of input variables on the x-axis. Results are grouped by solver type and initializatio… view at source ↗

**Figure 14.** Figure 14: Specifically, the three subplots are based on three variables z1, z2, z3 ∈ [0, 1], which satisfy either z1 + z2 + z3 = 2 (green triangle), or z1 + z2 + z3 = 1 (blue triangle). The latter triangle is also known as the probability or unit simplex. From these representations, it can be observed that the feasible set is closed, bounded, and convex, as it corresponds to a hypercube (due to the box constraints)… view at source ↗

**Figure 15.** Figure 15: Performance comparison between sBDCA with preconditioning and Fast-LTS using the DMLP and focusing on algorithmic reliability and output quality. Each subplot displays the fraction of infeasible solutions (left y-axis, bar plots) and the ATD of all feasible outcomes (right y-axis, box plots) across different numbers of input variables on the x-axis. Results are grouped by solver type and initialization st… view at source ↗

read the original abstract

When datasets contain outliers, robust regression is a well-established alternative to Ordinary Least Squares. A commonly employed robust estimator is Least Trimmed Squares (LTS), which computes the regression coefficients from a subset of observations. Determining the exact solution corresponds to a combinatorial problem with prohibitive computational costs, even for instances of moderate dimension. Thus, the most prevalent approach in practice remains a heuristic known as Fast-LTS. Although the heuristic often performs effectively, certain elements of the approach remain open to improvement. In particular, its core procedure provides robust results only when initialized with a large number of starting points. To address the heuristic's limitations, this paper reformulates the LTS problem as a concave minimization problem subject to a capped simplex constraint, and proposes the successive Boosted Difference of Convex Functions Algorithm (sBDCA) as a solution method. Theoretically, we establish via the \L ojasiewicz property that sBDCA converges to a local solution with a linear rate in the fastest case. To ensure robustness from a single initialization in practice, we derive and integrate a problem-specific preconditioning matrix into the algorithmic setup. Building on this theoretical foundation, we conduct numerical studies on various synthetic and real-world datasets to demonstrate the effectiveness of sBDCA with preconditioning. Specifically, we show that our approach is up to 3.25 times faster than Fast-LTS and achieves up to 90% lower objective function values, particularly in high-dimensional settings. As all code is openly available, this paper further provides a practical guide to robust regression in Python.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The DC reformulation of LTS with preconditioned sBDCA delivers faster runtimes and lower objectives than Fast-LTS from a single start, and the math checks out.

read the letter

The main point is that this paper turns the LTS combinatorial problem into a concave minimization over a capped simplex and solves it with successive boosted DCA plus a derived preconditioner. That setup produces linear convergence via the Lojasiewicz property and, in the experiments, runs up to 3.25 times faster than Fast-LTS while reaching noticeably lower objective values, especially in higher dimensions.

The reformulation is explicit and the preconditioner is constructed to support single-start robustness without apparent degradation. The numerical comparisons use the same objective on identical instances, and the open code lets anyone verify the implementation. The stress-test confirms the equivalence holds and there is no hidden fitting or unsupported step in the convergence claim.

The only soft spot worth noting is that the preconditioner is problem-specific, so anyone wanting to apply the same trick to a different robust estimator would need to redo that derivation. That is a normal limitation rather than a flaw in the current work.

This is for people who implement or extend robust regression routines and for optimization researchers interested in DC methods on combinatorial problems. A reader who cares about practical speed-ups over Fast-LTS or reproducible nonconvex solvers will get direct value. The paper is grounded enough to deserve peer review.

Referee Report

0 major / 2 minor

Summary. The manuscript reformulates the combinatorial Least Trimmed Squares (LTS) problem exactly as a concave minimization over a capped simplex constraint and introduces the successive Boosted Difference of Convex Functions Algorithm (sBDCA) together with a derived problem-specific preconditioning matrix. It establishes linear convergence to a local solution via the Łojasiewicz property and reports that sBDCA with preconditioning is up to 3.25 times faster than Fast-LTS while attaining up to 90% lower objective values on synthetic and real data, with all code released openly.

Significance. If the DC equivalence and preconditioner construction hold, the paper supplies a theoretically grounded, single-start robust alternative to Fast-LTS that is especially advantageous in high dimensions. The explicit reformulation, the Lojasiewicz-based rate, the open reproducible code, and the fair numerical comparisons (identical objective, same instances) are concrete strengths.

minor comments (2)

The abstract states convergence 'in the fastest case'; a brief clarification of the precise Łojasiewicz exponent range that yields the linear rate would improve readability.
Notation for the capped simplex and the preconditioning matrix could be introduced once in a dedicated preliminary section rather than inline in the algorithmic description.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of the manuscript, the recognition of its theoretical and practical contributions, and the recommendation to accept.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's derivation begins with an explicit combinatorial-to-DC reformulation of the LTS objective (concave minimization over capped simplex) and derives a preconditioning matrix from the problem geometry; both steps are presented as direct mathematical constructions rather than fits. Convergence follows from the external Łojasiewicz inequality with a stated linear rate. Numerical claims compare runtime and attained objective values on identical instances against Fast-LTS; these are empirical measurements, not quantities forced by internal fitting or self-citation. No load-bearing step reduces a reported result to its own inputs by construction, and the chain is self-contained against external benchmarks and open code.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central modeling step is the assumption that LTS admits an exact DC reformulation; convergence relies on the Lojasiewicz property (standard in nonsmooth optimization). No free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption The LTS objective admits an exact reformulation as concave minimization subject to a capped simplex constraint.
This modeling choice is the load-bearing step that enables the DC algorithm.

pith-pipeline@v0.9.1-grok · 5837 in / 1155 out tokens · 36187 ms · 2026-06-30T08:31:27.731252+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

139 extracted references · 93 canonical work pages · 2 internal anchors

[1]

Aggarwal, C. C. (2015).Data Mining: The Textbook, 1st edn, Springer, Cham.https: //doi.org/10.1007/978-3-319-14142-8

work page doi:10.1007/978-3-319-14142-8 2015
[2]

Agulló, J. (2001). New algorithms for computing the least trimmed squares regression estima- tor,Computational Statistics & Data Analysis36(4): 425–439.https://doi.org/10.1016/ S0167-9473(00)00056-6

2001
[3]

Ahipaşaoğlu, S. D. (2015). Fast algorithms for the minimum volume estimator,Journal of Global Optimization62: 351–370.https://doi.org/10.1007/s10898-014-0233-8

work page doi:10.1007/s10898-014-0233-8 2015
[4]

Al-Noor, N. H. and Mohammad, A. A. (2013). Model of Robust Regression with Parametric and Nonparametric Methods,Mathematical Theory and Modeling3(5): 27–39

2013
[5]

and Gelper, S

Alfons, A., Croux, C. and Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets,The Annals of Applied Statistics7(1): 226–248. https://doi.org/10.1214/12-AOAS575

work page doi:10.1214/12-aoas575 2013
[6]

Alma, Ö. G. (2011). Comparison of robust regression methods in linear regression,Interna- tional Journal of Contemporary Mathematical Sciences6(9): 409–421

2011
[7]

and Wang, Y

Ang, A., Ma, J., Liu, N., Huang, K. and Wang, Y. (2021). Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics, arXiv preprint, arXiv:2110.08471 [math.OC].https://doi.org/10.48550/arXiv.2110.08471

work page doi:10.48550/arxiv.2110.08471 2021
[8]

Anjos, M. F. and Lasserre, J. B. (2011).Handbook on Semidefinite, Conic and Polynomial Optimization, International Series in Operations Research & Management Science, 1st edn, Springer, New York.https://doi.org/10.1007/978-1-4614-0769-0

work page doi:10.1007/978-1-4614-0769-0 2011
[9]

and Lu, W.-S

Antoniou, A. and Lu, W.-S. (2021).Practical Optimization: Algorithms and Engineering Applications, Texts in Computer Science, 2nd edn, Springer, New York.https://doi.org/ 10.1007/978-1-0716-0843-2

work page doi:10.1007/978-1-0716-0843-2 2021
[10]

J., Campoy, R

Aragón-Artacho, F. J., Campoy, R. and Vuong, P. T. (2022). The Boosted DC Algorithm for Linearly Constrained DC Programming,Set-Valued and Variational Analysis30: 1265–1289. https://doi.org/10.1007/s11228-022-00656-x

work page doi:10.1007/s11228-022-00656-x 2022
[11]

AcceleratingtheDCalgorithm for smooth functions,Mathematical Programming169: 95–118.https://doi.org/10.1007/ s10107-017-1180-1

Aragón-Artacho, F.J., Fleming, R.M.andVuong, P.T.(2018). AcceleratingtheDCalgorithm for smooth functions,Mathematical Programming169: 95–118.https://doi.org/10.1007/ s10107-017-1180-1

2018
[12]

Aragón-Artacho, F. J. and Vuong, P. T. (2020). The Boosted Difference of Convex Functions Algorithm For Nonsmooth Functions,SIAM Journal on Optimization30(1): 980–1006.https: //doi.org/10.1137/18M123339X

work page doi:10.1137/18m123339x 2020
[13]

Armstrong, R. A. (2014). When to use the Bonferroni correction,Ophthalmic and Physiological Optics34(5): 502–508.https://doi.org/10.1111/opo.12131

work page doi:10.1111/opo.12131 2014
[14]

Atkinson, A. C. and Cheng, T.-C. (1999). Computing least trimmed squares regression with the forward search,Statistics and Computing9: 251–263.https://doi.org/10.1023/A: 1008942604045

work page doi:10.1023/a: 1999
[15]

and Vanthienen, J

Baesens, B., Mues, C., Martens, D. and Vanthienen, J. (2009). 50 years of data mining and OR: upcoming trends and challenges,Journal of the Operational Research Society60(sup1): S16– S23.https://doi.org/10.1057/jors.2008.171. 32 Thormann et al. Robust Regression with DC Programming

work page doi:10.1057/jors.2008.171 2009
[16]

and Ceselli, A

Barbato, M. and Ceselli, A. (2024). Mathematical programming for simultaneous feature selection and outlier detection under l1 norm,European Journal of Operational Research 316(3): 1070–1084.https://doi.org/10.1016/j.ejor.2024.03.035

work page doi:10.1016/j.ejor.2024.03.035 2024
[17]

and Wilson, S

Becher, H., Hall, P. and Wilson, S. R. (1993). Bootstrap hypothesis testing procedures,Bio- metrics49(4): 1268–1272.https://doi.org/10.2307/2532271

work page doi:10.2307/2532271 1993
[18]

and Lucet, Y

Beiranvand, V., Hare, W. and Lucet, Y. (2017). Best practices for comparing optimiza- tion algorithms,Optimization and Engineering18: 815–848.https://doi.org/10.1007/ s11081-017-9366-1

2017
[19]

and Nahavandi, S

Beliakov, G., Johnstone, M. and Nahavandi, S. (2012). Computing of high breakdown re- gression estimators without sorting on graphics processing units,Computing94: 433–447. https://doi.org/10.1007/s00607-011-0183-7

work page doi:10.1007/s00607-011-0183-7 2012
[20]

Bernholt, T. (2006). Robust Estimators are Hard to Compute, Technical Report, No. 2005/52, Universität Dortmund, Sonderforschungsbereich 475 - Komplexitätsreduktion in Multivari- aten Datenstrukturen, Dortmund.https://www.econstor.eu/bitstream/10419/22645/1/ tr52-05.pdf

2006
[21]

P., Whitman, B

Bertin-Mahieux, T., Ellis, D. P., Whitman, B. and Lamere, P. (2011). The Million Song Dataset. UCI Machine Learning Repository. Available athttps://archive.ics.uci.edu/ dataset/203/yearpredictionmsd

2011
[22]

Bertsekas, D. P. (1999).Nonlinear Programming, 2nd edn, Athena Scientific, Belmont (Mas- sachusetts). ISBN 1-886529-00-0

1999
[23]

and King, A

Bertsimas, D. and King, A. (2016). OR Forum—An Algorithmic Approach to Linear Regres- sion,Operations Research64(1): 2–16.https://doi.org/10.1287/opre.2015.1436

work page doi:10.1287/opre.2015.1436 2016
[24]

Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita,Pubblicazioni del R istituto superiore di scienze economiche e commericiali di firenze8: 3–62

1936
[25]

and Vandenberghe, L

Boyd, S. and Vandenberghe, L. (2009).Convex Optimization, Cambridge University Press, Cambridge.https://doi.org/10.1017/CBO9780511804441

work page doi:10.1017/cbo9780511804441 2009
[26]

Brualdi, R. A. and Ryser, H. J. (1991).Combinatorial Matrix Theory, Encyclopedia of Mathematics and its Applications, 1st edn, Cambridge University Press, Cambridge.https: //doi.org/10.1017/CBO9781107325708

work page doi:10.1017/cbo9781107325708 1991
[27]

and Abur, A

Çelik, M. and Abur, A. (1992). A robust WLAV state estimator using transformations,IEEE Transactions on Power Systems7(1): 106–113.https://doi.org/10.1109/59.141693

work page doi:10.1109/59.141693 1992
[28]

and Simonoff, J

Chatterjee, S. and Simonoff, J. S. (2020).Handbook of Regression Analysis With Applications in R, Wiley Series in Probability and Statistics, 2nd edn, John Wiley & Sons, Hoboken (NJ). https://doi.org/10.1002/9781119392491

work page doi:10.1002/9781119392491 2020
[29]

Chave, A. D. and Thomson, D. J. (2003). A Bounded Influence Regression Estimator Based on the Statistics of the Hat Matrix,Journal of the Royal Statistical Society Series C: Applied Statistics52(3): 307–322.https://doi.org/10.1111/1467-9876.00406

work page doi:10.1111/1467-9876.00406 2003
[30]

and Paschalidis, I

Chen, R. and Paschalidis, I. C. (2018). A Robust Learning Approach for Regression Mod- els Based on Distributionally Robust Optimization,Journal of Machine Learning Research 19(13): 1–48.http://jmlr.org/papers/v19/17-295.html

2018
[31]

and Gondzio, J

Cipolla, S. and Gondzio, J. (2024). Proximal-stabilized semidefinite programming, Computational Optimization and Applications,pp. 1–44.https://doi.org/10.1007/ s10589-024-00614-3

2024
[32]

Critchley, F., Schyns, M., Haesbroeck, G., Fauconnier, C., Lu, G., Atkinson, R. A. and Wang, D. Q. (2010). A relaxed approach to combinatorial problems in robustness and diagnostics, Statistics and Computing20: 99–115.https://doi.org/10.1007/s11222-009-9119-x

work page doi:10.1007/s11222-009-9119-x 2010
[33]

and Massart, D

De Maesschalck, R., Jouan-Rimbaud, D. and Massart, D. L. (2000). The Mahalanobis distance, Chemometrics and Intelligent Laboratory Systems50(1): 1–18.https://doi.org/10.1016/ 33 Thormann et al. Robust Regression with DC Programming S0169-7439(99)00047-7

2000
[34]

de Oliveira, W. (2020). The ABC of DC programming,Set-Valued and Variational Analysis 28: 679–706.https://doi.org/10.1007/s11228-020-00566-w

work page doi:10.1007/s11228-020-00566-w 2020
[35]

Doğru, F. Z. and Arslan, O. (2018). Robust mixture regression modeling using the least trimmed squares (LTS)-estimation method,Communications in Statistics - Simulation and Computation47(7): 2184–2196.https://doi.org/10.1080/03610918.2017.1341528

work page doi:10.1080/03610918.2017.1341528 2018
[36]

Efron, B. (1979). Bootstrap Methods: Another look at the Jackknife,The Annals of Statistics 7(1): 1–26.https://www.jstor.org/stable/2958830

work page arXiv 1979
[37]

and Marx, B

Fahrmeir, L., Kneib, T., Lang, S. and Marx, B. D. (2021).Regression: Models, Methods and Applications, 2nd edn, Springer, Berlin.https://doi.org/10.1007/978-3-662-63882-8

work page doi:10.1007/978-3-662-63882-8 2021
[38]

Fernandes, A. A. A., Koehler, M., Konstantinou, N., Pankin, P. and Paton, N. P. (2023). Data preparation: A technological perspective and review,SN Computer Science4(425): 1– 20.https://doi.org/10.1007/s42979-023-01828-8

work page doi:10.1007/s42979-023-01828-8 2023
[39]

Flores, S. (2015). SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression,European Journal of Operational Research246(1): 44–50.https: //doi.org/10.1016/j.ejor.2015.04.024

work page doi:10.1016/j.ejor.2015.04.024 2015
[40]

(2015).Applied Regression Analysis and Generalized Linear Models, 3rd edn, SAGE Publications, Thousand Oaks (California)

Fox, J. (2015).Applied Regression Analysis and Generalized Linear Models, 3rd edn, SAGE Publications, Thousand Oaks (California)

2015
[41]

and Weisberg, S

Fox, J. and Weisberg, S. (2019).An R companion to applied regression, 3rd edn, SAGE Publications, Thousand Oaks (California)

2019
[42]

Gafni, E. M. and Bertsekas, D. P. (1984). Two-Metric Projection Methods for Constrained Optimization,SIAM Journal on Control and Optimization22(6): 936–964.https://doi. org/10.1137/0322061

work page doi:10.1137/0322061 1984
[43]

and Herrera, F

García, S., Luengo, J. and Herrera, F. (2015).Data Preprocessing in Data Mining, Intel- ligent Systems Reference Library, 1st edn, Springer, Cham.https://doi.org/10.1007/ 978-3-319-10247-4

2015
[44]

and Padberg, M

Giloni, A. and Padberg, M. (2002). Least Trimmed Squares Regression, Least Median Squares Regression, and Mathematical Programming,Mathematical and Computer Modelling35(9– 10): 1043–1060.https://doi.org/10.1016/S0895-7177(02)00069-9

work page doi:10.1016/s0895-7177(02)00069-9 2002
[45]

and Novo, V

Giorgi, G., Jimenéz, B. and Novo, V. (2023).Basic Mathematical Programming Theory, International Series in Operations Research & Management Science, 1st edn, Springer, Cham. https://doi.org/10.1007/978-3-031-30324-1

work page doi:10.1007/978-3-031-30324-1 2023
[46]

Habshah, M., Norazan, M. R. and Rahmatullah Imon, A. H. (2009). The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression,Journal of Applied Statistics36(5): 507–520.https://doi.org/10.1080/ 02664760802553463

2009
[47]

Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models,Journal of the American Statistical Association88(424): 1264–1272.https: //doi.org/10.1080/01621459.1993.10476407

work page doi:10.1080/01621459.1993.10476407 1993
[48]

Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor,Computational Materials Science154: 346–354.https://doi.org/10. 1016/j.commatsci.2018.07.052

2018
[49]

Hampel, F. R. (1973). Robust estimation: A condensed partial survey,Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete27: 87–104.https://doi.org/10.1007/ BF00536619

1973
[50]

R., Ronchetti, E

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986).Robust Statistics: The Approach Based on Influence Functions, Wiley Series in Probability and Statistics, 1st edn, John Wiley & Sons, New York.https://doi.org/10.1002/9781118186435. 34 Thormann et al. Robust Regression with DC Programming

work page doi:10.1002/9781118186435 1986
[51]

and Salibián-Barrera, M

Harrington, J. and Salibián-Barrera, M. (2010). Finding approximate solutions to combinato- rial problems with very large data sets using BIRCH,Computational Statistics & Data Analysis 54(3): 655–667.https://doi.org/10.1016/j.csda.2008.08.001

work page doi:10.1016/j.csda.2008.08.001 2010
[52]

Hartman, P. (1959). On functions representable as a difference of convex functions,Pacific Journal of Mathematics9(3): 707–713.https://doi.org/10.2140/pjm.1959.9.707

work page doi:10.2140/pjm.1959.9.707 1959
[53]

and Friedman, J

Hastie, T., Tibshirani, R. and Friedman, J. H. (2009).The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn, Springer, New York.https://doi.org/10. 1007/b94608

2009
[54]

Hawkins, D. M. (1980).Identification of Outliers, Monographs on Statistics and Applied Probability, 1st edn, Springer, Dordrecht.https://doi.org/10.1007/978-94-015-3994-4

work page doi:10.1007/978-94-015-3994-4 1980
[55]

Hawkins, D. M. (1994). The feasible solution algorithm for least trimmed squares regres- sion,Computational Statistics & Data Analysis17(2): 185–196.https://doi.org/10.1016/ 0167-9473(92)00070-8

1994
[56]

M., Bradu, D

Hawkins, D. M., Bradu, D. and Kass, G. V. (1984). Location of Several Outliers in Multiple- Regression Data Using Elemental Sets,Technometrics26(3): 197–208.https://doi.org/10. 1080/00401706.1984.10487956

work page arXiv 1984
[57]

Hawkins, D. M. and Olive, D. J. (1999). Improved feasible solution algorithms for high break- down estimation,Computational Statistics & Data Analysis30(1): 1–11.https://doi.org/ 10.1016/S0167-9473(98)00082-6

work page doi:10.1016/s0167-9473(98)00082-6 1999
[58]

Hawkins, D. M. and Olive, D. J. (2002). Inconsistency of Resampling Algorithms for High- Breakdown Regression Estimators and a New Algorithm,Journal of the American Statistical Association97(457): 136–159.https://doi.org/10.1198/016214502753479293

work page doi:10.1198/016214502753479293 2002
[59]

and Lange, K

Heng, Q. and Lange, K. (2025). Bootstrap estimation of the proportion of outliers in robust regression,Statistics and Computing35(3): 1–14.https://doi.org/10.1007/ s11222-024-10526-1

2025
[60]

T., Le Thi, H

Ho, V. T., Le Thi, H. A. and Pham Dinh, T. (2020). DCA with Successive DC Decomposition for Convex Piecewise-Linear Fitting,inH. A. Le Thi, H. M. Le, T. Pham Dinh and N. T. Nguyen (eds),Advanced Computational Methods for Knowledge Engineering, Springer Inter- national Publishing, Cham, pp. 39–51.https://doi.org/10.1007/978-3-030-38364-0_4

work page doi:10.1007/978-3-030-38364-0_4 2020
[61]

T., Le Thi, H

Ho, V. T., Le Thi, H. A. and Pham Dinh, T. (2021). DCA-based algorithms for DC fitting, Journal of Computational and Applied Mathematics389:113353.https://doi.org/10.1016/ j.cam.2020.113353

work page arXiv 2021
[62]

Hoaglin, D. C. and Welsch, R. E. (1978). The Hat Matrix in Regression and ANOVA,The American Statistician32(1): 17–22.https://doi.org/10.1080/00031305.1978.10479237

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.1978.10479237 1978
[63]

Hocking, R. R. (2003).Methods and Applications of Linear Models: Regression and the Anal- ysis of Variance, 2nd edn, John Wiley & Sons, Hoboken (New Jersey).https://doi.org/10. 1002/0471434159

2003
[64]

and Kontoghiorghes, E

Hofmann, M., Gatu, C. and Kontoghiorghes, E. J. (2010). An Exact Least Trimmed Squares Algorithm for a Range of Coverage Values,Journal of Computational and Graphical Statistics 19(1): 191–204.https://doi.org/10.1198/jcgs.2009.07091

work page doi:10.1198/jcgs.2009.07091 2010
[65]

and Pardalos, P

Horst, R. and Pardalos, P. M. (1995).Handbook of Global Optimization: Volume 2, Nonconvex Optimization and Its Applications, 1st edn, Springer, New York.https://doi.org/10.1007/ 978-1-4757-5362-2

1995
[66]

Hössjer, O. (1995). Exact computation of the least trimmed squares estimate in simple linear regression,Computational Statistics & Data Analysis19(3): 265–282.https://doi.org/10. 1016/0167-9473(95)92697-V

1995
[67]

and Torre, F

Huang, D., Cabral, R. and Torre, F. D. l. (2016). Robust regression,IEEE Transactions on Pattern Analysis and Machine Intelligence38(2): 363–375.https://doi.org/10.1109/ TPAMI.2015.2448091. 35 Thormann et al. Robust Regression with DC Programming

work page arXiv 2016
[68]

Huber, P. J. and Ronchetti, E. M. (2009).Robust Statistics, Wiley Series in Probabil- ity and Statistics, 2nd edn, John Wiley & Sons, Hoboken.https://doi.org/10.1002/ 9780470434697

2009
[69]

Ibe, O. C. (2013).Markov Processes for Stochastic Modeling, 2nd edn, Elsevier, London. https://doi.org/10.1016/C2012-0-06106-6

work page doi:10.1016/c2012-0-06106-6 2013
[70]

and Tibshirani, R

James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021).An Introduction to Statistical Learning with Applications in R, Springer Texts in Statistics, 2nd edn, Springer, New York. https://doi.org/10.1007/978-1-0716-1418-1

work page doi:10.1007/978-1-0716-1418-1 2021
[71]

and Yazıcı, B

Kan, B., Alpu, Ö. and Yazıcı, B. (2013). Robust ridge and robust Liu estimator for regression based on the LTS estimator,Journal of Applied Statistics40(3): 644–655.https://doi.org/ 10.1080/02664763.2012.750285

work page doi:10.1080/02664763.2012.750285 2013
[72]

Lasserre, J. B. (2015).An Introduction to Polynomial and Semi-Algebraic Optimization, Cambridge Texts in Applied Mathematics, 1st edn, Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781107447226

work page doi:10.1017/cbo9781107447226 2015
[73]

Le Thi, H. A. (2000). An efficient algorithm for globally minimizing a quadratic function under convex quadratic constraints,Mathematical Programming87: 401–426.https://doi. org/10.1007/s101070050003

work page doi:10.1007/s101070050003 2000
[74]

A., Ho, V

Le Thi, H. A., Ho, V. T. and Pham Dinh, T. (2019). A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning,Journal of Global Optimization73: 279–310.https://doi.org/10.1007/s10898-018-0698-y

work page doi:10.1007/s10898-018-0698-y 2019
[75]

Le Thi, H. A. and Pham Dinh, T. (2005). The DC (Difference of Convex Functions) Program- ming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems, Annals of Operations Research133: 23–46.https://doi.org/10.1007/s10479-004-5022-1

work page doi:10.1007/s10479-004-5022-1 2005
[76]

Le Thi, H. A. and Pham Dinh, T. (2018). DC programming and DCA: thirty years of developments,Mathematical Programming169: 5–68.https://doi.org/10.1007/ s10107-018-1235-y

2018
[77]

Le Thi, H. A. and Pham Dinh, T. (2024). Open issues and recent advances in DC program- ming and DCA,Journal of Global Optimization88: 533–590.https://doi.org/10.1007/ s10898-023-01272-1

2024
[78]

Lewis, A. D. (2023).Geometric Analysis on Real Analytic Manifolds, Lecture Notes in Math- ematics, 1st edn, Springer, Cham.https://doi.org/10.1007/978-3-031-37913-0

work page doi:10.1007/978-3-031-37913-0 2023
[79]

Liu, T., Pong, T. K. and Takeda, A. (2019). A refined convergence analysis ofpDCAe with ap- plications to simultaneous sparse recovery and outlier detection,Computational Optimization and Applications73: 69–100.https://doi.org/10.1007/s10589-019-00067-z

work page doi:10.1007/s10589-019-00067-z 2019
[80]

(1965).Ensembles Semi-Analytiques, Institute des Hautes Etudes Scientifiques, Bures-sur-Yvette (Seine-et-Oise), France

Łojasiewicz, S. (1965).Ensembles Semi-Analytiques, Institute des Hautes Etudes Scientifiques, Bures-sur-Yvette (Seine-et-Oise), France

1965

Showing first 80 references.

[1] [1]

Aggarwal, C. C. (2015).Data Mining: The Textbook, 1st edn, Springer, Cham.https: //doi.org/10.1007/978-3-319-14142-8

work page doi:10.1007/978-3-319-14142-8 2015

[2] [2]

Agulló, J. (2001). New algorithms for computing the least trimmed squares regression estima- tor,Computational Statistics & Data Analysis36(4): 425–439.https://doi.org/10.1016/ S0167-9473(00)00056-6

2001

[3] [3]

Ahipaşaoğlu, S. D. (2015). Fast algorithms for the minimum volume estimator,Journal of Global Optimization62: 351–370.https://doi.org/10.1007/s10898-014-0233-8

work page doi:10.1007/s10898-014-0233-8 2015

[4] [4]

Al-Noor, N. H. and Mohammad, A. A. (2013). Model of Robust Regression with Parametric and Nonparametric Methods,Mathematical Theory and Modeling3(5): 27–39

2013

[5] [5]

and Gelper, S

Alfons, A., Croux, C. and Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets,The Annals of Applied Statistics7(1): 226–248. https://doi.org/10.1214/12-AOAS575

work page doi:10.1214/12-aoas575 2013

[6] [6]

Alma, Ö. G. (2011). Comparison of robust regression methods in linear regression,Interna- tional Journal of Contemporary Mathematical Sciences6(9): 409–421

2011

[7] [7]

and Wang, Y

Ang, A., Ma, J., Liu, N., Huang, K. and Wang, Y. (2021). Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics, arXiv preprint, arXiv:2110.08471 [math.OC].https://doi.org/10.48550/arXiv.2110.08471

work page doi:10.48550/arxiv.2110.08471 2021

[8] [8]

Anjos, M. F. and Lasserre, J. B. (2011).Handbook on Semidefinite, Conic and Polynomial Optimization, International Series in Operations Research & Management Science, 1st edn, Springer, New York.https://doi.org/10.1007/978-1-4614-0769-0

work page doi:10.1007/978-1-4614-0769-0 2011

[9] [9]

and Lu, W.-S

Antoniou, A. and Lu, W.-S. (2021).Practical Optimization: Algorithms and Engineering Applications, Texts in Computer Science, 2nd edn, Springer, New York.https://doi.org/ 10.1007/978-1-0716-0843-2

work page doi:10.1007/978-1-0716-0843-2 2021

[10] [10]

J., Campoy, R

Aragón-Artacho, F. J., Campoy, R. and Vuong, P. T. (2022). The Boosted DC Algorithm for Linearly Constrained DC Programming,Set-Valued and Variational Analysis30: 1265–1289. https://doi.org/10.1007/s11228-022-00656-x

work page doi:10.1007/s11228-022-00656-x 2022

[11] [11]

AcceleratingtheDCalgorithm for smooth functions,Mathematical Programming169: 95–118.https://doi.org/10.1007/ s10107-017-1180-1

Aragón-Artacho, F.J., Fleming, R.M.andVuong, P.T.(2018). AcceleratingtheDCalgorithm for smooth functions,Mathematical Programming169: 95–118.https://doi.org/10.1007/ s10107-017-1180-1

2018

[12] [12]

Aragón-Artacho, F. J. and Vuong, P. T. (2020). The Boosted Difference of Convex Functions Algorithm For Nonsmooth Functions,SIAM Journal on Optimization30(1): 980–1006.https: //doi.org/10.1137/18M123339X

work page doi:10.1137/18m123339x 2020

[13] [13]

Armstrong, R. A. (2014). When to use the Bonferroni correction,Ophthalmic and Physiological Optics34(5): 502–508.https://doi.org/10.1111/opo.12131

work page doi:10.1111/opo.12131 2014

[14] [14]

Atkinson, A. C. and Cheng, T.-C. (1999). Computing least trimmed squares regression with the forward search,Statistics and Computing9: 251–263.https://doi.org/10.1023/A: 1008942604045

work page doi:10.1023/a: 1999

[15] [15]

and Vanthienen, J

Baesens, B., Mues, C., Martens, D. and Vanthienen, J. (2009). 50 years of data mining and OR: upcoming trends and challenges,Journal of the Operational Research Society60(sup1): S16– S23.https://doi.org/10.1057/jors.2008.171. 32 Thormann et al. Robust Regression with DC Programming

work page doi:10.1057/jors.2008.171 2009

[16] [16]

and Ceselli, A

Barbato, M. and Ceselli, A. (2024). Mathematical programming for simultaneous feature selection and outlier detection under l1 norm,European Journal of Operational Research 316(3): 1070–1084.https://doi.org/10.1016/j.ejor.2024.03.035

work page doi:10.1016/j.ejor.2024.03.035 2024

[17] [17]

and Wilson, S

Becher, H., Hall, P. and Wilson, S. R. (1993). Bootstrap hypothesis testing procedures,Bio- metrics49(4): 1268–1272.https://doi.org/10.2307/2532271

work page doi:10.2307/2532271 1993

[18] [18]

and Lucet, Y

Beiranvand, V., Hare, W. and Lucet, Y. (2017). Best practices for comparing optimiza- tion algorithms,Optimization and Engineering18: 815–848.https://doi.org/10.1007/ s11081-017-9366-1

2017

[19] [19]

and Nahavandi, S

Beliakov, G., Johnstone, M. and Nahavandi, S. (2012). Computing of high breakdown re- gression estimators without sorting on graphics processing units,Computing94: 433–447. https://doi.org/10.1007/s00607-011-0183-7

work page doi:10.1007/s00607-011-0183-7 2012

[20] [20]

Bernholt, T. (2006). Robust Estimators are Hard to Compute, Technical Report, No. 2005/52, Universität Dortmund, Sonderforschungsbereich 475 - Komplexitätsreduktion in Multivari- aten Datenstrukturen, Dortmund.https://www.econstor.eu/bitstream/10419/22645/1/ tr52-05.pdf

2006

[21] [21]

P., Whitman, B

Bertin-Mahieux, T., Ellis, D. P., Whitman, B. and Lamere, P. (2011). The Million Song Dataset. UCI Machine Learning Repository. Available athttps://archive.ics.uci.edu/ dataset/203/yearpredictionmsd

2011

[22] [22]

Bertsekas, D. P. (1999).Nonlinear Programming, 2nd edn, Athena Scientific, Belmont (Mas- sachusetts). ISBN 1-886529-00-0

1999

[23] [23]

and King, A

Bertsimas, D. and King, A. (2016). OR Forum—An Algorithmic Approach to Linear Regres- sion,Operations Research64(1): 2–16.https://doi.org/10.1287/opre.2015.1436

work page doi:10.1287/opre.2015.1436 2016

[24] [24]

Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita,Pubblicazioni del R istituto superiore di scienze economiche e commericiali di firenze8: 3–62

1936

[25] [25]

and Vandenberghe, L

Boyd, S. and Vandenberghe, L. (2009).Convex Optimization, Cambridge University Press, Cambridge.https://doi.org/10.1017/CBO9780511804441

work page doi:10.1017/cbo9780511804441 2009

[26] [26]

Brualdi, R. A. and Ryser, H. J. (1991).Combinatorial Matrix Theory, Encyclopedia of Mathematics and its Applications, 1st edn, Cambridge University Press, Cambridge.https: //doi.org/10.1017/CBO9781107325708

work page doi:10.1017/cbo9781107325708 1991

[27] [27]

and Abur, A

Çelik, M. and Abur, A. (1992). A robust WLAV state estimator using transformations,IEEE Transactions on Power Systems7(1): 106–113.https://doi.org/10.1109/59.141693

work page doi:10.1109/59.141693 1992

[28] [28]

and Simonoff, J

Chatterjee, S. and Simonoff, J. S. (2020).Handbook of Regression Analysis With Applications in R, Wiley Series in Probability and Statistics, 2nd edn, John Wiley & Sons, Hoboken (NJ). https://doi.org/10.1002/9781119392491

work page doi:10.1002/9781119392491 2020

[29] [29]

Chave, A. D. and Thomson, D. J. (2003). A Bounded Influence Regression Estimator Based on the Statistics of the Hat Matrix,Journal of the Royal Statistical Society Series C: Applied Statistics52(3): 307–322.https://doi.org/10.1111/1467-9876.00406

work page doi:10.1111/1467-9876.00406 2003

[30] [30]

and Paschalidis, I

Chen, R. and Paschalidis, I. C. (2018). A Robust Learning Approach for Regression Mod- els Based on Distributionally Robust Optimization,Journal of Machine Learning Research 19(13): 1–48.http://jmlr.org/papers/v19/17-295.html

2018

[31] [31]

and Gondzio, J

Cipolla, S. and Gondzio, J. (2024). Proximal-stabilized semidefinite programming, Computational Optimization and Applications,pp. 1–44.https://doi.org/10.1007/ s10589-024-00614-3

2024

[32] [32]

Critchley, F., Schyns, M., Haesbroeck, G., Fauconnier, C., Lu, G., Atkinson, R. A. and Wang, D. Q. (2010). A relaxed approach to combinatorial problems in robustness and diagnostics, Statistics and Computing20: 99–115.https://doi.org/10.1007/s11222-009-9119-x

work page doi:10.1007/s11222-009-9119-x 2010

[33] [33]

and Massart, D

De Maesschalck, R., Jouan-Rimbaud, D. and Massart, D. L. (2000). The Mahalanobis distance, Chemometrics and Intelligent Laboratory Systems50(1): 1–18.https://doi.org/10.1016/ 33 Thormann et al. Robust Regression with DC Programming S0169-7439(99)00047-7

2000

[34] [34]

de Oliveira, W. (2020). The ABC of DC programming,Set-Valued and Variational Analysis 28: 679–706.https://doi.org/10.1007/s11228-020-00566-w

work page doi:10.1007/s11228-020-00566-w 2020

[35] [35]

Doğru, F. Z. and Arslan, O. (2018). Robust mixture regression modeling using the least trimmed squares (LTS)-estimation method,Communications in Statistics - Simulation and Computation47(7): 2184–2196.https://doi.org/10.1080/03610918.2017.1341528

work page doi:10.1080/03610918.2017.1341528 2018

[36] [36]

Efron, B. (1979). Bootstrap Methods: Another look at the Jackknife,The Annals of Statistics 7(1): 1–26.https://www.jstor.org/stable/2958830

work page arXiv 1979

[37] [37]

and Marx, B

Fahrmeir, L., Kneib, T., Lang, S. and Marx, B. D. (2021).Regression: Models, Methods and Applications, 2nd edn, Springer, Berlin.https://doi.org/10.1007/978-3-662-63882-8

work page doi:10.1007/978-3-662-63882-8 2021

[38] [38]

Fernandes, A. A. A., Koehler, M., Konstantinou, N., Pankin, P. and Paton, N. P. (2023). Data preparation: A technological perspective and review,SN Computer Science4(425): 1– 20.https://doi.org/10.1007/s42979-023-01828-8

work page doi:10.1007/s42979-023-01828-8 2023

[39] [39]

Flores, S. (2015). SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression,European Journal of Operational Research246(1): 44–50.https: //doi.org/10.1016/j.ejor.2015.04.024

work page doi:10.1016/j.ejor.2015.04.024 2015

[40] [40]

(2015).Applied Regression Analysis and Generalized Linear Models, 3rd edn, SAGE Publications, Thousand Oaks (California)

Fox, J. (2015).Applied Regression Analysis and Generalized Linear Models, 3rd edn, SAGE Publications, Thousand Oaks (California)

2015

[41] [41]

and Weisberg, S

Fox, J. and Weisberg, S. (2019).An R companion to applied regression, 3rd edn, SAGE Publications, Thousand Oaks (California)

2019

[42] [42]

Gafni, E. M. and Bertsekas, D. P. (1984). Two-Metric Projection Methods for Constrained Optimization,SIAM Journal on Control and Optimization22(6): 936–964.https://doi. org/10.1137/0322061

work page doi:10.1137/0322061 1984

[43] [43]

and Herrera, F

García, S., Luengo, J. and Herrera, F. (2015).Data Preprocessing in Data Mining, Intel- ligent Systems Reference Library, 1st edn, Springer, Cham.https://doi.org/10.1007/ 978-3-319-10247-4

2015

[44] [44]

and Padberg, M

Giloni, A. and Padberg, M. (2002). Least Trimmed Squares Regression, Least Median Squares Regression, and Mathematical Programming,Mathematical and Computer Modelling35(9– 10): 1043–1060.https://doi.org/10.1016/S0895-7177(02)00069-9

work page doi:10.1016/s0895-7177(02)00069-9 2002

[45] [45]

and Novo, V

Giorgi, G., Jimenéz, B. and Novo, V. (2023).Basic Mathematical Programming Theory, International Series in Operations Research & Management Science, 1st edn, Springer, Cham. https://doi.org/10.1007/978-3-031-30324-1

work page doi:10.1007/978-3-031-30324-1 2023

[46] [46]

Habshah, M., Norazan, M. R. and Rahmatullah Imon, A. H. (2009). The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression,Journal of Applied Statistics36(5): 507–520.https://doi.org/10.1080/ 02664760802553463

2009

[47] [47]

Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the Identification of Multiple Outliers in Linear Models,Journal of the American Statistical Association88(424): 1264–1272.https: //doi.org/10.1080/01621459.1993.10476407

work page doi:10.1080/01621459.1993.10476407 1993

[48] [48]

Hamidieh, K. (2018). A data-driven statistical model for predicting the critical temperature of a superconductor,Computational Materials Science154: 346–354.https://doi.org/10. 1016/j.commatsci.2018.07.052

2018

[49] [49]

Hampel, F. R. (1973). Robust estimation: A condensed partial survey,Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete27: 87–104.https://doi.org/10.1007/ BF00536619

1973

[50] [50]

R., Ronchetti, E

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986).Robust Statistics: The Approach Based on Influence Functions, Wiley Series in Probability and Statistics, 1st edn, John Wiley & Sons, New York.https://doi.org/10.1002/9781118186435. 34 Thormann et al. Robust Regression with DC Programming

work page doi:10.1002/9781118186435 1986

[51] [51]

and Salibián-Barrera, M

Harrington, J. and Salibián-Barrera, M. (2010). Finding approximate solutions to combinato- rial problems with very large data sets using BIRCH,Computational Statistics & Data Analysis 54(3): 655–667.https://doi.org/10.1016/j.csda.2008.08.001

work page doi:10.1016/j.csda.2008.08.001 2010

[52] [52]

Hartman, P. (1959). On functions representable as a difference of convex functions,Pacific Journal of Mathematics9(3): 707–713.https://doi.org/10.2140/pjm.1959.9.707

work page doi:10.2140/pjm.1959.9.707 1959

[53] [53]

and Friedman, J

Hastie, T., Tibshirani, R. and Friedman, J. H. (2009).The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn, Springer, New York.https://doi.org/10. 1007/b94608

2009

[54] [54]

Hawkins, D. M. (1980).Identification of Outliers, Monographs on Statistics and Applied Probability, 1st edn, Springer, Dordrecht.https://doi.org/10.1007/978-94-015-3994-4

work page doi:10.1007/978-94-015-3994-4 1980

[55] [55]

Hawkins, D. M. (1994). The feasible solution algorithm for least trimmed squares regres- sion,Computational Statistics & Data Analysis17(2): 185–196.https://doi.org/10.1016/ 0167-9473(92)00070-8

1994

[56] [56]

M., Bradu, D

Hawkins, D. M., Bradu, D. and Kass, G. V. (1984). Location of Several Outliers in Multiple- Regression Data Using Elemental Sets,Technometrics26(3): 197–208.https://doi.org/10. 1080/00401706.1984.10487956

work page arXiv 1984

[57] [57]

Hawkins, D. M. and Olive, D. J. (1999). Improved feasible solution algorithms for high break- down estimation,Computational Statistics & Data Analysis30(1): 1–11.https://doi.org/ 10.1016/S0167-9473(98)00082-6

work page doi:10.1016/s0167-9473(98)00082-6 1999

[58] [58]

Hawkins, D. M. and Olive, D. J. (2002). Inconsistency of Resampling Algorithms for High- Breakdown Regression Estimators and a New Algorithm,Journal of the American Statistical Association97(457): 136–159.https://doi.org/10.1198/016214502753479293

work page doi:10.1198/016214502753479293 2002

[59] [59]

and Lange, K

Heng, Q. and Lange, K. (2025). Bootstrap estimation of the proportion of outliers in robust regression,Statistics and Computing35(3): 1–14.https://doi.org/10.1007/ s11222-024-10526-1

2025

[60] [60]

T., Le Thi, H

Ho, V. T., Le Thi, H. A. and Pham Dinh, T. (2020). DCA with Successive DC Decomposition for Convex Piecewise-Linear Fitting,inH. A. Le Thi, H. M. Le, T. Pham Dinh and N. T. Nguyen (eds),Advanced Computational Methods for Knowledge Engineering, Springer Inter- national Publishing, Cham, pp. 39–51.https://doi.org/10.1007/978-3-030-38364-0_4

work page doi:10.1007/978-3-030-38364-0_4 2020

[61] [61]

T., Le Thi, H

Ho, V. T., Le Thi, H. A. and Pham Dinh, T. (2021). DCA-based algorithms for DC fitting, Journal of Computational and Applied Mathematics389:113353.https://doi.org/10.1016/ j.cam.2020.113353

work page arXiv 2021

[62] [62]

Hoaglin, D. C. and Welsch, R. E. (1978). The Hat Matrix in Regression and ANOVA,The American Statistician32(1): 17–22.https://doi.org/10.1080/00031305.1978.10479237

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.1978.10479237 1978

[63] [63]

Hocking, R. R. (2003).Methods and Applications of Linear Models: Regression and the Anal- ysis of Variance, 2nd edn, John Wiley & Sons, Hoboken (New Jersey).https://doi.org/10. 1002/0471434159

2003

[64] [64]

and Kontoghiorghes, E

Hofmann, M., Gatu, C. and Kontoghiorghes, E. J. (2010). An Exact Least Trimmed Squares Algorithm for a Range of Coverage Values,Journal of Computational and Graphical Statistics 19(1): 191–204.https://doi.org/10.1198/jcgs.2009.07091

work page doi:10.1198/jcgs.2009.07091 2010

[65] [65]

and Pardalos, P

Horst, R. and Pardalos, P. M. (1995).Handbook of Global Optimization: Volume 2, Nonconvex Optimization and Its Applications, 1st edn, Springer, New York.https://doi.org/10.1007/ 978-1-4757-5362-2

1995

[66] [66]

Hössjer, O. (1995). Exact computation of the least trimmed squares estimate in simple linear regression,Computational Statistics & Data Analysis19(3): 265–282.https://doi.org/10. 1016/0167-9473(95)92697-V

1995

[67] [67]

and Torre, F

Huang, D., Cabral, R. and Torre, F. D. l. (2016). Robust regression,IEEE Transactions on Pattern Analysis and Machine Intelligence38(2): 363–375.https://doi.org/10.1109/ TPAMI.2015.2448091. 35 Thormann et al. Robust Regression with DC Programming

work page arXiv 2016

[68] [68]

Huber, P. J. and Ronchetti, E. M. (2009).Robust Statistics, Wiley Series in Probabil- ity and Statistics, 2nd edn, John Wiley & Sons, Hoboken.https://doi.org/10.1002/ 9780470434697

2009

[69] [69]

Ibe, O. C. (2013).Markov Processes for Stochastic Modeling, 2nd edn, Elsevier, London. https://doi.org/10.1016/C2012-0-06106-6

work page doi:10.1016/c2012-0-06106-6 2013

[70] [70]

and Tibshirani, R

James, G., Witten, D., Hastie, T. and Tibshirani, R. (2021).An Introduction to Statistical Learning with Applications in R, Springer Texts in Statistics, 2nd edn, Springer, New York. https://doi.org/10.1007/978-1-0716-1418-1

work page doi:10.1007/978-1-0716-1418-1 2021

[71] [71]

and Yazıcı, B

Kan, B., Alpu, Ö. and Yazıcı, B. (2013). Robust ridge and robust Liu estimator for regression based on the LTS estimator,Journal of Applied Statistics40(3): 644–655.https://doi.org/ 10.1080/02664763.2012.750285

work page doi:10.1080/02664763.2012.750285 2013

[72] [72]

Lasserre, J. B. (2015).An Introduction to Polynomial and Semi-Algebraic Optimization, Cambridge Texts in Applied Mathematics, 1st edn, Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781107447226

work page doi:10.1017/cbo9781107447226 2015

[73] [73]

Le Thi, H. A. (2000). An efficient algorithm for globally minimizing a quadratic function under convex quadratic constraints,Mathematical Programming87: 401–426.https://doi. org/10.1007/s101070050003

work page doi:10.1007/s101070050003 2000

[74] [74]

A., Ho, V

Le Thi, H. A., Ho, V. T. and Pham Dinh, T. (2019). A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning,Journal of Global Optimization73: 279–310.https://doi.org/10.1007/s10898-018-0698-y

work page doi:10.1007/s10898-018-0698-y 2019

[75] [75]

Le Thi, H. A. and Pham Dinh, T. (2005). The DC (Difference of Convex Functions) Program- ming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems, Annals of Operations Research133: 23–46.https://doi.org/10.1007/s10479-004-5022-1

work page doi:10.1007/s10479-004-5022-1 2005

[76] [76]

Le Thi, H. A. and Pham Dinh, T. (2018). DC programming and DCA: thirty years of developments,Mathematical Programming169: 5–68.https://doi.org/10.1007/ s10107-018-1235-y

2018

[77] [77]

Le Thi, H. A. and Pham Dinh, T. (2024). Open issues and recent advances in DC program- ming and DCA,Journal of Global Optimization88: 533–590.https://doi.org/10.1007/ s10898-023-01272-1

2024

[78] [78]

Lewis, A. D. (2023).Geometric Analysis on Real Analytic Manifolds, Lecture Notes in Math- ematics, 1st edn, Springer, Cham.https://doi.org/10.1007/978-3-031-37913-0

work page doi:10.1007/978-3-031-37913-0 2023

[79] [79]

Liu, T., Pong, T. K. and Takeda, A. (2019). A refined convergence analysis ofpDCAe with ap- plications to simultaneous sparse recovery and outlier detection,Computational Optimization and Applications73: 69–100.https://doi.org/10.1007/s10589-019-00067-z

work page doi:10.1007/s10589-019-00067-z 2019

[80] [80]

(1965).Ensembles Semi-Analytiques, Institute des Hautes Etudes Scientifiques, Bures-sur-Yvette (Seine-et-Oise), France

Łojasiewicz, S. (1965).Ensembles Semi-Analytiques, Institute des Hautes Etudes Scientifiques, Bures-sur-Yvette (Seine-et-Oise), France

1965