L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing

Hans-Andrea Loeliger; Yun-Peng Li

arxiv: 2604.16949 · v1 · submitted 2026-04-18 · 💻 cs.LG · eess.SP· stat.ME

L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing

Yun-Peng Li , Hans-Andrea Loeliger This is my paper

Pith reviewed 2026-05-10 07:19 UTC · model grok-4.3

classification 💻 cs.LG eess.SPstat.ME

keywords L1 regularizationregularization pathsfactor graphsGaussian message passingKalman smoothingLASSOstate-space modelslinear models

0 comments

The pith

L1 regularization paths in state-space linear models can be computed by parametric Gaussian message passing on factor graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to obtain entire L1 regularization paths for linear models in a state-space setting by representing the problem as a factor graph and running Kalman-style forward-backward recursions. Two dual algorithms arise from this construction: one handles L1 penalties on independent variables and the other on dependent variables. The approach covers problems such as L1-regularized Kalman smoothing, linear support-vector machines, and LASSO. A reader would care because the method promises to deliver the full path using mainly matrix multiplications whose cost can compete with existing techniques.

Core claim

The authors establish that L1 regularization paths in the state-space setting can be computed by parametric Gaussian message passing in the pertinent factor graphs. This yields two dual algorithms, one for L1 regularization of independent variables and one for dependent variables. The recursions are of Kalman type and are presented as broadly applicable while requiring only matrix multiplications in most cases, with complexity competitive with prior methods for some problems.

What carries the argument

Parametric Gaussian message passing, i.e., Kalman-type forward-backward recursions that operate on factor graphs encoding the linear model together with the L1 regularization terms; the messages carry the dependence on the regularization strength so that the entire path is obtained in one pass.

If this is right

The same machinery applies directly to L1-regularized Kalman smoothing, linear SVM training, and LASSO.
Separate algorithms exist for regularizing independent versus dependent variables.
Most steps reduce to matrix multiplications.
Computational cost can be competitive with earlier path algorithms in selected settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The recursive structure suggests the method could be adapted to streaming or online data arrival without recomputing from scratch.
Because the messages remain parametric, the same framework might accommodate other penalties whose effect on the quadratic terms can be tracked analytically.
For very high-dimensional states the matrix operations may still dominate unless the factor graph admits further factorization.

Load-bearing premise

L1 regularization terms can be placed in the factor graph so that the required messages stay exactly Gaussian and parametric without extra approximations that would lose the structure of the regularization path.

What would settle it

Run the proposed recursions on a small LASSO instance whose exact regularization path is already known from a standard solver and check whether the breakpoints and coefficient values match exactly.

Figures

Figures reproduced from arXiv: 2604.16949 by Hans-Andrea Loeliger, Yun-Peng Li.

**Figure 1.** Figure 1: Factor graph of (10) and (5). The variables 𝑋 ′ 𝑛, 𝑋′′ 𝑛 , 𝑋′′′ 𝑛 are used in Algorithms 1 and 3. where 𝑥1, . . . , 𝑥𝑁 and 𝑦 △ = (𝑦1, . . . , 𝑦𝑁 ) T are determined by 𝑥0 and 𝑢1, . . . , 𝑢𝑁 according to (3) and (4), and with 𝑝(𝑥0) ∝ exp − 1 2𝜎 2 (𝑥0 − 𝑥˘0) T𝑄0 (𝑥0 − 𝑥˘0) (11) or with fixed initial state 𝑥0 = 𝑥˘0, with 𝑝(𝑥˘𝑁 |𝑥𝑁 ) ∝ exp − 1 2𝜎 2 (𝑥𝑁 − 𝑥˘𝑁 ) T𝑄𝑁 (𝑥𝑁 − 𝑥˘𝑁 ) , (12) and either with 𝑝(𝑢𝑛) … view at source ↗

**Figure 2.** Figure 2: Factor graph of (7). exp − ∥𝑥0 ∥ 2 2𝜎 2 𝑋0 . . . = 𝑋𝑛−1 𝑐 T 𝑛 𝑋 ′ 𝑛 𝑒 −𝜅 (𝑦𝑛 −𝑦˘𝑛 ) 𝑌𝑛 𝑋𝑛 . . . 𝑋𝑁 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Factor graph of (9). 3 Regularized Input Path by Parametric Backward Filtering Forward Deciding 3.1 MAP Estimation in Linear Gaussian Models with Backward Filtering Forward Deciding (BFFD) We first consider the statistical model (10)–(13) for the case where both 𝑝(𝑢𝑛) and 𝑝(𝑦˘𝑛 |𝑦𝑛) are Gaussian in 𝑢𝑛 and 𝑦𝑛, respectively. In this case, the statistical model is linear Gaussian. For fixed 𝜎 2 , the joint MA… view at source ↗

**Figure 4.** Figure 4: Example 5.1: Snapshots of trend filtering on the global warming datasets computed with PathBFFD (top row) and genLASSO [1] (bottom row). The ordinate is 𝑦ˆ(𝜎 2 ). 5 Numerical Examples 5.1 Filtering a Global Warming Dataset We demonstrate PathBFFD and PathFFBDD for [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Example 5.2: Snapshots of median-Kalman smoothing on the global warming datasets computed with PathFFBDD (top row) and the corresponding results from iterated FFBDD [11] (bottom row). The ordinate is 𝑦ˆ(𝜎 2 ). 0 0.2 0.4 0.6 0.8 1 −500 0 500 1 2 3 4 5 6 7 8 9 10 L1 arc length uˆn(σ 2 ) PathBFFD 0 0.2 0.4 0.6 0.8 1 −500 0 500 1 2 3 4 5 6 7 8 9 10 L1 arc length uˆn(σ 2 ) LARS [PITH_FULL_IMAGE:figures/full_fi… view at source ↗

**Figure 6.** Figure 6: Example 5.3: Regularization path of L1 regularized least squares for the diabetes dataset. The L1 arc length is the normalized sum Í𝑁 𝑛=1 |𝑢ˆ𝑛 (𝜎 2 ) |. The labels on the right side are the indices 𝑛 of the coefficients 𝑢ˆ𝑛. The vertical lines indicate the knots. Example 5.3 (L1 Regularized Least Squares). We use the state space model of (7) and the factor graph of [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Example 5.4: Regularized path of linear SVM regression for the diabetes dataset. Top row: moderate values of 𝜎 2 ; bottom row: large values of 𝜎 2 . The labels on the right side are the indices of the components of 𝑥ˆ0. The vertical lines indicate the knots. Example 5.4 (Linear Support Vector Regression). We use the state space model of (9) and the factor graph of [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

The paper considers the computation of L1 regularization paths in a state space setting, which includes L1 regularized Kalman smoothing, linear SVM, LASSO, and more. The paper proposes two new algorithms, which are duals of each other; the first algorithm applies to L1 regularization of independent variables while the second applies to L1 regularization of dependent variables. The heart of the proposed algorithms is parametric Gaussian message passing (i.e., Kalman-type forward-backward recursions) in the pertinent factor graphs. The proposed methods are broadly applicable, they (usually) require only matrix multiplications, and their complexity can be competitive with prior methods in some cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two dual algorithms using parametric Gaussian message passing for L1 regularization paths in state-space models form the core contribution, but exact handling of the non-Gaussian L1 factors remains the key question to verify.

read the letter

The paper introduces two dual algorithms for L1 regularization paths in state-space models, one for independent variables and one for dependent variables, both built around parametric Gaussian message passing with Kalman-style forward-backward recursions in factor graphs. This covers L1-regularized Kalman smoothing, LASSO, and linear SVMs, and the methods are positioned as often reducing to matrix multiplications with competitive complexity in some settings. The framing in factor graphs and the dual structure are the clearest new elements, giving a structured alternative to generic solvers for tracing paths over varying regularization strength. The approach builds directly on existing message-passing techniques rather than starting from scratch, which keeps the contribution focused. The main soft spot is whether the parametric messages can represent the absolute-value factors exactly enough to preserve the full piecewise-linear path structure without losing breakpoints or slipping into approximations. L1 terms break the quadratic closure that standard Gaussian messages rely on, so any parameterization has to track the necessary changes as lambda varies. If that step is not fully closed or requires case-by-case handling, the computed paths could deviate from the true solution. The abstract and available description do not include the explicit message-update rules or error analysis, so this needs checking in the derivations. The stress-test concern about preserving exact path structure is therefore worth testing against the full text rather than dismissed. This work is for readers already comfortable with factor graphs, Kalman smoothing, and regularization paths in structured linear models. Someone looking for new algorithmic tools in signal processing or sparse state-space estimation will find the most direct value. It deserves peer review because the algorithmic idea is concrete and the framework is established enough that referees can evaluate the message-passing details and any supporting experiments.

Referee Report

2 major / 3 minor

Summary. The paper proposes two dual algorithms for computing L1 regularization paths in linear state-space models (including L1-regularized Kalman smoothing, LASSO, and linear SVMs). The algorithms use parametric Gaussian message passing (Kalman-style forward-backward recursions) on factor graphs, with one variant for L1 penalties on independent variables and the dual for dependent variables. The methods are claimed to require primarily matrix multiplications and to be broadly applicable with competitive complexity.

Significance. If the parametric Gaussian messages for the non-quadratic L1 factors remain closed under the recursions and exactly recover the piecewise-linear solution paths for all regularization strengths without missing breakpoints or implicit approximations, the work would provide a unified, efficient framework for path computation in dynamic linear models that avoids explicit active-set enumeration.

major comments (2)

[Section 3 (message passing derivations)] The section deriving the message updates for absolute-value (L1) factors must demonstrate that the chosen parametric form remains closed under forward-backward passes as the regularization parameter varies continuously; otherwise the computed paths cannot be guaranteed to match the true piecewise-linear solutions.
[Section 5 (experiments)] The experimental section comparing the proposed methods to standard homotopy or active-set solvers should include quantitative checks (e.g., maximum deviation along the path) on instances with known dense breakpoint structure to confirm that no path segments are lost due to the parametric approximation.

minor comments (3)

The abstract's claim that the methods 'usually require only matrix multiplications' would be strengthened by an explicit operation count or complexity table relative to prior path algorithms.
[Figure 1] Factor-graph diagrams for the two dual cases would benefit from explicit annotation of which nodes carry the parametric Gaussian messages and how the L1 potentials are represented.
Notation distinguishing the two dual algorithms (e.g., variable naming for the independent vs. dependent L1 cases) could be made more consistent across the text and pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable suggestions. Below we provide point-by-point responses to the major comments.

read point-by-point responses

Referee: [Section 3 (message passing derivations)] The section deriving the message updates for absolute-value (L1) factors must demonstrate that the chosen parametric form remains closed under forward-backward passes as the regularization parameter varies continuously; otherwise the computed paths cannot be guaranteed to match the true piecewise-linear solutions.

Authors: Our derivations in Section 3 establish that the parametric form of the Gaussian messages is closed under the forward-backward message passing operations for varying regularization strengths. The L1 factor updates produce messages whose parameters (means and variances) are piecewise linear or affine in the regularization parameter, with changes occurring only at breakpoints that are explicitly identified. This guarantees that the computed paths match the true piecewise-linear solutions. To make this more transparent, we will add a dedicated subsection or proposition formalizing the closure property. revision: yes
Referee: [Section 5 (experiments)] The experimental section comparing the proposed methods to standard homotopy or active-set solvers should include quantitative checks (e.g., maximum deviation along the path) on instances with known dense breakpoint structure to confirm that no path segments are lost due to the parametric approximation.

Authors: We concur that quantitative checks on problems with dense breakpoints would provide stronger evidence. We will expand the experimental section to include comparisons on small instances where the exact path can be computed by enumeration or standard solvers, reporting the maximum deviation along the entire path and verifying that all breakpoints are captured by our algorithm. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic construction is self-contained

full rationale

The paper proposes two dual algorithms for L1 regularization paths via parametric Gaussian message passing on factor graphs for state-space models. No derivation step reduces a claimed result to a fitted parameter, self-defined quantity, or load-bearing self-citation chain. The core is the construction of forward-backward recursions that handle L1 factors parametrically; this is presented as a new algorithmic method rather than a tautological renaming or prediction forced by prior fits. Standard message-passing foundations are external and not invoked as an internal uniqueness theorem from the same authors that forbids alternatives. The derivation chain remains independent of the target paths themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the approach rests on standard properties of Gaussian message passing in factor graphs and linear state-space models; no free parameters, invented entities, or ad-hoc axioms are mentioned.

axioms (1)

domain assumption Parametric Gaussian messages can be passed exactly in factor graphs for linear models with L1 terms
Invoked implicitly as the heart of the algorithms for computing regularization paths.

pith-pipeline@v0.9.0 · 5410 in / 1175 out tokens · 42781 ms · 2026-05-10T07:19:22.240967+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Taylor B Arnold and Ryan J Tibshirani. 2016. Efficient implementations of the generalized lasso dual path algorithm.Journal of Computational and Graphical Statistics25, 1 (2016), 1–27

work page 2016
[2]

1982.Non-linear parametric optimization

Bernd Bank, Jürgen Guddat, Diethard Klatte, Bernd Kummer, and Klaus Tammer. 1982.Non-linear parametric optimization. Vol. 58. Walter de Gruyter GmbH & Co KG

work page 1982
[3]

Alberto Bemporad. 2021. Explicit model predictive control. InEncyclopedia of Systems and Control. Springer, 744–751

work page 2021
[4]

Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N Pistikopoulos. 2002. The explicit linear quadratic regulator for constrained systems. Automatica38, 1 (2002), 3–20

work page 2002
[5]

Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. Least angle regression.The Annals of Statistics32, 2 (2004), 407 – 499. https://doi.org/10.1214/009053604000000067

work page doi:10.1214/009053604000000067 2004
[6]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification.Journal of Machine Learning Research9 (2008), 1871–1874

work page 2008
[7]

2023.Climate at a Glance: Global Time Series, published October 2023

NOAA National Centers for Environmental information. 2023.Climate at a Glance: Global Time Series, published October 2023. https://www.ncei. noaa.gov/access/monitoring/climate-at-a-glance/global/time-series

work page 2023
[8]

Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. 2004. The entire regularization path for the support vector machine.Journal of Machine Learning Research5, Oct (2004), 1391–1415

work page 2004
[9]

Holger Hoefling. 2010. A path algorithm for the fused lasso signal approximator.Journal of Computational and Graphical Statistics19, 4 (2010), 984–1006

work page 2010
[10]

Raphael Keusch, Hans-Andrea Loeliger, and Tobias Geyer. 2023. Long-horizon direct model predictive control for power converters with state constraints.IEEE Transactions on Control Systems Technology32, 2 (2023), 340–350

work page 2023
[11]

Yun-Peng Li and Hans-Andrea Loeliger. 2024. Backward filtering forward deciding in linear non-Gaussian state space models. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 2287–2295

work page 2024
[12]

Yun-Peng Li and Hans-Andrea Loeliger. 2025. Dual NUP Representations and Min-Maximization in Factor Graphs. In2025 IEEE International Symposium on Information Theory (ISIT). 1–6

work page 2025
[13]

Hans-Andrea Loeliger, Lukas Bruderer, Hampus Malmberg, Federico Wadehn, and Nour Zalmai. 2016. On sparsity by NUV-EM, Gaussian message passing, and Kalman smoothing. In2016 Information Theory and Applications Workshop (ITA). IEEE, 1–10

work page 2016
[14]

Hans-Andrea Loeliger, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R Kschischang. 2007. The factor graph approach to model-based signal processing.Proc. IEEE95, 6 (2007), 1295–1322

work page 2007
[15]

1988.Linear complementarity, linear and nonlinear programming

Katta G Murty and Feng-Tien Yu. 1988.Linear complementarity, linear and nonlinear programming. Vol. 3. Heldermann Berlin

work page 1988
[16]

Michael R Osborne, Brett Presnell, and Berwin A Turlach. 2000. A new approach to variable selection in least squares problems.IMA journal of numerical analysis20, 3 (2000), 389–403

work page 2000
[17]

1981.On parametric linear and quadratic programming problems

Klaus ’Ritter. 1981.On parametric linear and quadratic programming problems. MRC Technical Summary Report No. 2197. Mathematics Research Center, University of Wisconsin-Madison, 610 Walnut Street, Madison, Wisconsin 53706

work page 1981
[18]

Saharon Rosset and Ji Zhu. 2007. Piecewise linear regularized solution paths.The Annals of Statistics(2007), 1012–1030

work page 2007
[19]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society Series B: Statistical Methodology58, 1 (1996), 267–288

work page 1996
[20]

Tibshirani and Jonathan Taylor

Ryan J. Tibshirani and Jonathan Taylor. 2011. The solution path of the generalized lasso.The Annals of Statistics39, 3 (2011), 1335 – 1371

work page 2011
[21]

Hua Zhou and Kenneth Lange. 2013. A path algorithm for constrained estimation.Journal of Computational and Graphical Statistics22, 2 (2013), 261–283

work page 2013

[1] [1]

Taylor B Arnold and Ryan J Tibshirani. 2016. Efficient implementations of the generalized lasso dual path algorithm.Journal of Computational and Graphical Statistics25, 1 (2016), 1–27

work page 2016

[2] [2]

1982.Non-linear parametric optimization

Bernd Bank, Jürgen Guddat, Diethard Klatte, Bernd Kummer, and Klaus Tammer. 1982.Non-linear parametric optimization. Vol. 58. Walter de Gruyter GmbH & Co KG

work page 1982

[3] [3]

Alberto Bemporad. 2021. Explicit model predictive control. InEncyclopedia of Systems and Control. Springer, 744–751

work page 2021

[4] [4]

Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N Pistikopoulos. 2002. The explicit linear quadratic regulator for constrained systems. Automatica38, 1 (2002), 3–20

work page 2002

[5] [5]

Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. 2004. Least angle regression.The Annals of Statistics32, 2 (2004), 407 – 499. https://doi.org/10.1214/009053604000000067

work page doi:10.1214/009053604000000067 2004

[6] [6]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification.Journal of Machine Learning Research9 (2008), 1871–1874

work page 2008

[7] [7]

2023.Climate at a Glance: Global Time Series, published October 2023

NOAA National Centers for Environmental information. 2023.Climate at a Glance: Global Time Series, published October 2023. https://www.ncei. noaa.gov/access/monitoring/climate-at-a-glance/global/time-series

work page 2023

[8] [8]

Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. 2004. The entire regularization path for the support vector machine.Journal of Machine Learning Research5, Oct (2004), 1391–1415

work page 2004

[9] [9]

Holger Hoefling. 2010. A path algorithm for the fused lasso signal approximator.Journal of Computational and Graphical Statistics19, 4 (2010), 984–1006

work page 2010

[10] [10]

Raphael Keusch, Hans-Andrea Loeliger, and Tobias Geyer. 2023. Long-horizon direct model predictive control for power converters with state constraints.IEEE Transactions on Control Systems Technology32, 2 (2023), 340–350

work page 2023

[11] [11]

Yun-Peng Li and Hans-Andrea Loeliger. 2024. Backward filtering forward deciding in linear non-Gaussian state space models. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 2287–2295

work page 2024

[12] [12]

Yun-Peng Li and Hans-Andrea Loeliger. 2025. Dual NUP Representations and Min-Maximization in Factor Graphs. In2025 IEEE International Symposium on Information Theory (ISIT). 1–6

work page 2025

[13] [13]

Hans-Andrea Loeliger, Lukas Bruderer, Hampus Malmberg, Federico Wadehn, and Nour Zalmai. 2016. On sparsity by NUV-EM, Gaussian message passing, and Kalman smoothing. In2016 Information Theory and Applications Workshop (ITA). IEEE, 1–10

work page 2016

[14] [14]

Hans-Andrea Loeliger, Justin Dauwels, Junli Hu, Sascha Korl, Li Ping, and Frank R Kschischang. 2007. The factor graph approach to model-based signal processing.Proc. IEEE95, 6 (2007), 1295–1322

work page 2007

[15] [15]

1988.Linear complementarity, linear and nonlinear programming

Katta G Murty and Feng-Tien Yu. 1988.Linear complementarity, linear and nonlinear programming. Vol. 3. Heldermann Berlin

work page 1988

[16] [16]

Michael R Osborne, Brett Presnell, and Berwin A Turlach. 2000. A new approach to variable selection in least squares problems.IMA journal of numerical analysis20, 3 (2000), 389–403

work page 2000

[17] [17]

1981.On parametric linear and quadratic programming problems

Klaus ’Ritter. 1981.On parametric linear and quadratic programming problems. MRC Technical Summary Report No. 2197. Mathematics Research Center, University of Wisconsin-Madison, 610 Walnut Street, Madison, Wisconsin 53706

work page 1981

[18] [18]

Saharon Rosset and Ji Zhu. 2007. Piecewise linear regularized solution paths.The Annals of Statistics(2007), 1012–1030

work page 2007

[19] [19]

Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society Series B: Statistical Methodology58, 1 (1996), 267–288

work page 1996

[20] [20]

Tibshirani and Jonathan Taylor

Ryan J. Tibshirani and Jonathan Taylor. 2011. The solution path of the generalized lasso.The Annals of Statistics39, 3 (2011), 1335 – 1371

work page 2011

[21] [21]

Hua Zhou and Kenneth Lange. 2013. A path algorithm for constrained estimation.Journal of Computational and Graphical Statistics22, 2 (2013), 261–283

work page 2013