pith. sign in

arxiv: 2604.06278 · v4 · pith:YSJGLHDDnew · submitted 2026-04-07 · 📊 stat.ME · cs.CY· stat.AP

Predictive Volatility of Machine Learning in Micro-Samples: A Regularised Assessment of Regional Poverty

Pith reviewed 2026-05-21 10:39 UTC · model grok-4.3

classification 📊 stat.ME cs.CYstat.AP
keywords povertyregional analysisregularizationmachine learningsmall samplesIndonesiaICT skillscross-validation
0
0 comments X

The pith

Regularised linear shrinkage models outperform complex machine learning when identifying poverty drivers in small, collinear regional datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper addresses the challenge of unstable results in poverty analysis when datasets are limited to a few dozen observations and variables are highly interrelated. It systematically compares standard linear regression, penalised shrinkage methods, Bayesian approaches, a spatial model, and machine learning ensembles on Indonesian provincial data. The evaluation uses leave-one-out cross-validation to test true predictive ability rather than fit to the observed sample. Simple regularised linear models prove more reliable at avoiding overfitting than complex ensembles, and they consistently point to ICT skills as the strongest stable signal associated with lower poverty rates. A reader would care because policy decisions based on fragile statistical patterns can misdirect resources in data-poor regions.

Core claim

In data-constrained regional analysis, parametrically regularised linear shrinkage provides a more reliable mathematical foundation for isolating structural development priorities, such as ICT skills, than either naive OLS or unconstrained machine learning. This is shown by the superior out-of-sample performance of Ridge, Elastic Net, and LASSO models over complex ensembles such as BART, which suffer severe overfitting, when all are assessed via strict leave-one-out cross-validation on the n=34 provincial observations.

What carries the argument

Parametrically regularised linear shrinkage estimators, which stabilise coefficient estimates under multicollinearity by penalising model complexity during fitting.

If this is right

  • Simple linear shrinkage models achieve better out-of-sample prediction than complex ensembles like BART in small regional samples.
  • ICT skills emerge as the most stable proxy for lower provincial poverty across all successful regularised models.
  • Unconstrained machine learning carries a high risk of severe overfitting when applied to datasets with n around 34 and high collinearity.
  • Regularised linear methods supply a stronger basis than naive OLS for identifying structural priorities in such constrained settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern of shrinkage models outperforming ensembles may appear in other small-sample regional studies of economic or social outcomes facing multicollinearity.
  • A direct test could apply the identical model-comparison framework to poverty or development indicators from provinces in neighbouring countries with comparable data sizes.
  • If ICT skills remain the dominant stable factor, targeted digital training programs could be examined as one concrete lever among the identified priorities.

Load-bearing premise

Leave-one-out cross-validation on the 34 provincial observations is assumed to reliably estimate true out-of-sample predictive performance despite high collinearity and without any external hold-out data or further robustness checks.

What would settle it

An independent test on data from additional provinces or a later time period in which the regularised linear models no longer show better predictive accuracy than the complex machine learning ensembles.

Figures

Figures reproduced from arXiv: 2604.06278 by A. H. Jamaluddin, A. T. R. Dani, N. I. Mahat, S. S. M. Fauzi, V. Ratnasari.

Figure 1
Figure 1. Figure 1: Predictor correlation matrix ordered via hierarchical clustering. The clustering [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Spatial distribution of provincial poverty in Indonesia ( [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Posterior means and 95% credible intervals for the Bayesian Horseshoe model [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Posterior predictive check (PPC) for the Bayesian Horseshoe model (M8). The [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity analysis demonstrating the impact of prior variance on the posterior [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Relative variable importance from Bayesian Additive Regression Trees (BART) [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: SHAP (SHapley Additive exPlanations) summary beeswarm plot for the XGBoost [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Leave-One-Out RMSE across all evaluated model frameworks. Simple linear [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
read the original abstract

Small regional datasets pose a dual statistical problem: correlated predictors inflate estimation variance, while flexible learners can become unstable because the available information per adaptive degree of freedom is limited. We examine this issue through predictive volatility, defined as the cross-sample dispersion and upper-tail behaviour of out-of-sample loss. Using simulation evidence reported for sparse linear, near-linear and heavy-tailed settings, we compare ordinary least squares, frequentist penalties, Bayesian shrinkage models, bounded-response and spatial specifications, and flexible machine-learning procedures. In the reported simulation results, regularised linear estimators generally dominate in the linear high-collinearity micro-sample settings and remain the most reliable overall, whereas tree-based methods become more competitive only when the signal is weakly nonlinear and the sample size is larger. In the empirical application to 34 Indonesian provinces, ridge yields the best leave-one-out performance, followed by elastic net and lasso. Across the Bayesian shrinkage specifications, ICT skills show the most consistent negative association with poverty, with the strongest support under horseshoe and spike-and-slab formulations. These results suggest that, in micro-sample regional modelling, the main constraint is limited information per effective degree of freedom rather than insufficient algorithmic flexibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates statistical and machine learning approaches for identifying drivers of provincial poverty in Indonesia using a small sample (n=34) with high collinearity. It compares OLS, regularized linear models (Ridge, LASSO, Elastic Net), Bayesian shrinkage, a spatial ICAR model, and ensembles such as BART, assessing them via leave-one-out cross-validation (LOOCV). The central claims are that regularized linear shrinkage yields superior out-of-sample predictive performance and that ICT skills consistently rank as the most stable proxy for lower poverty across successful models, providing a more reliable basis for isolating structural priorities than naive OLS or unconstrained ML.

Significance. If the results hold, the work supplies useful evidence on the risks of complex ML in micro-samples and the advantages of parametric regularization for stable inference under collinearity, relevant to regional development statistics. Credit is due for the explicit model-comparison framework tailored to small n and for employing LOOCV rather than in-sample metrics.

major comments (2)
  1. [Abstract and Evaluation Framework] Abstract and Evaluation Framework: The claim that ICT skills emerge as the most stable proxy across all successful regularised models is load-bearing for the primary contribution, yet no sensitivity checks (e.g., repeated random splits, bootstrap resampling of the n=34 observations, or simulation-based assessment of false-positive rates for variable selection) are reported to establish that this ranking is robust rather than an artifact of collinearity and single-point omissions in LOOCV.
  2. [Methods] Methods: While LOOCV is a reasonable choice for n=34, the manuscript provides no details on how the regularization hyperparameter is selected in the presence of high collinearity (e.g., whether cross-validation paths were examined for stability or whether condition numbers/variance inflation factors were monitored), which directly affects the reliability of the reported coefficient stability for ICT.
minor comments (2)
  1. [Abstract] Abstract: No information is given on variable coding, the exact grid or selection procedure for tuning parameters, or the quantitative criteria used to label models as 'successful'.
  2. [Results] Results: A summary table of LOOCV performance metrics (RMSE, MAE, or R²) across all compared models would improve clarity and allow readers to assess the magnitude of the reported superiority of regularized linear models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the robustness and transparency of the analysis.

read point-by-point responses
  1. Referee: [Abstract and Evaluation Framework] Abstract and Evaluation Framework: The claim that ICT skills emerge as the most stable proxy across all successful regularised models is load-bearing for the primary contribution, yet no sensitivity checks (e.g., repeated random splits, bootstrap resampling of the n=34 observations, or simulation-based assessment of false-positive rates for variable selection) are reported to establish that this ranking is robust rather than an artifact of collinearity and single-point omissions in LOOCV.

    Authors: We agree that the stability of the ICT skills ranking is central to our contribution and that additional sensitivity analyses would provide stronger evidence against potential artifacts from collinearity or LOOCV. While the consistency of this result across Ridge, LASSO, and Elastic Net already offers some reassurance, we will incorporate bootstrap resampling of the n=34 observations in the revised manuscript. This will allow us to report the proportion of bootstrap samples in which ICT skills ranks as the top or near-top predictor, directly addressing concerns about robustness. revision: yes

  2. Referee: [Methods] Methods: While LOOCV is a reasonable choice for n=34, the manuscript provides no details on how the regularization hyperparameter is selected in the presence of high collinearity (e.g., whether cross-validation paths were examined for stability or whether condition numbers/variance inflation factors were monitored), which directly affects the reliability of the reported coefficient stability for ICT.

    Authors: We thank the referee for highlighting this transparency gap. Hyperparameters for the regularized models were selected via the default LOOCV procedure in the glmnet implementation, but we did not report condition numbers, VIF values, or stability of the CV paths. In the revised Methods section we will add these details, including the selected lambda values, a brief description of the CV path behavior, and VIF diagnostics computed on the original predictor matrix to quantify the degree of collinearity. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on independent LOOCV evaluation

full rationale

The paper evaluates regularised linear models against OLS and ML ensembles by computing predictive performance via strict Leave-One-Out Cross-Validation on the n=34 sample. This LOOCV procedure generates out-of-sample error estimates that are not algebraically equivalent to the fitted coefficients or hyperparameters themselves. The emergence of ICT skills as the most stable proxy is reported as an empirical outcome of the cross-validated coefficient paths rather than a definitional or self-referential step. No self-citation load-bearing arguments, ansatz smuggling, or renaming of known results appear in the derivation chain; the central comparison between shrinkage and complex models is therefore self-contained against the external benchmark of LOOCV.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard domain assumptions about the validity of LOOCV for small collinear samples and the instability of OLS under those conditions; no new entities or free parameters are introduced beyond routine regularization tuning.

axioms (2)
  • domain assumption High multidimensional collinearity and small sample size (n=34) render standard OLS unstable and misleading
    Invoked in the opening paragraph as the core statistical hazard motivating the model comparison.
  • domain assumption LOOCV provides a reliable estimate of predictive performance for model selection in this setting
    Used as the strict evaluation criterion for all models.

pith-pipeline@v0.9.0 · 5780 in / 1374 out tokens · 54756 ms · 2026-05-21T10:39:51.838222+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.