{"paper":{"title":"Predictive Volatility of Machine Learning in Micro-Samples: A Regularised Assessment of Regional Poverty","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Penalized linear models outperform complex ensembles in small-sample provincial poverty analysis and flag ICT as a stable factor.","cross_cats":["cs.CY","stat.AP"],"primary_cat":"stat.ME","authors_text":"A. H. Jamaluddin, A. T. R. Dani, N. I. Mahat, S. S. M. Fauzi, V. Ratnasari","submitted_at":"2026-04-07T09:41:12Z","abstract_excerpt":"Identifying the structural drivers of poverty in regional datasets is frequently hindered by small sample sizes and high multidimensional collinearity, which can result in unstable and misleading policy advice. This paper evaluates the provincial causes of poverty in Indonesia by addressing these specific statistical hazards. We employ a rigorous model-comparison framework designed for small samples ($n=34$) with high collinearity, comparing standard linear models with frequentist penalisation, Bayesian shrinkage priors, an adjusted spatial intrinsic conditionally autoregressive (ICAR) model, "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"simple linear shrinkage models (Ridge, Elastic Net, LASSO) achieve the superior out-of-sample prediction, whereas complex ensembles like BART suffer from severe overfitting. ... parametrically regularised linear shrinkage provides a more reliable mathematical foundation for isolating structural development priorities, such as ICT, than either naive OLS or unconstrained machine learning.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That leave-one-out cross-validation on n=34 sufficiently demonstrates general superiority of penalized linear models and that ICT emerges as the key stable proxy without being an artifact of variable selection or collinearity handling in the specific dataset.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Penalized linear shrinkage models outperform complex ensembles in LOOCV for small-sample high-collinearity Indonesian provincial poverty data, with ICT skills as the stable predictor.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Penalized linear models outperform complex ensembles in small-sample provincial poverty analysis and flag ICT as a stable factor.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e3f6010944cf126c837eef024911320ce76177c3d1e1f22c07e5e70447785984"},"source":{"id":"2604.06278","kind":"arxiv","version":2},"verdict":{"id":"5fe2cd6e-0328-4154-b170-7bf3eaf0c904","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T18:47:44.799169Z","strongest_claim":"simple linear shrinkage models (Ridge, Elastic Net, LASSO) achieve the superior out-of-sample prediction, whereas complex ensembles like BART suffer from severe overfitting. ... parametrically regularised linear shrinkage provides a more reliable mathematical foundation for isolating structural development priorities, such as ICT, than either naive OLS or unconstrained machine learning.","one_line_summary":"Penalized linear shrinkage models outperform complex ensembles in LOOCV for small-sample high-collinearity Indonesian provincial poverty data, with ICT skills as the stable predictor.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That leave-one-out cross-validation on n=34 sufficiently demonstrates general superiority of penalized linear models and that ICT emerges as the key stable proxy without being an artifact of variable selection or collinearity handling in the specific dataset.","pith_extraction_headline":"Penalized linear models outperform complex ensembles in small-sample provincial poverty analysis and flag ICT as a stable factor."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.06278/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"253ef0bce6e11fac5e362700096a6d53b4423625deb04adf071ce45e439edd1b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}