{"paper":{"title":"Measures of predictive accuracy, miscalibration and discrimination","license":"http://creativecommons.org/licenses/by/4.0/","headline":"ABC, ABC² and Gini scores rely on predictor-dependent weights that break alignment with mean-consistent loss functions and can produce dishonest model evaluations for point predictions.","cross_cats":[],"primary_cat":"stat.ME","authors_text":"{\\L}ukasz Delong, Mario W\\\"uthrich","submitted_at":"2026-05-12T19:29:55Z","abstract_excerpt":"We study the evaluation of real-valued point predictors under the decision-theoretic framework of mean-consistent loss functions given by the Bregman divergences. We first derive a new version of Murphy's decomposition of the expected loss which does not directly include the response itself but only its predictors. We then relate the miscalibration and the discrimination component of the Murphy's decomposition to Lorenz-curve-based accuracy measures that are widely used in practice. Besides the usual area between the concentration and Lorenz curves, ABC, we introduce a mean-squared version ABC"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ABC, ABC² and Gini scores may lead to dishonest evaluation of point predictions when used for model selection; this gives support to use mean-consistent loss functions as well as the miscalibration and the discrimination measure from the Murphy's decomposition of the expected loss for model evaluation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the class of mean-consistent scoring functions is the appropriate reference class for honest evaluation, and that predictor-dependent weights in ABC necessarily violate alignment with this class for all relevant Bregman divergences.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ABC, ABC squared, and Gini scores for predictive accuracy rely on predictor-dependent weights that misalign with mean-consistent loss functions, supporting instead the miscalibration and discrimination components from Murphy's decomposition.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ABC, ABC² and Gini scores rely on predictor-dependent weights that break alignment with mean-consistent loss functions and can produce dishonest model evaluations for point predictions.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"bb86335482814bcacc17f4275a4af4c829a9542ab2bfa0461c97ae7dad5f05f9"},"source":{"id":"2605.12679","kind":"arxiv","version":1},"verdict":{"id":"e66888ee-4494-4681-a5ad-04a4ff377cde","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:05:30.478079Z","strongest_claim":"ABC, ABC² and Gini scores may lead to dishonest evaluation of point predictions when used for model selection; this gives support to use mean-consistent loss functions as well as the miscalibration and the discrimination measure from the Murphy's decomposition of the expected loss for model evaluation.","one_line_summary":"ABC, ABC squared, and Gini scores for predictive accuracy rely on predictor-dependent weights that misalign with mean-consistent loss functions, supporting instead the miscalibration and discrimination components from Murphy's decomposition.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the class of mean-consistent scoring functions is the appropriate reference class for honest evaluation, and that predictor-dependent weights in ABC necessarily violate alignment with this class for all relevant Bregman divergences.","pith_extraction_headline":"ABC, ABC² and Gini scores rely on predictor-dependent weights that break alignment with mean-consistent loss functions and can produce dishonest model evaluations for point predictions."},"references":{"count":42,"sample":[{"doi":"","year":2009,"title":"Aaberge, R. (2009). Ranking intersecting Lorenz curves. Social Choice and Welfare 33 , 235-259","work_id":"72f1c18e-0bfb-40e0-b187-3cfef6fa8d4f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1969,"title":"Atkinson, A. (1969). On the measurement of inequality. Journal of Economic Theory 2 , 244-263","work_id":"003ef195-42b7-41b8-8df2-819666104ed0","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1989,"title":"Bendel, R.B., Higgins, S.S., Teberg, J.E., Pyke, D.A. (1989). Comparison of skewness coefficient, coefficient of variation, and Gini coefficient as inequality measures within populations. Oecologia 78","work_id":"b3b40b3d-e82c-4e8e-bff0-19296533fefa","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"(2025) Gini score under ties and case weights","work_id":"6a202968-4a03-4645-8311-9469face860a","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1988,"title":"Dardanoni, V., Lambert, P. (1988). Welfare rankings of income distributions: A role for the variance and some insights for tax reform Social Choice and Welfare 5 , 1-17","work_id":"847f9fbe-3d21-4e6c-84be-adce544c0edd","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":42,"snapshot_sha256":"41a49f367f9be86464c61270b3cf2ab9a76153edbf74b075509a1dad77701d31","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}