Leakage and the reproducibility crisis in machine-learning-based science

· 2023 · arXiv 2023.100804

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.

Fusion or Confusion? Multimodal Complexity Is Not All You Need

cs.LG · 2025-12-28 · unverdicted · novelty 6.0

Complex multimodal architectures do not reliably outperform unimodal baselines or a simple multimodal baseline under standardized evaluation.

SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

cs.LG · 2025-11-14 · accept · novelty 5.0

SurvBench supplies a configurable, open-source preprocessing pipeline that standardizes multi-modal EHR data from four critical-care databases for single-risk and competing-risk survival analysis.

Towards a more realistic evaluation of machine learning models for bearing fault diagnosis

cs.LG · 2025-09-26 · unverdicted · novelty 5.0

Proposes bearing-wise data partitioning to remove leakage in ML bearing fault diagnosis, reformulates as multi-label classification, and shows training bearing count drives generalization on four public datasets.

Predicting Forecast Error for the HRRR Using LSTM Neural Networks: A Comparative Study Using New York and Oklahoma State Mesonets

physics.ao-ph · 2025-12-16 · conditional · novelty 4.0

LSTM networks predict HRRR forecast errors with average improvements of 48% for precipitation, 25% for temperature, and 15% for wind using mesonet ground truth.

fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

stat.CO · 2026-04-06 · unverdicted · novelty 3.0

fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.

citing papers explorer

Showing 7 of 7 citing papers.

Neural Point-Forms cs.LG · 2026-05-15 · unverdicted · none · ref 63
Neural point-forms are introduced as permutation-invariant neural layers that output learned form-comparison matrices for point clouds, with a claimed consistency proof under sampling and manifold assumptions and competitive results on synthetic and biological data.
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability cs.LG · 2026-04-20 · unverdicted · none · ref 96
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
Fusion or Confusion? Multimodal Complexity Is Not All You Need cs.LG · 2025-12-28 · unverdicted · none · ref 22
Complex multimodal architectures do not reliably outperform unimodal baselines or a simple multimodal baseline under standardized evaluation.
SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis cs.LG · 2025-11-14 · accept · none · ref 110
SurvBench supplies a configurable, open-source preprocessing pipeline that standardizes multi-modal EHR data from four critical-care databases for single-risk and competing-risk survival analysis.
Towards a more realistic evaluation of machine learning models for bearing fault diagnosis cs.LG · 2025-09-26 · unverdicted · none · ref 3
Proposes bearing-wise data partitioning to remove leakage in ML bearing fault diagnosis, reformulates as multi-label classification, and shows training bearing count drives generalization on four public datasets.
Predicting Forecast Error for the HRRR Using LSTM Neural Networks: A Comparative Study Using New York and Oklahoma State Mesonets physics.ao-ph · 2025-12-16 · conditional · none · ref 38
LSTM networks predict HRRR forecast errors with average improvements of 48% for precipitation, 25% for temperature, and 15% for wind using mesonet ground truth.
fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R stat.CO · 2026-04-06 · unverdicted · none · ref 17
fastml is an R package that enforces leakage-free preprocessing through guarded resampling and provides a unified interface for safer automated ML including survival analysis.

Leakage and the reproducibility crisis in machine-learning-based science

fields

years

verdicts

representative citing papers

citing papers explorer