Pandora's Regret is a closed-form pairwise scoring rule derived from expected optimal search costs that elicits true probabilities and outperforms log loss, accuracy, and F1 at predicting diagnostic costs on MedMNIST models.
Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is background (60%).
citation-role summary
citation-polarity summary
representative citing papers
CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.
Individually calibrated predictors become collectively miscalibrated under Brier-optimal strategic responses with positive belief correlations, but VCG aggregation restores dominant-strategy incentive compatibility and near-optimal performance.
Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.
A Bayesian data-fusion model combines AI predictions and manual labels from camera traps to yield improved ecological inference and uncertainty quantification for white-tailed deer body condition.
A kernel-based regression model plus scenario generation from forecast errors and a new Support Vector Sorting step produces ensemble price trajectories that improve both statistical accuracy and trading profits over benchmarks on German intraday continuous market data.
Q-SRDRN multi-quantile network with pinball loss and per-quantile heads detects extreme precipitation events up to 18 times more effectively than deterministic baselines while preserving augmentation benefits for the median.
Non-affine approval functions create unavoidable miscalibration in proper scoring rules for strategic agents, but step-function thresholds enable first-best screening without it, uniquely for the Brier score.
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
For heterogeneous power-p pseudospherical scoring rules with d ≤ 4, the True-KL0 property R(M,p,d) < 1 holds for all M > 1, establishing unconditional DSIC via a Prekopa-based log-concavity argument on the loss integral.
CERBERUS uses a three-headed encoder-decoder to predict zero-inflated probabilistic vertical radar reflectivity profiles from satellite and meteorological inputs.
HealDA supplies ML-based initial conditions for AI weather models that produce forecasts trailing ERA5-initialized runs by less than one day of effective lead time, with the skill gap arising mainly from initial error size.
A kernel-based regularized learning framework for FDR control that unifies arbitrary structures and supplies provably valid decision rules with likelihood-based tuning.
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
Bayesian deep learning method rankings are unstable at small sample sizes, dataset-dependent, and require uncertainty-aware evaluation using hierarchical models and minimum detectable difference curves.
Systematic benchmarking reveals that regression calibration metrics frequently disagree on recalibration quality, with ENCE and CWC identified as more consistent performers.
A post-processing pipeline applied to ECMWF subseasonal ensembles produces calibrated daily wind power forecasts for France that improve on climatology by 5-15% in CRPS up to 16 days ahead.
Standard count time series models with pandemic break indicators applied to US and Italian transplant data capture COVID deviations, show deceased-donor recovery to baselines, and find auxiliary COVID covariates add negligible predictive value beyond autoregressive and calendar terms.
citing papers explorer
-
Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models
Systematic benchmarking reveals that regression calibration metrics frequently disagree on recalibration quality, with ENCE and CWC identified as more consistent performers.
-
Achieving Skilled and Reliable Daily Probabilistic Forecasts of Wind Power at Subseasonal-to-Seasonal Timescales over France
A post-processing pipeline applied to ECMWF subseasonal ensembles produces calibrated daily wind power forecasts for France that improve on climatology by 5-15% in CRPS up to 16 days ahead.