Pandora's Regret is a closed-form pairwise scoring rule derived from expected optimal search costs that elicits true probabilities and outperforms log loss, accuracy, and F1 at predicting diagnostic costs on MedMNIST models.
Mixed citations
Title resolution pending
Mixed citation behavior. Most common role is background (60%).
citation-role summary
citation-polarity summary
representative citing papers
CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.
Individually calibrated predictors become collectively miscalibrated under Brier-optimal strategic responses with positive belief correlations, but VCG aggregation restores dominant-strategy incentive compatibility and near-optimal performance.
Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.
A Bayesian data-fusion model combines AI predictions and manual labels from camera traps to yield improved ecological inference and uncertainty quantification for white-tailed deer body condition.
A kernel-based regression model plus scenario generation from forecast errors and a new Support Vector Sorting step produces ensemble price trajectories that improve both statistical accuracy and trading profits over benchmarks on German intraday continuous market data.
Q-SRDRN multi-quantile network with pinball loss and per-quantile heads detects extreme precipitation events up to 18 times more effectively than deterministic baselines while preserving augmentation benefits for the median.
Non-affine approval functions create unavoidable miscalibration in proper scoring rules for strategic agents, but step-function thresholds enable first-best screening without it, uniquely for the Brier score.
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
For heterogeneous power-p pseudospherical scoring rules with d ≤ 4, the True-KL0 property R(M,p,d) < 1 holds for all M > 1, establishing unconditional DSIC via a Prekopa-based log-concavity argument on the loss integral.
CERBERUS uses a three-headed encoder-decoder to predict zero-inflated probabilistic vertical radar reflectivity profiles from satellite and meteorological inputs.
HealDA supplies ML-based initial conditions for AI weather models that produce forecasts trailing ERA5-initialized runs by less than one day of effective lead time, with the skill gap arising mainly from initial error size.
A kernel-based regularized learning framework for FDR control that unifies arbitrary structures and supplies provably valid decision rules with likelihood-based tuning.
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
Bayesian deep learning method rankings are unstable at small sample sizes, dataset-dependent, and require uncertainty-aware evaluation using hierarchical models and minimum detectable difference curves.
Systematic benchmarking reveals that regression calibration metrics frequently disagree on recalibration quality, with ENCE and CWC identified as more consistent performers.
A post-processing pipeline applied to ECMWF subseasonal ensembles produces calibrated daily wind power forecasts for France that improve on climatology by 5-15% in CRPS up to 16 days ahead.
Standard count time series models with pandemic break indicators applied to US and Italian transplant data capture COVID deviations, show deceased-donor recovery to baselines, and find auxiliary COVID covariates add negligible predictive value beyond autoregressive and calendar terms.
citing papers explorer
-
Pandora's Regret: A Proper Scoring Rule for Evaluating Sequential Search
Pandora's Regret is a closed-form pairwise scoring rule derived from expected optimal search costs that elicits true probabilities and outperforms log loss, accuracy, and F1 at predicting diagnostic costs on MedMNIST models.
-
Valid and Expressive Copulas for Irregular Multivariate Time Series
CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.
-
When Individually Calibrated Models Become Collectively Miscalibrated
Individually calibrated predictors become collectively miscalibrated under Brier-optimal strategic responses with positive belief correlations, but VCG aggregation restores dominant-strategy incentive compatibility and near-optimal performance.
-
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context
Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.
-
Improving ecological inference and uncertainty quantification from camera trap data through the fusion of AI confidences and manual annotations
A Bayesian data-fusion model combines AI predictions and manual labels from camera traps to yield improved ecological inference and uncertainty quantification for white-tailed deer body condition.
-
Scenario generation of intraday electricity price paths for optimal trading in continuous markets
A kernel-based regression model plus scenario generation from forecast errors and a new Support Vector Sorting step produces ensemble price trajectories that improve both statistical accuracy and trading profits over benchmarks on German intraday continuous market data.
-
Multi-Quantile Regression for Extreme Precipitation Downscaling
Q-SRDRN multi-quantile network with pinball loss and per-quantile heads detects extreme precipitation events up to 18 times more effectively than deterministic baselines while preserving augmentation benefits for the median.
-
The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting
Non-affine approval functions create unavoidable miscalibration in proper scoring rules for strategic agents, but step-function thresholds enable first-best screening without it, uniquely for the Brier score.
-
Bayesian Modeling and Prediction of Generalized Contact Matrices
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
-
Perturbation is All You Need for Extrapolating Language Models
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
-
Honest Reporting in Scored Oversight: True-KL0 Property via the Prekopa Principle
For heterogeneous power-p pseudospherical scoring rules with d ≤ 4, the True-KL0 property R(M,p,d) < 1 holds for all M > 1, establishing unconditional DSIC via a Prekopa-based log-concavity argument on the loss integral.
-
CERBERUS: A Three-Headed Decoder for Vertical Cloud Profiles
CERBERUS uses a three-headed encoder-decoder to predict zero-inflated probabilistic vertical radar reflectivity profiles from satellite and meteorological inputs.
-
HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts
HealDA supplies ML-based initial conditions for AI weather models that produce forecasts trailing ERA5-initialized runs by less than one day of effective lead time, with the skill gap arising mainly from initial error size.
-
Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels
A kernel-based regularized learning framework for FDR control that unifies arbitrary structures and supplies provably valid decision rules with likelihood-based tuning.
-
Soft Learning
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
-
A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
-
Unstable Rankings in Bayesian Deep Learning Evaluation
Bayesian deep learning method rankings are unstable at small sample sizes, dataset-dependent, and require uncertainty-aware evaluation using hierarchical models and minimum detectable difference curves.
-
Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models
Systematic benchmarking reveals that regression calibration metrics frequently disagree on recalibration quality, with ENCE and CWC identified as more consistent performers.
-
Achieving Skilled and Reliable Daily Probabilistic Forecasts of Wind Power at Subseasonal-to-Seasonal Timescales over France
A post-processing pipeline applied to ECMWF subseasonal ensembles produces calibrated daily wind power forecasts for France that improve on climatology by 5-15% in CRPS up to 16 days ahead.
-
Scalable model selection for count time series with structural breaks: application to solid-organ transplantation during and after COVID-19 in the USA and Italy
Standard count time series models with pandemic break indicators applied to US and Italian transplant data capture COVID deviations, show deceased-donor recovery to baselines, and find auxiliary COVID covariates add negligible predictive value beyond autoregressive and calendar terms.
- ECUAS$_n$: A family of metrics for principled evaluation of uncertainty-augmented systems
- A Penalty-Free Pipeline for Direct Quantum-Annealer Portfolio Optimization