Mixed citations

Title resolution pending

Tilmann Gneiting, Adrian E Raftery · 2007 · Journal of the American Statistical Association · DOI 10.1198/016214506000001437

Mixed citation behavior. Most common role is background (60%).

58 Pith papers citing it

3,941 external citations · Crossref

Background 60% of classified citations

open at publisher browse 58 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3 method 2

citation-polarity summary

background 3 use method 2

representative citing papers

Pandora's Regret: A Proper Scoring Rule for Evaluating Sequential Search

cs.LG · 2026-05-03 · conditional · novelty 8.0

Pandora's Regret is a closed-form pairwise scoring rule derived from expected optimal search costs that elicits true probabilities and outperforms log loss, accuracy, and F1 at predicting diagnostic costs on MedMNIST models.

Calibrated Probability Forecast Sequences and Measure-Valued Martingales

math.ST · 2026-06-30 · unverdicted · novelty 7.0

Auto-calibration of forecast sequences equals measure-valued martingales, enabling a statistical test for calibration of updating predictions.

Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

EQMs, sixty LLM-scored reasoning patterns, predict forecast accuracy at both item and person levels and outperform prior text-analysis methods in a large pre-registered tournament dataset.

MACROCAST: A Vintage-Consistent Time Series Foundation Model for Real-Time Macroeconomic Forecasting

econ.EM · 2026-06-27 · unverdicted · novelty 7.0

MACROCAST is the first leakage-free time series foundation model for real-time macroeconomic forecasting, trained exclusively on synthetic series and vintage data, outperforming AR(1), Chronos-2, BVAR, and DFM benchmarks on FRED-MD.

ForecastBench-Sim: A Simulated-World Forecasting Benchmark

cs.AI · 2026-06-17 · unverdicted · novelty 7.0

ForecastBench-Sim is a simulated-world benchmark using Freeciv game rollouts to generate resolvable forecasting questions at arbitrary horizons with paired intervention worlds.

Expected Free Energy-based Planning as Variational Inference

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.

Logistic Credibility with Temporal Decay: Extending B\"uhlmann--Straub for Commercial Lines

stat.AP · 2026-06-07 · conditional · novelty 7.0

A logistic credibility model with data-driven temporal decay restores calibration slope to 1.00 and reduces exposure-weighted error by 38% versus standard Bühlmann-Straub on US commercial auto held-out data.

Proper Scoring Rules for Right-Censored Survival Data

cs.LG · 2026-06-04 · conditional · novelty 7.0

A mapping of predictive distributions through the censoring mechanism yields proper right-censored versions of the CRPS, Brier score, energy score and other losses, with the marginalized form proven proper under conditional independent censoring.

What Type of Inference is Active Inference?

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.

FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance

q-fin.CP · 2026-06-02 · conditional · novelty 7.0

FinStressTS is a parametric synthetic benchmark with 30 environments across six mechanism families for evaluating point and probabilistic forecasting models on financial time series.

Stabilizing distribution-free probabilistic forecasts

cs.LG · 2026-05-27 · unverdicted · novelty 7.0

Neural network-parameterized regression splines enable joint optimization of forecast quality and stability in distribution-free probabilistic time series models by penalizing dissimilarities from forecast updates.

Proper Scoring Rules for Agentic Uncertainty Quantification

cs.AI · 2026-05-23 · unverdicted · novelty 7.0

Introduces Trajectory Proper Score (TPS) as a strictly proper family of trajectory-level scoring rules that elicits the complete prefix-conditioned success probability process.

Valid and Expressive Copulas for Irregular Multivariate Time Series

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.

When Individually Calibrated Models Become Collectively Miscalibrated

cs.LG · 2026-05-14 · conditional · novelty 7.0

Individually calibrated predictors become collectively miscalibrated under Brier-optimal strategic responses with positive belief correlations, but VCG aggregation restores dominant-strategy incentive compatibility and near-optimal performance.

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.

Decision-Aligned Evaluation of Uncertainty Quantification

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

Introduces decision-alignment to evaluate uncertainty metrics against downstream decision utilities and proposes prior-weighted proper scoring rules that align better in benchmarks and case studies.

Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers

cs.GT · 2026-06-24 · unverdicted · novelty 6.0

Designs a leave-one-out contrastive scoring rule penalty to restore incentive compatibility for prosumers in two-stage energy markets under linear preferences.

Learning Dynamical Systems from Multiple Sparse Datasets: A Hierarchical Bayesian Modeling Approach

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

A hierarchical Bayesian framework pools information across sparse dynamical system datasets via a shared population distribution to improve parameter inference and prediction over unpooled approaches.

The Degeneracy Distillery

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

A method called the degeneracy distillery uses symbolic transformations to flatten the Fisher information matrix globally from simulations alone, identifying independent parameter combinations and reducing neural posterior estimation simulation budgets by up to 10x.

Hierarchical Bayes meets hierarchical forecasting: A flexible framework for level-focused forecasts

stat.ME · 2026-06-22 · unverdicted · novelty 6.0

A Bayesian hierarchical model integrates coherence penalization and level-specific focus into forecasting estimation, yielding improved predictive accuracy on simulated and Australian tourism data.

To select or not to select: predictively consistent priors instead of model selection

stat.ME · 2026-06-22 · unverdicted · novelty 6.0

Predictively consistent priors let complex Bayesian models match or beat the out-of-sample performance of selected simpler models across linear, logistic, and nonlinear examples without explicit selection.

Temporal Coarse-Graining of Multi-Sector Default Count Data Generates Posterior-Implied Copulas

q-fin.RM · 2026-06-20 · unverdicted · novelty 6.0 · 2 refs

A low-rank dynamic factor model with AR(1) latent states and binomial observations, when aggregated over time, generates horizon-dependent posterior-implied copulas that reproduce annual eigenvalue amplification on S&P sector default data and improve some forecast scores.

On the QUEST for Uncertainty Quantification via Highest Density Regions

cs.LG · 2026-06-17 · unverdicted · novelty 6.0

QUEST measures uncertainty via the Lebesgue volume of highest-density regions of a distribution's support, evaluated at robustness parameter alpha, and claims to satisfy UQ axioms while outperforming variance and differential entropy on selective prediction tasks.

Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting

cs.LG · 2026-06-07 · unverdicted · novelty 6.0

Tyan-WP is a pretrained wind power foundation model that outperforms site-specific TSMs and generic LTSMs in zero-shot ultra-short-term probabilistic forecasting on U.S. and U.K. sites via static embeddings and PAMF module.

citing papers explorer

Showing 50 of 56 citing papers after filters.

Pandora's Regret: A Proper Scoring Rule for Evaluating Sequential Search cs.LG · 2026-05-03 · conditional · none · ref 192
Pandora's Regret is a closed-form pairwise scoring rule derived from expected optimal search costs that elicits true probabilities and outperforms log loss, accuracy, and F1 at predicting diagnostic costs on MedMNIST models.
Calibrated Probability Forecast Sequences and Measure-Valued Martingales math.ST · 2026-06-30 · unverdicted · none · ref 7
Auto-calibration of forecast sequences equals measure-valued martingales, enabling a statistical test for calibration of updating predictions.
Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments cs.CL · 2026-06-29 · unverdicted · none · ref 91
EQMs, sixty LLM-scored reasoning patterns, predict forecast accuracy at both item and person levels and outperform prior text-analysis methods in a large pre-registered tournament dataset.
MACROCAST: A Vintage-Consistent Time Series Foundation Model for Real-Time Macroeconomic Forecasting econ.EM · 2026-06-27 · unverdicted · none · ref 180
MACROCAST is the first leakage-free time series foundation model for real-time macroeconomic forecasting, trained exclusively on synthetic series and vintage data, outperforming AR(1), Chronos-2, BVAR, and DFM benchmarks on FRED-MD.
ForecastBench-Sim: A Simulated-World Forecasting Benchmark cs.AI · 2026-06-17 · unverdicted · none · ref 3
ForecastBench-Sim is a simulated-world benchmark using Freeciv game rollouts to generate resolvable forecasting questions at arbitrary horizons with paired intervention worlds.
Expected Free Energy-based Planning as Variational Inference cs.AI · 2026-06-09 · unverdicted · none · ref 103
EFE-based planning is formulated as variational free energy minimization with epistemic priors, decomposing into expected plan costs plus a complexity term.
Logistic Credibility with Temporal Decay: Extending B\"uhlmann--Straub for Commercial Lines stat.AP · 2026-06-07 · conditional · none · ref 5
A logistic credibility model with data-driven temporal decay restores calibration slope to 1.00 and reduces exposure-weighted error by 38% versus standard Bühlmann-Straub on US commercial auto held-out data.
Proper Scoring Rules for Right-Censored Survival Data cs.LG · 2026-06-04 · conditional · none · ref 1
A mapping of predictive distributions through the censoring mechanism yields proper right-censored versions of the CRPS, Brier score, energy score and other losses, with the marginalized form proven proper under conditional independent censoring.
What Type of Inference is Active Inference? cs.AI · 2026-06-03 · unverdicted · none · ref 114
EFE-based active inference planning is characterized as VFE on an augmented model plus entropy and planning corrections, with a derived message-passing implementation and grid-world validation.
FinStressTS: A Parametric Synthetic Benchmark for Time-Series Forecasting in Finance q-fin.CP · 2026-06-02 · conditional · none · ref 25
FinStressTS is a parametric synthetic benchmark with 30 environments across six mechanism families for evaluating point and probabilistic forecasting models on financial time series.
Stabilizing distribution-free probabilistic forecasts cs.LG · 2026-05-27 · unverdicted · none · ref 13
Neural network-parameterized regression splines enable joint optimization of forecast quality and stability in distribution-free probabilistic time series models by penalizing dissimilarities from forecast updates.
Proper Scoring Rules for Agentic Uncertainty Quantification cs.AI · 2026-05-23 · unverdicted · none · ref 6
Introduces Trajectory Proper Score (TPS) as a strictly proper family of trajectory-level scoring rules that elicits the complete prefix-conditioned success probability process.
Valid and Expressive Copulas for Irregular Multivariate Time Series cs.LG · 2026-05-22 · unverdicted · none · ref 44
CopFITi is the first marginalization-consistent copula for irregular multivariate time series, using normalizing flows for marginals and a Gaussian mixture copula for dependencies to reach new state-of-the-art joint density modeling.
When Individually Calibrated Models Become Collectively Miscalibrated cs.LG · 2026-05-14 · conditional · none · ref 60
Individually calibrated predictors become collectively miscalibrated under Brier-optimal strategic responses with positive belief correlations, but VCG aggregation restores dominant-strategy incentive compatibility and near-optimal performance.
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context cs.CL · 2026-04-22 · unverdicted · none · ref 45
Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.
Decision-Aligned Evaluation of Uncertainty Quantification cs.LG · 2026-06-25 · unverdicted · none · ref 10
Introduces decision-alignment to evaluate uncertainty metrics against downstream decision utilities and proposes prior-weighted proper scoring rules that align better in benchmarks and case studies.
Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers cs.GT · 2026-06-24 · unverdicted · none · ref 11
Designs a leave-one-out contrastive scoring rule penalty to restore incentive compatibility for prosumers in two-stage energy markets under linear preferences.
Learning Dynamical Systems from Multiple Sparse Datasets: A Hierarchical Bayesian Modeling Approach cs.LG · 2026-06-23 · unverdicted · none · ref 22
A hierarchical Bayesian framework pools information across sparse dynamical system datasets via a shared population distribution to improve parameter inference and prediction over unpooled approaches.
The Degeneracy Distillery cs.LG · 2026-06-22 · unverdicted · none · ref 15
A method called the degeneracy distillery uses symbolic transformations to flatten the Fisher information matrix globally from simulations alone, identifying independent parameter combinations and reducing neural posterior estimation simulation budgets by up to 10x.
Hierarchical Bayes meets hierarchical forecasting: A flexible framework for level-focused forecasts stat.ME · 2026-06-22 · unverdicted · none · ref 22
A Bayesian hierarchical model integrates coherence penalization and level-specific focus into forecasting estimation, yielding improved predictive accuracy on simulated and Australian tourism data.
To select or not to select: predictively consistent priors instead of model selection stat.ME · 2026-06-22 · unverdicted · none · ref 167
Predictively consistent priors let complex Bayesian models match or beat the out-of-sample performance of selected simpler models across linear, logistic, and nonlinear examples without explicit selection.
Temporal Coarse-Graining of Multi-Sector Default Count Data Generates Posterior-Implied Copulas q-fin.RM · 2026-06-20 · unverdicted · none · ref 8 · 2 links
A low-rank dynamic factor model with AR(1) latent states and binomial observations, when aggregated over time, generates horizon-dependent posterior-implied copulas that reproduce annual eigenvalue amplification on S&P sector default data and improve some forecast scores.
On the QUEST for Uncertainty Quantification via Highest Density Regions cs.LG · 2026-06-17 · unverdicted · none · ref 17
QUEST measures uncertainty via the Lebesgue volume of highest-density regions of a distribution's support, evaluated at robustness parameter alpha, and claims to satisfy UQ axioms while outperforming variance and differential entropy on selective prediction tasks.
Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting cs.LG · 2026-06-07 · unverdicted · none · ref 49
Tyan-WP is a pretrained wind power foundation model that outperforms site-specific TSMs and generic LTSMs in zero-shot ultra-short-term probabilistic forecasting on U.S. and U.K. sites via static embeddings and PAMF module.
Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining cs.LG · 2026-06-06 · unverdicted · none · ref 3
Equal-weight mixture of synthetic generators matches or exceeds best single generator for time series foundation model pretraining and strengthens further with real data.
Scalable Uncertainty Quantification for Extreme Weather Forecasting via Empirical Neural Tangent Kernels cs.LG · 2026-06-01 · unverdicted · none · ref 12
NTK-UQ produces 31-37% sharper 90% prediction intervals than split conformal prediction for extreme weather forecasts, with adaptive scaling via architecture-dependent eigenvalue truncation and ICA decomposition of last-layer features.
Probabilistic storyline attribution using machine learning stat.AP · 2026-06-01 · unverdicted · none · ref 42
Distributional autoencoders trained on climate model simulations model full conditional distributions of European temperature fields to enable probabilistic storyline attribution, illustrated by higher intensities and probability ratios for a 2003-like heatwave in 2028 and 2053.
Escaping the Mode Lottery: Multi-Response Training Improves Language Model Generalization cs.LG · 2026-05-30 · unverdicted · none · ref 19
Multi-response training retains multiple responses per prompt to reduce uncertainty about the conditional output distribution, yielding improved distributional generalization especially in high response-diversity and low prompt-redundancy regimes.
Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP astro-ph.IM · 2026-05-26 · unverdicted · none · ref 38
Attentive Neural Processes outperform Gaussian Processes and neural networks on light curve interpolation quality, feature recovery, calibration, and speed for 15 transient classes under realistic Rubin cadences.
Rashomon-Seeded Annealing for Robust Bayesian Inference in Factorial Designs stat.ME · 2026-05-21 · unverdicted · none · ref 12
Rashomon-seeded annealing repurposes Rashomon sets as warm starts for annealed importance sampling to enable full posterior inference in factorial designs without exhaustive enumeration.
ECUAS$_n$: A family of metrics for principled evaluation of uncertainty-augmented systems cs.AI · 2026-05-19 · unverdicted · none · ref 32 · 3 links
ECUAS_n is a parameterized family of proper scoring rules for jointly assessing prediction accuracy and uncertainty quality in automated decision systems.
A Penalty-Free Pipeline for Direct Quantum-Annealer Portfolio Optimization quant-ph · 2026-05-17 · conditional · none · ref 12
A penalty-free pipeline samples an objective-only QUBO on D-Wave hardware and enforces cardinality classically, cutting chain-break fractions from 71-92% to at most 0.04% across tested equity and betting instances.
Improving ecological inference and uncertainty quantification from camera trap data through the fusion of AI confidences and manual annotations stat.AP · 2026-05-13 · unverdicted · none · ref 4
A Bayesian data-fusion model combines AI predictions and manual labels from camera traps to yield improved ecological inference and uncertainty quantification for white-tailed deer body condition.
Scenario generation of intraday electricity price paths for optimal trading in continuous markets stat.AP · 2026-05-13 · unverdicted · none · ref 7
A kernel-based regression model plus scenario generation from forecast errors and a new Support Vector Sorting step produces ensemble price trajectories that improve both statistical accuracy and trading profits over benchmarks on German intraday continuous market data.
Multi-Quantile Regression for Extreme Precipitation Downscaling cs.LG · 2026-05-12 · unverdicted · none · ref 11
Q-SRDRN multi-quantile network with pinball loss and per-quantile heads detects extreme precipitation events up to 18 times more effectively than deterministic baselines while preserving augmentation benefits for the median.
The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting cs.GT · 2026-05-08 · unverdicted · none · ref 27
Non-affine approval functions create unavoidable miscalibration in proper scoring rules for strategic agents, but step-function thresholds enable first-best screening without it, uniquely for the Brier score.
Bayesian Modeling and Prediction of Generalized Contact Matrices stat.ME · 2026-05-07 · unverdicted · none · ref 97
A Bayesian model for multi-feature contact matrices that uses tensor structures and contingency table theory to satisfy structural constraints and impute missing contact features, validated on simulations and US/German survey data.
Perturbation is All You Need for Extrapolating Language Models stat.ML · 2026-05-05 · unverdicted · none · ref 59
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
Honest Reporting in Scored Oversight: True-KL0 Property via the Prekopa Principle cs.GT · 2026-05-05 · conditional · none · ref 16
For heterogeneous power-p pseudospherical scoring rules with d ≤ 4, the True-KL0 property R(M,p,d) < 1 holds for all M > 1, establishing unconditional DSIC via a Prekopa-based log-concavity argument on the loss integral.
CERBERUS: A Three-Headed Decoder for Vertical Cloud Profiles physics.ao-ph · 2026-04-09 · unverdicted · none · ref 1
CERBERUS uses a three-headed encoder-decoder to predict zero-inflated probabilistic vertical radar reflectivity profiles from satellite and meteorological inputs.
HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts physics.ao-ph · 2026-01-25 · conditional · none · ref 58
HealDA supplies ML-based initial conditions for AI weather models that produce forecasts trailing ERA5-initialized runs by less than one day of effective lead time, with the skill gap arising mainly from initial error size.
Otter Weather: Skillful and Computationally Efficient Medium-Range Weather Forecasting cs.LG · 2026-06-24 · unverdicted · none · ref 10
Otter Weather is a spatiotemporal model that outperforms NWP baselines by 9.6% at 24h lead with under 3.5 A100-days training and extends efficiency gains to probabilistic forecasting via CRPS.
Reliability of Probabilistic Emulation of Physical Systems cs.LG · 2026-06-11 · unverdicted · none · ref 6
CRPS-trained ensembles achieve better uncertainty reliability and speed than latent generative models for probabilistic emulation of 2D physical systems.
Variational Proximal Policy Optimization stat.ML · 2026-06-06 · unverdicted · none · ref 115
VP2O maps PPO to SVGD in a MoE architecture using functional kernels and expert orthogonalization, claiming +179 ELO on Codeforces and 32% token reduction on AIME for a 33B/4B model.
When Should Forecasting Models Be Re-Specified? A Cost-Sensitive Trigger for Adaptive Model-Form Updating stat.AP · 2026-06-04 · unverdicted · none · ref 6
A cost-sensitive trigger using specification debt for deciding when to re-specify forecasting model forms, shown on M4 data to match full-update accuracy at 28% of the compute cost.
Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels stat.ME · 2026-05-17 · unverdicted · none · ref 12
A kernel-based regularized learning framework for FDR control that unifies arbitrary structures and supplies provably valid decision rules with likelihood-based tuning.
Soft Learning cs.LG · 2026-05-16 · unverdicted · none · ref 55
Soft Learning optimally combines heterogeneous ML specialists via cross-validated non-negative least squares, achieving top performance on 70% of 37 datasets with formal guarantees and 72-435x CPU speedups over deep networks.
A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning cs.LG · 2026-04-25 · unverdicted · none · ref 6
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
Unstable Rankings in Bayesian Deep Learning Evaluation cs.LG · 2026-04-25 · unverdicted · none · ref 7
Bayesian deep learning method rankings are unstable at small sample sizes, dataset-dependent, and require uncertainty-aware evaluation using hierarchical models and minimum detectable difference curves.
Adaptive COVID-19 Trajectory Forecasting Using MAB-Inspired Ensemble Weighting q-bio.QM · 2026-06-17 · unverdicted · none · ref 43
EXP3-based adaptive ensembles achieved the lowest mean weighted interval scores for COVID-19 incidence forecasts compared with individual models and simple ensemble baselines.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer