Conditional optimal transport is used to turn raw PRM outputs into monotonic quantile functions that improve calibration and downstream Best-of-N performance on MATH-500 and AIME.
Verification of forecasts expressed in terms of probability.Monthly weather review, 78(1):1–3
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
MERIT achieves 81.65% F1 on MMFakeBench for multimodal misinformation detection via a four-module framework, outperforming zero-shot baselines like GPT-4V with MMD-Agent at 74.0% F1, with gains attributed to architectural design.
Hierarchical multinomial logistic regression models outperform a baseline in probabilistic and point accuracy for SARS-CoV-2 variant nowcasts and forecasts, performing best in high-data locations.
citing papers explorer
-
Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport
Conditional optimal transport is used to turn raw PRM outputs into monotonic quantile functions that improve calibration and downstream Best-of-N performance on MATH-500 and AIME.
-
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
-
MERIT: Modular Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning
MERIT achieves 81.65% F1 on MMFakeBench for multimodal misinformation detection via a four-module framework, outperforming zero-shot baselines like GPT-4V with MMD-Agent at 74.0% F1, with gains attributed to architectural design.
-
Comparison of probabilistic nowcasts and forecasts of SARS-CoV-2 variant proportions made by hierarchical multinomial linear regression models
Hierarchical multinomial logistic regression models outperform a baseline in probabilistic and point accuracy for SARS-CoV-2 variant nowcasts and forecasts, performing best in high-data locations.
- DynMuon: A Dynamic Spectral Shaping View of Muon