JECS aggregates per-model conformal p-values via their maximum and reconstructs a conservative envelope of the max-p null distribution to select benchmarks with global contamination rate control.
Testing for outliers with conformal p-values
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
GAIF dynamically adjusts testing thresholds with feedback for finite-sample FDR control in sequential settings and extends to conformal selection via feedback-driven model selection.
citing papers explorer
-
Provable Joint Decontamination for Benchmarking Multiple Large Language Models
JECS aggregates per-model conformal p-values via their maximum and reconstructs a conservative envelope of the max-p null distribution to select benchmarks with global contamination rate control.
-
Feedback-Enhanced Online Multiple Testing with Applications to Conformal Selection
GAIF dynamically adjusts testing thresholds with feedback for finite-sample FDR control in sequential settings and extends to conformal selection via feedback-driven model selection.