Derives simultaneous high-probability upper bounds on realized FDP for conformal p-values that hold for arbitrary post-hoc thresholds via an envelope on the null empirical distribution function.
hub
Advances in neural information processing systems , volume=
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 10roles
method 2polarities
use method 2representative citing papers
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
A regime theory selects the optimal controller class for LLM action decisions from a nested lattice of four classes using three data-estimable bottlenecks, with a Bernstein-tight threshold and empirical matches on multiple benchmarks.
Structural dependency graphs and staged pre-execution verification raise LLM-based EDA code pass rates to 82.5% (single-step) and 70-84% (multi-step) while halving tool calls by catching dependency violations before runtime.
BalanceRAG uses sequential graphical testing on a 2D lattice of threshold pairs to certify safe operating points that meet target risk levels in cascaded RAG while increasing coverage.
SCA applies the Information Bottleneck principle via NIBS and GIBS methods to identify erroneous steps in black-box LLM reasoning and boosts self-correction success by up to 13.5%.
Conflicting biomedical evidence triggers order-dependent prediction flips in RAG LLMs, and a new abstention score combining confidence with conflict detection raises selective accuracy by 7-33 points in the hardest conditions.
ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.
CHASE improves selective prediction under ambiguity by optimizing a ranking-aware selector over margins between competing temporal hypotheses, yielding up to 11% better alignment and 8.8% higher three-way accuracy than baselines on GUV-inspired tasks.
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
citing papers explorer
-
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
-
Post-hoc Selective Classification for Reliable Synthetic Image Detection
ReSIDe generalizes logit-based confidence scores to intermediate layers of synthetic image detectors and uses preference optimization to aggregate them, cutting area under the risk-coverage curve by up to 69.55% under covariate shifts.