MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.
hub
Conformal language modeling
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Introduces object-level semantic uncertainty for VLM memory, the UQ-DAAAM refinement system, and probabilistic guarantees that selected high-quality views reduce uncertainty more effectively.
Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.
Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.
A decomposition-based modular conformal prediction method for two-stage models with FWER-controlled stage-wise scaling and adaptive extension for non-stationary data.
Introduces policy-coupled coverage for conformal prediction in counterfactual decisions and the PC-RACP procedure that achieves higher utility with finite-sample coverage guarantees.
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.
Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.
Introduces a trust-boundary architecture in Lean 4 with three certificate families and two operators that deliver sorry-free, axiom-audited assurances for LLM pipeline components.
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
The paper introduces an optimization framework for AI agents to strategically seek support, proving a threshold policy on support value and providing an online algorithm to control missed-support error without distributional assumptions.
Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.
Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.
citing papers explorer
-
Adaptive Stopping for Multi-Turn LLM Reasoning
MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.
-
Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees
Introduces object-level semantic uncertainty for VLM memory, the UQ-DAAAM refinement system, and probabilistic guarantees that selected high-quality views reduce uncertainty more effectively.
-
Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems
Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.
-
Flow-Based Conformal Predictive Distributions
Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.
-
Decomposition-Based Modular Conformal Prediction for Two-Stage Modeling
A decomposition-based modular conformal prediction method for two-stage models with FWER-controlled stage-wise scaling and adaptive extension for non-stationary data.
-
Prediction Sets for Counterfactual Decisions: Coverage, Optimality, and Conformal Prediction
Introduces policy-coupled coverage for conformal prediction in counterfactual decisions and the PC-RACP procedure that achieves higher utility with finite-sample coverage guarantees.
-
Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
-
Entropy Distribution as a Fingerprint for Hallucinations in Generative Models
Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.
-
Empirical Bayes Conformal Prediction for Vision and Language Models
Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.
-
Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture
Introduces a trust-boundary architecture in Lean 4 with three certificate families and two operators that deliver sorry-free, axiom-audited assurances for LLM pipeline components.
-
Geometry-Calibrated Conformal Abstention for Language Models
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
-
Strategic Decision Support for AI Agents
The paper introduces an optimization framework for AI agents to strategically seek support, proving a threshold policy on support value and providing an online algorithm to control missed-support error without distributional assumptions.
-
Capability Self-Assessment: Teaching LLMs to Know Their Limits
Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.
-
Online Safety Monitoring for LLMs
Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.