Language models with conformal factuality guarantees

URL https: //arxiv · 2024 · arXiv 2402.10978

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

cs.CL · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.

PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

PASC converts multi-stage joint coverage into a single scalar conformal problem on the joint max nonconformity score, delivering finite-sample distribution-free guarantees and higher empirical coverage than Bonferroni or independent calibration.

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.

IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

cs.CL · 2026-04-16 · unverdicted · novelty 5.0

IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

cs.CL · 2026-05-12

citing papers explorer

Showing 7 of 7 citing papers.

Adaptive Stopping for Multi-Turn LLM Reasoning cs.CL · 2026-04-01 · unverdicted · none · ref 15
MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.
Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems cs.CL · 2026-04-19 · unverdicted · none · ref 11 · 2 links
Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.
PASC: Pipeline-Aware Conformal Prediction with Joint Coverage Guarantees for Multi-Stage NLP and LLM Pipelines cs.LG · 2026-05-12 · unverdicted · none · ref 8
PASC converts multi-stage joint coverage into a single scalar conformal problem on the joint max nonconformity score, delivering finite-sample distribution-free guarantees and higher empirical coverage than Bonferroni or independent calibration.
Geometry-Calibrated Conformal Abstention for Language Models cs.CL · 2026-04-30 · unverdicted · none · ref 27
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement cs.LG · 2026-05-14 · unverdicted · none · ref 17
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation cs.CL · 2026-04-16 · unverdicted · none · ref 30
IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations cs.CL · 2026-05-12 · unreviewed · ref 175

Language models with conformal factuality guarantees

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer