Conformal language modeling

Victor Quach, Adam Fisch, Tal Schuster, Adam Yala, Jae Ho Sohn, Tommi S Jaakkola, Regina Barzilay · 2023 · arXiv 2306.10193

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

cs.CL · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.

Flow-Based Conformal Predictive Distributions

stat.ML · 2026-02-07 · unverdicted · novelty 7.0

Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.

Decomposition-Based Modular Conformal Prediction for Two-Stage Modeling

stat.ML · 2025-10-06 · unverdicted · novelty 7.0

A decomposition-based modular conformal prediction method for two-stage models with FWER-controlled stage-wise scaling and adaptive extension for non-stationary data.

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.

Empirical Bayes Conformal Prediction for Vision and Language Models

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.

Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

cs.LO · 2026-05-13 · unverdicted · novelty 6.0

Introduces a trust-boundary architecture in Lean 4 with three certificate families and two operators that deliver sorry-free, axiom-audited assurances for LLM pipeline components.

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

Capability Self-Assessment: Teaching LLMs to Know Their Limits

cs.AI · 2026-05-29 · unverdicted · novelty 5.0

Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

cs.AI · 2023-08-10 · accept · novelty 5.0

Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

citing papers explorer

Showing 10 of 10 citing papers.

Adaptive Stopping for Multi-Turn LLM Reasoning cs.CL · 2026-04-01 · unverdicted · none · ref 17
MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.
Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems cs.CL · 2026-04-19 · unverdicted · none · ref 12 · 2 links
Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.
Flow-Based Conformal Predictive Distributions stat.ML · 2026-02-07 · unverdicted · none · ref 35
Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.
Decomposition-Based Modular Conformal Prediction for Two-Stage Modeling stat.ML · 2025-10-06 · unverdicted · none · ref 21
A decomposition-based modular conformal prediction method for two-stage models with FWER-controlled stage-wise scaling and adaptive extension for non-stationary data.
Entropy Distribution as a Fingerprint for Hallucinations in Generative Models cs.AI · 2026-05-27 · unverdicted · none · ref 38
Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.
Empirical Bayes Conformal Prediction for Vision and Language Models cs.LG · 2026-05-22 · unverdicted · none · ref 28
Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.
Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture cs.LO · 2026-05-13 · unverdicted · partial · ref 49
Introduces a trust-boundary architecture in Lean 4 with three certificate families and two operators that deliver sorry-free, axiom-audited assurances for LLM pipeline components.
Geometry-Calibrated Conformal Abstention for Language Models cs.CL · 2026-04-30 · unverdicted · none · ref 28
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
Capability Self-Assessment: Teaching LLMs to Know Their Limits cs.AI · 2026-05-29 · unverdicted · none · ref 50
Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment cs.AI · 2023-08-10 · accept · none · ref 117
Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

Conformal language modeling

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer