hub

Conformal language modeling

Quach, V · 2024 · arXiv 2306.10193

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Adaptive Stopping for Multi-Turn LLM Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 8.0

MiCP is the first conformal prediction method for multi-turn LLM pipelines that allocates per-turn error budgets to enable adaptive stopping with an overall coverage guarantee, shown to reduce turns and cost on RAG and ReAct benchmarks.

Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees

cs.CV · 2026-06-06 · unverdicted · novelty 7.0

Introduces object-level semantic uncertainty for VLM memory, the UQ-DAAAM refinement system, and probabilistic guarantees that selected high-quality views reduce uncertainty more effectively.

Answer Only as Precisely as Justified: Calibrated Claim-Level Specificity Control for Agentic Systems

cs.CL · 2026-04-19 · unverdicted · novelty 7.0 · 2 refs

Compositional selective specificity (CSS) decomposes generated answers into claims and emits each at the most specific level supported by evidence, raising overcommitment-aware utility from 0.846 to 0.913 on LongFact while retaining 0.938 specificity.

Flow-Based Conformal Predictive Distributions

stat.ML · 2026-02-07 · unverdicted · novelty 7.0

Differentiable nonconformity scores induce flows that sample conformal prediction set boundaries, and mixing flows across levels produces conformal predictive distributions whose quantiles match the sets.

Decomposition-Based Modular Conformal Prediction for Two-Stage Modeling

stat.ML · 2025-10-06 · unverdicted · novelty 7.0

A decomposition-based modular conformal prediction method for two-stage models with FWER-controlled stage-wise scaling and adaptive extension for non-stationary data.

Prediction Sets for Counterfactual Decisions: Coverage, Optimality, and Conformal Prediction

stat.ML · 2026-07-02 · unverdicted · novelty 6.0

Introduces policy-coupled coverage for conformal prediction in counterfactual decisions and the PC-RACP procedure that achieves higher utility with finite-sample coverage guarantees.

Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.

Empirical Bayes Conformal Prediction for Vision and Language Models

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.

Proof-Carrying Certificates for LLM Pipelines: A Trust-Boundary Architecture

cs.LO · 2026-05-13 · unverdicted · novelty 6.0

Introduces a trust-boundary architecture in Lean 4 with three certificate families and two operators that deliver sorry-free, axiom-audited assurances for LLM pipeline components.

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

Strategic Decision Support for AI Agents

cs.AI · 2026-06-10 · unverdicted · novelty 5.0

The paper introduces an optimization framework for AI agents to strategically seek support, proving a threshold policy on support value and providing an online algorithm to control missed-support error without distributional assumptions.

Capability Self-Assessment: Teaching LLMs to Know Their Limits

cs.AI · 2026-05-29 · unverdicted · novelty 5.0

Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

cs.AI · 2023-08-10 · accept · novelty 5.0

Survey organizes LLM trustworthiness into seven categories and 29 sub-categories, measures eight sub-categories on popular models, and finds that more aligned models generally score higher but with varying effectiveness.

Online Safety Monitoring for LLMs

cs.AI · 2026-07-02 · unverdicted · novelty 3.0

Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds cs.AI · 2026-06-28 · unverdicted · none · ref 34
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
Entropy Distribution as a Fingerprint for Hallucinations in Generative Models cs.AI · 2026-05-27 · unverdicted · none · ref 38
Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.
Strategic Decision Support for AI Agents cs.AI · 2026-06-10 · unverdicted · none · ref 66
The paper introduces an optimization framework for AI agents to strategically seek support, proving a threshold policy on support value and providing an online algorithm to control missed-support error without distributional assumptions.
Capability Self-Assessment: Teaching LLMs to Know Their Limits cs.AI · 2026-05-29 · unverdicted · none · ref 50
Reinforcement learning teaches LLMs to assess their own capabilities more effectively than supervised fine-tuning, preserves original skills, generalizes out of distribution, and aids local-cloud routing and data selection.
Online Safety Monitoring for LLMs cs.AI · 2026-07-02 · unverdicted · none · ref 23
Simple thresholding on an external verifier signal, calibrated by risk control, performs competitively with sequential hypothesis testing monitors on math reasoning and red-teaming datasets.

Conformal language modeling

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer