RISC reformulates self-consistency answer selection as a ranking task solved by a lightweight LambdaRank model with five hand-designed features, yielding better accuracy-efficiency trade-offs than majority voting on QA benchmarks.
LM -Polygraph: Uncertainty Estimation for Language Models
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 6roles
extension 1polarities
extend 1representative citing papers
A regression model using attention features and recurrent uncertainty scores improves selective generation in LLMs over unsupervised and supervised baselines on ten datasets and three models.
A factorized study finds raw hidden states and attention features hard to beat in-domain for LLM uncertainty probes, but structured compressed features are more robust under distribution shift, with pretrained probes transferring to open-ended generation.
ThinkBooster supplies a modular library, joint performance-efficiency benchmark, and deployable proxy for test-time compute scaling of LLM reasoning on math and coding tasks.
SIVR detects LLM hallucinations by learning from token-wise and layer-wise variance patterns in internal hidden states, outperforming baselines with better generalization and less training data.
IUQ quantifies claim-level uncertainty in long-form LLM generation by combining inter-sample consistency and intra-sample faithfulness through an interrogate-then-respond approach and outperforms baselines on two datasets.
citing papers explorer
-
From Signals to Transfer: A Factorised Study of Probe-Based Uncertainty Estimation in Large Language Models
A factorized study finds raw hidden states and attention features hard to beat in-domain for LLM uncertainty probes, but structured compressed features are more robust under distribution shift, with pretrained probes transferring to open-ended generation.