Exploration-Commitment Decoupling instantiated as Calibration-Aware Generation improves long-form factuality by up to 13% and reduces decoding time by up to 37% on five benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
SCAN is a framework for fine-grained LLM capability assessment via automatic taxonomy construction from queries, query synthesis for coverage, visualization tools, and a PC2-enhanced LLM-as-a-judge method, applied to 21 models showing intra-family variations.
citing papers explorer
-
Only Say What You Know: Calibration-Aware Generation for Long-Form Factuality
Exploration-Commitment Decoupling instantiated as Calibration-Aware Generation improves long-form factuality by up to 13% and reduces decoding time by up to 37% on five benchmarks.
-
SCAN: Structured Capability Assessment and Navigation for LLMs
SCAN is a framework for fine-grained LLM capability assessment via automatic taxonomy construction from queries, query synthesis for coverage, visualization tools, and a PC2-enhanced LLM-as-a-judge method, applied to 21 models showing intra-family variations.