TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
method 2polarities
use method 2representative citing papers
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.
citing papers explorer
-
Efficient Test-Time Scaling via Temporal Reasoning Aggregation
TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.
-
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
-
Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning
Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.