Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

Beiduo Chen , Tiancheng Hu , Caiqi Zhang , Robert Litschko , Anna Korhonen , Barbara Plank

Authors on Pith no claims yet

classification 💻 cs.CL

keywords distributionalmodelpriorsreasoningtasksaccuracychain-of-thoughteffect

read the original abstract

Reasoning-tuned LLMs utilizing long Chain-of-Thought (CoT) excel at single-answer tasks, yet their ability to model Human Label Variation--which requires capturing probabilistic ambiguity rather than resolving it--remains underexplored. We investigate this through systematic disentanglement experiments on distribution-based tasks, employing Cross-CoT experiments to isolate the effect of reasoning text from intrinsic model priors. We observe a distinct "decoupled mechanism": while CoT improves distributional alignment, final accuracy is dictated by CoT content (99% variance contribution), whereas distributional ranking is governed by model priors (over 80%). Step-wise analysis further shows that while CoT's influence on accuracy grows monotonically during the reasoning process, distributional structure is largely determined by LLM's intrinsic priors. These findings suggest that long CoT serves as a decisive LLM decision-maker for the top option but fails to function as a granular distribution calibrator for ambiguous tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Annotation Entropy Predicts Per-Example Learning Dynamics in LoRA Fine-Tuning
cs.LG 2026-03 unverdicted novelty 7.0

Annotation entropy from contested labels predicts increasing loss during LoRA fine-tuning on NLI tasks, unlike full fine-tuning.