LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.
In Findings of the Association for Computational Linguistics: ACL 2024, pages 16063– 16077
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.
citing papers explorer
No citing papers match the current filters.