LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.
In Findings of the Association for Computational Linguistics: ACL 2024, pages 16063– 16077
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.
citing papers explorer
-
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models
LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.
-
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
-
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
-
ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
-
SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.