Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
Mitigating content effects on reasoning in language models through fine-grained activation steering
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
citing papers explorer
-
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
-
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
-
ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
- Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models