Mitigating content effects on reasoning in language models through fine-grained activation steering

Exploring reasoning biases in large language models through syllogism: Insights from the neubaroco dataset · 2024 · arXiv 2505.12189

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.

FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.

ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs

cs.CL · 2026-03-03 · unverdicted · novelty 3.0

Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

cs.CL · 2026-04-29

citing papers explorer

Showing 4 of 4 citing papers.

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes cs.AI · 2026-05-07 · unverdicted · none · ref 65
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction cs.CL · 2026-04-20 · unverdicted · none · ref 11
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs cs.CL · 2026-03-03 · unverdicted · none · ref 3
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models cs.CL · 2026-04-29 · unreviewed · ref 38

Mitigating content effects on reasoning in language models through fine-grained activation steering

fields

years

verdicts

representative citing papers

citing papers explorer