In Findings of the Association for Computational Linguistics: ACL 2024, pages 16063– 16077

· 2024 · arXiv 2505.12189

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

cs.CL · 2026-04-29 · unverdicted · novelty 6.0

LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.

Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.

FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.

ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs

cs.CL · 2026-03-03 · unverdicted · novelty 3.0

Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.

SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance

cs.CL · 2026-06-08 · unverdicted · novelty 2.0

SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models cs.CL · 2026-04-29 · unverdicted · none · ref 38
LLMs prioritize task-appropriate reasoning over conflicting instructions, but reasoning types are linearly encoded in middle-to-late layers, allowing activation steering to raise instruction compliance by up to 29%.
Decodable but Not Corrected by Fixed Residual-Stream Linear Steering: Evidence from Medical LLM Failure Regimes cs.AI · 2026-05-07 · unverdicted · none · ref 65
Overthinking in medical QA is linearly decodable at 71.6% accuracy yet fixed residual-stream steering yields no correction across 29 configurations, while enabling selective abstention with AUROC 0.610.
FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction cs.CL · 2026-04-20 · unverdicted · none · ref 11
A neuro-symbolic system using LLM disagreement to trigger Z3 formal verification achieves 94.3% accuracy and a combined score of 41.88 on syllogistic validity prediction, improving on the pure ensemble by reducing content effects.
ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs cs.CL · 2026-03-03 · unverdicted · none · ref 3
Normalization plus deterministic parsing reduces content effects in LLM syllogistic reasoning and delivers top-5 performance on a multilingual SemEval benchmark.
SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance cs.CL · 2026-06-08 · unverdicted · none · ref 18
SEF-CLGC with SLMs trained on natural and symbolic languages achieves 27.80% content score while lowering content bias on SemEval-2026 Task 11 Subtask 1.

In Findings of the Association for Computational Linguistics: ACL 2024, pages 16063– 16077

fields

years

verdicts

representative citing papers

citing papers explorer