ActivationReasoning: Logical Reasoning in Latent Activation Spaces

· 2025 · cs.LG · arXiv 2510.18184

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language models (LLMs) excel at generating fluent text, but their internal reasoning remains opaque and difficult to control. Sparse autoencoders (SAEs) make hidden activations more interpretable by exposing latent features that often align with human concepts. Yet, these features are fragile and passive, offering no mechanism for systematic reasoning or model control. To address this, we introduce ActivationReasoning (AR), a framework that embeds explicit logical reasoning into the latent space of LLMs. It proceeds in three stages: (1) Finding latent representations, first latent concept representations are identified (e.g., via SAEs) and organized into a dictionary; (2) Activating propositions, at inference time AR detects activating concepts and maps them to logical propositions; and (3)Logical reasoning, applying logical rules over these propositions to infer higher-order structures, compose new concepts, and steer model behavior. We evaluate AR on multi-hop reasoning (PrOntoQA), abstraction and robustness to indirect concept cues (Rail2Country), reasoning over natural and diverse language (ProverQA), and context-sensitive safety (BeaverTails). Across all tasks, AR scales robustly with reasoning complexity, generalizes to abstract and context-sensitive tasks, and transfers across model backbones. These results demonstrate that grounding logical structure in latent activations not only improves transparency but also enables structured reasoning, reliable control, and alignment with desired behaviors, providing a path toward more reliable and auditable AI.

representative citing papers

Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transcoders decompose MLP layers in Gemma 3-4B-IT to trace visual grounding more effectively than SAEs and predict hallucinations from circuit graph features at AUC 0.68.

citing papers explorer

Showing 1 of 1 citing paper.

Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models cs.LG · 2026-05-21 · unverdicted · none · ref 11 · internal anchor
Transcoders decompose MLP layers in Gemma 3-4B-IT to trace visual grounding more effectively than SAEs and predict hallucinations from circuit graph features at AUC 0.68.

ActivationReasoning: Logical Reasoning in Latent Activation Spaces

fields

years

verdicts

representative citing papers

citing papers explorer