Neural natural language inference models partially embed theories of lexical entailment and negation

Atticus Geiger, Kyle Richardson, Christopher Potts · 2020 · DOI 10.18653/v1/2020.blackboxnlp-1.16

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

cs.AI · 2026-05-04 · unverdicted · novelty 7.0

A four-step recipe partitions the input space using interchange intervention behavior to diagnose where causal abstractions hold and to guide improvements, demonstrated by recovering a full hypothesis from scratch in a toy logic task.

Linear Representations of Sentiment in Large Language Models

cs.LG · 2023-10-23 · unverdicted · novelty 6.0

Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

cs.AI · 2023-10-10 · unverdicted · novelty 6.0

At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.

citing papers explorer

Showing 4 of 4 citing papers.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 181
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction cs.AI · 2026-05-04 · unverdicted · none · ref 6
A four-step recipe partitions the input space using interchange intervention behavior to diagnose where causal abstractions hold and to guide improvements, demonstrated by recovering a full hypothesis from scratch in a toy logic task.
Linear Representations of Sentiment in Large Language Models cs.LG · 2023-10-23 · unverdicted · none · ref 87
Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets cs.AI · 2023-10-10 · unverdicted · none · ref 72
At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.

Neural natural language inference models partially embed theories of lexical entailment and negation

fields

years

verdicts

representative citing papers

citing papers explorer