arXiv , url =:2211.12312 , primaryclass =

Interpreting neural networks through the polytope lens , author= · arXiv 2211.12312

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LLM tasks are supported by multiple distinct circuits rather than unique mechanisms, demonstrated via Overlap-Aware Sheaf Repulsion and the Distributive Dense Circuit Hypothesis.

Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

Prediction agreement between open and closed LLMs substantially overstates agreement on attributions and causal reasons.

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

citing papers explorer

Showing 3 of 3 citing papers.

All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs cs.CL · 2026-05-12 · unverdicted · none · ref 86
LLM tasks are supported by multiple distinct circuits rather than unique mechanisms, demonstrated via Overlap-Aware Sheaf Repulsion and the Distributive Dense Circuit Hypothesis.
Surrogate Fidelity: When Can Open LLMs Explain Closed Ones? cs.LG · 2026-06-30 · unverdicted · none · ref 57
Prediction agreement between open and closed LLMs substantially overstates agreement on attributions and causal reasons.
Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 225
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

arXiv , url =:2211.12312 , primaryclass =

fields

years

verdicts

representative citing papers

citing papers explorer