What isomorphism or what approximation of a neural network (or parts of it) is the best way to express it for the purposes of interpreting it? b

How should we decompose networks into more interpretable constituent parts? a

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2025-01-27 · unverdicted · novelty 3.0

A review paper that organizes conceptual, practical, and socio-technical open problems in mechanistic interpretability.

Showing 1 of 1 citing paper.

Open Problems in Mechanistic Interpretability cs.LG · 2025-01-27 · unverdicted · none · ref 17
A review paper that organizes conceptual, practical, and socio-technical open problems in mechanistic interpretability.