Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.
arXiv preprint arXiv:2204.10965 , year=
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
BabyMind improves forced-choice word grounding accuracy by 2.6 points over CVCL on SAYCam-S by using offline object masks, short-term tracking into object files, and prototype-space multiple-instance contrastive learning.
Introduces synthetic benchmarks for concept bottleneck models that control data modality, concept choice, annotation quality, and completeness to evaluate performance in decision support and automation.
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Natural language descriptions generated via a closed-loop pipeline with digital twins capture the selectivity of most neurons in macaque V1 and V4, with synthesized images driving 96% of V4 neurons into the top or bottom 5% of natural-image response distributions.
HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
SALSA aligns social features and adds future-risk signals in VLA models to cut near-collisions by 86.4% and raise social accuracy from 53% to 93% on SCAND and real robots.
citing papers explorer
No citing papers match the current filters.