arXiv preprint arXiv:2204.10965 , year=

Oikarinen, Tuomas, Weng, Tsui-Wei , year = · 2022 · arXiv 2204.10965

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

Cross-Layer Transcoders decompose ViT activations into sparse, depth-aware layer contributions that maintain zero-shot accuracy and enable faithful attribution of the final representation.

Objects Before Words: Object-First Inductive Biases for Grounding Language in Child-View Video

cs.CV · 2026-06-11 · unverdicted · novelty 6.0

BabyMind improves forced-choice word grounding accuracy by 2.6 points over CVCL on SAYCam-S by using offline object masks, short-term tracking into object files, and prototype-space multiple-instance contrastive learning.

Measuring What Matters: Synthetic Benchmarks for Concept Bottleneck Models

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

Introduces synthetic benchmarks for concept bottleneck models that control data modality, concept choice, annotation quality, and completeness to evaluate performance in decision support and automation.

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

Letting the neural code speak: Automated characterization of monkey visual neurons through human language

q-bio.NC · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Natural language descriptions generated via a closed-loop pipeline with digital twins capture the selectivity of most neurons in macaque V1 and V4, with synthesized images driving 96% of V4 neurons into the top or bottom 5% of natural-image response distributions.

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

cs.CV · 2026-04-02 · unverdicted · novelty 6.0

HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.

Act on What You See: Unlocking Safe Social Navigation in Vision-Language-Action Models

cs.RO · 2026-06-09 · unverdicted · novelty 5.0

SALSA aligns social features and adds future-risk signals in VLA models to cut near-collisions by 86.4% and raise social accuracy from 53% to 93% on SCAND and real robots.

Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

cs.CY · 2026-02-27

Beyond Interpretability: When, Why, and How Sparse Autoencoders Enable Label-Free Visual Steering

cs.CV · 2025-06-02

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2204.10965 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer