Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
arXiv preprint arXiv:2506.09890 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.
citing papers explorer
-
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
-
Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance
Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.