A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
[RPG+21] Ron Ross, Victoria Pillitteri, Richard Graubart, Deborah Bodeau, and Rosalie Mcquaid
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLM approaches ExArch and ArTEMiS reach F1 scores of 0.86 and 0.81 for architecture entity recognition and traceability, matching or approaching baselines that require manual models.
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
T-SHAP stabilizes SHAP attributions temporally for LSTM fall detection, achieving 94.3% accuracy and improved faithfulness on NTU RGB+D dataset.
A zero-sum game model with algebraic dependency filtering selects budget-constrained security control sets from standardized catalogues and is demonstrated on a fictional military system using ITSG-33.
citing papers explorer
-
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
-
Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition
LLM approaches ExArch and ArTEMiS reach F1 scores of 0.86 and 0.81 for architecture entity recognition and traceability, matching or approaching baselines that require manual models.
-
User Reviews as a Source for Usability Requirements: A Precursor Study on Using Large Language Models
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
-
Explainable Fall Detection for Elderly Monitoring via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition
T-SHAP stabilizes SHAP attributions temporally for LSTM fall detection, achieving 94.3% accuracy and improved faithfulness on NTU RGB+D dataset.
-
A Scalable Game-Theoretic Approach for Selecting Security Controls from Standardized Catalogues
A zero-sum game model with algebraic dependency filtering selects budget-constrained security control sets from standardized catalogues and is demonstrated on a fictional military system using ITSG-33.