Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
and Tsvetkov, Yulia
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
Pretrained and fine-tuned Qwen3 embeddings exhibit measurable alignment with an expert symptom matrix via RSA on Reddit mental-health data, strengthened by fine-tuning at fine-grained levels and larger scale, with residual alignment after VAD/LIWC/topic controls.
citing papers explorer
-
Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
-
Rigorous Interpretation Is a Form of Evaluation
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
-
Do LLM Embedding Spaces Recover Expert Structure?
Pretrained and fine-tuned Qwen3 embeddings exhibit measurable alignment with an expert symptom matrix via RSA on Reddit mental-health data, strengthened by fine-tuning at fine-grained levels and larger scale, with residual alignment after VAD/LIWC/topic controls.