TRAK: Attributing Model Behavior at Scale
read the original abstract
The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .
This paper has not been read by Pith yet.
Forward citations
Cited by 14 Pith papers
-
On the Accuracy of Newton Step and Influence Function Data Attributions
New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.
-
DRIFT: Refining Instruction Data via On-Policy Data Attribution
DRIFT applies on-policy influence functions with signed weighting and debiasing to attribute and refine SFT data, raising performance on 7B instruction and reasoning models over prior curation methods.
-
Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing
Proposes memorization-guided two-stage scoring to select debiased training subsets, enabling ERM models to achieve better performance than SOTA debiasing techniques using only 10% of data.
-
idSCD: Identifying Training Datasets through Semantic Correlation Descriptors
idSCD uses semantic correlation descriptors to perform dataset membership inference by comparing learned semantic structures, outperforming baselines in NLI, emotion, and medical text experiments.
-
Refining Multidimensional Video Reward Models via Disentangled Influence Functions
Introduces dimension-disentangled influence estimation to prune or reweight training samples for MVRMs, outperforming global scalar filtering in alignment with ground truth.
-
Small edits, large models: How Wikipedia advocacy shapes LLM values
Wikipedia edits by animal welfare advocates measurably influence LLM outputs on animal welfare topics, shown via retrieval and gradient attribution plus fine-tuning experiments.
-
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence ana...
-
A Human-Centric Framework for Data Attribution in Large Language Models
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
-
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
-
Watermarking for Proprietary Dataset Protection
Watermark-based dataset inference achieves membership detection performance comparable to loss-based methods when subset exposure is high, under alternate assumptions.
-
Rigorous Interpretation Is a Form of Evaluation
Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
-
High-Dimensional Statistics: Reflections on Progress and Open Problems
This review synthesizes representative advances in high-dimensional statistics, highlights common themes and open problems, and points to key entry works.
-
High-Dimensional Statistics: Reflections on Progress and Open Problems
A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.
-
There Will Be a Scientific Theory of Deep Learning
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.