TRAK: Attributing Model Behavior at Scale

Aleksander Madry; Andrew Ilyas; Guillaume Leclerc; Kristian Georgiev; Sung Min Park

arxiv: 2303.14186 · v2 · pith:LVVOJVQBnew · submitted 2023-03-24 · 📊 stat.ML · cs.LG

TRAK: Attributing Model Behavior at Scale

Sung Min Park , Kristian Georgiev , Andrew Ilyas , Guillaume Leclerc , Aleksander Madry This is my paper

classification 📊 stat.ML cs.LG

keywords modelstrakattributiondatamethodsmodeltrainingwork

0 comments

read the original abstract

The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On the Accuracy of Newton Step and Influence Function Data Attributions
cs.LG 2025-12 unverdicted novelty 7.0

New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.
DRIFT: Refining Instruction Data via On-Policy Data Attribution
cs.LG 2026-06 unverdicted novelty 6.0

DRIFT applies on-policy influence functions with signed weighting and debiasing to attribute and refine SFT data, raising performance on 7B instruction and reasoning models over prior curation methods.
Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing
cs.LG 2026-06 unverdicted novelty 6.0

Proposes memorization-guided two-stage scoring to select debiased training subsets, enabling ERM models to achieve better performance than SOTA debiasing techniques using only 10% of data.
idSCD: Identifying Training Datasets through Semantic Correlation Descriptors
cs.LG 2026-05 unverdicted novelty 6.0

idSCD uses semantic correlation descriptors to perform dataset membership inference by comparing learned semantic structures, outperforming baselines in NLI, emotion, and medical text experiments.
Refining Multidimensional Video Reward Models via Disentangled Influence Functions
cs.LG 2026-05 unverdicted novelty 6.0

Introduces dimension-disentangled influence estimation to prune or reweight training samples for MVRMs, outperforming global scalar filtering in alignment with ground truth.
Small edits, large models: How Wikipedia advocacy shapes LLM values
cs.CL 2026-04 unverdicted novelty 6.0

Wikipedia edits by animal welfare advocates measurably influence LLM outputs on animal welfare topics, shown via retrieval and gradient attribution plus fine-tuning experiments.
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
cs.LG 2026-04 unverdicted novelty 6.0

RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence ana...
A Human-Centric Framework for Data Attribution in Large Language Models
cs.CY 2026-02 unverdicted novelty 6.0

Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
cs.LG 2023-10 conditional novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Watermarking for Proprietary Dataset Protection
cs.LG 2026-07 unverdicted novelty 5.0

Watermark-based dataset inference achieves membership detection performance comparable to loss-based methods when subset exposure is high, under alternate assumptions.
Rigorous Interpretation Is a Form of Evaluation
cs.CY 2026-05 unverdicted novelty 5.0

Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.
High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

This review synthesizes representative advances in high-dimensional statistics, highlights common themes and open problems, and points to key entry works.
High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.
There Will Be a Scientific Theory of Deep Learning
stat.ML 2026-04 unverdicted novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...