pith. sign in

arxiv: 2303.14186 · v2 · pith:LVVOJVQBnew · submitted 2023-03-24 · 📊 stat.ML · cs.LG

TRAK: Attributing Model Behavior at Scale

classification 📊 stat.ML cs.LG
keywords modelstrakattributiondatamethodsmodeltrainingwork
0
0 comments X
read the original abstract

The goal of data attribution is to trace model predictions back to training data. Despite a long line of work towards this goal, existing approaches to data attribution tend to force users to choose between computational tractability and efficacy. That is, computationally tractable methods can struggle with accurately attributing model predictions in non-convex settings (e.g., in the context of deep neural networks), while methods that are effective in such regimes require training thousands of models, which makes them impractical for large models or datasets. In this work, we introduce TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differentiable models. In particular, by leveraging only a handful of trained models, TRAK can match the performance of attribution methods that require training thousands of models. We demonstrate the utility of TRAK across various modalities and scales: image classifiers trained on ImageNet, vision-language models (CLIP), and language models (BERT and mT5). We provide code for using TRAK (and reproducing our work) at https://github.com/MadryLab/trak .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. On the Accuracy of Newton Step and Influence Function Data Attributions

    cs.LG 2025-12 unverdicted novelty 7.0

    New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.

  2. DRIFT: Refining Instruction Data via On-Policy Data Attribution

    cs.LG 2026-06 unverdicted novelty 6.0

    DRIFT applies on-policy influence functions with signed weighting and debiasing to attribute and refine SFT data, raising performance on 7B instruction and reasoning models over prior curation methods.

  3. Mitigating Spurious Correlations with Memorization-Guided Dataset De-Biasing

    cs.LG 2026-06 unverdicted novelty 6.0

    Proposes memorization-guided two-stage scoring to select debiased training subsets, enabling ERM models to achieve better performance than SOTA debiasing techniques using only 10% of data.

  4. idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

    cs.LG 2026-05 unverdicted novelty 6.0

    idSCD uses semantic correlation descriptors to perform dataset membership inference by comparing learned semantic structures, outperforming baselines in NLI, emotion, and medical text experiments.

  5. Refining Multidimensional Video Reward Models via Disentangled Influence Functions

    cs.LG 2026-05 unverdicted novelty 6.0

    Introduces dimension-disentangled influence estimation to prune or reweight training samples for MVRMs, outperforming global scalar filtering in alignment with ground truth.

  6. Small edits, large models: How Wikipedia advocacy shapes LLM values

    cs.CL 2026-04 unverdicted novelty 6.0

    Wikipedia edits by animal welfare advocates measurably influence LLM outputs on animal welfare topics, shown via retrieval and gradient attribution plus fine-tuning experiments.

  7. Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation

    cs.LG 2026-04 unverdicted novelty 6.0

    RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence ana...

  8. A Human-Centric Framework for Data Attribution in Large Language Models

    cs.CY 2026-02 unverdicted novelty 6.0

    Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.

  9. SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

    cs.LG 2023-10 conditional novelty 6.0

    SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.

  10. Watermarking for Proprietary Dataset Protection

    cs.LG 2026-07 unverdicted novelty 5.0

    Watermark-based dataset inference achieves membership detection performance comparable to loss-based methods when subset exposure is high, under alternate assumptions.

  11. Rigorous Interpretation Is a Form of Evaluation

    cs.CY 2026-05 unverdicted novelty 5.0

    Rigorous interpretability can function as a principled form of model evaluation if its claims are falsifiable, reproducible, and predictive.

  12. High-Dimensional Statistics: Reflections on Progress and Open Problems

    math.ST 2026-05 unverdicted novelty 2.0

    This review synthesizes representative advances in high-dimensional statistics, highlights common themes and open problems, and points to key entry works.

  13. High-Dimensional Statistics: Reflections on Progress and Open Problems

    math.ST 2026-05 unverdicted novelty 2.0

    A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.

  14. There Will Be a Scientific Theory of Deep Learning

    stat.ML 2026-04 unverdicted novelty 2.0

    A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universa...