Efficient Estimation of Kernel Surrogate Models for Task Attribution
Pith reviewed 2026-05-16 07:55 UTC · model grok-4.3
The pith
Kernel surrogate models capture nonlinear task interactions for more accurate attribution in large AI training
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Kernel surrogate models represent second-order task interactions more effectively than linear surrogates within a unified task-weighting framework. A gradient-based estimation procedure leverages first-order approximations of pretrained models to learn these surrogates without repeated retraining, achieving less than 2% relative error. Experiments in mathematical reasoning with transformers, in-context learning, and multi-objective reinforcement learning show 25% higher correlation with leave-one-out ground truth than linear surrogates or influence-function baselines, and 40% better performance in downstream data selection.
What carries the argument
Kernel surrogate models that capture nonlinear second-order interactions between training tasks, estimated via gradient-based optimization on first-order approximations of pretrained models.
If this is right
- Kernel surrogates achieve 25% higher correlation with leave-one-out ground truth than linear surrogates and influence-function baselines.
- They enable 40% improvement in downstream data selection across the tested settings.
- The first-order approximation suffices for accurate estimates with less than 2% relative error.
- The approach applies to transformers for mathematical reasoning, in-context learning, and multi-objective reinforcement learning.
Where Pith is reading between the lines
- Similar kernel surrogates could be tested on vision or multimodal models to attribute contributions from image or video data sources.
- The method might reduce reliance on influence-function approximations in other attribution settings such as feature selection.
- Dynamic task reweighting during continual learning becomes feasible if the approximation error remains low at larger scales.
- Combining the kernel estimation with quantization or pruning could further lower the cost of attribution in resource-constrained environments.
Load-bearing premise
The first-order approximation of pretrained models is accurate enough to produce kernel surrogate estimates with less than 2% relative error without repeated retraining.
What would settle it
Perform full leave-one-out retraining on a new task collection and measure whether the kernel surrogate correlation with ground truth falls below that of linear surrogates or whether the relative approximation error exceeds 2%.
Figures
read the original abstract
Modern AI agents such as large language models are trained on diverse tasks -- translation, code generation, mathematical reasoning, and text prediction -- simultaneously. A key question is how to quantify the influence of each individual training task on performance on a target task, a problem we refer to as task attribution. The direct approach, leave-one-out retraining, measures the effect of removing each task, but is computationally infeasible at scale. An alternative approach that builds surrogate models to predict the performance on a target task for any subset of training tasks has emerged in the recent literature. Prior work focuses on linear surrogate models, which capture first-order relationships but miss nonlinear interactions such as XOR-type effects. In this paper, we first consider a unified task-weighting framework for analyzing task-attribution methods and establish a new connection between linear surrogate models and influence functions via a second-order analysis. Then, we introduce kernel surrogate models, which more effectively represent second-order task interactions. To efficiently learn the kernel surrogate, we develop a gradient-based estimation procedure that leverages a first-order approximation of pretrained models; empirically, this yields accurate surrogate estimates with less than $2\%$ relative error without repeated retraining. Experiments across multiple settings -- including mathematical reasoning in transformers, in-context learning, and multi-objective reinforcement learning -- demonstrate the effectiveness of kernel surrogate models. They achieve a $25\%$ higher correlation with the leave-one-out ground truth than linear surrogates and influence-function baselines, enabling more accurate and scalable task attribution. When used for downstream data selection, kernel surrogate models further yield a $40\%$ improvement in the aforementioned settings.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes kernel surrogate models for task attribution to capture nonlinear (e.g., XOR-type) interactions among training tasks, in contrast to prior linear surrogates. It first unifies task-weighting methods and links linear surrogates to influence functions via second-order analysis, then introduces kernels and a gradient-based estimation procedure relying on a first-order approximation of pretrained models to avoid repeated retraining. Experiments on mathematical reasoning in transformers, in-context learning, and multi-objective RL report 25% higher correlation with leave-one-out ground truth than linear or influence-function baselines, plus 40% gains in downstream data selection.
Significance. If the first-order approximation reliably preserves the second-order effects the kernels are intended to model, the work supplies a practical, scalable improvement over linear task-attribution methods with clear downstream utility for data selection. The empirical correlation lifts across three distinct settings are a concrete strength; however, the absence of explicit error bounds or ablations on the truncation error makes it difficult to attribute the reported gains specifically to the kernel rather than to uncontrolled approximation artifacts.
major comments (1)
- [Abstract] Abstract (gradient-based estimation procedure): the central efficiency claim rests on a first-order approximation of pretrained models when fitting the kernel surrogate, yet the kernel is explicitly motivated by its ability to represent second-order interactions. No derivation, remainder bound, or ablation isolating the contribution of discarded higher-order terms is supplied; if those terms are non-negligible, the 25% correlation advantage cannot be confidently ascribed to the kernel structure itself.
minor comments (2)
- The abstract reports aggregate correlation and improvement percentages without stating the number of independent runs, standard errors, or the precise definition of the leave-one-out ground truth used for comparison.
- Notation for the kernel surrogate (e.g., the precise form of the kernel and how task subsets are encoded) is introduced only at a high level; a short explicit definition would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their detailed review and valuable comments on our manuscript. We address the major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract (gradient-based estimation procedure): the central efficiency claim rests on a first-order approximation of pretrained models when fitting the kernel surrogate, yet the kernel is explicitly motivated by its ability to represent second-order interactions. No derivation, remainder bound, or ablation isolating the contribution of discarded higher-order terms is supplied; if those terms are non-negligible, the 25% correlation advantage cannot be confidently ascribed to the kernel structure itself.
Authors: We thank the referee for pointing out this potential inconsistency. The first-order approximation is used solely to efficiently compute the necessary gradients for fitting the surrogate model without requiring multiple retrainings of the large pretrained model. This approximation is applied to the performance function of the pretrained model. The kernel surrogate, however, operates on the space of task subsets and is capable of capturing nonlinear interactions through its kernel function, which can model second-order and higher effects in the attribution weights. The connection to influence functions is established via second-order analysis for the linear case, and the kernel extends this. While we do not provide a theoretical remainder bound in the current version, we report empirical evidence of the approximation's accuracy with less than 2% relative error across experiments. To strengthen the manuscript, we will include a derivation of the first-order approximation error and an ablation study that compares the kernel surrogate against a linear surrogate under identical approximation conditions to isolate the contribution of the kernel structure. revision: yes
Circularity Check
No significant circularity: independent LOO validation and empirical error measurement keep derivation self-contained
full rationale
The paper fits kernel surrogate models to predict target-task performance from training-task subsets and directly evaluates correlation against leave-one-out retraining ground truth, which is computed independently of the surrogate fit. The gradient-based procedure employs a first-order approximation solely for computational efficiency; its accuracy is asserted via an empirical <2% relative-error measurement rather than by definitional reduction or self-citation. No load-bearing step equates a claimed prediction to its own fitted inputs, invokes a self-citation uniqueness theorem, or renames a known result. The reported 25% correlation lift and 40% downstream improvement are therefore measured against external benchmarks and do not collapse to the model's own parameters by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption First-order approximation of pretrained models yields accurate surrogate estimates
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce kernel surrogate models... develop a gradient-based estimation procedure that leverages a first-order approximation of pretrained models
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
linear surrogate models... approximately equal to the influence functions, up to third-order expansion errors (Proposition 3.1)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, and Eric Xing. What is your data worth to gpt? llm-scale data valuation with influence functions.arXiv preprint arXiv:2405.13954,
-
[2]
Optimizing ml training with metagradient descent.arXiv preprint arXiv:2503.13751,
Logan Engstrom, Andrew Ilyas, Benjamin Chen, Axel Feldmann, William Moses, and Aleksander Madry. Optimizing ml training with metagradient descent.arXiv preprint arXiv:2503.13751,
-
[3]
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamil˙e Lukoˇsi¯ut˙e, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, and Samuel R. Bowman. Studying large lan- guage model generalization with influence functions.arXiv preprint arXiv:2308.03296,
-
[4]
Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, and Jiaqi W Ma. Grass: Scalable influ- ence function with sparse gradient compression.arXiv preprint arXiv:2505.18976,
-
[5]
Andrew Ilyas and Logan Engstrom
Andrew Ilyas and Logan Engstrom. Magic: Near-optimal data attribution for deep learning.arXiv preprint arXiv:2504.16430,
-
[6]
Extensions of lipshitz mapping into hilbert space
11 William B Johnson. Extensions of lipshitz mapping into hilbert space. InConference modern analysis and probability, 1984, pp. 189–206,
work page 1984
-
[7]
Scalable multitask learning using gradient- based estimation of task affinity
Dongyue Li, Aneesh Sharma, and Hongyang R Zhang. Scalable multitask learning using gradient- based estimation of task affinity. InProceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining, pp. 1542–1553, 2024a. Dongyue Li, Ziniu Zhang, Lu Wang, and Hongyang R. Zhang. Scalable fine-tuning from multiple data sources: A first-ord...
work page 2024
-
[8]
Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 11048–11064,
work page 2022
-
[9]
Siwei Wang, Yifei Shen, Haoran Sun, Shi Feng, Shang-Hua Teng, Li Dong, Yaru Hao, and Wei Chen. Benefits and pitfalls of reinforcement learning for language model planning: a theoretical perspective.arXiv preprint arXiv:2509.22613,
-
[10]
,(xn, yn)}be a dataset in- cludingnsamples drawn independently from an unknown data distribution
13 A COMPLETEPROOFS Derivation of influence functions.LetS={(x 1, y1),(x 2, y2), . . . ,(xn, yn)}be a dataset in- cludingnsamples drawn independently from an unknown data distribution. Letf W denote a model with parametersW∈R d. Let ˆL(fW )denote the empirical loss of the modelf W onS, averaged over thentraining data samples. The influence function (Koh &...
work page 2017
-
[11]
≤δ. Provided that the random projection dimensionksatisfiesk=O logN ϵ2 , the training loss ofcW(S) is bounded away from the minimum training loss for anyS⊆ {1,2, . . . , n}as ˆL(fcW(S) )≤min W∈D ˆL(fW ) + 2δ+ 4GDϵ.(25) The proof is based on the Johnson-Lindenstrauss lemma (Johnson, 1984), which asserts that when k=O logN ϵ2 , for anyg i with∥g i∥ ≤Gand an...
work page 1984
-
[12]
We adapt it to attribute influence at the task level
offers an efficient algorithm for data attribution by linearizing the model and using random projections. We adapt it to attribute influence at the task level. The core idea is to represent each task by an average of its constituent samples’ projected gradients. Specifically, for each samplezin a task, a feature vector is computed from the gradient of a m...
work page 2022
-
[13]
Specifically, we varyλfrom10 −3 to1and γfrom10 −5 to10 −1. We find that our approach is relatively robust to changes in bothλandγ, exhibiting stable performance across the entire range. In our experiments, we setλ= 10 −1 andγ= 1/nas the default configuration, wherendenotes the number of tasks. Comparison of kernels.We use the CIFAR-10 dataset and the ResN...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.