Interaction-Aware Influence Functions for Group Attribution
Pith reviewed 2026-05-20 20:38 UTC · model grok-4.3
The pith
Adding a pairwise interaction term to influence functions improves estimates of how groups of training examples jointly affect model behavior.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By expanding the target function to second order around the trained parameters, we obtain an estimator that augments the standard sum with a pairwise interaction term that captures the alignment between two examples' effects on the target.
What carries the argument
The second-order Taylor expansion of the target function around the trained model parameters, which supplies the pairwise interaction term that augments the usual first-order sum.
If this is right
- The estimator tracks leave-group-out retraining more closely than first-order influence on six dataset-model pairs spanning logistic regression, MLPs, and ResNet-9.
- Greedy selection guided by the interaction-aware scores beats prior influence-based and representation-similarity baselines on five of seven downstream tasks for instruction tuning of Llama-3.1-8B.
- The pairwise term distinguishes redundant examples from complementary ones, allowing group attributions that simple summation cannot provide.
- The same estimator remains useful even in regimes where standard influence-based selection performs worse than random selection.
Where Pith is reading between the lines
- Higher-order terms beyond pairwise interactions could be derived similarly if larger groups are the focus of attribution.
- The alignment captured by the interaction term may help explain why certain data subsets produce synergistic gains when used together for fine-tuning.
- The approach could be tested on other attribution problems such as feature attribution or neuron pruning where joint effects are also ignored by first-order methods.
Load-bearing premise
The second-order Taylor expansion around the trained parameters remains accurate enough for the group sizes and model scales considered.
What would settle it
Direct leave-group-out retraining experiments on a new model scale or larger group sizes that show the interaction-aware estimates diverging from the true change in the target would falsify the estimator's accuracy claim.
Figures
read the original abstract
Influence functions approximate how removing a training example changes a quantity of interest, called the target function, such as a held-out loss. To estimate the influence of a group of examples, the standard practice is to sum the individual influences of its members. However, this sum does not capture how examples jointly affect the target: a pair of examples may be redundant or complementary, but the sum cannot distinguish these cases. We propose an interaction-aware influence function that characterizes how interactions between examples influence the target. By expanding the target to second order around the trained parameters, we obtain an estimator that augments the standard sum with a pairwise interaction term that captures the alignment between two examples' effects on the target. We empirically evaluate our estimator in two settings. First, on six dataset-model pairs spanning logistic regression, MLPs, and ResNet-9, our estimator tracks leave-group-out retraining substantially better than first-order influence across all settings. Second, when used as a greedy selection rule for instruction-tuning data on Llama-3.1-8B, it beats prior influence-based and representation-similarity baselines on five of seven downstream tasks, in a regime where standard influence-based selection underperforms random selection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an interaction-aware influence function for group attribution by augmenting standard first-order influence functions with a pairwise interaction term obtained via second-order Taylor expansion of the target function around trained parameters. This term captures alignment between examples' effects. Empirically, the estimator tracks leave-group-out retraining better than first-order baselines on six dataset-model pairs (logistic regression, MLPs, ResNet-9). When used for greedy instruction-tuning data selection on Llama-3.1-8B, it outperforms prior influence-based and representation-similarity baselines on five of seven downstream tasks.
Significance. If the second-order approximation remains accurate at the scales considered, the method offers a computationally tractable way to account for example interactions in influence estimation, which could improve data selection and attribution tasks where groups exhibit redundancy or complementarity. The small-model validation against leave-group-out retraining provides direct evidence of improved fidelity; the large-model results suggest practical utility but depend on untested transfer of the approximation.
major comments (3)
- [§3.2, Eq. (7)] §3.2, Eq. (7): The second-order interaction term is derived from the Taylor expansion, but no remainder-term bound or analysis of approximation error is provided for the group sizes and model scales in the Llama-3.1-8B experiments; this is load-bearing because the skeptic correctly notes that if higher-order terms or Hessian-vector approximation errors become comparable to the interaction term, the downstream gains cannot be confidently attributed to interaction awareness.
- [§5.3] §5.3: Direct validation against leave-group-out retraining is performed only on logistic regression, MLPs, and ResNet-9; the Llama-3.1-8B greedy selection results lack any analogous direct check (as retraining is infeasible) and instead rely on downstream task performance, which could be driven by factors other than the claimed interaction term.
- [Table 1 and §5.1] Table 1 and §5.1: While consistent improvement over first-order baselines is reported across six dataset-model pairs, no error bars, statistical significance tests, or sensitivity analysis to the Hessian approximation method are included, weakening the claim that the interaction term is the source of the improvement.
minor comments (2)
- The notation for the target function and influence quantities is introduced without a consolidated table of symbols, making it harder to follow the transition from first-order to interaction-aware estimators.
- Figure 2 caption does not specify the exact group sizes used in the leave-group-out experiments, which is relevant for assessing the regime where the second-order term is expected to matter.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment point by point below, indicating whether revisions have been made.
read point-by-point responses
-
Referee: [§3.2, Eq. (7)] §3.2, Eq. (7): The second-order interaction term is derived from the Taylor expansion, but no remainder-term bound or analysis of approximation error is provided for the group sizes and model scales in the Llama-3.1-8B experiments; this is load-bearing because the skeptic correctly notes that if higher-order terms or Hessian-vector approximation errors become comparable to the interaction term, the downstream gains cannot be confidently attributed to interaction awareness.
Authors: We agree that a formal remainder-term bound would strengthen the claims. However, obtaining a tight, non-vacuous bound on higher-order terms for high-dimensional models and non-trivial group sizes without strong additional assumptions is technically challenging and beyond the scope of the current work. In the revision we add an expanded discussion of approximation quality, drawing on the small-model leave-group-out results to empirically characterize when the second-order term remains dominant, and we explicitly flag the lack of a general bound as a limitation for the Llama-scale experiments. revision: partial
-
Referee: [§5.3] §5.3: Direct validation against leave-group-out retraining is performed only on logistic regression, MLPs, and ResNet-9; the Llama-3.1-8B greedy selection results lack any analogous direct check (as retraining is infeasible) and instead rely on downstream task performance, which could be driven by factors other than the claimed interaction term.
Authors: We acknowledge the limitation. Direct leave-group-out validation is computationally infeasible at the Llama-3.1-8B scale. In the revised manuscript we have expanded §5.3 to state this caveat explicitly, to clarify that downstream gains constitute indirect evidence, and to note that the pattern of improvement is consistent with the small-model regime where direct validation against retraining was possible. revision: yes
-
Referee: [Table 1 and §5.1] Table 1 and §5.1: While consistent improvement over first-order baselines is reported across six dataset-model pairs, no error bars, statistical significance tests, or sensitivity analysis to the Hessian approximation method are included, weakening the claim that the interaction term is the source of the improvement.
Authors: We thank the referee for this observation. The revised manuscript updates Table 1 with error bars computed over multiple random seeds, adds paired statistical significance tests between our estimator and the first-order baseline, and includes a new sensitivity analysis in §5.1 that compares results under exact versus approximate Hessian-vector products. revision: yes
Circularity Check
Derivation via second-order Taylor expansion is self-contained and does not reduce to inputs by construction
full rationale
The paper derives the interaction-aware estimator by performing a direct second-order Taylor expansion of the target function around the trained parameters, augmenting the standard first-order sum with an explicit pairwise interaction term whose coefficients are the mixed second derivatives of the loss. This is a standard calculus construction using the same loss, gradient, and Hessian primitives as classical influence functions, but extended rather than redefined. No equation reduces a prediction to a fitted quantity, no uniqueness theorem is imported via self-citation, and no ansatz is smuggled in. The empirical comparisons to leave-group-out retraining on small models serve as an external benchmark, keeping the central claim independent of its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The loss is twice differentiable in a neighborhood of the trained parameters.
Reference graph
Works this paper leans on
-
[1]
Neural networks for learnable and scalable influence estimation of instruction fine-tuning data
Ishika Agarwal and Dilek Hakkani-Tür. Neural networks for learnable and scalable influence estimation of instruction fine-tuning data. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[2]
Explanations for commonsenseqa: New dataset and models
Shourya Aggarwal, Divyanshu Mandowara, Vishwajeet Agrawal, Dinesh Khandelwal, Parag Singla, and Dinesh Garg. Explanations for commonsenseqa: New dataset and models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
work page 2021
-
[3]
Jordan T Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agar- wal. Deep batch active learning by diverse, uncertain gradient lower bounds.International Conference on Learning Representations, 2020
work page 2020
-
[4]
Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, and Roger B Grosse. If influence functions are the answer, then what is the question?Advances in Neural Information Processing Systems, 2022
work page 2022
-
[5]
On second-order group influence functions for black-box predictions
Samyadeep Basu, Xuchen You, and Soheil Feizi. On second-order group influence functions for black-box predictions. InInternational Conference on Machine Learning, 2020
work page 2020
-
[6]
Influence functions in deep learning are fragile
Samyadeep Basu, Philip Pope, and Soheil Feizi. Influence functions in deep learning are fragile. International Conference on Learning Representations, 2021
work page 2021
-
[7]
Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need
Vighnesh Birodkar, Hossein Mobahi, and Samy Bengio. Semantic redundancies in image- classification datasets: The 10% you don’t need.arXiv preprint arXiv:1901.11409, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[8]
Piqa: Reasoning about physical commonsense in natural language
Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, and Yejin Choi. Piqa: Reasoning about physical commonsense in natural language. InProceedings of the AAAI Conference on Artificial Intelligence, 2020
work page 2020
-
[9]
Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney
Tyler A. Chang, Dheeraj Rajagopal, Tolga Bolukbasi, Lucas Dixon, and Ian Tenney. Scalable influence and fact tracing for large language model pretraining. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[10]
Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, et al. What is your data worth to gpt? llm-scale data valuation with influence functions.Advances in neural information processing systems, 2025
work page 2025
-
[11]
Batch active learning at scale.Advances in Neural Information Processing Systems, 2021
Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, and Sanjiv Kumar. Batch active learning at scale.Advances in Neural Information Processing Systems, 2021
work page 2021
-
[12]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge.arXiv preprint arXiv:1803.05457, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
Dawnbench: An end-to-end deep learning benchmark and competition.Training, 2017
Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. Dawnbench: An end-to-end deep learning benchmark and competition.Training, 2017
work page 2017
-
[15]
Support-vector networks.Machine learning, 1995
Corinna Cortes and Vladimir Vapnik. Support-vector networks.Machine learning, 1995
work page 1995
-
[16]
Qirun Dai, Dylan Zhang, Jiaqi W Ma, and Hao Peng. Improving influence-based instruction tuning data selection for balanced learning of diverse capabilities.Findings of the Association for Computational Linguistics, 2025. 10
work page 2025
- [17]
-
[18]
Dsdm: Model-aware dataset selection with datamodels
Logan Engstrom, Axel Feldmann, and Aleksander Madry. Dsdm: Model-aware dataset selection with datamodels. InInternational Conference on Machine Learning, 2024
work page 2024
-
[19]
Thomas George, César Laurent, Xavier Bouthillier, Nicolas Ballas, and Pascal Vincent. Fast approximate natural gradient descent in a kronecker factored eigenbasis.Advances in neural information processing systems, 2018
work page 2018
-
[20]
Data shapley: Equitable valuation of data for machine learning
Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning. InInternational conference on machine learning, 2019
work page 2019
-
[21]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, et al. Studying large language model generalization with influence functions.arXiv preprint arXiv:2308.03296, 2023
-
[23]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, 2016
work page 2016
-
[24]
Jaeseung Heo, Kyeongheung Yun, Seokwon Yoon, MoonJeong Park, Jungseul Ok, and Dong- woo Kim. Influence functions for edge edits in non-convex graph neural networks.Advances in Neural Information Processing Systems, 2025
work page 2025
-
[25]
Yuzheng Hu, Pingbang Hu, Han Zhao, et al. Most influential subset selection: Challenges, promises, and beyond.Advances in Neural Information Processing Systems, 2024
work page 2024
-
[26]
Jenny Y Huang, David R Burt, Yunyi Shen, Tin D Nguyen, and Tamara Broderick. Approx- imations to worst-case data dropping: unmasking failure modes.Transactions on Machine Learning Research, 2025
work page 2025
-
[27]
Hamish Ivison, Muru Zhang, Faeze Brahman, Pang Wei Koh, and Pradeep Dasigi. Large-Scale Data Selection for Instruction Tuning.arXiv preprint arXiv:2503.01807, 2025
-
[28]
Andreas Kirsch, Joost Van Amersfoort, and Yarin Gal. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning.Advances in neural information processing systems, 2019
work page 2019
-
[29]
Understanding black-box predictions via influence functions
Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InInternational conference on machine learning, 2017
work page 2017
-
[30]
Pang Wei W Koh, Kai-Siang Ang, Hubert Teo, and Percy S Liang. On the accuracy of influence functions for measuring group effects.Advances in neural information processing systems, 2019
work page 2019
-
[31]
Bayesian influence functions for hessian-free data attribution
Philipp Alexander Kreer, Wilson Wu, Maxwell Adam, Zach Furman, and Jesse Hoogland. Bayesian influence functions for hessian-free data attribution. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[32]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[33]
Yongchan Kwon, Eric Wu, Kevin Wu, and James Zou. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models.International Conference on Learning Representations, 2024
work page 2024
-
[34]
Gradient-based learning applied to document recognition.Proceedings of the IEEE, 2002
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition.Proceedings of the IEEE, 2002. 11
work page 2002
-
[35]
Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. Nv-embed: Improved techniques for training llms as generalist embedding models.International Conference on Learning Representations, 2025
work page 2025
-
[36]
Program induction by rationale generation: Learning to solve and explain algebraic word problems
Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blunsom. Program induction by rationale generation: Learning to solve and explain algebraic word problems. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
work page 2017
-
[37]
James Martens. New insights and perspectives on the natural gradient method.Journal of Machine Learning Research, 2020
work page 2020
-
[38]
Optimizing neural networks with kronecker-factored approx- imate curvature
James Martens and Roger Grosse. Optimizing neural networks with kronecker-factored approx- imate curvature. InInternational conference on machine learning, 2015
work page 2015
-
[39]
Coresets for data-efficient training of machine learning models
Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. InInternational Conference on Machine Learning, 2020
work page 2020
-
[40]
Bruno Kacper Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, and Richard E. Turner. Influence functions for scalable data attribution in diffusion models. In The Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[41]
Efficient data selection at scale via influence distillation
Mahdi Nikdan, Vincent Cohen-Addad, Dan Alistarh, and Vahab Mirrokni. Efficient data selection at scale via influence distillation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[42]
Xingyuan Pan, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, and Shanbo Cheng. G- dig: Towards gradient-based diverse and high-quality instruction data selection for machine translation. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
work page 2024
-
[43]
Trak: Attributing model behavior at scale
Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, and Aleksander Madry. Trak: Attributing model behavior at scale. InInternational Conference on Machine Learning, 2023
work page 2023
-
[44]
Garima Pruthi, Frederick Liu, Satyen Kale, and Mukund Sundararajan. Estimating training data influence by tracing gradient descent.Advances in Neural Information Processing Systems, 2020
work page 2020
-
[45]
Squad: 100,000+ questions for machine comprehension of text
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016
work page 2016
-
[46]
Contrastive learning with hard negative samples
Joshua David Robinson, Ching-Yao Chuang, Suvrit Sra, and Stefanie Jegelka. Contrastive learning with hard negative samples. InInternational Conference on Learning Representations, 2021
work page 2021
-
[47]
Ittai Rubinstein and Samuel B. Hopkins. Rescaled influence functions: Accurate data attribution in high dimension. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[48]
Nikunj Saunshi, Arushi Gupta, Mark Braverman, and Sanjeev Arora. Understanding influ- ence functions and datamodels via harmonic analysis.International Conference on Learning Representations, 2023
work page 2023
-
[49]
Scaling up influence functions
Andrea Schioppa, Polina Zablotskaia, David Vilar, and Artem Sokolov. Scaling up influence functions. InProceedings of the AAAI Conference on Artificial Intelligence, 2022
work page 2022
-
[50]
Training region-based object detectors with online hard example mining
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard example mining. InProceedings of the IEEE conference on computer vision and pattern recognition, 2016
work page 2016
-
[51]
Data pruning by infor- mation maximization
Haoru Tan, Sitong Wu, Wei Huang, Shizhen Zhao, and XIAOJUAN QI. Data pruning by infor- mation maximization. InThe Thirteenth International Conference on Learning Representations, 2025. 12
work page 2025
-
[52]
An empirical study of example forgetting during deep neural network learning
Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J Gordon. An empirical study of example forgetting during deep neural network learning. InInternational Conference on Learning Representations, 2019
work page 2019
-
[53]
Athanasios Tsanas, Max Little, Patrick McSharry, and Lorraine Ramig. Accurate telemonitoring of parkinson’s disease progression by non-invasive speech tests.Nature Precedings, 2009
work page 2009
-
[54]
Andrew Wang, Elisa Nguyen, Runshi Yang, Juhan Bae, Sheila A McIlraith, and Roger Grosse. Better training data attribution via better inverse hessian-vector products.Advances in Neural Information Processing Systems, 2025
work page 2025
-
[55]
Jiachen T Wang, Tianji Yang, James Zou, Yongchan Kwon, and Ruoxi Jia. Rethinking data shapley for data selection tasks: Misleads and merits.International Conference on Machine Learning, 2024
work page 2024
-
[56]
Data shapley in one training run
Jiachen T Wang, Prateek Mittal, Dawn Song, and Ruoxi Jia. Data shapley in one training run. International Conference on Learning Representations, 2025
work page 2025
-
[57]
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Chandu, David Wadden, Kelsey MacMillan, Noah A Smith, Iz Beltagy, et al. How far can camels go? exploring the state of instruction tuning on open resources.Advances in Neural Information Processing Systems, 2023
work page 2023
-
[58]
Ji2s: Joint influence-aware instruction data selection for efficient fine-tuning
Jingyu Wei, Bo Liu, Tianjiao Wan, Baoyun Peng, Xingkong Ma, and Mengmeng Guo. Ji2s: Joint influence-aware instruction data selection for efficient fine-tuning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
work page 2025
-
[59]
LESS: Selecting influential data for targeted instruction tuning
Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, and Danqi Chen. LESS: Selecting influential data for targeted instruction tuning. InInternational Conference on Machine Learning, 2024
work page 2024
-
[60]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[61]
Yu Yang, Siddhartha Mishra, Jeffrey N Chiang, and Baharan Mirzasoleiman. Smalltolarge (s2l): Scalable data selection for fine-tuning large language models by summarizing training trajectories of small models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[62]
Towards robust influence functions with flat validation minima.arXiv preprint arXiv:2505.19097, 2025
Xichen Ye, Yifan Wu, Weizhong Zhang, Cheng Jin, and Yifan Chen. Towards robust influence functions with flat validation minima.arXiv preprint arXiv:2505.19097, 2025
-
[63]
Modeling of strength of high-performance concrete using artificial neural networks
I-C Yeh. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 1998
work page 1998
-
[64]
Zichun Yu, Spandan Das, and Chenyan Xiong. Mates: Model-aware data selection for efficient pretraining with data influence models.Advances in Neural Information Processing Systems, 2024
work page 2024
-
[65]
Group-level data selection for efficient pretraining
Zichun Yu, Fei Peng, Jie Lei, Arnold Overwijk, Wen tau Yih, and Chenyan Xiong. Group-level data selection for efficient pretraining. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[66]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019
work page 2019
-
[67]
Chi Zhang, Huaping Zhong, Kuan Zhang, Chengliang Chai, Rui Wang, Xinlin Zhuang, Tianyi Bai, Jiantao Qiu, Lei Cao, Ju Fan, et al. Harnessing diversity for important data selection in pretraining large language models.International Conference on Learning Representations, 2025. 13 A Notation Table 2 consolidates the notation used throughout the paper. The sy...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.