hub

Studying large language model generalization with influence functions.arXiv preprint arXiv:2308.03296

Roger B · 2023 · arXiv 2308.03296

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines

cs.LG · 2026-05-12 · conditional · novelty 7.0

The paper decomposes errors in trajectory-based data attribution into config, algorithm, and system levels, proposes AdamW-influence to fix optimizer mismatch, derives an error proxy for Taylor approximation, and unifies data selection under a K-step look-ahead framework.

Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning

cs.LG · 2026-03-08 · unverdicted · novelty 7.0

Filter-then-Weight is a two-stage optimizer-aware method that filters geometrically useful data candidates and optimizes their coefficients to shape target updates in online LLM fine-tuning.

On the Accuracy of Newton Step and Influence Function Data Attributions

cs.LG · 2025-12-14 · unverdicted · novelty 7.0

New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.

MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization

cs.CV · 2025-08-11 · unverdicted · novelty 7.0

MIMIC is a new inversion framework that recovers visual concepts from VLM internal states using joint inversion, feature alignment, and three regularizers.

Interaction-Aware Influence Functions for Group Attribution

cs.LG · 2026-05-15 · conditional · novelty 6.0

Extends influence functions with a second-order pairwise interaction term that improves group attribution accuracy over simple summation on multiple model-dataset pairs and instruction-tuning selection tasks.

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.

Convergent Evolution: How Different Language Models Learn Similar Number Representations

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.

Representation-Guided Parameter-Efficient LLM Unlearning

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.

A Human-Centric Framework for Data Attribution in Large Language Models

cs.CY · 2026-02-11 · unverdicted · novelty 6.0

Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.

Efficient Estimation of Kernel Surrogate Models for Task Attribution

cs.LG · 2026-02-03 · unverdicted · novelty 6.0

Kernel surrogate models with first-order gradient approximation achieve 25% higher correlation to leave-one-out ground truth for task attribution and 40% better downstream data selection than linear surrogates.

Feature Identification via the Empirical NTK

cs.LG · 2025-10-01 · unverdicted · novelty 6.0

Eigenanalysis of the empirical NTK surfaces feature directions that align with Fourier features in modular addition networks and grammatical features in Gemma-3-270M, outperforming PCA baselines on activations.

DMin: Scalable Training Data Influence Estimation for Diffusion Models

cs.CV · 2024-12-11 · unverdicted · novelty 6.0

DMin uses gradient compression to scalably estimate training data influence in billion-parameter diffusion models.

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

cs.LG · 2023-10-19 · conditional · novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.

Mechanistic Anomaly Detection via Functional Attribution

cs.LG · 2026-04-21

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

cs.CL · 2026-01-20

citing papers explorer

Showing 16 of 16 citing papers.

How Faithful Is Trajectory-Based Data Attribution? Error Sources, Remedies, and Practical Guidelines cs.LG · 2026-05-12 · conditional · none · ref 3
The paper decomposes errors in trajectory-based data attribution into config, algorithm, and system levels, proposes AdamW-influence to fix optimizer mismatch, derives an error proxy for Taylor approximation, and unifies data selection under a K-step look-ahead framework.
Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning cs.LG · 2026-03-08 · unverdicted · none · ref 1
Filter-then-Weight is a two-stage optimizer-aware method that filters geometrically useful data candidates and optimizes their coefficients to shape target updates in online LLM fine-tuning.
On the Accuracy of Newton Step and Influence Function Data Attributions cs.LG · 2025-12-14 · unverdicted · none · ref 8
New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization cs.CV · 2025-08-11 · unverdicted · none · ref 4
MIMIC is a new inversion framework that recovers visual concepts from VLM internal states using joint inversion, feature alignment, and three regularizers.
Interaction-Aware Influence Functions for Group Attribution cs.LG · 2026-05-15 · conditional · none · ref 22
Extends influence functions with a second-order pairwise interaction term that improves group attribution accuracy over simple summation on multiple model-dataset pairs and instruction-tuning selection tasks.
Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces cs.LG · 2026-05-12 · unverdicted · none · ref 24
A latent mediation framework with sparse autoencoders enables non-additive token-level influence attribution in LLMs by learning orthogonal features and back-propagating attributions.
Convergent Evolution: How Different Language Models Learn Similar Number Representations cs.CL · 2026-04-22 · unverdicted · none · ref 44
Diverse language models converge on similar periodic number features with a two-tier hierarchy of Fourier sparsity and geometric separability, acquired via language co-occurrences or multi-token arithmetic.
Representation-Guided Parameter-Efficient LLM Unlearning cs.CL · 2026-04-19 · unverdicted · none · ref 95
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation cs.LG · 2026-04-17 · unverdicted · none · ref 16
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence analysis on LLMs up to 32B parameters.
A Human-Centric Framework for Data Attribution in Large Language Models cs.CY · 2026-02-11 · unverdicted · none · ref 80
Introduces a parameter-driven framework for data attribution in LLMs that enables negotiation among creators, users, and intermediaries to meet stakeholder goals within the data economy.
Efficient Estimation of Kernel Surrogate Models for Task Attribution cs.LG · 2026-02-03 · unverdicted · none · ref 3
Kernel surrogate models with first-order gradient approximation achieve 25% higher correlation to leave-one-out ground truth for task attribution and 40% better downstream data selection than linear surrogates.
Feature Identification via the Empirical NTK cs.LG · 2025-10-01 · unverdicted · none · ref 7
Eigenanalysis of the empirical NTK surfaces feature directions that align with Fourier features in modular addition networks and grammatical features in Gemma-3-270M, outperforming PCA baselines on activations.
DMin: Scalable Training Data Influence Estimation for Diffusion Models cs.CV · 2024-12-11 · unverdicted · none · ref 13
DMin uses gradient compression to scalably estimate training data influence in billion-parameter diffusion models.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation cs.LG · 2023-10-19 · conditional · none · ref 91
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Mechanistic Anomaly Detection via Functional Attribution cs.LG · 2026-04-21 · unreviewed · ref 45
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment cs.CL · 2026-01-20 · unreviewed · ref 11

Studying large language model generalization with influence functions.arXiv preprint arXiv:2308.03296

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer