On the Fragility of Data Attribution When Learning Is Distributed

Bo Hui; Min-Te Sun; Wei-shinn Ku; Xian Gao

arxiv: 2605.15520 · v1 · pith:2WYRQ72Cnew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· cs.DC

On the Fragility of Data Attribution When Learning Is Distributed

Xian Gao , Bo Hui , Min-Te Sun , Wei-Shinn Ku This is my paper

Pith reviewed 2026-05-19 15:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DC

keywords data attributiondistributed learningadversarial attackmachine learning securitycontribution evaluationnon-IID datafederated learning

0 comments

The pith

A single participant can inflate its measured attribution value in distributed training while preserving global utility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that data attribution methods used for pricing, auditing, and governance in machine learning pipelines can be manipulated in distributed training. A single client employs latent optimization to add small synthetic batches that exploit non-IID label coverage and evaluator sensitivities. This raises the attacker's attribution score and alters relative scores among other clients without lowering model accuracy or activating geometry-based defenses. The finding indicates that attribution itself creates a new attack surface in collaborative learning.

Core claim

The central claim is that attribution values do not faithfully reflect participants' contributions because a single client in a standard distributed training workflow can use latent optimization to inject small synthetic batches. These batches preserve global utility while exploiting non-IID label coverage and the sensitivities of marginal-utility evaluators, consistently increasing the adversary's attribution value and reshaping the relative attribution structure among benign clients across datasets, models, and evaluators without degrading accuracy.

What carries the argument

The attribution-first attack that uses latent optimization to inject small synthetic batches exploiting non-IID label coverage and marginal-utility evaluator sensitivities.

If this is right

Attribution methods fail to accurately measure individual contributions when training data is distributed and non-IID.
A single malicious client can increase its own attribution score and change the ranking of benign participants.
Global model accuracy stays the same after the attack.
Existing geometry-based detection methods do not flag the injected batches.
The vulnerability holds for multiple marginal-utility attribution evaluators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Incentive systems that pay participants based on attribution scores may need new safeguards against synthetic data injection.
The same manipulation tactic could appear in federated learning deployments where clients control their local data.
Future evaluators might add checks for label-distribution anomalies introduced by small batches.
Testing the attack with real-world non-IID partitions from production systems would reveal practical impact.

Load-bearing premise

Standard marginal-utility attribution evaluators remain sensitive to small synthetic batches that exploit non-IID label coverage in distributed settings.

What would settle it

Running the attack on a new dataset and model combination and observing no increase in the adversary's attribution value or a drop in final accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.15520 by Bo Hui, Min-Te Sun, Wei-shinn Ku, Xian Gao.

**Figure 1.** Figure 1: Client-level attribution can shift under utility-preserving local changes. Attribution shares for an attack-free (left) versus a latent optimization attack (right). in FL. At the same time, recent studies on data valuation have documented instability, hyperparameter sensitivity, and evaluator variance (Wang et al., 2025c; Wei et al., 2024; Wang et al., 2024b; Rubinstein & Hopkins, 2025), suggesting that a… view at source ↗

**Figure 3.** Figure 3: Effect of malicious-client selection. Grouped bar plot (with colored dots marking each value, including near-zero cases) of client-level data attribution values across different selection strategies. These results indicate that latent optimization benefits both strong and weak contributors, but with asymmetric gains. High-ranking clients are already near evaluator saturation and remain stable, whereas low-… view at source ↗

**Figure 2.** Figure 2: Grouped bar plot of data attribution values across attack methods under different client counts. Effect of Malicious-Client Selection [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 4.** Figure 4: Effect of attack intensity on data attribution. Normalized attribution share and test accuracy under increasing attack intensity. 5. Defenses There is currently no defense specifically designed to protect client-level data attribution in FL. To assess whether standard utility-centric defenses can incidentally mitigate attribution manipulation, we evaluate a widely used baseline: geometry-based trimming … view at source ↗

**Figure 5.** Figure 5: Client-level data attribution under CIFAR-10 with ResNet-18. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. C0 9.9% C1 25.6% C2 C3 12.2% 9.4% C5 10.0% C6 13.7% C7 9.5% C9 9.3% C4 0.0% C8 0.4% (a) Attack-Free C0 14.5% C1 26.1% C2 14.3% C3 6.2% C4 7.8% C6 7.5% C7 9.3% C9 10.8% C5 3.7% C8 0.0% (b) Latent Optimization C0 9.1% C… view at source ↗

**Figure 6.** Figure 6: Client-level data attribution under CIFAR-10 with WRN-28×10. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Client-level data attribution under CIFAR-10 with VGG16 BN. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. C0 12.5% C2 10.6% C4 9.8% C5 15.9% C6 16.2% C8 32.2% C1 2.0% C3 0.7% C7 0.0% C9 0.0% (a) Attack-Free C0 13.6% C2 8.8% C5 12.4% C6 10.0% C8 37.0% C9 7.4% C1 0.0% C3 4.4% C4 4.5% C7 1.9% (b) Latent Optimization C0 12.2% … view at source ↗

**Figure 8.** Figure 8: Client-level data attribution under FashionMNIST with ResNet-18. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Client-level data attribution under FashionMNIST with WRN-28×10. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. C2 8.8% C3 9.2% C4 13.2% C5 16.8% C6 23.3% C8 22.5% C0 4.2% C1 1.9% C7 0.0% C9 0.0% (a) Attack-Free C0 17.6% C2 14.6% C3 6.8% C4 C5 5.1% 9.2% C6 6.2% C7 10.7% C8 18.6% C9 11.1% C1 0.0% (b) Latent Optimization C0 8… view at source ↗

**Figure 10.** Figure 10: Client-level data attribution under FashionMNIST with VGG16 BN. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Client-level data attribution under SVHN with ResNet-18. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. C1 17.5% C3 5.6% C4 14.5% C6 18.1% C8 25.5% C9 15.2% C0 0.0% C2 0.0% C5 1.1% C7 2.5% (a) Attack-Free C1 13.5% C2 9.5% C3 7.2% C4 13.0% C6 12.5% C7 5.3% C8 20.8% C9 15.5% C0 1.5% C5 1.2% (b) Latent Optimization C1 15.1% C… view at source ↗

**Figure 12.** Figure 12: Client-level data attribution under SVHN with WRN-28×10. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Client-level data attribution under SVHN with VGG16-BN. Per-client attribution shares under different attack settings. Colors indicate client identities and are consistent across panels. ResNet-18 WRN-28-10 VGG16_BN ResNet-18 WRN-28-10 VGG16_BN ResNet-18 WRN-28-10 VGG16_BN 0 20 40 60 80 100 Accuracy (%) CIFAR-10 FashionMNIST SVHN Attack Free Label Flipping Random Noise Free Rider Latent Optimization [PIT… view at source ↗

**Figure 14.** Figure 14: Global model accuracy across datasets and architectures under different attack methods (FedSV). We report test accuracy (%) for three model architectures on CIFAR-10, FashionMNIST, and SVHN. Each dataset–model pair is shown as a group of bars, with different colors indicating attack variants. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: Leave-one-out (LOO) client-level data attribution under CIFAR-10 with ResNet-18. Per-client attribution shifts under different attack settings. Colors indicate client identities and remain consistent across panels. CIFAR-10 / ResNet-18 0 10 20 30 40 50 60 70 Accuracy (%) Attack Free Label Flipping Random Noise Free Rider Latent Optimization [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: Global model accuracy under CIFAR-10 with ResNet-18 under different attack methods (LOO). 21 [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

read the original abstract

Data attribution has become an important component of pricing, auditing, and governance in machine learning pipelines, yet most attribution methods implicitly assume that attribution values faithfully reflect participants' contributions. We show that this assumption can fail: a single participant in a standard distributed training workflow can substantially inflate its measured attribution value while preserving global utility. Our attribution-first attack uses latent optimization to inject small synthetic batches that preserve utility while exploiting non-IID label coverage and evaluator sensitivities. Across datasets, models, and multiple marginal-utility evaluators, the attack consistently increases the adversary's attribution value and reshapes the relative attribution structure among benign clients without degrading accuracy or triggering geometry-based defenses. These results show that attribution itself forms a new attack surface and motivate the development of attribution-robust and incentive-compatible scoring mechanisms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

One client can inflate its attribution score in distributed training by injecting latent-optimized synthetic batches, but the attack likely needs generative-model access that standard participants lack.

read the letter

The main takeaway is that attribution methods for collaborative ML can be gamed by a single participant who adds a handful of synthetic examples to raise its own score while leaving global accuracy intact. The paper shows this through an attribution-first attack that uses latent optimization to target non-IID label gaps and the quirks of marginal-utility evaluators. Across several datasets and models the attack raises the adversary's value and reshuffles rankings among honest clients without triggering basic geometry checks. That demonstration is the concrete new piece: it treats attribution itself as the target rather than just model performance. The work is useful because it flags a practical risk for any system that ties payments, credits, or audits to these scores. Readers who build incentive layers or auditing tools will see why robustness matters here. The soft spot is feasibility. Latent optimization normally requires a pre-trained generator whose latent space can be searched. A normal participant only receives the current global model and its own local data; nothing in the standard protocol supplies the auxiliary model or extra data needed to run the attack as described. If the experiments assume that extra capability, the claim that this works inside a standard workflow does not fully hold. The abstract also gives no error bars or hyperparameter details, so the size and stability of the gains are hard to judge from the summary alone. This paper is for people working on data valuation, federated incentives, or ML security. It is worth a reading group if the group cares about how attribution can be turned into an attack surface. I would send it to peer review so the authors can clarify the exact resources required and tighten the experimental reporting.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that data attribution methods are fragile in distributed machine learning settings. A single participant in a standard distributed training workflow can substantially inflate its measured attribution value by using an attribution-first attack that employs latent optimization to inject small synthetic batches. These batches exploit non-IID label coverage and evaluator sensitivities while preserving global model utility, accuracy, and avoiding geometry-based defenses. The attack is shown to consistently increase the adversary's attribution and reshape relative attributions among benign clients across multiple datasets, models, and marginal-utility evaluators.

Significance. If the attack is feasible within the constraints of a standard participant (local data and global model only), the result would be significant for highlighting a new attack surface in attribution mechanisms used for pricing, auditing, and governance in ML pipelines. The empirical consistency across settings provides concrete evidence that attribution does not always faithfully reflect contributions, motivating development of attribution-robust and incentive-compatible scoring. The work's focus on marginal-utility evaluators and utility preservation is a strength, as is the demonstration that the attack evades existing defenses.

major comments (2)

[Abstract] Abstract: The central claim that 'a single participant in a standard distributed training workflow' can execute the attack is load-bearing but potentially undermined by reliance on 'latent optimization to inject small synthetic batches'. Latent optimization typically presupposes access to a pre-trained generative model (VAE/GAN) and auxiliary training resources, which are not part of standard distributed protocols where clients receive only local data and the current global model. The manuscript should either demonstrate a purely local-data variant or explicitly bound the attack's requirements.
[Experimental Results] Experimental Results: The abstract reports consistent success across datasets, models, and evaluators, but provides no error bars, exact attack hyperparameters, ablation details on synthetic batch size, or sensitivity analysis to non-IID label coverage. These omissions make it difficult to assess whether the reported attribution inflation is robust or sensitive to implementation choices.

minor comments (2)

[Abstract] Abstract: The phrase 'attribution-first attack' is used without a brief inline definition or contrast to utility-focused attacks, which could improve immediate clarity for readers unfamiliar with the framing.
[Notation] Notation: Attribution value symbols and evaluator definitions could be introduced with a short table or equation reference in the early sections to reduce reliance on later definitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. We provide detailed responses to each major comment below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'a single participant in a standard distributed training workflow' can execute the attack is load-bearing but potentially undermined by reliance on 'latent optimization to inject small synthetic batches'. Latent optimization typically presupposes access to a pre-trained generative model (VAE/GAN) and auxiliary training resources, which are not part of standard distributed protocols where clients receive only local data and the current global model. The manuscript should either demonstrate a purely local-data variant or explicitly bound the attack's requirements.

Authors: We agree that the attack's reliance on latent optimization requires clarification regarding the resources needed. Our implementation uses a pre-trained VAE for optimizing in the latent space to create synthetic batches that exploit the attribution mechanism. This is not strictly limited to local data and the global model alone. In the revised manuscript, we will explicitly bound the attack requirements by stating that the adversary has access to a pre-trained generative model (e.g., trained on public datasets similar to the task), which is a reasonable assumption in many practical settings but not universal. We will not demonstrate a purely local-data variant in this work as it would require a fundamentally different attack strategy, but we will discuss this limitation and its implications for the attack's applicability. revision: partial
Referee: [Experimental Results] Experimental Results: The abstract reports consistent success across datasets, models, and evaluators, but provides no error bars, exact attack hyperparameters, ablation details on synthetic batch size, or sensitivity analysis to non-IID label coverage. These omissions make it difficult to assess whether the reported attribution inflation is robust or sensitive to implementation choices.

Authors: We appreciate this observation and agree that these details are important for evaluating the robustness of our findings. We will revise the experimental section to include error bars based on multiple runs with different random seeds, provide the precise hyperparameters used in the latent optimization process, add ablation experiments on the synthetic batch size, and include a sensitivity analysis varying the level of non-IID label coverage across clients. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical attack demonstration

full rationale

The paper presents an empirical attack showing that a single participant can inflate attribution values via latent optimization on small synthetic batches while preserving utility. This is demonstrated experimentally across datasets, models, and marginal-utility evaluators rather than derived from any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations. No derivation chain reduces to its own inputs by construction; the results rely on external experimental validation and do not invoke uniqueness theorems or ansatzes from prior author work in a circular manner.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical attack demonstration with no mathematical derivation; no free parameters, axioms, or invented entities are introduced or fitted in the abstract.

pith-pipeline@v0.9.0 · 5663 in / 937 out tokens · 52108 ms · 2026-05-19T15:11:22.551854+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our attribution-first attack uses latent optimization to inject small synthetic batches that preserve utility while exploiting non-IID label coverage and evaluator sensitivities.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the per-round marginal utility is defined as Δt(g) = U(wt + g) − U(wt)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 4 internal anchors

[1]

I., Cevher, V ., and Muehlebach, M

Bal, M. I., Cevher, V ., and Muehlebach, M. Adversarial training for defense against label poisoning attacks.arXiv preprint arXiv:2502.17121,

work page arXiv
[2]

Shapley estimated explanation (shep): A fast post-hoc attribution method for interpreting intelligent fault diagnosis.arXiv preprint arXiv:2504.03773,

Chen, Q., Dong, X., Peng, Z., and Meng, G. Shapley estimated explanation (shep): A fast post-hoc attribution method for interpreting intelligent fault diagnosis.arXiv preprint arXiv:2504.03773,

work page arXiv
[3]

Scaling laws for the value of individual data points in machine learning

Covert, I., Ji, W., Hashimoto, T., and Zou, J. Scaling laws for the value of individual data points in machine learning. arXiv preprint arXiv:2405.20456, 2024a. Covert, I., Kim, C., Lee, S.-I., Zou, J. Y ., and Hashimoto, T. B. Stochastic amortization: A unified approach to ac- celerate feature and data attribution.Advances in Neural Information Processin...

work page arXiv
[4]

Fair and efficient contribution val- uation for vertical federated learning

Fan, Z., Fang, H., Wang, X., Zhou, Z., Pei, J., Friedlander, M., and Zhang, Y . Fair and efficient contribution val- uation for vertical federated learning. InInternational Conference on Learning Representations, volume 2024, pp. 14553–14572,

work page 2024
[5]

How to probe: Simple yet effective techniques for improving post-hoc explanations.arXiv preprint arXiv:2503.00641,

Gairola, S., B¨ohle, M., Locatello, F., and Schiele, B. How to probe: Simple yet effective techniques for improving post-hoc explanations.arXiv preprint arXiv:2503.00641,

work page arXiv
[6]

Du-shapley: A shapley value proxy for ef- ficient dataset valuation.Advances in Neural Information Processing Systems, 37:1973–2000,

Garrido Lucero, F., Heymann, B., V ono, M., Loiseau, P., and Perchet, V . Du-shapley: A shapley value proxy for ef- ficient dataset valuation.Advances in Neural Information Processing Systems, 37:1973–2000,

work page 1973
[7]

W., and Zhao, H

Hu, Y ., Wu, F., Ye, H., Forsyth, D., Zou, J., Jiang, N., Ma, J. W., and Zhao, H. A snapshot of influence: A local data attribution framework for online reinforcement learning. arXiv preprint arXiv:2505.19281,

work page arXiv
[8]

Jia, Y ., Fang, M., Liu, H., Zhang, J., and Gong, N. Z. Trac- ing back the malicious clients in poisoning attacks to federated learning.arXiv preprint arXiv:2407.07221,

work page arXiv
[9]

W., and Xiong, C

10 On the Fragility of Data Attribution When Learning Is Distributed Jiao, C., Pan, Y ., Xiao, E., Sheng, D., Jain, N., Zhao, H., Dasgupta, I., Ma, J. W., and Xiong, C. Date-lm: Bench- marking data attribution evaluation for large language models.arXiv preprint arXiv:2507.09424,

work page arXiv
[10]

and Raffel, C

Kandpal, N. and Raffel, C. Position: The most expensive part of an llm should be its training data.arXiv preprint arXiv:2504.12427,

work page arXiv
[11]

DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models

Kwon, Y ., Wu, E., Wu, K., and Zou, J. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models.arXiv preprint arXiv:2310.00902,

work page arXiv
[12]

An efficient frame- work for crediting data contributors of diffusion models

Lin, C., Lu, M., Kim, C., and Lee, S.-I. An efficient frame- work for crediting data contributors of diffusion models. arXiv preprint arXiv:2407.03153, 2024a. Lin, J., Tao, L., Dong, M., and Xu, C. Diffusion attribution score: Evaluating training data influence in diffusion models.arXiv preprint arXiv:2410.18639, 2024b. Lin, X., Xu, X., Wu, Z., Ng, S.-K.,...

work page arXiv
[13]

DBA-DFL: towards distributed backdoor attacks with network detection in decentralized federated learn- ing

Liu, B., Xiao, Y ., Ye, R., Ling, Z., Ma, X., and Hui, B. DBA-DFL: towards distributed backdoor attacks with network detection in decentralized federated learn- ing. In Lynce, I., Murano, N., Vallati, M., Villata, S., Chesani, F., Milano, M., Omicini, A., and Dastani, M. (eds.),ECAI 2025 - 28th European Conference on Artificial Intelligence, 25-30 October...

work page 2025
[14]

URL https://doi.org/10.3233/FAIA250961

doi: 10.3233/FAIA250961. URL https://doi.org/10.3233/FAIA250961. Liu, P., Xu, X., and Wang, W. Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. Cybersecurity, 5(1):4,

work page doi:10.3233/faia250961
[15]

Threats to federated learning: A survey,

Lyu, L., Yu, H., and Yang, Q. Threats to federated learning: A survey.arXiv preprint arXiv:2003.02133,

work page arXiv 2003
[16]

Influence functions for scal- able data attribution in diffusion models.arXiv preprint arXiv:2410.13850,

Mlodozeniec, B., Eschenhagen, R., Bae, J., Immer, A., Krueger, D., and Turner, R. Influence functions for scal- able data attribution in diffusion models.arXiv preprint arXiv:2410.13850,

work page arXiv
[17]

Murhekar, A., Yuan, Z., Ray Chaudhury, B., Li, B., and Mehta, R

URLhttps://arxiv.org/abs/2506.06337. Murhekar, A., Yuan, Z., Ray Chaudhury, B., Li, B., and Mehta, R. Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization. Advances in Neural Information Processing Systems, 36: 17811–17831,

work page arXiv
[18]

Di, Yiwei Lu, Ayush Sekhari, Gautam Kamath, and Seth Neel

Pawelczyk, M., Di, J. Z., Lu, Y ., Sekhari, A., Kamath, G., and Neel, S. Machine unlearning fails to remove data poisoning attacks.arXiv preprint arXiv:2406.17216,

work page arXiv
[19]

Ramu, P., Goswami, K., Saxena, A., and Srinivasan, B. V . Enhancing post-hoc attributions in long document com- prehension via coarse grained answer decomposition. arXiv preprint arXiv:2409.17073,

work page arXiv
[20]

Rescaled influence functions: Accurate data attribution in high dimension.arXiv preprint arXiv:2506.06656,

Rubinstein, I. and Hopkins, S. B. Rescaled influence func- tions: Accurate data attribution in high dimension.arXiv preprint arXiv:2506.06656,

work page arXiv
[21]

Sardana, S., Gupta, S., Donode, A., Prasad, A., and Karthik, G. M. Defending machine learning and deep learning models: Detecting and preventing data poisoning attacks. 2024 Global Conference on Communications and Infor- mation Technologies (GCCIT), pp. 1–6,

work page 2024
[22]

Shahani, P. S. and Scheutz, M. Noise injection systemically degrades large language model safety guardrails.arXiv preprint arXiv:2505.13500,

work page arXiv
[23]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K. and Zisserman, A. Very deep convolu- tional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Data-faithful feature attribution: Mitigating unobservable confounders via instrumental variables.Advances in Neural Information Processing Systems, 37:44935–44964, 2024a

Sun, Q., Xia, H., and Liu, J. Data-faithful feature attribution: Mitigating unobservable confounders via instrumental variables.Advances in Neural Information Processing Systems, 37:44935–44964, 2024a. Sun, W., Liu, H., Kandpal, N., Raffel, C., and Yang, Y . Enhancing training data attribution with representational optimization.arXiv preprint arXiv:2505.18513,

work page arXiv
[25]

2d-oob: Attributing data con- tribution through joint valuation framework.Advances in Neural Information Processing Systems, 37:46764– 46790, 2024b

Sun, Y ., Shen, J., and Kwon, Y . 2d-oob: Attributing data con- tribution through joint valuation framework.Advances in Neural Information Processing Systems, 37:46764– 46790, 2024b. Tastan, N., Fares, S., Aremu, T., Horvath, S., and Nan- dakumar, K. Redefining contributions: Shapley-driven federated learning.arXiv preprint arXiv:2406.00569,

work page arXiv
[26]

A., and Grosse, R

Wang, A., Nguyen, E., Yang, R., Bae, J., McIlraith, S. A., and Grosse, R. Better training data attribution via better inverse hessian-vector products.arXiv preprint arXiv:2507.14740, 2025a. Wang, J., Lin, X., Qiao, R., Foo, C.-S., and Low, B. K. H. Helpful or harmful data? fine-tuning-free shapley attri- bution for explaining language model predictions.ar...

work page arXiv
[27]

T., Mittal, P., Song, D., and Jia, R

Wang, J. T., Mittal, P., Song, D., and Jia, R. Data shapley in one training run.arXiv preprint arXiv:2406.11011, 2024b. Wang, J. T., Yang, T., Zou, J., Kwon, Y ., and Jia, R. Re- thinking data shapley for data selection tasks: Misleads and merits.arXiv preprint arXiv:2405.03875, 2024c. Wang, L., Xu, S., Wang, X., and Zhu, Q. Addressing class imbalance in ...

work page arXiv
[28]

Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural In- formation Processing Systems, 37:4235–4266, 2024d

Wang, S.-Y ., Hertzmann, A., Efros, A., Zhu, J.-Y ., and Zhang, R. Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural In- formation Processing Systems, 37:4235–4266, 2024d. 12 On the Fragility of Data Attribution When Learning Is Distributed Wang, S.-Y ., Hertzmann, A., Efros, A. A., Zhang, R., and Zhu, J.-Y . Fa...

work page arXiv
[29]

Wang, W., Deng, J., Hu, Y ., Zhang, S., Jiang, X., Zhang, R., Zhao, H., and Ma, J. W. Taming hyperparameter sensitiv- ity in data attribution: Practical selection without costly retraining.arXiv preprint arXiv:2505.24261, 2025c. Wang, X., Hu, P., Deng, J., and Ma, J. W. Adversarial attacks on data attribution.arXiv preprint arXiv:2409.05657, 2024e. Wei, D...

work page arXiv
[30]

Adversarial label flips attack on support vector machines

Xiao, H., Xiao, H., and Eckert, C. Adversarial label flips attack on support vector machines. InECAI 2012, pp. 870–875. IOS Press,

work page 2012
[31]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K., and V ollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,

work page internal anchor Pith review Pith/arXiv arXiv
[32]

ISBN 9798400704369

Associa- tion for Computing Machinery. ISBN 9798400704369. doi: 10.1145/3627673.3679650. URL https://doi. org/10.1145/3627673.3679650. Xu, X., Wang, S., Foo, C.-S., Low, B. K., and Fanti, G. Data distribution valuation.Advances in Neural Information Processing Systems, 37:2407–2448,

work page doi:10.1145/3627673.3679650
[33]

Wide Residual Networks

Zagoruyko, S. and Komodakis, N. Wide residual networks. arXiv preprint arXiv:1605.07146,

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Fairshare data pric- ing via data valuation for large language models.arXiv preprint arXiv:2502.00198, 2025a

Zhang, L., Jiao, C., Li, B., and Xiong, C. Fairshare data pric- ing via data valuation for large language models.arXiv preprint arXiv:2502.00198, 2025a. Zhang, L., Wu, H., Zhang, L., Xu, F., Cao, J., Li, F., and Niu, B. Training data attribution: Was your model secretly trained on data created by mine? InProceedings of the 31st ACM SIGKDD Conference on Kn...

work page arXiv
[35]

Federated Learning with Non-IID Data

Zhao, Y ., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V . Federated learning with non-iid data.arXiv preprint arXiv:1806.00582,

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Intriguing properties of data attribution on diffusion models.arXiv preprint arXiv:2311.00500,

Zheng, X., Pang, T., Du, C., Jiang, J., and Lin, M. Intriguing properties of data attribution on diffusion models.arXiv preprint arXiv:2311.00500,

work page arXiv
[37]

Background and Related Work A.1

13 On the Fragility of Data Attribution When Learning Is Distributed A. Background and Related Work A.1. Federated Learning Federated learning (FL) enables collaborative training over decentralized data without sharing raw samples. Classi- cal FL follows the broadcast–local-train–aggregate loop, where a central server coordinates many clients under pri- v...

work page 2024

[1] [1]

I., Cevher, V ., and Muehlebach, M

Bal, M. I., Cevher, V ., and Muehlebach, M. Adversarial training for defense against label poisoning attacks.arXiv preprint arXiv:2502.17121,

work page arXiv

[2] [2]

Shapley estimated explanation (shep): A fast post-hoc attribution method for interpreting intelligent fault diagnosis.arXiv preprint arXiv:2504.03773,

Chen, Q., Dong, X., Peng, Z., and Meng, G. Shapley estimated explanation (shep): A fast post-hoc attribution method for interpreting intelligent fault diagnosis.arXiv preprint arXiv:2504.03773,

work page arXiv

[3] [3]

Scaling laws for the value of individual data points in machine learning

Covert, I., Ji, W., Hashimoto, T., and Zou, J. Scaling laws for the value of individual data points in machine learning. arXiv preprint arXiv:2405.20456, 2024a. Covert, I., Kim, C., Lee, S.-I., Zou, J. Y ., and Hashimoto, T. B. Stochastic amortization: A unified approach to ac- celerate feature and data attribution.Advances in Neural Information Processin...

work page arXiv

[4] [4]

Fair and efficient contribution val- uation for vertical federated learning

Fan, Z., Fang, H., Wang, X., Zhou, Z., Pei, J., Friedlander, M., and Zhang, Y . Fair and efficient contribution val- uation for vertical federated learning. InInternational Conference on Learning Representations, volume 2024, pp. 14553–14572,

work page 2024

[5] [5]

How to probe: Simple yet effective techniques for improving post-hoc explanations.arXiv preprint arXiv:2503.00641,

Gairola, S., B¨ohle, M., Locatello, F., and Schiele, B. How to probe: Simple yet effective techniques for improving post-hoc explanations.arXiv preprint arXiv:2503.00641,

work page arXiv

[6] [6]

Du-shapley: A shapley value proxy for ef- ficient dataset valuation.Advances in Neural Information Processing Systems, 37:1973–2000,

Garrido Lucero, F., Heymann, B., V ono, M., Loiseau, P., and Perchet, V . Du-shapley: A shapley value proxy for ef- ficient dataset valuation.Advances in Neural Information Processing Systems, 37:1973–2000,

work page 1973

[7] [7]

W., and Zhao, H

Hu, Y ., Wu, F., Ye, H., Forsyth, D., Zou, J., Jiang, N., Ma, J. W., and Zhao, H. A snapshot of influence: A local data attribution framework for online reinforcement learning. arXiv preprint arXiv:2505.19281,

work page arXiv

[8] [8]

Jia, Y ., Fang, M., Liu, H., Zhang, J., and Gong, N. Z. Trac- ing back the malicious clients in poisoning attacks to federated learning.arXiv preprint arXiv:2407.07221,

work page arXiv

[9] [9]

W., and Xiong, C

10 On the Fragility of Data Attribution When Learning Is Distributed Jiao, C., Pan, Y ., Xiao, E., Sheng, D., Jain, N., Zhao, H., Dasgupta, I., Ma, J. W., and Xiong, C. Date-lm: Bench- marking data attribution evaluation for large language models.arXiv preprint arXiv:2507.09424,

work page arXiv

[10] [10]

and Raffel, C

Kandpal, N. and Raffel, C. Position: The most expensive part of an llm should be its training data.arXiv preprint arXiv:2504.12427,

work page arXiv

[11] [11]

DataInf: Efficiently estimating data in- fluence in LoRA-tuned LLMs and diffusion models

Kwon, Y ., Wu, E., Wu, K., and Zou, J. Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models.arXiv preprint arXiv:2310.00902,

work page arXiv

[12] [12]

An efficient frame- work for crediting data contributors of diffusion models

Lin, C., Lu, M., Kim, C., and Lee, S.-I. An efficient frame- work for crediting data contributors of diffusion models. arXiv preprint arXiv:2407.03153, 2024a. Lin, J., Tao, L., Dong, M., and Xu, C. Diffusion attribution score: Evaluating training data influence in diffusion models.arXiv preprint arXiv:2410.18639, 2024b. Lin, X., Xu, X., Wu, Z., Ng, S.-K.,...

work page arXiv

[13] [13]

DBA-DFL: towards distributed backdoor attacks with network detection in decentralized federated learn- ing

Liu, B., Xiao, Y ., Ye, R., Ling, Z., Ma, X., and Hui, B. DBA-DFL: towards distributed backdoor attacks with network detection in decentralized federated learn- ing. In Lynce, I., Murano, N., Vallati, M., Villata, S., Chesani, F., Milano, M., Omicini, A., and Dastani, M. (eds.),ECAI 2025 - 28th European Conference on Artificial Intelligence, 25-30 October...

work page 2025

[14] [14]

URL https://doi.org/10.3233/FAIA250961

doi: 10.3233/FAIA250961. URL https://doi.org/10.3233/FAIA250961. Liu, P., Xu, X., and Wang, W. Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives. Cybersecurity, 5(1):4,

work page doi:10.3233/faia250961

[15] [15]

Threats to federated learning: A survey,

Lyu, L., Yu, H., and Yang, Q. Threats to federated learning: A survey.arXiv preprint arXiv:2003.02133,

work page arXiv 2003

[16] [16]

Influence functions for scal- able data attribution in diffusion models.arXiv preprint arXiv:2410.13850,

Mlodozeniec, B., Eschenhagen, R., Bae, J., Immer, A., Krueger, D., and Turner, R. Influence functions for scal- able data attribution in diffusion models.arXiv preprint arXiv:2410.13850,

work page arXiv

[17] [17]

Murhekar, A., Yuan, Z., Ray Chaudhury, B., Li, B., and Mehta, R

URLhttps://arxiv.org/abs/2506.06337. Murhekar, A., Yuan, Z., Ray Chaudhury, B., Li, B., and Mehta, R. Incentives in federated learning: Equilibria, dynamics, and mechanisms for welfare maximization. Advances in Neural Information Processing Systems, 36: 17811–17831,

work page arXiv

[18] [18]

Di, Yiwei Lu, Ayush Sekhari, Gautam Kamath, and Seth Neel

Pawelczyk, M., Di, J. Z., Lu, Y ., Sekhari, A., Kamath, G., and Neel, S. Machine unlearning fails to remove data poisoning attacks.arXiv preprint arXiv:2406.17216,

work page arXiv

[19] [19]

Ramu, P., Goswami, K., Saxena, A., and Srinivasan, B. V . Enhancing post-hoc attributions in long document com- prehension via coarse grained answer decomposition. arXiv preprint arXiv:2409.17073,

work page arXiv

[20] [20]

Rescaled influence functions: Accurate data attribution in high dimension.arXiv preprint arXiv:2506.06656,

Rubinstein, I. and Hopkins, S. B. Rescaled influence func- tions: Accurate data attribution in high dimension.arXiv preprint arXiv:2506.06656,

work page arXiv

[21] [21]

Sardana, S., Gupta, S., Donode, A., Prasad, A., and Karthik, G. M. Defending machine learning and deep learning models: Detecting and preventing data poisoning attacks. 2024 Global Conference on Communications and Infor- mation Technologies (GCCIT), pp. 1–6,

work page 2024

[22] [22]

Shahani, P. S. and Scheutz, M. Noise injection systemically degrades large language model safety guardrails.arXiv preprint arXiv:2505.13500,

work page arXiv

[23] [23]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K. and Zisserman, A. Very deep convolu- tional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Data-faithful feature attribution: Mitigating unobservable confounders via instrumental variables.Advances in Neural Information Processing Systems, 37:44935–44964, 2024a

Sun, Q., Xia, H., and Liu, J. Data-faithful feature attribution: Mitigating unobservable confounders via instrumental variables.Advances in Neural Information Processing Systems, 37:44935–44964, 2024a. Sun, W., Liu, H., Kandpal, N., Raffel, C., and Yang, Y . Enhancing training data attribution with representational optimization.arXiv preprint arXiv:2505.18513,

work page arXiv

[25] [25]

2d-oob: Attributing data con- tribution through joint valuation framework.Advances in Neural Information Processing Systems, 37:46764– 46790, 2024b

Sun, Y ., Shen, J., and Kwon, Y . 2d-oob: Attributing data con- tribution through joint valuation framework.Advances in Neural Information Processing Systems, 37:46764– 46790, 2024b. Tastan, N., Fares, S., Aremu, T., Horvath, S., and Nan- dakumar, K. Redefining contributions: Shapley-driven federated learning.arXiv preprint arXiv:2406.00569,

work page arXiv

[26] [26]

A., and Grosse, R

Wang, A., Nguyen, E., Yang, R., Bae, J., McIlraith, S. A., and Grosse, R. Better training data attribution via better inverse hessian-vector products.arXiv preprint arXiv:2507.14740, 2025a. Wang, J., Lin, X., Qiao, R., Foo, C.-S., and Low, B. K. H. Helpful or harmful data? fine-tuning-free shapley attri- bution for explaining language model predictions.ar...

work page arXiv

[27] [27]

T., Mittal, P., Song, D., and Jia, R

Wang, J. T., Mittal, P., Song, D., and Jia, R. Data shapley in one training run.arXiv preprint arXiv:2406.11011, 2024b. Wang, J. T., Yang, T., Zou, J., Kwon, Y ., and Jia, R. Re- thinking data shapley for data selection tasks: Misleads and merits.arXiv preprint arXiv:2405.03875, 2024c. Wang, L., Xu, S., Wang, X., and Zhu, Q. Addressing class imbalance in ...

work page arXiv

[28] [28]

Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural In- formation Processing Systems, 37:4235–4266, 2024d

Wang, S.-Y ., Hertzmann, A., Efros, A., Zhu, J.-Y ., and Zhang, R. Data attribution for text-to-image models by unlearning synthesized images.Advances in Neural In- formation Processing Systems, 37:4235–4266, 2024d. 12 On the Fragility of Data Attribution When Learning Is Distributed Wang, S.-Y ., Hertzmann, A., Efros, A. A., Zhang, R., and Zhu, J.-Y . Fa...

work page arXiv

[29] [29]

Wang, W., Deng, J., Hu, Y ., Zhang, S., Jiang, X., Zhang, R., Zhao, H., and Ma, J. W. Taming hyperparameter sensitiv- ity in data attribution: Practical selection without costly retraining.arXiv preprint arXiv:2505.24261, 2025c. Wang, X., Hu, P., Deng, J., and Ma, J. W. Adversarial attacks on data attribution.arXiv preprint arXiv:2409.05657, 2024e. Wei, D...

work page arXiv

[30] [30]

Adversarial label flips attack on support vector machines

Xiao, H., Xiao, H., and Eckert, C. Adversarial label flips attack on support vector machines. InECAI 2012, pp. 870–875. IOS Press,

work page 2012

[31] [31]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Xiao, H., Rasul, K., and V ollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747,

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

ISBN 9798400704369

Associa- tion for Computing Machinery. ISBN 9798400704369. doi: 10.1145/3627673.3679650. URL https://doi. org/10.1145/3627673.3679650. Xu, X., Wang, S., Foo, C.-S., Low, B. K., and Fanti, G. Data distribution valuation.Advances in Neural Information Processing Systems, 37:2407–2448,

work page doi:10.1145/3627673.3679650

[33] [33]

Wide Residual Networks

Zagoruyko, S. and Komodakis, N. Wide residual networks. arXiv preprint arXiv:1605.07146,

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Fairshare data pric- ing via data valuation for large language models.arXiv preprint arXiv:2502.00198, 2025a

Zhang, L., Jiao, C., Li, B., and Xiong, C. Fairshare data pric- ing via data valuation for large language models.arXiv preprint arXiv:2502.00198, 2025a. Zhang, L., Wu, H., Zhang, L., Xu, F., Cao, J., Li, F., and Niu, B. Training data attribution: Was your model secretly trained on data created by mine? InProceedings of the 31st ACM SIGKDD Conference on Kn...

work page arXiv

[35] [35]

Federated Learning with Non-IID Data

Zhao, Y ., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V . Federated learning with non-iid data.arXiv preprint arXiv:1806.00582,

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Intriguing properties of data attribution on diffusion models.arXiv preprint arXiv:2311.00500,

Zheng, X., Pang, T., Du, C., Jiang, J., and Lin, M. Intriguing properties of data attribution on diffusion models.arXiv preprint arXiv:2311.00500,

work page arXiv

[37] [37]

Background and Related Work A.1

13 On the Fragility of Data Attribution When Learning Is Distributed A. Background and Related Work A.1. Federated Learning Federated learning (FL) enables collaborative training over decentralized data without sharing raw samples. Classi- cal FL follows the broadcast–local-train–aggregate loop, where a central server coordinates many clients under pri- v...

work page 2024