FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

Anna Monreale; Luca Corbucci; Mattia Cerrato; Xenia Heilmann

arxiv: 2506.21095 · v5 · submitted 2025-06-26 · 💻 cs.LG · cs.AI

FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

Xenia Heilmann , Luca Corbucci , Mattia Cerrato , Anna Monreale This is my paper

Pith reviewed 2026-05-19 08:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords federated learningfairnessbenchmark datasetsheterogeneous biasclient-level evaluationattribute biasvalue bias

0 comments

The pith

FeDa4Fair supplies client-level datasets that expose how federated models can appear fair on average while discriminating at individual clients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to show that federated learning creates an illusion of fairness because global models are judged on server-wide averages that mask ongoing bias at the client level. Current fairness techniques usually address only one binary sensitive attribute and therefore break down when clients differ in which attribute they bias against or when they hold opposing biases toward values of the same attribute. To make these problems measurable and reproducible, the authors built FeDa4Fair, a library that produces tailored federated datasets containing these two heterogeneous bias patterns together with a ready benchmark suite and evaluation tools. A sympathetic reader would care because federated learning is already deployed in privacy-sensitive settings such as mobile apps and medical data, where unnoticed client-level discrimination can persist even when average metrics look acceptable.

Core claim

We introduce FeDa4Fair, the first benchmarking framework designed to stress-test fairness methods under these heterogeneous conditions of attribute-bias and value-bias. The library creates datasets tailored to evaluating fair FL methods under heterogeneous client bias; we also release a benchmark suite generated by the FeDa4Fair library and provide ready-to-use functions for evaluating fairness outcomes for these datasets.

What carries the argument

FeDa4Fair library, which generates federated datasets that embed attribute-bias across clients and value-bias within the same attribute to create controlled heterogeneous bias conditions for fairness testing.

If this is right

Researchers can now run controlled, reproducible tests of fairness methods on datasets that contain conflicting client biases rather than uniform bias.
Fairness evaluation in federated learning will move from server-average metrics to explicit client-level checks.
New fairness algorithms can be compared directly on a shared benchmark that includes both attribute-bias and value-bias cases.
Standardized datasets make it possible to quantify how much existing single-attribute methods degrade under realistic client heterogeneity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be extended to include dynamic client arrival or changing bias patterns over time.
Practitioners deploying federated models in regulated domains might adopt client-level fairness audits as a routine step.
Methods that learn to reconcile differing client biases on the fly could become a natural next research target.

Load-bearing premise

The two identified bias scenarios are the primary realistic cases that single-attribute fairness methods fail to handle, and the datasets produced by the library faithfully reproduce those conditions.

What would settle it

An experiment in which standard fairness methods achieve consistent fairness at both the global and every client level when trained and tested on FeDa4Fair datasets would show that the new framework does not address a genuine gap.

Figures

Figures reproduced from arXiv: 2506.21095 by Anna Monreale, Luca Corbucci, Mattia Cerrato, Xenia Heilmann.

**Figure 2.** Figure 2: Attribute bias measured with DD on the XGBoost model for attribute benchmark datasets. At its core, FeDa4Fair analyzes bias distribution at the client level and allows for data modification to amplify biases, facilitating exploration of different client settings while still maintaining a natural non-i.i.d. setting. A key component is the specification of the sensitive attributes which will be used for … view at source ↗

**Figure 3.** Figure 3: Attribute value bias measured with DD on XGBoost for value benchmark datasets. For a more detailed view, the attribute-value level returns fairness metrics for all possible combinations of attributes and their values. These fairness metrics are automatically generated alongside any dataset created via FeDa4Fair, reported in tabular formats, and visualized using plots. Figures 2 illustrate examples of attri… view at source ↗

**Figure 4.** Figure 4: Attribute and attribute value bias measured with DD on the true labels and partitioning data from “LA” and “WY”. These plots are generated for any dataset created with FeDa4Fair. negative labels within selected data splits. This focus ensures these interventions amplify potential imbalances that usually impact model performance on underrepresented negative classes. To use this feature, practitioners indica… view at source ↗

**Figure 5.** Figure 5: Attribute bias toward RACE and SEX measured with Demographic Disparity on the XGBoost model vs. the FedAvg model and vs. PUFFLE for the attribute-silo dataset. AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT ND NE NH NJ NM NV NY OH OK OR PA PR RI SC SD TN TX UT VA VT WA WI WV WY State 0.8 0.6 0.4 0.2 0.0 Dem. Disparity XGBoost - FedAvg (a) Attribute-silo dataset FedAvg. AK AL … view at source ↗

**Figure 6.** Figure 6: Individual, per attribute, bias differences in Demographic Disparity between the local [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Attribute value bias toward RACE as well as value changes measured with Demographic Disparity on the XGBoost model vs. the FedAvg model and vs. PUFFLE for the value-silo dataset. to the local XGBoost model. In this case, states biased towards the RACE attribute profit less from participating in this PUFFLE setting in terms of RACE bias reduction [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Attribute bias measured with Demographic Disparity on the Logistic Regression model for [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Attribute value bias measured with Demographic Disparity on the Logistic Regression [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Attribute bias toward RACE and SEX measured with Demographic Disparity on the Logistic Regression model vs. the FedAvg model and vs. PUFFLE for the attribute-silo dataset. 0.05 0.10 0.15 0.20 FedAVG Dem. Disparity 0.05 0.10 0.15 0.20 XGBoost Dem. Disparity RACE 0.1 0.2 0.3 FedAVG Dem. Disparity 0.05 0.10 0.15 0.20 0.25 0.30 XGBoost Dem. Disparity SEX (a) Attribute-device dataset FedAvg. 0.05 0.10 0.15 0.2… view at source ↗

**Figure 11.** Figure 11: Attribute bias toward RACE and SEX measured with Demographic Disparity on the XGBoost model vs. the FedAvg model and vs. the PUFFLE model for the attribute-silo dataset. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Attribute bias toward RACE and SEX measured with Demographic Disparity on the Logistic Regression model vs. the FedAvg model and vs. the PUFFLE model for the attribute-silo dataset. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Attribute value bias toward RACE as well as value changes measured with Demographic Disparity on the XGBoost model vs. the FedAvg model and vs. PUFFLE for the value-device datasets. White Black Asian Ala./Ind. Others Sensitive Group Value 0.2 0.4 0.6 0.8 1.0 Dem.Disparity Change in Max. Value Disparity (a) Value-device dataset FedAvg. White Black Asian Ala./Ind. Others Sensitive Group Value 0.2 0.4 0.6 0.… view at source ↗

**Figure 14.** Figure 14: Attribute value bias toward RACE as well as value changes measured with Demographic Disparity on the Logistic Regression model vs. the FedAvg model and vs. PUFFLE for the valuedevice datasets. White Black Asian Ala./Ind. Others Sensitive Group Value 0.2 0.4 0.6 0.8 Dem.Disparity Change in Max. Value Disparity (a) Value-silo dataset FedAvg. White Black Asian Ala./Ind. Others Sensitive Group Value 0.2 0.4 … view at source ↗

**Figure 15.** Figure 15: Attribute value bias toward RACE as well as value changes measured with Demographic Disparity on the Logistic Regression model vs. the FedAvg model and vs. PUFFLE for the value-silo dataset. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

**Figure 16.** Figure 16: Individual, per attribute, bias differences in Demographic Disparity between the local [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: Individual differences in Demographic Disparity between the local XGBoost models vs. [PITH_FULL_IMAGE:figures/full_fig_p022_17.png] view at source ↗

**Figure 18.** Figure 18: Individual differences in Demographic Disparity between the local Logistic Regression [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗

**Figure 19.** Figure 19: Individual, per attribute, bias differences in Demographic Disparity between the local [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗

**Figure 20.** Figure 20: Individual, per attribute, bias differences in Demographic Disparity between the local [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗

**Figure 21.** Figure 21: Individual differences in Demographic Disparity between the local XGBoost models vs. [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗

**Figure 22.** Figure 22: Individual differences in Demographic Disparity between the local Logistic Regression [PITH_FULL_IMAGE:figures/full_fig_p027_22.png] view at source ↗

read the original abstract

Federated Learning (FL) enables collaborative training while preserving privacy, yet it introduces a critical challenge: the "illusion of fairness''. A global model, usually evaluated on the server, appears fair on average while keeping persistent discrimination at the client level. Current fairness-enhancing FL solutions often fall short, as they typically mitigate biases for a single, usually binary, sensitive attribute, while ignoring two realistic and conflicting scenarios: attribute-bias (where clients are unfair toward different sensitive attributes) and value-bias (where clients exhibit conflicting biases toward different values of the same attribute). To support more robust and reproducible fairness research in FL, we introduce FeDa4Fair, the first benchmarking framework designed to stress-test fairness methods under these heterogeneous conditions. Our contributions are three-fold: (1) We introduce FeDa4Fair, a library designed to create datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release a benchmark suite generated by the FeDa4Fair library to standardize the evaluation of fair FL methods; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FeDa4Fair supplies a library and benchmark suite for client-heterogeneous fairness testing in federated learning, but the release lacks any reported checks that the generated partitions actually create the claimed attribute and value biases.

read the letter

The paper's core offering is a library called FeDa4Fair that generates federated datasets with two specific forms of client-level bias: attribute-bias, where different clients are unfair on different sensitive attributes, and value-bias, where clients disagree on the preferred values of the same attribute. It also ships a benchmark suite and some evaluation helpers. That is the main new piece. Prior fairness work in FL has mostly stayed with single binary attributes and server-side averages, so targeting these heterogeneous cases is a reasonable step toward more realistic testing conditions. Releasing the generation code and ready-to-use evaluation functions is the practical part that could save other groups time when they want to run controlled experiments on client heterogeneity. The motivation section correctly flags the gap between global fairness scores and persistent local discrimination, which matches what people see in real privacy-sensitive deployments. The contribution stays grounded in tooling rather than new theory or large-scale empirical claims. The soft spot is the missing verification step. The abstract and contribution list describe the library's purpose and the released suite, but they do not show client-level fairness gaps, ablation results on the bias-injection parameters, or side-by-side comparisons demonstrating that standard single-attribute methods degrade exactly on these partitions. Without that evidence it remains possible that the generated data still looks statistically homogeneous to existing fairness metrics, which would undercut the stress-test claim. Minor issues include the usual need for clearer documentation on how the partitioning and label rules are implemented, but those are fixable. This paper is for researchers who work on fair federated learning and need standardized client-heterogeneous benchmarks rather than for theorists or large-scale systems papers. A reader who wants to run reproducible experiments on attribute and value bias will get immediate value from the released code. It deserves a serious referee because the tooling is concrete and the targeted problem is live in the literature. I would send it to review with the expectation that the main revision request will be for quantitative validation of the bias patterns.

Referee Report

2 major / 2 minor

Summary. The paper introduces FeDa4Fair, a library for generating client-level federated datasets that incorporate heterogeneous fairness biases, specifically attribute-bias (clients unfair toward different sensitive attributes) and value-bias (conflicting biases toward different values of the same attribute). It releases a benchmark suite generated by the library and provides ready-to-use evaluation functions to support testing of fairness methods in federated learning beyond single-attribute settings.

Significance. If the library's data-generation procedure is shown to produce client distributions with the claimed per-client bias heterogeneity, the work would offer a useful standardized benchmark for FL fairness research, helping address the 'illusion of fairness' in global models. The release of the library, benchmark suite, and evaluation functions supports reproducibility and is a concrete strength.

major comments (2)

[Abstract] Abstract and contributions list: the central claim that FeDa4Fair supplies the first benchmark for stress-testing under attribute-bias and value-bias is load-bearing on the unverified premise that the generated datasets actually yield measurable client-level differences (e.g., distinct per-client DP or EO gaps). No such client-level fairness statistics, ablation on bias-injection parameters, or baseline degradation results are reported.
[Contributions (1) and (2)] Library and benchmark description: without reported verification that the partitioning/sampling rules produce statistically heterogeneous client fairness outcomes (as opposed to remaining homogeneous from the perspective of existing single-attribute methods), the utility of the released suite for the stated purpose cannot be assessed.

minor comments (2)

[Library design] The description of the two bias scenarios could include a brief formal definition or pseudocode for how attribute-bias versus value-bias is injected at the client level to improve clarity.
[Benchmark suite] Consider adding a table summarizing the released benchmark datasets (number of clients, sensitive attributes, bias parameters used) for quick reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for empirical verification of client-level heterogeneity in the generated datasets. We address each major comment below and commit to strengthening the manuscript with additional analyses in the revision.

read point-by-point responses

Referee: [Abstract] Abstract and contributions list: the central claim that FeDa4Fair supplies the first benchmark for stress-testing under attribute-bias and value-bias is load-bearing on the unverified premise that the generated datasets actually yield measurable client-level differences (e.g., distinct per-client DP or EO gaps). No such client-level fairness statistics, ablation on bias-injection parameters, or baseline degradation results are reported.

Authors: We acknowledge that the current version of the manuscript does not report explicit client-level fairness statistics, parameter ablations, or baseline degradation results. In the revised manuscript we will add a dedicated experimental subsection that computes and reports per-client DP and EO gaps across the benchmark suite, includes ablations varying bias-injection strength, and shows how global models exhibit client-level fairness degradation even when server-level metrics appear acceptable. These additions will directly support the central claim. revision: yes
Referee: [Contributions (1) and (2)] Library and benchmark description: without reported verification that the partitioning/sampling rules produce statistically heterogeneous client fairness outcomes (as opposed to remaining homogeneous from the perspective of existing single-attribute methods), the utility of the released suite for the stated purpose cannot be assessed.

Authors: The library's partitioning and sampling procedures are explicitly constructed to assign distinct sensitive attributes to different clients (attribute-bias) and conflicting value preferences for the same attribute (value-bias). We agree that the manuscript would benefit from explicit verification. In the revision we will include statistical summaries (e.g., variance and range of per-client fairness gaps) and direct comparisons against standard single-attribute federated partitions to demonstrate that the resulting client distributions are measurably heterogeneous under both attribute-bias and value-bias regimes. revision: yes

Circularity Check

0 steps flagged

No circularity: library and benchmark release with no derived quantities or self-referential reductions

full rationale

The paper introduces FeDa4Fair as a library for generating client-level federated datasets and releases a benchmark suite, with contributions limited to the tooling, the generated datasets, and evaluation functions. No equations, fitted parameters, predictions, or first-principles derivations are present that could reduce to inputs by construction. The central claim rests on the release of the framework itself rather than any self-citation load-bearing step, ansatz, or renaming of known results. This is a standard software contribution paper whose validity is independent of internal fitting loops or author-overlapping uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on domain assumptions about how attribute-bias and value-bias manifest across clients; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Heterogeneous client biases (attribute-bias and value-bias) are realistic and conflicting scenarios that single-attribute fairness methods fail to address.
Invoked in the abstract when describing why current solutions fall short and why the new benchmark is needed.

pith-pipeline@v0.9.0 · 5732 in / 1197 out tokens · 27871 ms · 2026-05-19T08:04:21.541101+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Abay et al

A. Abay et al. Mitigating Bias in Federated Learning. 2020. URL: https://arxiv.org/ abs/2012.02447

work page arXiv 2020
[2]

Fairness in machine learning

Barocas et al. “Fairness in machine learning”. In: NeurIPS tutorial 1.2 (2017)

work page 2017
[3]

D. J. Beutel et al. Flower: A Friendly Federated Learning Research Framework. 2020. URL: https://arxiv.org/abs/2007.14390

work page arXiv 2020
[4]

J. R. Biden. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. 2023

work page 2023
[5]

Benchmarking and survey of explanation methods for black box models

F. Bodria et al. “Benchmarking and survey of explanation methods for black box models”. In: Data Mining and Knowledge Discovery 37.5 (2023), pp. 1719–1778

work page 2023
[6]

Caldas et al

S. Caldas et al. LEAF: A Benchmark for Federated Settings . 2019. arXiv: 1812.01097 [cs.LG]. URL:https://arxiv.org/abs/1812.01097

work page arXiv 2019
[7]

Fairness in Machine Learning: A Survey

S. Caton et al. “Fairness in Machine Learning: A Survey”. In:ACM Comput. Surv. 56.7 (Apr. 2024). ISSN : 0360-0300. DOI: 10.1145/3616865 . URL: https://doi.org/10.1145/ 3616865

work page doi:10.1145/3616865 2024
[8]

Bias propagation in federated learning

H. Chang et al. “Bias propagation in federated learning”. In:ArXiv preprint abs/2309.02160 (2023). URL:https://arxiv.org/abs/2309.02160

work page arXiv 2023
[9]

& Guestrin, C

T. Chen et al. “XGBoost: A Scalable Tree Boosting System”. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. San Francisco, California, USA: Association for Computing Machinery, 2016, 785–794. ISBN : 9781450342322. DOI: 10.1145/2939672.2939785 . URL: https://doi.org/10. 1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016
[10]

Commission

E. Commission. Ethics guidelines for trustworthy AI. Publications Office, 2019. DOI:doi/10. 2759/346720

work page 2019
[11]

Benefits of the Federation? Analyzing the Impact of Fair Federated Learning at the Client Level

L. Corbucci et al. “Benefits of the Federation? Analyzing the Impact of Fair Federated Learning at the Client Level”. In: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency. FAccT ’25. 2025, 2232–2248.DOI:10.1145/3715275.3732152

work page doi:10.1145/3715275.3732152 2025
[12]

PUFFLE: Balancing Privacy, Utility, and Fairness in Federated Learning

L. Corbucci et al. “PUFFLE: Balancing Privacy, Utility, and Fairness in Federated Learning”. In: ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024) . Ed. by U. Endriss et al. V ol. 392. Frontiers in Art...

work page doi:10.3233/faia240671 2024
[13]

The regression analysis of binary sequences

D. R. Cox. “The regression analysis of binary sequences”. In: Journal of the Royal Statistical Society: Series B (Methodological) 20.2 (1958), pp. 215–232

work page 1958
[14]

Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics

K. Crenshaw. “Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics”. In: Feminist legal theories. Routledge, 2013, pp. 23–51

work page 2013
[15]

Retiring Adult: New Datasets for Fair Machine Learning

F. Ding et al. “Retiring Adult: New Datasets for Fair Machine Learning”. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. Ed. by M. Ranzato et al. 2021, pp. 6478–6490. URL: https://proceedings.neurips.cc/paper/2021/ hash/32e54441e6382a7fba...

work page 2021
[16]

Towards predicting client benefit and contribution in federated learning from data imbalance

C. Düsing et al. “Towards predicting client benefit and contribution in federated learning from data imbalance”. In: Proceedings of the 3rd International Workshop on Distributed Machine Learning. 2022, pp. 23–29

work page 2022
[17]

Monitoring fairness in HOLDA

M. Fontana et al. “Monitoring fairness in HOLDA”. In: HHAI2022: Augmenting Human Intellect. IOS Press, 2022, pp. 246–248

work page 2022
[18]

D. M. J. G. et al. Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions . 2024. arXiv: 2411.12377 [cs.LG]. URL: https://arxiv.org/abs/2411.12377

work page arXiv 2024
[19]

Equality of Opportunity in Supervised Learning

M. Hardt et al. “Equality of Opportunity in Supervised Learning”. In: Advances in Neu- ral Information Processing Systems 29: Annual Conference on Neural Information Pro- cessing Systems 2016, December 5-10, 2016, Barcelona, Spain . Ed. by D. D. Lee et al. 2016, pp. 3315–3323. URL: https://proceedings.neurips.cc/paper/2016/hash/ 9d2682367c3935defcb1f9e247...

work page 2016
[20]

group on how AI principles should be implemented

E. group on how AI principles should be implemented. AI Governance in Japan. 2023

work page 2023
[21]

FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research

G. D. M. Jimenez et al. “FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research”. In:IEEE Access (2024)

work page 2024
[22]

Federated learning on non-iid data silos: An experimental study

Q. Li et al. “Federated learning on non-iid data silos: An experimental study”. In: 2022 IEEE 38th international conference on data engineering (ICDE). IEEE. 2022, pp. 965–978

work page 2022
[23]

When Machine Learning Meets Privacy: A Survey and Outlook

B. Liu et al. “When Machine Learning Meets Privacy: A Survey and Outlook”. In: ACM Comput. Surv. 54.2 (Mar. 2021). ISSN : 0360-0300. DOI: 10.1145/3436755 . URL: https: //doi.org/10.1145/3436755

work page doi:10.1145/3436755 2021
[24]

Federated Learning With Non-IID Data: A Survey

Z. Lu et al. “Federated Learning With Non-IID Data: A Survey”. In:IEEE Internet of Things Journal 11.11 (2024), pp. 19188–19209. DOI:10.1109/JIOT.2024.3376548

work page doi:10.1109/jiot.2024.3376548 2024
[25]

Artificial intelligence act

T. Madiega. “Artificial intelligence act”. In:European Parliament: European Parliamentary Research Service (2021)

work page 2021
[26]

Communication-Efficient Learning of Deep Networks from Decentralized Data

B. McMahan et al. “Communication-Efficient Learning of Deep Networks from Decentralized Data”. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA. Ed. by A. Singh et al. V ol. 54. Proceedings of Machine Learning Research. PMLR, 2017, pp. 1273–1282. URL: h...

work page 2017
[27]

A Survey on Bias and Fairness in Machine Learning

N. Mehrabi et al. “A Survey on Bias and Fairness in Machine Learning”. In:ACM Comput. Surv. 54.6 (2021). ISSN : 0360-0300. DOI:10.1145/3457607. URL:https://doi.org/10. 1145/3457607

work page doi:10.1145/3457607 2021
[28]

Minimax Demographic Group Fairness in Federated Learning

A. Papadaki et al. “Minimax Demographic Group Fairness in Federated Learning”. In: FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 2022, pp. 142–159. DOI:10.1145/3531146.3533081. URL:https://doi.org/10.1145/3531146.3533081

work page doi:10.1145/3531146.3533081 2022
[29]

FeLebrities: A User-Centric Assessment of Federated Learning Frameworks

W. Riviera et al. “FeLebrities: A User-Centric Assessment of Federated Learning Frameworks”. In: IEEE Access 11 (2023), pp. 96865–96878. DOI:10.1109/ACCESS.2023.3312579

work page doi:10.1109/access.2023.3312579 2023
[30]

The Chinese approach to artificial intelligence: an analysis of policy, ethics, and regulation

H. Roberts et al. “The Chinese approach to artificial intelligence: an analysis of policy, ethics, and regulation”. In: AI & society (2021)

work page 2021
[31]

Salazar et al

T. Salazar et al. A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research. 2024. URL:https://arxiv.org/abs/2410. 03855

work page 2024
[32]

The Current State and Challenges of Fairness in Federated Learning

S. Vucinich et al. “The Current State and Challenges of Fairness in Federated Learning”. In: IEEE Access 11 (2023), pp. 80903–80914. DOI:10.1109/ACCESS.2023.3295412

work page doi:10.1109/access.2023.3295412 2023
[33]

Salvaging federated learning by local adaptation

T. Yu et al. “Salvaging federated learning by local adaptation”. In: ArXiv preprint abs/2002.04758 (2020). URL:https://arxiv.org/abs/2002.04758. 12 A FeDa4Fair general setup We rely on several parameters as a general setup, which are independent of fairness specifications. Specifically, this implies that FeDa4Fair can also generate data for analyzing stan...

work page arXiv 2002

[1] [1]

Abay et al

A. Abay et al. Mitigating Bias in Federated Learning. 2020. URL: https://arxiv.org/ abs/2012.02447

work page arXiv 2020

[2] [2]

Fairness in machine learning

Barocas et al. “Fairness in machine learning”. In: NeurIPS tutorial 1.2 (2017)

work page 2017

[3] [3]

D. J. Beutel et al. Flower: A Friendly Federated Learning Research Framework. 2020. URL: https://arxiv.org/abs/2007.14390

work page arXiv 2020

[4] [4]

J. R. Biden. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. 2023

work page 2023

[5] [5]

Benchmarking and survey of explanation methods for black box models

F. Bodria et al. “Benchmarking and survey of explanation methods for black box models”. In: Data Mining and Knowledge Discovery 37.5 (2023), pp. 1719–1778

work page 2023

[6] [6]

Caldas et al

S. Caldas et al. LEAF: A Benchmark for Federated Settings . 2019. arXiv: 1812.01097 [cs.LG]. URL:https://arxiv.org/abs/1812.01097

work page arXiv 2019

[7] [7]

Fairness in Machine Learning: A Survey

S. Caton et al. “Fairness in Machine Learning: A Survey”. In:ACM Comput. Surv. 56.7 (Apr. 2024). ISSN : 0360-0300. DOI: 10.1145/3616865 . URL: https://doi.org/10.1145/ 3616865

work page doi:10.1145/3616865 2024

[8] [8]

Bias propagation in federated learning

H. Chang et al. “Bias propagation in federated learning”. In:ArXiv preprint abs/2309.02160 (2023). URL:https://arxiv.org/abs/2309.02160

work page arXiv 2023

[9] [9]

& Guestrin, C

T. Chen et al. “XGBoost: A Scalable Tree Boosting System”. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. San Francisco, California, USA: Association for Computing Machinery, 2016, 785–794. ISBN : 9781450342322. DOI: 10.1145/2939672.2939785 . URL: https://doi.org/10. 1145/2939672.2939785

work page doi:10.1145/2939672.2939785 2016

[10] [10]

Commission

E. Commission. Ethics guidelines for trustworthy AI. Publications Office, 2019. DOI:doi/10. 2759/346720

work page 2019

[11] [11]

Benefits of the Federation? Analyzing the Impact of Fair Federated Learning at the Client Level

L. Corbucci et al. “Benefits of the Federation? Analyzing the Impact of Fair Federated Learning at the Client Level”. In: Proceedings of the ACM Conference on Fairness, Accountability, and Transparency. FAccT ’25. 2025, 2232–2248.DOI:10.1145/3715275.3732152

work page doi:10.1145/3715275.3732152 2025

[12] [12]

PUFFLE: Balancing Privacy, Utility, and Fairness in Federated Learning

L. Corbucci et al. “PUFFLE: Balancing Privacy, Utility, and Fairness in Federated Learning”. In: ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024) . Ed. by U. Endriss et al. V ol. 392. Frontiers in Art...

work page doi:10.3233/faia240671 2024

[13] [13]

The regression analysis of binary sequences

D. R. Cox. “The regression analysis of binary sequences”. In: Journal of the Royal Statistical Society: Series B (Methodological) 20.2 (1958), pp. 215–232

work page 1958

[14] [14]

Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics

K. Crenshaw. “Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics”. In: Feminist legal theories. Routledge, 2013, pp. 23–51

work page 2013

[15] [15]

Retiring Adult: New Datasets for Fair Machine Learning

F. Ding et al. “Retiring Adult: New Datasets for Fair Machine Learning”. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. Ed. by M. Ranzato et al. 2021, pp. 6478–6490. URL: https://proceedings.neurips.cc/paper/2021/ hash/32e54441e6382a7fba...

work page 2021

[16] [16]

Towards predicting client benefit and contribution in federated learning from data imbalance

C. Düsing et al. “Towards predicting client benefit and contribution in federated learning from data imbalance”. In: Proceedings of the 3rd International Workshop on Distributed Machine Learning. 2022, pp. 23–29

work page 2022

[17] [17]

Monitoring fairness in HOLDA

M. Fontana et al. “Monitoring fairness in HOLDA”. In: HHAI2022: Augmenting Human Intellect. IOS Press, 2022, pp. 246–248

work page 2022

[18] [18]

D. M. J. G. et al. Non-IID data in Federated Learning: A Survey with Taxonomy, Metrics, Methods, Frameworks and Future Directions . 2024. arXiv: 2411.12377 [cs.LG]. URL: https://arxiv.org/abs/2411.12377

work page arXiv 2024

[19] [19]

Equality of Opportunity in Supervised Learning

M. Hardt et al. “Equality of Opportunity in Supervised Learning”. In: Advances in Neu- ral Information Processing Systems 29: Annual Conference on Neural Information Pro- cessing Systems 2016, December 5-10, 2016, Barcelona, Spain . Ed. by D. D. Lee et al. 2016, pp. 3315–3323. URL: https://proceedings.neurips.cc/paper/2016/hash/ 9d2682367c3935defcb1f9e247...

work page 2016

[20] [20]

group on how AI principles should be implemented

E. group on how AI principles should be implemented. AI Governance in Japan. 2023

work page 2023

[21] [21]

FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research

G. D. M. Jimenez et al. “FedArtML: A Tool to Facilitate the Generation of Non-IID Datasets in a Controlled Way to Support Federated Learning Research”. In:IEEE Access (2024)

work page 2024

[22] [22]

Federated learning on non-iid data silos: An experimental study

Q. Li et al. “Federated learning on non-iid data silos: An experimental study”. In: 2022 IEEE 38th international conference on data engineering (ICDE). IEEE. 2022, pp. 965–978

work page 2022

[23] [23]

When Machine Learning Meets Privacy: A Survey and Outlook

B. Liu et al. “When Machine Learning Meets Privacy: A Survey and Outlook”. In: ACM Comput. Surv. 54.2 (Mar. 2021). ISSN : 0360-0300. DOI: 10.1145/3436755 . URL: https: //doi.org/10.1145/3436755

work page doi:10.1145/3436755 2021

[24] [24]

Federated Learning With Non-IID Data: A Survey

Z. Lu et al. “Federated Learning With Non-IID Data: A Survey”. In:IEEE Internet of Things Journal 11.11 (2024), pp. 19188–19209. DOI:10.1109/JIOT.2024.3376548

work page doi:10.1109/jiot.2024.3376548 2024

[25] [25]

Artificial intelligence act

T. Madiega. “Artificial intelligence act”. In:European Parliament: European Parliamentary Research Service (2021)

work page 2021

[26] [26]

Communication-Efficient Learning of Deep Networks from Decentralized Data

B. McMahan et al. “Communication-Efficient Learning of Deep Networks from Decentralized Data”. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA. Ed. by A. Singh et al. V ol. 54. Proceedings of Machine Learning Research. PMLR, 2017, pp. 1273–1282. URL: h...

work page 2017

[27] [27]

A Survey on Bias and Fairness in Machine Learning

N. Mehrabi et al. “A Survey on Bias and Fairness in Machine Learning”. In:ACM Comput. Surv. 54.6 (2021). ISSN : 0360-0300. DOI:10.1145/3457607. URL:https://doi.org/10. 1145/3457607

work page doi:10.1145/3457607 2021

[28] [28]

Minimax Demographic Group Fairness in Federated Learning

A. Papadaki et al. “Minimax Demographic Group Fairness in Federated Learning”. In: FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24, 2022. ACM, 2022, pp. 142–159. DOI:10.1145/3531146.3533081. URL:https://doi.org/10.1145/3531146.3533081

work page doi:10.1145/3531146.3533081 2022

[29] [29]

FeLebrities: A User-Centric Assessment of Federated Learning Frameworks

W. Riviera et al. “FeLebrities: A User-Centric Assessment of Federated Learning Frameworks”. In: IEEE Access 11 (2023), pp. 96865–96878. DOI:10.1109/ACCESS.2023.3312579

work page doi:10.1109/access.2023.3312579 2023

[30] [30]

The Chinese approach to artificial intelligence: an analysis of policy, ethics, and regulation

H. Roberts et al. “The Chinese approach to artificial intelligence: an analysis of policy, ethics, and regulation”. In: AI & society (2021)

work page 2021

[31] [31]

Salazar et al

T. Salazar et al. A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research. 2024. URL:https://arxiv.org/abs/2410. 03855

work page 2024

[32] [32]

The Current State and Challenges of Fairness in Federated Learning

S. Vucinich et al. “The Current State and Challenges of Fairness in Federated Learning”. In: IEEE Access 11 (2023), pp. 80903–80914. DOI:10.1109/ACCESS.2023.3295412

work page doi:10.1109/access.2023.3295412 2023

[33] [33]

Salvaging federated learning by local adaptation

T. Yu et al. “Salvaging federated learning by local adaptation”. In: ArXiv preprint abs/2002.04758 (2020). URL:https://arxiv.org/abs/2002.04758. 12 A FeDa4Fair general setup We rely on several parameters as a general setup, which are independent of fairness specifications. Specifically, this implies that FeDa4Fair can also generate data for analyzing stan...

work page arXiv 2002