pith. sign in

arxiv: 2508.13813 · v2 · submitted 2025-08-19 · 💻 cs.LG · cs.AI

Assessing Trustworthiness of AI Training Dataset using Subjective Logic -- A Use Case on Bias

Pith reviewed 2026-05-18 22:14 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords AI training datasetstrustworthinessSubjective Logicbias assessmentuncertainty quantificationfederated learningclass imbalance
0
0 comments X

The pith

Subjective Logic provides a formal way to assess trustworthiness of entire AI training datasets for emergent properties like bias.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a formal framework based on Subjective Logic for assessing the trustworthiness of AI training datasets with respect to properties like bias that only appear at the whole-dataset level. This matters because prior methods could only check individual data items, leaving dataset-level issues unaddressed in uncertain or distributed settings. The framework quantifies uncertainty from incomplete or conflicting evidence and supports both centralized and federated data scenarios. Experiments confirm it can identify class imbalance in a traffic sign recognition dataset while remaining interpretable.

Core claim

By building on Subjective Logic, the framework supports trust propositions for dataset-wide properties and enables uncertainty-aware evaluations of bias even with partial or conflicting evidence from multiple sources.

What carries the argument

Composition of Subjective Logic opinions to derive trust assessments for dataset-level properties such as bias.

If this is right

  • The method evaluates trustworthiness in federated learning where data remains distributed across sources.
  • It captures class imbalance through uncertainty-aware trust opinions.
  • Results stay interpretable for bias assessments under incomplete evidence.
  • The approach applies in both centralized and federated contexts without requiring full data access.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same composition approach could extend to other dataset-level properties such as overall quality or representativeness.
  • It could integrate with existing bias mitigation steps to support end-to-end checks during model training.
  • Testing on additional datasets would show whether the uncertainty measures scale to larger or more varied collections.

Load-bearing premise

Subjective Logic operators can be combined to measure properties like bias across an entire dataset even if the available evidence is incomplete or comes from conflicting sources.

What would settle it

A controlled test on a dataset with known injected bias where the framework fails to detect the imbalance or to reflect appropriate uncertainty levels would challenge the central claim.

Figures

Figures reproduced from arXiv: 2508.13813 by Frank Kargl, Ioannis Krontiris, Koffi Ismael Ouattara, Theo Dimitrakos.

Figure 1
Figure 1. Figure 1: Density plot of the class probabilities distribution. [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution in scenario 1 0 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 Accuracy of the model whole Test Warning Signs Others Signs (a) Accuracy 0 20 40 60 80 100 0.2 0.4 0.6 Accuracy difference (b) Accuracy difference [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Plot of accuracy on warning signs and others vs number of inblanced sub [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Method 1 and method 2 for 10 and 100 OEMs [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of η on be￾lief, disbelief, and un￾certainty. The red ver￾tical line represents the expected probability, lo￾cated at η = 0.02326. The red vertical line represents the expected probability, located at η = 0.02326. The blue and orange curves depict the evolution of belief and disbelief, respectively, while the green curve represents uncertainty. The shaded region (0.0185 ≤ η ≤ 0.0235) indicates a sta… view at source ↗
read the original abstract

As AI systems increasingly rely on training data, assessing dataset trustworthiness has become critical, particularly for properties like fairness or bias that emerge at the dataset level. Prior work has used Subjective Logic to assess trustworthiness of individual data, but not to evaluate trustworthiness properties that emerge only at the level of the dataset as a whole. This paper introduces the first formal framework for assessing the trustworthiness of AI training datasets, enabling uncertainty-aware evaluations of global properties such as bias. Built on Subjective Logic, our approach supports trust propositions and quantifies uncertainty in scenarios where evidence is incomplete, distributed, and/or conflicting. We instantiate this framework on the trustworthiness property of bias, and we experimentally evaluate it based on a traffic sign recognition dataset. The results demonstrate that our method captures class imbalance and remains interpretable and robust in both centralized and federated contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the first formal framework for assessing trustworthiness of AI training datasets using Subjective Logic, with a focus on emergent global properties such as bias. It supports uncertainty-aware evaluation under incomplete, distributed, or conflicting evidence and instantiates the approach for bias, with experimental evaluation on a traffic sign recognition dataset claiming to demonstrate capture of class imbalance and robustness in both centralized and federated settings.

Significance. If the framework's operator compositions and mappings from dataset statistics to opinions are shown to faithfully reflect emergent distributional properties like bias (rather than producing artifactual uncertainty), the work could provide a useful uncertainty-aware tool for dataset auditing in machine learning, with added relevance to federated learning scenarios.

major comments (2)
  1. [§4] §4 (Bias instantiation): The mapping from per-class statistics or sample features to Subjective Logic opinion components (belief, disbelief, uncertainty, base rates) is not formalized with explicit equations or rules. Without a canonical, property-preserving definition, it is unclear whether fusion of local opinions yields a global bias assessment whose uncertainty correctly encodes dataset-level imbalance rather than an artifact of the chosen mapping.
  2. [§5] §5 (Experiments): The abstract asserts that results on the traffic sign dataset demonstrate capture of class imbalance and robustness, but no equations, specific metrics (e.g., correlation between computed bias opinions and known imbalance ratios), tables, or figures detailing the Subjective Logic outputs are referenced. This detail is load-bearing for verifying the central claim of effectiveness in centralized and federated contexts.
minor comments (2)
  1. Consider expanding the related work section to explicitly contrast the proposed dataset-level framework with prior Subjective Logic applications to individual data points.
  2. Notation for opinion fusion operators should be introduced with a brief reminder of their definitions from Subjective Logic literature to aid readers unfamiliar with the base formalism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the clarity and rigor of our presentation of the Subjective Logic framework for dataset trustworthiness assessment. We address each major comment below and will revise the manuscript accordingly to incorporate explicit formalizations and additional experimental details.

read point-by-point responses
  1. Referee: [§4] §4 (Bias instantiation): The mapping from per-class statistics or sample features to Subjective Logic opinion components (belief, disbelief, uncertainty, base rates) is not formalized with explicit equations or rules. Without a canonical, property-preserving definition, it is unclear whether fusion of local opinions yields a global bias assessment whose uncertainty correctly encodes dataset-level imbalance rather than an artifact of the chosen mapping.

    Authors: We agree that explicit equations are needed to formalize the mapping and demonstrate that it preserves the intended properties under fusion. Section 4 describes the bias instantiation conceptually, mapping per-class sample statistics to opinion components, but we will revise the manuscript to include precise mathematical definitions. These will specify how class frequencies and imbalance measures determine the belief mass, disbelief, uncertainty, and base rate, along with a brief analysis showing that the fusion operators yield uncertainty values that scale with dataset-level imbalance rather than mapping artifacts. revision: yes

  2. Referee: [§5] §5 (Experiments): The abstract asserts that results on the traffic sign dataset demonstrate capture of class imbalance and robustness, but no equations, specific metrics (e.g., correlation between computed bias opinions and known imbalance ratios), tables, or figures detailing the Subjective Logic outputs are referenced. This detail is load-bearing for verifying the central claim of effectiveness in centralized and federated contexts.

    Authors: We acknowledge that the experimental section would benefit from more quantitative support for the claims. The current evaluation on the traffic sign recognition dataset illustrates the framework's behavior under varying imbalance in both centralized and federated settings, but we will expand Section 5 to include explicit metrics such as Pearson correlation between the computed bias opinions and known imbalance ratios. We will also add tables reporting the opinion components (belief, disbelief, uncertainty) across imbalance levels and figures visualizing the outputs, enabling direct verification of imbalance capture and robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: framework extends existing Subjective Logic to dataset-level properties via new composition rules without reducing to fitted inputs or self-citations.

full rationale

The paper presents a new formal framework that applies Subjective Logic operators to aggregate per-sample or per-class opinions into global dataset properties such as bias. The abstract and described approach treat Subjective Logic as an external foundation, with the novelty lying in the instantiation for emergent properties and experimental validation on a traffic-sign dataset. No equations or steps are shown to define bias in terms of the output opinion itself, nor does the central claim rely on a self-citation chain that would make the result tautological. The derivation remains self-contained because the mapping from evidence to opinions and the fusion rules are presented as constructed on top of the established Subjective Logic formalism rather than derived from the target bias measure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework is described at a high level without detailing any fitted quantities or new postulates.

pith-pipeline@v0.9.0 · 5683 in / 1054 out tokens · 26153 ms · 2026-05-18T22:14:09.545200+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

  1. [1]

    Online,http:// benchmark.ini.rub.de/

    German traffic sign recognition benchmark (gtsrb) dataset. Online,http:// benchmark.ini.rub.de/

  2. [2]

    Creating Trust in Connected and Automated Vehicles (Trust4CAV). Tech. rep., 5G Automotive Association (5GAA) (2024),https://5gaa.org/news/ creating-trust-in-connected-and-automated-vehicles/

  3. [3]

    MIT Press (2023)

    Barocas,S.,Hardt,M.,Narayanan,A.:FairnessandMachineLearning:Limitations and Opportunities. MIT Press (2023)

  4. [4]

    Journal of artificial intelligence research16, 321–357 (2002) 16 K

    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of artificial intelligence research16, 321–357 (2002) 16 K. I. Ouattara et al

  5. [5]

    In: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

    Cherepanova, V., Reich, S., Dooley, S., Souri, H., Dickerson, J.P.: A Deep Dive into Dataset Imbalance and Bias in Face Identification. In: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. pp. 229–247. ACM (2023)

  6. [6]

    Communications of the ACM63(5), 82–89 (2020).https://doi.org/ 10.1145/3376898

    Chouldechova, A., Roth, A.: A Snapshot of the Frontiers of Fairness in Machine Learning. Communications of the ACM63(5), 82–89 (2020).https://doi.org/ 10.1145/3376898

  7. [7]

    Datasheets for datasets.Communications of the ACM, 64(12):86–92, 2021

    Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J.W., Wallach, H., Daumé III, H., Crawford, K.: Datasheets for datasets. Communications of the ACM64(12), 62–71 (2021).https://doi.org/10.1145/3458723

  8. [8]

    Hanneke, S.: The Optimal Sample Complexity of PAC Learning (2016),https: //arxiv.org/abs/1507.00473

  9. [9]

    Information and Computation115(2), 248–292 (1994)

    Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting{0, 1}-functions on ran- domly drawn points. Information and Computation115(2), 248–292 (1994)

  10. [10]

    In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing

    Herd, B., Burton, S.: Can you trust your ML metrics? Using subjective logic to determine the true contribution of ML metrics for safety. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. pp. 1579–1586 (2024)

  11. [11]

    International Organization for Standardization: ISO/IEC 22989:2022 - Information technology — Artificial intelligence — Artificial intelligence concepts and termi- nology.https://www.iso.org/standard/74296.html(2022)

  12. [12]

    Springer Publishing Company, Incorporated, 1st edn

    Jøsang, A.: Subjective Logic: A Formalism for Reasoning Under Uncertainty. Springer Publishing Company, Incorporated, 1st edn. (2016)

  13. [13]

    In: 2008 Second International Conference on Emerging Security Information, Systems and Technologies

    Jøsang, A., Bhuiyan, T.: Optimal trust network analysis with subjective logic. In: 2008 Second International Conference on Emerging Security Information, Systems and Technologies. pp. 179–184 (2008)

  14. [14]

    Electronics13(23) (2024)

    Kim, D., Woo, H., Lee, Y.: Addressing Bias and Fairness Using Fair Federated Learning: A Synthetic Review. Electronics13(23) (2024)

  15. [15]

    In: Proceedings of the 2023 ACM Con- ference on Fairness, Accountability, and Transparency

    Meyer, A.P., Albarghouthi, A., D’Antoni, L.: The dataset multiplicity problem: How unreliable data impacts predictions. In: Proceedings of the 2023 ACM Con- ference on Fairness, Accountability, and Transparency. pp. 193–204 (2023)

  16. [16]

    In: Proceedings of the 28th International Conference on Information Fusion (FUSION) (2025), accepted at Fusion 2025

    Ouattara, K.I., Krontiris, I., Dimitrakos, T., Kargl, F.: Quantifying Calibration Error in Neural Networks through Evidence-Based Theory. In: Proceedings of the 28th International Conference on Information Fusion (FUSION) (2025), accepted at Fusion 2025

  17. [17]

    In: 2024 27th International Conference on Information Fusion (FUSION)

    Ouattara, K.I., Petrovska, A., Hermann, A., Trkulja, N., Dimitrakos, T., Kargl, F.: On subjective logic trust discount for referral paths. In: 2024 27th International Conference on Information Fusion (FUSION). pp. 1–8 (2024)

  18. [18]

    In: Inter- national Conference on Discovery Science

    Roy, A., Iosifidis, V., Ntoutsi, E.: Multi-fairness Under Class-Imbalance. In: Inter- national Conference on Discovery Science. pp. 511–526. Springer (2022)

  19. [19]

    arXiv preprint arXiv:2404.19725 (2024)

    Roy, S., Sharma, H., Salekin, A.: Fairness Without Demographics in Human- Centered Federated Learning. arXiv preprint arXiv:2404.19725 (2024)

  20. [20]

    In: World Conference on Explainable Artificial Intelligence

    Rutinowski, J., Klüttermann, S., Endendyk, J., Reining, C., Müller, E.: Bench- marking Trust: A Metric for Trustworthy Machine Learning. In: World Conference on Explainable Artificial Intelligence. pp. 287–307. Springer (2024)

  21. [21]

    NPJ Digital Medicine7(1), 203 (2024)

    Schwabe, D., Becker, K., Seyferth, M., Klaß, A., Schaeffter, T.: The METRIC- framework for assessing data quality for trustworthy AI in medicine: a systematic review. NPJ Digital Medicine7(1), 203 (2024)

  22. [22]

    arXiv preprint arXiv:2502.12225 (Feb 2025),https://arxiv.org/abs/2502.12225

    Vasilakes, J., Zerva, C., Ananiadou, S.: Subjective Logic Encodings. arXiv preprint arXiv:2502.12225 (Feb 2025),https://arxiv.org/abs/2502.12225

  23. [23]

    Frontiers in Artificial Intelligence3, 54 (2020)

    Wang, M., Fu, Y., Hoi, S.C.H.: There is hope after all: Quantifying opinion and trustworthiness in neural networks. Frontiers in Artificial Intelligence3, 54 (2020)