Quantifying Transparency of Machine Learning Systems through Analysis of Contributions

Alun Preece; Dinesh Verma; Iain Barclay; Ian Taylor

arxiv: 1907.03483 · v1 · pith:PUDVMJCDnew · submitted 2019-07-08 · 💻 cs.LG · cs.SE

Quantifying Transparency of Machine Learning Systems through Analysis of Contributions

Iain Barclay , Alun Preece , Ian Taylor , Dinesh Verma This is my paper

Pith reviewed 2026-05-25 01:13 UTC · model grok-4.3

classification 💻 cs.LG cs.SE

keywords machine learning transparencymodel pipelinescontribution visibilitytrust and validationauditing ML systemsdata asset provenancetransparency metric

0 comments

The pith

A numeric score ranks the transparency of machine learning pipelines by measuring how visible each contribution from data and people is.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to calculate a transparency metric for the pipelines that create machine learning models and data assets. The metric rests on criteria that judge the visibility of contributions from human developers and data sources. Users, auditors, and regulators can apply the resulting scores to decide how readily they can check the origins and suitability of a model when rules shift or data sources come into question. This targets the growing separation between the original creators of a model and the later parties who depend on it over time or through third-party sharing.

Core claim

The paper claims that transparency can be turned into a quantifiable ranking by scoring the visibility of contributions along the process pipeline that produces an ML model or data asset, so that stakeholders can assess their ability to validate and trust the sources and contributors involved.

What carries the argument

The transparency metric, derived from criteria that evaluate the visibility of each contribution to the ML pipeline.

If this is right

Auditors gain a direct way to compare the transparency of different ML models and data assets.
Stakeholders can adjust their reliance on a system when its transparency score falls after a regulatory change or a data-source challenge.
Model marketplaces and shared repositories can display the metric as a standard attribute for each asset.
The same scoring approach extends to ranking the pipelines behind other data-driven assets beyond ML models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested by scoring real models released on public repositories and checking whether higher scores predict fewer validation problems in practice.
Regulators might adopt the metric as one input when setting requirements for high-stakes ML deployments.
The criteria could be refined over time as new types of contributions, such as automated tools or synthetic data, become common.

Load-bearing premise

That judgments on the visibility of contributions can be made consistently and objectively using the proposed criteria, and that higher visibility scores accurately indicate greater ability to validate and trust the contributors and data sources.

What would settle it

Multiple independent auditors apply the criteria to the same collection of pipelines and produce materially different scores, or a pipeline that receives a high score is later found to contain unverifiable or discredited contributors when an actual audit occurs.

Figures

Figures reproduced from arXiv: 1907.03483 by Alun Preece, Dinesh Verma, Iain Barclay, Ian Taylor.

read the original abstract

Increased adoption and deployment of machine learning (ML) models into business, healthcare and other organisational processes, will result in a growing disconnect between the engineers and researchers who developed the models and the model's users and other stakeholders, such as regulators or auditors. This disconnect is inevitable, as models begin to be used over a number of years or are shared among third parties through user communities or via commercial marketplaces, and it will become increasingly difficult for users to maintain ongoing insight into the suitability of the parties who created the model, or the data that was used to train it. This could become problematic, particularly where regulations change and once-acceptable standards become outdated, or where data sources are discredited, perhaps judged to be biased or corrupted, either deliberately or unwittingly. In this paper we present a method for arriving at a quantifiable metric capable of ranking the transparency of the process pipelines used to generate ML models and other data assets, such that users, auditors and other stakeholders can gain confidence that they will be able to validate and trust the data sources and human contributors in the systems that they rely on for their business operations. The methodology for calculating the transparency metric, and the type of criteria that could be used to make judgements on the visibility of contributions to systems are explained and illustrated through an example scenario.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a visibility-based transparency score for ML pipelines but supplies no formulas, consistency checks, or evidence that the scores track actual validation or trust.

read the letter

The main takeaway is that the authors want a numeric way to rank how transparent an ML pipeline is by judging how visible each contribution is, yet they give no working definition or test of whether those judgments hold up or matter downstream. They rightly flag the practical problem that models and data sources drift out of sight over time or across organizations, which can bite when rules change or sources turn out to be biased. That disconnect is worth attention for anyone doing long-term deployment or oversight. What they actually do is outline a high-level method and list some possible visibility criteria, then walk through one example scenario. That is a modest step toward making accountability concrete rather than purely rhetorical. The soft spot is exactly what the stress-test note flags: the metric is built directly on the chosen visibility criteria, with no inter-rater agreement data, no aggregation rule shown, and no check that higher scores let stakeholders actually validate or trust the sources better. The abstract stops at the intent and the illustration; without those missing pieces the score stays a constructed checklist. This is aimed at people working on ML governance, auditing standards, or regulatory compliance rather than core algorithm researchers. A reader already thinking about accountability frameworks could pick up the idea as a prompt, but it does not yet deliver a usable or validated instrument. I would send it for peer review. The topic is relevant and the authors are engaging an honest gap, but the referees will need to see concrete criteria, a clear calculation method, and at least pilot evidence on consistency and predictive value before the claim can be taken as a method rather than a suggestion.

Referee Report

3 major / 2 minor

Summary. The paper claims to introduce a method for computing a quantifiable transparency metric that ranks ML model pipelines (and other data assets) according to the visibility of contributions from data sources and human contributors; the metric is intended to help users, auditors, and regulators assess their ability to validate and trust those contributors, with the approach and example visibility criteria illustrated via a single scenario.

Significance. If the metric were shown to support consistent, objective ratings that correlate with downstream validation capability, it would address a genuine gap in ML governance tooling by supplying an auditable ranking instrument rather than ad-hoc checklists. The manuscript supplies no such evidence, however, leaving the contribution at the level of a descriptive framework.

major comments (3)

[methodology and example scenario] The central claim requires that visibility judgments be made consistently and that the resulting scores correlate with validation/trust outcomes, yet the manuscript provides neither an inter-rater reliability study nor any empirical mapping from metric values to those outcomes (methodology section and example scenario).
[methodology] No formal aggregation function, weighting scheme, or normalization procedure is stated for combining per-contribution visibility scores into the overall transparency metric; the description remains at the level of qualitative criteria.
[example scenario] The single illustrative scenario supplies no quantitative results, baseline comparisons, or sensitivity analysis, so it cannot demonstrate that the metric produces stable rankings or distinguishes pipelines in a way that supports the trust-assessment use case.

minor comments (2)

[methodology] Notation for the visibility criteria and any derived quantities is introduced informally; a compact table or pseudocode definition would improve clarity.
[introduction] The abstract and introduction repeat the high-level motivation without distinguishing the proposed metric from existing transparency or provenance frameworks (e.g., data cards, model cards).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The manuscript proposes a conceptual framework for a transparency metric rather than an empirically validated instrument. We respond to each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [methodology and example scenario] The central claim requires that visibility judgments be made consistently and that the resulting scores correlate with validation/trust outcomes, yet the manuscript provides neither an inter-rater reliability study nor any empirical mapping from metric values to those outcomes (methodology section and example scenario).

Authors: The paper introduces a framework for quantifying transparency via contribution visibility and provides illustrative criteria. We agree that consistency of judgments and correlation with downstream trust outcomes are important but were not empirically tested here; the work is positioned as a definitional proposal rather than a validated measurement tool. We will add an explicit limitations section stating that inter-rater reliability and outcome mapping remain open questions for future empirical studies. revision: yes
Referee: [methodology] No formal aggregation function, weighting scheme, or normalization procedure is stated for combining per-contribution visibility scores into the overall transparency metric; the description remains at the level of qualitative criteria.

Authors: The current description intentionally remains high-level to accommodate different organizational contexts. We accept that a concrete example of aggregation would strengthen the presentation and will add a formal illustrative aggregation function (e.g., a weighted sum over contribution categories with example weights) together with a note on possible normalization approaches in the revised methodology section. revision: yes
Referee: [example scenario] The single illustrative scenario supplies no quantitative results, baseline comparisons, or sensitivity analysis, so it cannot demonstrate that the metric produces stable rankings or distinguishes pipelines in a way that supports the trust-assessment use case.

Authors: The scenario serves only to walk through application of the visibility criteria. We agree it does not constitute empirical validation. We will augment the scenario with hypothetical numerical scores to show how an overall metric value could be derived and how rankings might arise, while clarifying that stability and discriminative power require separate evaluation studies beyond the scope of this paper. revision: partial

Circularity Check

1 steps flagged

Transparency metric is constructed directly from author-chosen visibility criteria by definition

specific steps

self definitional [Abstract]
"we present a method for arriving at a quantifiable metric capable of ranking the transparency of the process pipelines used to generate ML models and other data assets, such that users, auditors and other stakeholders can gain confidence that they will be able to validate and trust the data sources and human contributors in the systems that they rely on for their business operations. The methodology for calculating the transparency metric, and the type of criteria that could be used to make judgements on the visibility of contributions to systems are explained and illustrated through anexample"

The metric is arrived at by applying the visibility criteria; the transparency ranking is therefore equivalent to the visibility score by construction, and the asserted link to validation/trust capability is an untested assumption rather than an independent derivation.

full rationale

The paper defines a quantifiable transparency metric via judgments on visibility of contributions, then claims this ranks pipelines for validation/trust capability. The abstract presents the metric as calculated from those criteria and illustrated by example, with no external benchmarks, inter-rater data, or correlation to downstream outcomes shown. This reduces the central result to a self-defined function of the inputs. No equations, self-citations, or other patterns are load-bearing in the provided text; the circularity is limited to the definitional construction of the metric itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the ability to define and apply visibility criteria consistently. The metric itself is a new constructed quantity without external validation.

free parameters (1)

visibility criteria and weights
The specific factors and any weighting used to compute the transparency score from contributions are not detailed but are required for the metric.

axioms (1)

domain assumption Visibility of contributions to ML pipelines can be assessed objectively and consistently
Invoked to enable a quantifiable and comparable metric across different pipelines.

invented entities (1)

transparency metric no independent evidence
purpose: To rank ML process pipelines according to contribution visibility for trust assessment
Newly introduced quantity defined by the paper's criteria with no independent evidence provided.

pith-pipeline@v0.9.0 · 5762 in / 1402 out tokens · 41382 ms · 2026-05-25T01:13:29.312785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 4 internal anchors

[1]

Defining the Collective Intelligence Supply Chain

I. Barclay, A. Preece, and I. Taylor, “Deﬁning the collective intelligence supply chain,” arXiv preprint arXiv:1809.09444 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

A conceptual architecture for contractual data sharing in a decentralised environment,

I. Barclay, A. Preece, I. Taylor, and D. Verma, “A conceptual architecture for contractual data sharing in a decentralised environment,” in Artiﬁcial Intelligence and Machine Learning for Multi-Domain Operations Ap- plications, vol. 11006. International Society for Optics and Photonics, 2019, p. 110060G

work page 2019
[3]

Accountability in algorithmic decision making,

N. Diakopoulos, “Accountability in algorithmic decision making,” Com- munications of the ACM , vol. 59, no. 2, pp. 56–62, 2016

work page 2016
[4]

Accountability of ai under the law: The role of explanation,

F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S. Gershman, D. O’Brien, S. Schieber, J. Waldo, D. Weinberger, and A. Wood, “Accountability of ai under the law: The role of explanation,” arXiv preprint arXiv:1711.01134, 2017

work page arXiv 2017
[5]

Information accountability,

D. J. Weitzner, H. Abelson, T. Berners-Lee, J. Feigenbaum, J. Hendler, and G. J. Sussman, “Information accountability,” Communications of the ACM, vol. 51, no. 6, pp. 82–87, 2008

work page 2008
[6]

Defending against sybil devices in crowdsourced mapping services,

G. Wang, B. Wang, T. Wang, A. Nika, H. Zheng, and B. Y . Zhao, “Defending against sybil devices in crowdsourced mapping services,” in Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services . ACM, 2016, pp. 179–191

work page 2016
[7]

Towards data poisoning attacks in crowd sensing systems,

C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su, “Towards data poisoning attacks in crowd sensing systems,” in Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing . ACM, 2018, pp. 111–120

work page 2018
[8]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Packaging and sharing machine learning models via the acumos ai open platform,

S. Zhao, M. Talasila, G. Jacobson, C. Borcea, S. A. Aftab, and J. F. Murray, “Packaging and sharing machine learning models via the acumos ai open platform,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) . IEEE, 2018, pp. 841–846

work page 2018
[10]

Caffe model zoo,

Y . Jia and E. Shelhamer, “Caffe model zoo,” UC Berkeley,[Online]. Available: http://caffe. berkeleyvision. org/model zoo. html.[Accessed 23 05 2019], 2015

work page 2019
[11]

Redeﬁning data trans- parency: A multidimensional approach,

E. Bertino, S. Merrill, A. Nesen, and C. Utz, “Redeﬁning data trans- parency: A multidimensional approach,” Computer, vol. 52, no. 1, pp. 16–26, 2019

work page 2019
[12]

Stakeholders in Explainable AI

A. Preece, D. Harborne, D. Braines, R. Tomsett, and S. Chakraborty, “Stakeholders in explainable ai,”arXiv preprint arXiv:1810.00184, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Measuring visibility to improve supply chain performance: a quantitative approach,

M. Caridi, L. Crippa, A. Perego, A. Sianesi, and A. Tumino, “Measuring visibility to improve supply chain performance: a quantitative approach,” Benchmarking: An International Journal , vol. 17, no. 4, pp. 593–615, 2010

work page 2010
[14]

Measuring supply chain visi- bility in the apparel industry,

M. Caridi, A. Perego, and A. Tumino, “Measuring supply chain visi- bility in the apparel industry,” Benchmarking: An International Journal , vol. 20, no. 1, pp. 25–44, 2013

work page 2013
[15]

Visibility of consumer context: improving reverse supply with internet of things data,

G. Parry, S. A. Brax, R. Maull, and I. Ng, “Visibility of consumer context: improving reverse supply with internet of things data,” Supply Chain Management: An International Journal , vol. 21, no. 2, pp. 228– 244, 2016

work page 2016
[16]

Visibility and digital art: Blockchain as an ownership layer on the internet,

M. McConaghy, G. McMullen, G. Parry, T. McConaghy, and D. Holtz- man, “Visibility and digital art: Blockchain as an ownership layer on the internet,” Strategic Change, vol. 26, no. 5, pp. 461–470, 2017

work page 2017
[17]

Improving it incident handling perfor- mance with information visibility,

J. Vlietland and H. van Vliet, “Improving it incident handling perfor- mance with information visibility,” Journal of Software: Evolution and Process, vol. 26, no. 12, pp. 1106–1127, 2014

work page 2014
[18]

Information quality in mashups,

C. Cappiello, F. Daniel, M. Matera, and C. Pautasso, “Information quality in mashups,” IEEE Internet Computing, vol. 14, no. 4, pp. 14–22, 2010

work page 2010
[19]

Model cards for model reporting,

M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchin- son, E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model reporting,” in Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 2019, pp. 220–229

work page 2019
[21]

FactSheets: Increasing Trust in AI Services through Supplier's Declarations of Conformity

[Online]. Available: https://arxiv.org/pdf/1808.07261.pdf

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Veriﬁable credentials data model 1.0,

M. S. Drummond Reed and M. Sabadello, “Veriﬁable credentials data model 1.0,” W3C, Draft Community Group Report, Jun. 2019, w3C Decentralized Identiﬁers (DIDs) v0.13

work page 2019
[23]

A survey on essential components of a self-sovereign identity,

A. M ¨uhle, A. Gr ¨uner, T. Gayvoronskaya, and C. Meinel, “A survey on essential components of a self-sovereign identity,” Computer Science Review, vol. 30, pp. 80–86, 2018. 6

work page 2018

[1] [1]

Defining the Collective Intelligence Supply Chain

I. Barclay, A. Preece, and I. Taylor, “Deﬁning the collective intelligence supply chain,” arXiv preprint arXiv:1809.09444 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

A conceptual architecture for contractual data sharing in a decentralised environment,

I. Barclay, A. Preece, I. Taylor, and D. Verma, “A conceptual architecture for contractual data sharing in a decentralised environment,” in Artiﬁcial Intelligence and Machine Learning for Multi-Domain Operations Ap- plications, vol. 11006. International Society for Optics and Photonics, 2019, p. 110060G

work page 2019

[3] [3]

Accountability in algorithmic decision making,

N. Diakopoulos, “Accountability in algorithmic decision making,” Com- munications of the ACM , vol. 59, no. 2, pp. 56–62, 2016

work page 2016

[4] [4]

Accountability of ai under the law: The role of explanation,

F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S. Gershman, D. O’Brien, S. Schieber, J. Waldo, D. Weinberger, and A. Wood, “Accountability of ai under the law: The role of explanation,” arXiv preprint arXiv:1711.01134, 2017

work page arXiv 2017

[5] [5]

Information accountability,

D. J. Weitzner, H. Abelson, T. Berners-Lee, J. Feigenbaum, J. Hendler, and G. J. Sussman, “Information accountability,” Communications of the ACM, vol. 51, no. 6, pp. 82–87, 2008

work page 2008

[6] [6]

Defending against sybil devices in crowdsourced mapping services,

G. Wang, B. Wang, T. Wang, A. Nika, H. Zheng, and B. Y . Zhao, “Defending against sybil devices in crowdsourced mapping services,” in Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services . ACM, 2016, pp. 179–191

work page 2016

[7] [7]

Towards data poisoning attacks in crowd sensing systems,

C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su, “Towards data poisoning attacks in crowd sensing systems,” in Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing . ACM, 2018, pp. 111–120

work page 2018

[8] [8]

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Packaging and sharing machine learning models via the acumos ai open platform,

S. Zhao, M. Talasila, G. Jacobson, C. Borcea, S. A. Aftab, and J. F. Murray, “Packaging and sharing machine learning models via the acumos ai open platform,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) . IEEE, 2018, pp. 841–846

work page 2018

[10] [10]

Caffe model zoo,

Y . Jia and E. Shelhamer, “Caffe model zoo,” UC Berkeley,[Online]. Available: http://caffe. berkeleyvision. org/model zoo. html.[Accessed 23 05 2019], 2015

work page 2019

[11] [11]

Redeﬁning data trans- parency: A multidimensional approach,

E. Bertino, S. Merrill, A. Nesen, and C. Utz, “Redeﬁning data trans- parency: A multidimensional approach,” Computer, vol. 52, no. 1, pp. 16–26, 2019

work page 2019

[12] [12]

Stakeholders in Explainable AI

A. Preece, D. Harborne, D. Braines, R. Tomsett, and S. Chakraborty, “Stakeholders in explainable ai,”arXiv preprint arXiv:1810.00184, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Measuring visibility to improve supply chain performance: a quantitative approach,

M. Caridi, L. Crippa, A. Perego, A. Sianesi, and A. Tumino, “Measuring visibility to improve supply chain performance: a quantitative approach,” Benchmarking: An International Journal , vol. 17, no. 4, pp. 593–615, 2010

work page 2010

[14] [14]

Measuring supply chain visi- bility in the apparel industry,

M. Caridi, A. Perego, and A. Tumino, “Measuring supply chain visi- bility in the apparel industry,” Benchmarking: An International Journal , vol. 20, no. 1, pp. 25–44, 2013

work page 2013

[15] [15]

Visibility of consumer context: improving reverse supply with internet of things data,

G. Parry, S. A. Brax, R. Maull, and I. Ng, “Visibility of consumer context: improving reverse supply with internet of things data,” Supply Chain Management: An International Journal , vol. 21, no. 2, pp. 228– 244, 2016

work page 2016

[16] [16]

Visibility and digital art: Blockchain as an ownership layer on the internet,

M. McConaghy, G. McMullen, G. Parry, T. McConaghy, and D. Holtz- man, “Visibility and digital art: Blockchain as an ownership layer on the internet,” Strategic Change, vol. 26, no. 5, pp. 461–470, 2017

work page 2017

[17] [17]

Improving it incident handling perfor- mance with information visibility,

J. Vlietland and H. van Vliet, “Improving it incident handling perfor- mance with information visibility,” Journal of Software: Evolution and Process, vol. 26, no. 12, pp. 1106–1127, 2014

work page 2014

[18] [18]

Information quality in mashups,

C. Cappiello, F. Daniel, M. Matera, and C. Pautasso, “Information quality in mashups,” IEEE Internet Computing, vol. 14, no. 4, pp. 14–22, 2010

work page 2010

[19] [19]

Model cards for model reporting,

M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchin- son, E. Spitzer, I. D. Raji, and T. Gebru, “Model cards for model reporting,” in Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 2019, pp. 220–229

work page 2019

[20] [21]

FactSheets: Increasing Trust in AI Services through Supplier's Declarations of Conformity

[Online]. Available: https://arxiv.org/pdf/1808.07261.pdf

work page internal anchor Pith review Pith/arXiv arXiv

[21] [22]

Veriﬁable credentials data model 1.0,

M. S. Drummond Reed and M. Sabadello, “Veriﬁable credentials data model 1.0,” W3C, Draft Community Group Report, Jun. 2019, w3C Decentralized Identiﬁers (DIDs) v0.13

work page 2019

[22] [23]

A survey on essential components of a self-sovereign identity,

A. M ¨uhle, A. Gr ¨uner, T. Gayvoronskaya, and C. Meinel, “A survey on essential components of a self-sovereign identity,” Computer Science Review, vol. 30, pp. 80–86, 2018. 6

work page 2018