pith. sign in

arxiv: 2401.13850 · v2 · submitted 2024-01-24 · 💻 cs.CY · cs.HC

PADTHAI-MM: Principles-based Approach for Designing Trustworthy, Human-centered AI using MAST Methodology

Pith reviewed 2026-05-24 05:04 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords trustworthy AIhuman-centered designMAST methodologyintelligence reportingdesign frameworkAI explanationsempirical assessment
0
0 comments X

The pith

PADTHAI-MM uses iterative MAST evaluations to design context-specific trustworthy AI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PADTHAI-MM as an iterative design framework built on the Multisource AI Scorecard Table to create human-centered AI. It applies the framework to develop the READIT platform for intelligence reporting tasks, producing one version with contextual information and explanations and another resembling a black-box system. Stakeholder ratings show the high-MAST version aligns better with trust factors of process, purpose, and performance, supporting the framework as a practical method for high-stakes domains.

Core claim

Expanding on MAST, the PADTHAI-MM framework guides iterative development of AI systems through stakeholder feedback and contemporary architectures, as shown when the High-MAST READIT version incorporating AI contextual information and explanations receives superior evaluations compared to the Low-MAST version in an intelligence reporting task.

What carries the argument

PADTHAI-MM, the iterative design framework that applies MAST scorecard evaluations and stakeholder input to incorporate explanations and context into AI-enabled decision support systems.

Load-bearing premise

Differences in stakeholder MAST ratings between the High-MAST and Low-MAST READIT versions can be attributed to the PADTHAI-MM design process rather than other unmeasured factors in the task or participant pool.

What would settle it

A study that finds no difference in trust ratings or MAST scores between systems designed with and without the PADTHAI-MM process in the same intelligence reporting task would falsify the central claim.

Figures

Figures reproduced from arXiv: 2401.13850 by Anna Pan, Erik Blasch, Erin K. Chiou, James Sung, Michelle V. Mancenido, Myke C. Cohen, Nayoung Kim, Pouria Salehi, Shawaiz Bhatti, Yang Ba.

Figure 1
Figure 1. Figure 1: PADTHAI-MM Design Framework 3 PADTHAI-MM Framework The Principles-based Approach for Designing Trustworthy, Human-centered AI using MAST Methodology (PADTHAI-MM) is a design framework that integrates system de￾velopers’ AI and data knowledge with trust scholarship to develop trustworthy AI-DSSs for intelligence use cases, following AI trustworthiness principles outlined in the MAST criteria (Blasch et al.,… view at source ↗
Figure 1
Figure 1. Figure 1: Features may address multiple candidate processes or multiple MAST criteria; thus, design teams should map how feature concepts address each of its target candidate processes and assign MAST rating estimates. Note that feature concepts generated at this step, including their interoperability, will be further refined in subsequent iterations before their inclusion in system-level configurations. Step 4: Gen… view at source ↗
Figure 2
Figure 2. Figure 2: Main page of High-MAST READIT 1.0. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Tools window of High-MAST READIT 1.0. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Options windows of High-MAST READIT 1.0 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Alternative Summary needed alert in High-MAST READIT 1.0. Highlighted [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example search result of Low-MAST READIT 1.0, demonstrating the same [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Evaluation Study Procedure for READIT 1.0. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: PCA loadings for three dimensions in trust theory [PITH_FULL_IMAGE:figures/full_fig_p042_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Principal Component Regression results 43 [PITH_FULL_IMAGE:figures/full_fig_p043_11.png] view at source ↗
read the original abstract

Despite an extensive body of literature on trust in technology, designing trustworthy AI systems for high-stakes decision domains remains a significant challenge, further compounded by the lack of actionable design and evaluation tools. The Multisource AI Scorecard Table (MAST) was designed to bridge this gap by offering a systematic, tradecraft-centered approach to evaluating AI-enabled decision support systems. Expanding on MAST, we introduce an iterative design framework called \textit{Principles-based Approach for Designing Trustworthy, Human-centered AI using MAST Methodology} (PADTHAI-MM). We demonstrate this framework in our development of the Reporting Assistant for Defense and Intelligence Tasks (READIT), a research platform that leverages data visualizations and natural language processing-based text analysis, emulating an AI-enabled system supporting intelligence reporting work. To empirically assess the efficacy of MAST on trust in AI, we developed two distinct iterations of READIT for comparison: a High-MAST version, which incorporates AI contextual information and explanations, and a Low-MAST version, akin to a ``black box'' system. This iterative design process, guided by stakeholder feedback and contemporary AI architectures, culminated in a prototype that was evaluated through its use in an intelligence reporting task. We further discuss the potential benefits of employing the MAST-inspired design framework to address context-specific needs. We also explore the relationship between stakeholder evaluators' MAST ratings and three categories of information known to impact trust: \textit{process}, \textit{purpose}, and \textit{performance}. Overall, our study supports the practical benefits and theoretical validity for PADTHAI-MM as a viable method for designing trustable, context-specific AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PADTHAI-MM, an iterative design framework that extends the Multisource AI Scorecard Table (MAST) to guide the creation of trustworthy, human-centered AI systems in high-stakes domains. It demonstrates the framework via the READIT prototype for intelligence reporting, which incorporates data visualizations and NLP-based analysis. Two versions are developed and compared: a High-MAST iteration with contextual information and explanations versus a Low-MAST black-box version. The prototype is evaluated in an intelligence reporting task, with discussion of how stakeholder MAST ratings relate to process, purpose, and performance information. The central claim is that the study supports the practical benefits and theoretical validity of PADTHAI-MM for designing context-specific trustworthy AI.

Significance. If the empirical comparison were supported by adequate controls, sample details, and statistical evidence, PADTHAI-MM could supply a much-needed actionable methodology for translating trust research into concrete AI design choices in domains such as defense intelligence. The framework's grounding in MAST and its iterative, stakeholder-informed process would represent a concrete contribution to bridging abstract trust principles with deployable systems.

major comments (2)
  1. [Empirical assessment section] Empirical assessment section (abstract paragraph on evaluation and the corresponding methods/results section): the claim that observed differences in stakeholder MAST ratings between High-MAST and Low-MAST READIT versions demonstrate the efficacy of the PADTHAI-MM process requires evidence that the versions differed only in the intended design factors. No information is supplied on participant assignment, sample size, task standardization, blinding, or statistical controls, so the attribution of rating differences to the framework cannot be isolated from confounds in the participant pool or intelligence task.
  2. [Discussion section] Discussion of MAST ratings and process/purpose/performance categories: the manuscript states that it explores the relationship between evaluators' MAST ratings and these three information categories, yet reports neither raw rating data, correlation coefficients, nor any quantitative analysis. This leaves the asserted link to theoretical validity unsupported and prevents assessment of whether the ratings actually track the claimed trust factors.
minor comments (1)
  1. [Introduction] The abstract and introduction use the term 'MAST ratings' without first defining the precise scoring procedure or scale employed in the stakeholder evaluation; a brief operational definition would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting areas where the empirical claims require stronger support. We address each major comment below and commit to revisions that add necessary methodological details and quantitative elements without overstating the original study design.

read point-by-point responses
  1. Referee: [Empirical assessment section] Empirical assessment section (abstract paragraph on evaluation and the corresponding methods/results section): the claim that observed differences in stakeholder MAST ratings between High-MAST and Low-MAST READIT versions demonstrate the efficacy of the PADTHAI-MM process requires evidence that the versions differed only in the intended design factors. No information is supplied on participant assignment, sample size, task standardization, blinding, or statistical controls, so the attribution of rating differences to the framework cannot be isolated from confounds in the participant pool or intelligence task.

    Authors: We agree that the current manuscript lacks sufficient detail on the evaluation protocol to isolate the effects of the PADTHAI-MM design choices. The study was conducted as an initial prototype demonstration with stakeholder evaluators rather than a fully randomized controlled trial. In the revised manuscript we will add a dedicated methods subsection that reports all available information on participant recruitment and assignment, exact sample size, task standardization procedures, any blinding or randomization employed, and the statistical approach (or its absence). Where controls were not implemented we will explicitly note this as a limitation and temper the language around causal attribution of rating differences to the framework. revision: yes

  2. Referee: [Discussion section] Discussion of MAST ratings and process/purpose/performance categories: the manuscript states that it explores the relationship between evaluators' MAST ratings and these three information categories, yet reports neither raw rating data, correlation coefficients, nor any quantitative analysis. This leaves the asserted link to theoretical validity unsupported and prevents assessment of whether the ratings actually track the claimed trust factors.

    Authors: The original discussion of the relationship was primarily qualitative, drawing on evaluator comments and observed rating patterns. To strengthen the claim of theoretical validity we will include the raw MAST rating data (anonymized) in an appendix, compute and report Pearson or Spearman correlations between overall MAST scores and the process/purpose/performance sub-ratings where the data permit, and add a short quantitative subsection. If the original data collection did not capture the three categories at a granularity allowing correlation analysis, we will state this limitation and present only the descriptive patterns that were observed. revision: partial

Circularity Check

0 steps flagged

No significant circularity; framework expands on external MAST literature with independent empirical demonstration

full rationale

The paper presents PADTHAI-MM as an iterative design framework expanding on the prior MAST methodology from the literature. The central claim of practical benefits and theoretical validity rests on the development of READIT prototypes (High-MAST vs Low-MAST) and their evaluation in an intelligence reporting task, including stakeholder ratings on process/purpose/performance. No equations, parameter fits, or self-citation chains are present that reduce any prediction or result to the inputs by construction. The derivation chain is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from the authors' own prior work in a load-bearing way.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that MAST scores validly capture trust-relevant information and that iterative stakeholder feedback produces measurable improvements; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Trust in AI systems is influenced by information about process, purpose, and performance
    The abstract states the authors explore the relationship between MAST ratings and these three categories.
invented entities (2)
  • PADTHAI-MM framework no independent evidence
    purpose: Iterative design process for trustworthy AI
    Newly named and described in the paper; no independent evidence outside the described study.
  • READIT prototype no independent evidence
    purpose: AI-enabled intelligence reporting assistant
    Developed as demonstration vehicle; no external validation data provided.

pith-pipeline@v0.9.0 · 5873 in / 1329 out tokens · 26095 ms · 2026-05-24T05:04:30.570714+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

103 extracted references · 103 canonical work pages · 1 internal anchor

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

  2. [2]

    & Williams, L

    Abdi, H. & Williams, L. J. (2010). Principal component analysis. WIREs Computational Statistics , 2(4), 433--459

  3. [3]

    R., Bakdash, J

    Alufaisan, Y., Marusich, L. R., Bakdash, J. Z., Zhou, Y., & Kantarcioglu, M. (2021). Does explainable artificial intelligence improve human decision-making? Proceedings of the AAAI C onference on A rtificial I ntelligence , 35(8), 6618--6626

  4. [4]

    V., Chiou, E

    Ba, Y., Mancenido, M. V., Chiou, E. K., & Pan, R. (2024). Data quality in crowdsourcing and spamming behavior detection. arXiv preprint arXiv:2404.17582

  5. [5]

    & Holtzblatt, K

    Beyer, H. & Holtzblatt, K. (1999). Contextual design. Interactions , (January + February), 32–43

  6. [6]

    D., Aved, A., & Ardiles-Cruz , E

    Blasch, E., Bastian, N. D., Aved, A., & Ardiles-Cruz , E. (2023). Human-machine cooperative AI decision-making with heterogeneous data. In Signal Processing , Sensor / Information Fusion , and Target Recognition XXXII , volume 12547 (pp.\ 162--171).: SPIE

  7. [7]

    Blasch, E., Shen, D., Chen, G., & Sung, J. (2021a). Multisource ai scorecard table analysis of amigo. In Sensors and Systems for Space Applications XIV , volume 11755 (pp.\ 13--23).: SPIE

  8. [8]

    Blasch, E., Sung, J., & Nguyen, T. (2021b). Multisource ai scorecard table for system evaluation. eprint arXiv:2102.03985 . Presented at AAAI FSS-20: Artificial Intelligence in Government and Public Sector, Washington, DC, USA

  9. [9]

    Bolton, M. L. (2024). Trust is Not a Virtue : Why We Should Not Trust Trust . Ergonomics in Design: The Quarterly of Human Factors Applications , 32(4), 4--11

  10. [10]

    Booher, H. R., Ed. (2003). Handbook of Human Systems Integration . Wiley Series in Systems Engineering and Management. Hoboken, N.J: Wiley-Interscience

  11. [11]

    Quick and Dirty

    Brooke, J. (2020). SUS: A "Quick and Dirty" Usability Scale . In Usability Evaluation In Industry (pp.\ 207--212). London, UK: CRC Press

  12. [12]

    Castelvecchi, D. (2016). Can we open the black box of AI ? Nature News , 538(7623), 20

  13. [13]

    E., Yemini, M., Goldsmith, A

    Cavorsi, M., Akg \"u n, O. E., Yemini, M., Goldsmith, A. J., & Gil, S. (2023). Exploiting Trust for Resilient Hypothesis Testing with Malicious Robots . In 2023 IEEE International Conference on Robotics and Automation ( ICRA ) (pp.\ 7663--7669)

  14. [14]

    Cech, F. (2021). The agency of the forum: Mechanisms for algorithmic accountability through the lens of agency. Journal of Responsible Technology , 7, 100015

  15. [15]

    T., Bliss, J

    Chancey, E. T., Bliss, J. P., Yamani, Y., & Handley, H. A. (2017). Trust and the compliance--reliance paradigm: The effects of risk, error bias, and reliability on trust and dependence. Human Factors , 59(3), 333--345

  16. [16]

    Cheng, M., Nazarian, S., & Bogdan, P. (2020). There Is Hope After All : Quantifying Opinion and Trustworthiness in Neural Networks . Frontiers in Artificial Intelligence , 3

  17. [17]

    Cheng, M., Sun, T., Nazarian, S., & Bogdan, P. (2022). Trustworthiness evaluation and trust-aware design of CNN architectures. In Conference on Lifelong Learning Agents (pp.\ 1086--1102).: PMLR

  18. [18]

    Cheng, M., Yin, C., Zhang, J., Nazarian, S., Deshmukh, J., & Bogdan, P. (2021). A General Trust Framework for Multi-Agent Systems . In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems , AAMAS '21 (pp.\ 332--340). Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems

  19. [19]

    Chiou, E. K. & Lee, J. D. (2023). Trusting automation: Designing for responsivity and resilience. Human Factors , 65(1), 137--165

  20. [20]

    K., Salehi, P., Blasch, E., Sung, J., Cohen, M

    Chiou, E. K., Salehi, P., Blasch, E., Sung, J., Cohen, M. C., Pan, A., Mancenido, M., Mosallanezhad, A., Ba, Y., & Bhatti, S. (2022). Trust in ai-enabled decision support systems: P reliminary validation of mast criteria. 2022 IEEE 3rd International Conference on Human-Machine Systems (ICHMS) , (pp.\ 1--1)

  21. [21]

    C., Mancenido, M

    Cohen, M. C., Mancenido, M. V., Grimm, K. J., & Chiou, E. K. (2024). Multi- Measure Trust Calibration in Expert Interactions with AI-Enabled Decision Support Systems : A Multiple Cause , Multiple ( Behavioral ) Indicator Model . In ASPIRE 2024: 68th International Annual Meeting of the Human Factors and Ergonomics Society Phoenix, AZ, USA

  22. [22]

    Coiera, E. (2015). Technology, cognition and error. BMJ Quality & Safety , 24(7), 417--422

  23. [23]

    Cummings, M. L. (2015). Automation Bias in Intelligent Time Critical Decision Support Systems . In Decision Making in Aviation . Routledge

  24. [24]

    J., Momen, A., Walliser, J., Kohn, S., Shaw, T., & Tossell, C

    de Visser , E. J., Momen, A., Walliser, J., Kohn, S., Shaw, T., & Tossell, C. (2023). Mutually Adaptive Trust Calibration in Human-AI Teams ( Short Paper ). In P. K. Murukannaiah & T. Hirzle (Eds.), Proceedings of the Workshops at the Second International Conference on Hybrid Human-Artificial Intelligence , volume 3456 of CEUR Workshop Proceedings (pp.\ 1...

  25. [25]

    J., Pak, R., & Shaw, T

    de Visser , E. J., Pak, R., & Shaw, T. H. (2018). From `automation' to `autonomy': The importance of trust repair in human--machine interaction. Ergonomics , 61(10), 1409--1427

  26. [26]

    J., Peeters, M

    de Visser, E. J., Peeters, M. M., Jung, M. F., Kohn, S., Shaw, T. H., Pak, R., & Neerincx, M. A. (2020). Towards a theory of longitudinal trust calibration in human--robot teams. International Journal of Social Robotics , 12(2), 459--478

  27. [27]

    H., Yildirim, N., Chang, M., Eslami, M., Holstein, K., & Madaio, M

    Deng, W. H., Yildirim, N., Chang, M., Eslami, M., Holstein, K., & Madaio, M. (2023). Investigating practices and opportunities for cross-functional collaboration around ai fairness in industry practice. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp.\ 705--716)

  28. [28]

    J., Simmons, J

    Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General , 144, 114--126

  29. [29]

    Dur \'a n, J. M. & Jongsma, K. R. (2021). Who is afraid of black box algorithms? on the epistemological and ethical basis of trust in medical ai. Journal of Medical Ethics , 47(5), 329--335

  30. [30]

    Ferreira, F. (2013). Measuring trade-offs among criteria in a balanced scorecard framework: Possible contributions from the multiple criteria decision analysis research field. Journal of Business Economics and Management , 14

  31. [31]

    Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology , 47(6), 381–391

  32. [32]

    People + AI Guidebook

    Google PAIR (2021). People + AI Guidebook

  33. [33]

    & Oertelt-Prigione, S

    G \"o ttgens, I. & Oertelt-Prigione, S. (2021). The application of human-centered design approaches in health research and innovation: A narrative review of current practices. JMIR mHealth and uHealth , 9(12), e28102

  34. [34]

    Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Comput. Surv. , 51(5)

  35. [35]

    & Aha, D

    Gunning, D. & Aha, D. (2019). Darpa’s explainable artificial intelligence (xai) program. AI Magazine , 40(2), 44--58

  36. [36]

    Gupta, S., Bagga, S., & Sharma, D. K. (2020). Intelligent Data Analysis : Black Box Versus White Box Modeling . In Intelligent Data Analysis chapter 1, (pp.\ 1--15). John Wiley & Sons, Ltd

  37. [37]

    Hagendorff, T. (2020). The ethics of ai ethics: An evaluation of guidelines. Minds & Machines , 30, 99--120

  38. [38]

    A., Kessler, T

    Hancock, P. A., Kessler, T. T., Kaplan, A. D., Stowers, K., Brill, J. C., Billings, D. R., Schaefer, K. E., & Szalma, J. L. (2023). How and why humans trust: A meta-analysis and elaborated model. Frontiers in Psychology , 14

  39. [39]

    Hassija, V., Chamola, V., Mahapatra, A., Singal, A., Goel, D., Huang, K., Scardapane, S., Spinelli, I., Mahmud, M., & Hussain, A. (2024). Interpreting Black-Box Models : A Review on Explainable Artificial Intelligence . Cognitive Computation , 16(1), 45--74

  40. [40]

    Hauser, J. R. & Clausing, D. (1988). The house of qualify. Harvard Business Review , 66(3)

  41. [41]

    Henderson, S., Hyde, G., Grover, S., & Furnham, A. (2021). Risk- Taking in Professional Groups . Psychology , 12(7), 1127--1140

  42. [42]

    Iandolo, F., La Sala, A., Turriziani, L., & Caputo, F. (2024). Stakeholder engagement in managing systemic risk management. Business Ethics, the Environment & Responsibility , (pp.\ beer.12694)

  43. [43]

    IEEE VAST Challenge 2011, Mini Challenge 3 (MC3)

    IEEE SEMVAST Project (2011). IEEE VAST Challenge 2011, Mini Challenge 3 (MC3) . Retrieved May 15, 2023, from https://www.vgtc.org/activities/vastcontest2011/

  44. [44]

    Insaurralde, C. C. & Blasch, E. (2021). Trust Evaluation of Ontological Decision Support Systems for Avionics Analytics . In 2021 Integrated Communications Navigation and Surveillance Conference ( ICNS ) (pp.\ 1--10)

  45. [45]

    M., & Drury, C

    Jian, J.-Y., Bisantz, A. M., & Drury, C. G. (2000). Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics , 4(1), 53--71

  46. [46]

    K., Rooks, G., Snijders, C., & Willemsen, M

    Kahr, P. K., Rooks, G., Snijders, C., & Willemsen, M. C. (2024). The Trust Recovery Journey . The Effect of Timing of Errors on the Willingness to Follow AI Advice . In Proceedings of the 29th International Conference on Intelligent User Interfaces (pp.\ 609--622). Greenville SC USA: ACM

  47. [47]

    Kim, N. (2023). Nayoungkim94/ PADTHAI-MM

  48. [48]

    G., Šimić, I., Sabol, V., Trügler, A., Veas, E., Kern, R., Nad, T., & Kopeinik, S

    Kowald, D., Scher, S., Pammer-Schindler, V., Müllner, P., Waxnegger, K., Demelius, L., Fessl, A., Toller, M., Mendoza Estrada, I. G., Šimić, I., Sabol, V., Trügler, A., Veas, E., Kern, R., Nad, T., & Kopeinik, S. (2024). Establishing and evaluating trustworthy ai: overview and research challenges. Frontiers in Big Data , 7

  49. [49]

    Kurke, M. I. (1961). Operational Sequence Diagrams in System Design . Human Factors: The Journal of the Human Factors and Ergonomics Society , 3(1), 66--73

  50. [50]

    & Tan, C

    Lai, V. & Tan, C. (2019). On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness , Accountability , and Transparency , FAT * '19 (pp.\ 29--38). New York, NY, USA : Association for Computing Machinery

  51. [51]

    Lee, J. D. & Moray, N. (1994). Trust, self-confidence, and operators' adaptation to automation. International Journal of Human-Computer Studies , 40(1), 153--184

  52. [52]

    Lee, J. D. & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors , 46(1), 50--80

  53. [53]

    London, A. J. (2019). Artificial intelligence and black-box medical decisions: Accuracy versus explainability. Hastings Center Report , 49(1), 15--21

  54. [54]

    Loyola-González, O. (2019). Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access , 7, 154096--154113

  55. [55]

    Lundberg, S. M. & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in N eural I nformation P rocessing S ystems , 30, 4668--4777

  56. [56]

    & Coiera, E

    Lyell, D. & Coiera, E. (2017). Automation bias and verification complexity: A systematic review. Journal of the American Medical Informatics Association , 24(2), 423--431

  57. [57]

    Madras, D., Pitassi, T., & Zemel, R. (2018). Predict responsibly: Improving fairness and accuracy by learning to defer. arXiv:1711.06664 [cs, stat]

  58. [58]

    Malle, B. F. & Ullman, D. (2021). Chapter 1 - A multidimensional conception and measure of human-robot trust. In C. S. Nam & J. B. Lyons (Eds.), Trust in Human-Robot Interaction (pp.\ 3--25). Academic Press

  59. [59]

    Matias, A. C. (2001). Work Measurement : Principles and Techniques . In Handbook of Industrial Engineering chapter 54, (pp.\ 1409--1462). John Wiley & Sons, Ltd

  60. [60]

    C., Davis, J

    Mayer, R. C., Davis, J. H., & Schoorman, F. D. (1995). An integrative model of organizational trust. Academy of management review , 20(3), 709--734

  61. [61]

    McGuirl, J. M. & Sarter, N. B. (2006). Supporting Trust Calibration and the Effective Use of Decision Aids by Presenting Dynamic System Confidence Information . Human Factors: The Journal of the Human Factors and Ergonomics Society , 48(4), 656--665

  62. [62]

    & Lee, J

    Meyer, J. & Lee, J. D. (2013). Trust, reliance, and compliance. In J. D. Lee & A. Kirlik (Eds.), The Oxford Handbook of Cognitive Engineering , Oxford Library of Psychology (pp.\ 109--124). New York, NY, US: Oxford University Press

  63. [63]

    Miller, C. A. (2021). Trust, transparency, explanation, and planning: Why we need a lifecycle perspective on human-automation interaction. In Trust in Human-Robot Interaction (pp.\ 233--257). Elsevier

  64. [64]

    Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence , 267, 1--38

  65. [65]

    Mirzaei, S., Mao, H., Al-Nima , R. R. O., & Woo, W. L. (2024). Explainable AI Evaluation : A Top-Down Approach for Selecting Optimal Explanations for Black Box Models . Information , 15(1), 4

  66. [66]

    Munn, L. (2023). The uselessness of AI ethics. AI and Ethics , 3(3), 869--877

  67. [67]

    M., Cassani, L., Cook, J., Bautista, P., & Fortier, L

    Naber, A. M., Cassani, L., Cook, J., Bautista, P., & Fortier, L. (2024). Comparing Human to Analytic Performance on Detecting , Attributing , and Characterizing Manipulated Media . Proceedings of the Human Factors and Ergonomics Society Annual Meeting , (pp.\ 10711813241262035)

  68. [68]

    Intelligence Community Directive 203

    ODNI (2015). Intelligence Community Directive 203. Retrieved from https://www.dni.gov/files/documents/ICD/ICD-203 \_ TA \_ Analytic \_ Standards \_ 21 \_ Dec \_ 2022.pdf

  69. [69]

    & Riley, V

    Parasuraman, R. & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors , 39(2), 230--253

  70. [70]

    Prem, E. (2023). From ethical ai frameworks to tools: A review of approaches. AI Ethics , 3, 699--716

  71. [71]

    Qualtrics

    Qualtrics (2020). Qualtrics . Provo, UT. https://www.qualtrics.com

  72. [72]

    Revelle, W. R. (2018). psych: Procedures for Personality and Psychological Research . R package Version 1.18. 10

  73. [73]

    W hy should i trust you?

    Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). " W hy should i trust you?" E xplaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD I nternational C onference on K nowledge D iscovery and D ata M ining , (pp.\ 1135--1144)

  74. [74]

    T., Singh, S., & Guestrin, C

    Ribeiro, M. T., Singh, S., & Guestrin, C. (2018). Anchors: High-precision model-agnostic explanations. Proceedings of the AAAI C onference on A rtificial I ntelligence , 32(1)

  75. [75]

    A., & McCarthy, J

    Riegelsberger, J., Sasse, M. A., & McCarthy, J. D. (2005). The mechanics of trust: A framework for research and design. International Journal of Human-Computer Studies , 62(3), 381--422

  76. [76]

    M., Bisantz, A

    Roth, E. M., Bisantz, A. M., Wang, X., Kim, T., & Hettinger, A. Z. (2021). A work-centered approach to system user-evaluation. Journal of Cognitive Engineering and Decision Making

  77. [77]

    Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence , 1(5), 206--215

  78. [78]

    C., Wang, Y., Zhao, J., Bhatti, S., Sung, J., Blasch, E., Mancenido, M

    Salehi, P., Ba, Y., Kim, N., Mosallanezhad, A., Pan, A., Cohen, M. C., Wang, Y., Zhao, J., Bhatti, S., Sung, J., Blasch, E., Mancenido, M. V., & Chiou, E. K. (2024). Towards Trustworthy AI-Enabled Decision Support Systems : Validation of the Multisource AI Scorecard Table ( MAST ). Journal of Artificial Intelligence Research , 80, 1311--1341

  79. [79]

    & Subhashini, R

    Saranya, A. & Subhashini, R. (2023). A systematic review of Explainable Artificial Intelligence models and applications: Recent developments and future trends. Decision Analytics Journal , 7, 100230

  80. [80]

    B., Salanova, M., Gonz \'a lez-Rom \'a , V., & Bakker, A

    Schaufeli, W. B., Salanova, M., Gonz \'a lez-Rom \'a , V., & Bakker, A. B. (2002). The measurement of engagement and burnout: A two sample confirmatory factor analytic approach. Journal of Happiness Studies , 3, 71--92

Showing first 80 references.