pith. sign in

arxiv: 2605.17746 · v1 · pith:LT3FSHOTnew · submitted 2026-05-18 · 💻 cs.AI · cs.HC

Agents for Experiments, Experiments for Agents: A Design Grammar for AI-Enabled Experimental Science

Pith reviewed 2026-05-20 11:18 UTC · model grok-4.3

classification 💻 cs.AI cs.HC
keywords SEED frameworkactor-flow graphsAI agentsexperimental designworkflow representationgovernance in AIhuman-AI interactiondesign grammar
0
0 comments X

The pith

SEED encodes experimental conditions as typed actor-flow graphs to describe, evaluate, and generate AI-human workflow designs under governance constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SEED as a way to represent experimental setups involving AI agents and humans as structured graphs of actors and information flows. This addresses the difficulty of specifying such experiments in plain prose, which hinders comparison, reuse, and checks on how decisions are delegated or controlled. SEED enables three operations: mapping out the interaction structure, measuring how new designs differ from previous ones, and creating candidate setups that satisfy feasibility and oversight rules. A small test on medical triage workflows found that designs produced with SEED made the actor changes, assumptions, and governance elements more explicit than designs made without the structure. Readers should care because AI systems are increasingly embedded in knowledge work, and better tools for testing those arrangements could improve accountability without requiring entirely new methods.

Core claim

Experimental conditions for AI-enabled studies can be represented as typed actor-flow graphs in the SEED framework. This representation supports describing the structure of interactions among actors, evaluating the structural novelty of a candidate design against a library of prior encodings, and generating new candidate designs subject to explicit feasibility and governance constraints. In a diagnostic test contrasting graph-blind and SEED-guided generation for a medical-triage task, the SEED-guided outputs displayed clearer documentation of actor-flow modifications, stated assumptions, and governance validations.

What carries the argument

SEED (Structural Encoding for Experimental Discovery) as typed actor-flow graphs that encode actors, directed flows, and constraint annotations to enable description, novelty evaluation, and constrained generation of experimental designs.

If this is right

  • Experimental conditions become comparable and reusable across different studies through shared graph encodings.
  • Generation of new designs can systematically incorporate explicit checks for governance and feasibility.
  • Structural novelty becomes a measurable property relative to an encoded set of prior designs.
  • Accountability improves because assumptions and control points are surfaced in the graph representation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph approach could be applied to generate and audit experiments in domains such as education or organizational decision-making beyond the medical example.
  • Libraries of reusable graph templates might emerge for common experiment patterns, reducing the cost of designing new tests.
  • Tensions around replication and validity identified in the commentary could be addressed by versioning the graph encodings themselves.

Load-bearing premise

Representing experimental conditions as typed actor-flow graphs captures the key mechanisms of delegation, feedback, and control in human-AI arrangements without significant loss of relevant detail.

What would settle it

An independent replication of the medical-triage design task in which blinded evaluators find no difference in clarity of actor-flow changes, assumptions, or governance checks between SEED-guided and unstructured candidate designs.

read the original abstract

AI systems are becoming active participants in organizational and knowledge work. They increasingly interact with humans, coordinate workflows, and operate in multi-agent arrangements. Understanding their effects therefore requires more than measuring output accuracy; it requires evidence about mechanisms, delegation, feedback, and control. Experiments remain central to this task, but they also face a recursive challenge: we need experiments for agents to study these arrangements, and we may need agents for experiments to help search the expanding space of possible designs. Yet experimental conditions for human-AI and agentic workflows are still largely specified in prose, making them difficult to compare, reuse, or audit. We frame this as a problem of workflow representation, traceability, and governance in AI-enabled knowledge production. We introduce SEED (Structural Encoding for Experimental Discovery), a framework that represents experimental conditions as typed actor-flow graphs. SEED supports three design functions: describing conditions as interaction structures, evaluating structural novelty relative to encoded prior designs, and generating candidate designs under feasibility and governance constraints. We report a lightweight empirical feasibility test that compares graph-blind and SEEDguided generation in a medical-triage design task. In this diagnostic contrast, SEED-guided candidate designs show clearer actor-flow changes, assumptions, and governance checks, supporting the feasibility of the grammar as a design aid. The commentary closes by identifying governance tensions around novelty, replication, validity, diversity of inquiry, and accountability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SEED (Structural Encoding for Experimental Discovery), a framework representing experimental conditions as typed actor-flow graphs to enable description of interaction structures, evaluation of structural novelty relative to prior designs, and generation of candidate designs under feasibility and governance constraints. It reports a lightweight empirical feasibility test contrasting graph-blind and SEED-guided generation in a single medical-triage design task, claiming that SEED outputs exhibit clearer actor-flow changes, assumptions, and governance checks.

Significance. If the representation and generation functions prove robust, SEED could advance traceability, comparability, and auditability of complex human-AI workflow experiments, addressing a timely need as AI agents increasingly participate in organizational and knowledge-production settings. The framing of experiments as design problems with explicit governance checks is a constructive contribution to AI-enabled science methodology.

major comments (2)
  1. [empirical feasibility test] Empirical feasibility test section: The diagnostic contrast relies on an informal qualitative judgment that SEED-guided designs show 'clearer actor-flow changes, assumptions, and governance checks' without reporting quantitative metrics, pre-specified scoring criteria, blinding procedures, inter-rater reliability, or statistical comparison to the graph-blind baseline. This leaves the central feasibility claim dependent on unverified author assessment rather than reproducible evidence.
  2. [SEED framework] Framework definition: The claim that typed actor-flow graphs adequately capture mechanisms such as delegation, feedback, and control is asserted without a systematic analysis of representational fidelity or loss of relevant detail; the single-task contrast does not test whether the graph encoding preserves or distorts these dynamics across varied experimental settings.
minor comments (1)
  1. [conclusion] The abstract and closing commentary reference governance tensions (novelty, replication, validity, diversity, accountability) but the main text would benefit from explicit mapping of how specific SEED operations (description, novelty evaluation, constrained generation) mitigate each tension.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the scope and evidentiary standards appropriate for introducing a design grammar. We address each major comment below and indicate where revisions will be incorporated to improve transparency and precision.

read point-by-point responses
  1. Referee: [empirical feasibility test] Empirical feasibility test section: The diagnostic contrast relies on an informal qualitative judgment that SEED-guided designs show 'clearer actor-flow changes, assumptions, and governance checks' without reporting quantitative metrics, pre-specified scoring criteria, blinding procedures, inter-rater reliability, or statistical comparison to the graph-blind baseline. This leaves the central feasibility claim dependent on unverified author assessment rather than reproducible evidence.

    Authors: We agree that the presentation relies on qualitative author judgment without formal metrics or procedures. The test was designed as a lightweight diagnostic contrast to illustrate feasibility rather than as a controlled empirical study. In revision we will (1) articulate explicit qualitative criteria used to assess clarity of actor-flow changes, assumptions, and governance checks, (2) include the actual generated designs as supplementary material so readers can inspect them directly, and (3) add an explicit limitations paragraph acknowledging the absence of blinding, inter-rater reliability, and statistical testing. These changes will make the evidence more transparent while preserving the illustrative intent of the section. revision: yes

  2. Referee: [SEED framework] Framework definition: The claim that typed actor-flow graphs adequately capture mechanisms such as delegation, feedback, and control is asserted without a systematic analysis of representational fidelity or loss of relevant detail; the single-task contrast does not test whether the graph encoding preserves or distorts these dynamics across varied experimental settings.

    Authors: The manuscript presents SEED as an initial structural grammar and employs the medical-triage task as a single illustrative case. We accept that a broader systematic analysis of representational fidelity would strengthen the framework claims. In the revised manuscript we will add a dedicated subsection discussing how delegation, feedback, and control are encoded, together with acknowledged limitations such as the loss of fine-grained temporal sequencing or implicit contextual cues. We will also state explicitly that the single-task contrast is not offered as exhaustive validation and will outline directions for multi-domain testing in future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; framework introduced as independent encoding without reduction to inputs or self-referential definitions.

full rationale

The paper presents SEED as a novel structural encoding framework that represents experimental conditions as typed actor-flow graphs to support description, novelty evaluation, and constrained generation. The feasibility claim rests on a qualitative contrast between graph-blind and SEED-guided outputs in a single medical-triage task, described as showing clearer actor-flow changes and governance checks. No equations, fitted parameters, or derivations are provided that would make any result equivalent to its inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked in the reported chain. The derivation remains self-contained as an independent design grammar and diagnostic test.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the modeling choice that actor-flow graphs can represent experimental conditions; no free parameters or invented entities with independent evidence are described.

axioms (1)
  • domain assumption Experimental conditions for human-AI and agentic workflows can be adequately represented as typed actor-flow graphs.
    This is the foundational representation choice invoked to enable the three design functions.
invented entities (1)
  • Typed actor-flow graphs no independent evidence
    purpose: To encode experimental conditions structurally for description, comparison, and generation.
    New representational construct introduced by the framework.

pith-pipeline@v0.9.0 · 5789 in / 1132 out tokens · 34914 ms · 2026-05-20T11:18:43.792886+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [3]

    Hemant K Bhargava, Susan Brown, Anindya Ghose, Alok Gupta, Dorothy Leidner, and DJ Wu. 2025. Exploring Generative AI’s Impact on Research: Perspectives from Senior Scholars in Management Information Systems.ACM Transactions on Management Information Systems16, 2, Article 19 (2025), 9 pages. doi:10.1145/3721846

  2. [4]

    E Brynjolfsson, D Li, and LR Raymond. 2025. Generative AI at work.The Quarterly Journal of Economics140, 2 (2025), 889–942. doi:10.1093/qje/qjae044

  3. [5]

    Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. 2024. Large Language Models as Tool Makers. InInternational Conference on Learning Representations (ICLR). OpenReview.net, Vienna, Austria, 23 pages. https://openreview.net/forum?id=qV83K9d5WB

  4. [9]

    Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou. 2024. Large Language Models Cannot Self-Correct Reasoning Yet. InInternational Conference on Learning Representations (ICLR). OpenReview.net, Vienna, Austria, 17 pages. https://openreview.net/forum?id=IkmD3fKBPQ

  5. [10]

    Anna Kawakami, Venkatesh Sivaraman, Hao-Fei Cheng, Logan Stapleton, Yanghuidi Cheng, Diana Qing, Adam Perer, Zhiwei Steven Wu, Haiyi Zhu, and Kenneth Holstein. 2022. Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support. InProceedings of the 2022 CHI Conference on Human F...

  6. [11]

    Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online Controlled Experiments at Large Scale. InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Association for Computing Machinery, New York, NY, USA, 1168–1176. doi:10.1145/2487575.2488217

  7. [12]

    Pan Li and Alexander Tuzhilin. 2020. Ddtcdr: Deep dual transfer cross domain recommendation. InProceedings of the 13th international conference on web search and data mining. 331–339. doi:10.1145/3336191.3371793

  8. [13]

    Jessy Lin, Nicholas Tomlin, Jacob Andreas, and Jason Eisner. 2024. Decision-Oriented Dialogue for Human-AI Collaboration.Transactions of the Association for Computational Linguistics12 (2024), 892–911. doi:10.1162/tacl_a_00679

  9. [14]

    Jens Ludwig and Sendhil Mullainathan. 2024. Machine learning as a tool for hypothesis generation.The Quarterly Journal of Economics139, 2 (2024), 751–827. doi:10.1093/qje/qjad055

  10. [16]

    Open Science Collaboration. 2015. Estimating the reproducibility of psychological science.Science349, 6251 (2015), aac4716

  11. [17]

    Phanish Puranam. 2021. Human–AI Collaborative Decision-Making as an Organization Design Problem.Journal of Organization Design10 (2021), 75–80. doi:10.1007/s41469-021-00095-2

  12. [18]

    Deciding fast and slow: The role of cognitive biases in ai-assisted decision-making

    Charvi Rastogi, Yunfeng Zhang, Dennis Wei, Kush R. Varshney, Amit Dhurandhar, and Richard Tomsett. 2022. Deciding Fast and Slow: The Role of Cognitive Biases in AI-assisted Decision-Making.Proceedings of the ACM on Human-Computer Interaction6, CSCW1 (2022), 1–22. doi:10.1145/3512930

  13. [21]

    Anjana Susarla, Ram Gopal, Jason Bennett Thatcher, and Suprateek Sarker. 2023. The Janus effect of generative AI: Charting the path for responsible conduct of scholarly activities in information systems.Information Systems Research34, 2 (2023), 399–408. doi:10.1287/isre.2023.ed.v34.n2

  14. [22]

    Kyle Swanson, Wesley Wu, Nash L Bulaong, John E Pak, and James Zou. 2025. The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies. Nature646, 8085 (2025), 716–723. doi:10.1038/s41586-025-09442-9

  15. [23]

    Michael Vössing, Niklas Kühl, Matteo Lind, and Gerhard Satzger. 2022. Designing Transparency for Effective Human-AI Collaboration.Information Systems Frontiers24 (2022), 877–895. doi:10.1007/s10796-022-10284-3

  16. [24]

    Lingli Wang, Ni Huang, Yumei He, De Liu, Xunhua Guo, Yan Sun, and Guoqing Chen. 2025. Artificial Intelligence (AI) Assistant in Online Shopping: A Randomized Field Experiment on a Livestream Selling Platform.Information Systems Research36, 4 (2025), 2358–2374. doi:10.1287/isre.2023.0103

  17. [25]

    Heng Xu and Nan Zhang. 2022. From Contextualizing to Context Theorizing: Assessing Context Effects in Privacy Research.Management Science 68, 10 (2022), 7383–7401. doi:10.1287/mnsc.2021.4249

  18. [26]

    Yuqian Xu, Hongyan Dai, and Wanfeng Yan. 2024. Identity Disclosure and Anthropomorphism in Voice Chatbot Design: A Field Experiment. Management Science72, 1 (2024), 223–241. doi:10.1287/mnsc.2022.03833

  19. [29]

    Bennett, Kori Inkpen, Jaime Tee- van, Ruth Kikin-Gil, and Eric Horvitz

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). Association for Computing Machine...

  20. [30]

    Proceedings of the AAAI Conference on Human Computation and Crowdsourcing , author=

    Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S. Lasecki, Daniel S. Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance.Proceedings of the AAAI Conference on Human Computation and Crowdsourcing7, 1 (2019), 2–11. doi:10.1609/hcomp.v7i1.5285

  21. [31]

    Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT). Association for Computing Machinery, New York, NY,...

  22. [32]

    Zenan Chen and Jason Chan. 2024. Large Language Model in Creative Work: The Role of Collaboration Modality and User Expertise.Management Science70, 12 (2024), 9101–9117. doi:10.1287/mnsc.2023.03014

  23. [33]

    A Fügener, J Grahl, A Gupta, and W Ketter. 2022. Cognitive Challenges in Human–Artificial Intelligence Collaboration: Investigating the Path Toward Productive Delegation.Information Systems Research33, 2 (2022), 678–696. doi:10.1287/isre.2021.1079

  24. [34]

    Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason A

    Ethan Goh, Robert J. Gallo, Eric Strong, Yingjie Weng, Hannah Kerman, Jason A. Freed, Joséphine A. Cool, Zahir Kanjee, Kathleen P. Lane, Andrew S. Parsons, Neera Ahuja, Eric Horvitz, Daniel Yang, Arnold Milstein, Andrew P. J. Olson, Jason Hom, Jonathan H. Chen, and Adam Rodman. 2025. GPT-4 Assistance for Improvement of Physician Performance on Patient Car...

  25. [35]

    Eeshaan Jain, Indradyumna Roy, Saswat Meher, Soumen Chakrabarti, and Abir De. 2024. Graph Edit Distance with General Costs Using Neural Set Divergence. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 37. Curran Associates, Inc., Red Hook, NY, USA, 40 pages. Manuscript submitted to ACM Agents for Experiments, Experiments for Agents: A D...

  26. [36]

    Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. InProceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). Association for Computing Machinery, New York, NY, USA, 14 pages. doi:10.1145/3290605.3300641

  27. [37]

    Raphael Koster, Jan Balaguer, Andrea Tacchetti, Ari Weinstein, Tina Zhu, Oliver Hauser, Duncan Williams, Lucy Campbell-Gillingham, Phoebe Thacker, Matthew Botvinick, et al. 2022. Human-centred mechanism design with Democratic AI.Nature Human Behaviour6, 10 (2022), 1398–1407. doi:10.1038/s41562-022-01383-x

  28. [38]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 33. Curran Associates, Inc....

  29. [39]

    Hussein Mozannar and David Sontag. 2020. Consistent Estimators for Learning to Defer to an Expert. InProceedings of the 37th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research, Vol. 119). PMLR, Virtual, 7076–7087. https://proceedings.mlr. press/v119/mozannar20b.html

  30. [40]

    2019.Reproducibility and Replicability in Science

    National Academies of Sciences, Engineering, and Medicine. 2019.Reproducibility and Replicability in Science. The National Academies Press, Washington, DC

  31. [41]

    Patil, Tianjun Zhang, Xin Wang, and Joseph E

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Gorilla: Large Language Model Connected with Massive APIs. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 37. Curran Associates, Inc., Red Hook, NY, USA, 22 pages. doi:10.52202/079017-4020

  32. [42]

    Rishabh Ranjan, Siddharth Grover, Sourav Medya, Venkatesan Chakaravarthy, Yogish Sabharwal, and Sayan Ranu. 2022. GREED: A Neural Framework for Learning Graph Distance Functions. InAdvances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., Red Hook, NY, USA, 13 pages. https://proceedings.neurips.cc/paper_files/paper/2022/hash/8d...

  33. [43]

    Elena Revilla, María Jesús Saenz, Matthias Seifert, and Ye Ma. 2023. Human–artificial intelligence collaboration in prediction: A field experiment in the retail industry.Journal of Management Information Systems40, 4 (2023), 1071–1098. doi:10.1080/07421222.2023.2267317

  34. [44]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom

  35. [45]

    InAdvances in Neural Information Processing Systems (NeurIPS), Vol

    Toolformer: Language Models Can Teach Themselves to Use Tools. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 36. Cur- ran Associates, Inc., Red Hook, NY, USA, 13 pages. https://proceedings.neurips.cc/paper_files/paper/2023/hash/d842425e4bf79ba039352da0f658a906- Abstract-Conference.html

  36. [46]

    Weiyan Shi, Xuewei Wang, Yoo Jung Oh, Jingwen Zhang, Saurav Sahay, and Zhou Yu. 2020. Effects of Persuasive Dialogues: Testing Bot Identities and Inquiry Strategies. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI). Association for Computing Machinery, New York, NY, USA. doi:10.1145/3313831.3376843

  37. [47]

    Marta Stelmaszak, Mareike Möhlmann, and Carsten Sørensen. 2025. When Algorithms Delegate to Humans: Exploring Human-Algorithm Interaction at Uber.MIS Quarterly49, 1 (2025), 305–330. doi:10.25300/MISQ/2024/17911

  38. [48]

    2021.Nudge: The final edition

    Richard H Thaler and Cass R Sunstein. 2021.Nudge: The final edition. Penguin. doi:10.1017/err.2021.61

  39. [49]

    Cathy Yang, Kevin Bauer, Xitong Li, and Oliver Hinz. 2025. My Advisor, Her AI, and Me: Evidence from a Field Experiment on Human–AI Collaboration and Investment Decisions.Management Science72, 1 (2025), 242–264. doi:10.1287/mnsc.2022.03918

  40. [50]

    Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/3290605.3300509

  41. [51]

    Sangseok You, Cathy Liu Yang, and Xitong Li. 2022. Algorithmic versus Human Advice: Does Presenting Prediction Performance Matter for Algorithm Appreciation?Journal of Management Information Systems39, 2 (2022), 336–365. doi:10.1080/07421222.2022.2063553

  42. [52]

    Vera and Bellamy, Rachel K

    Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-assisted Decision Making. InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT*). Association for Computing Machinery, New York, NY, USA, 11 pages. doi:10.1145/3351095.3372852 Manuscri...