pith. sign in

arxiv: 2004.07213 · v2 · pith:JESFYQN6new · submitted 2020-04-15 · 💻 cs.CY

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

classification 💻 cs.CY
keywords claimsdevelopmentmechanismssystemstheymakeneedstakeholders
0
0 comments X
read the original abstract

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions

    cs.MA 2026-05 unverdicted novelty 6.0

    LLM agents make collective belief dynamics programmable, with simulations showing coordinated agents induce stable belief shifts, and four structural properties that complicate detection and defense.

  2. Ethics Testing: Proactive Identification of Generative AI System Harms

    cs.SE 2026-04 unverdicted novelty 6.0

    Ethics testing is introduced as a systematic approach to generate tests that identify software harms induced by unethical behavior in generative AI outputs.

  3. Precautionary Governance of Autonomous AI: Legal Personhood as Functional Instrument

    cs.CY 2026-03 unverdicted novelty 6.0

    Limited legal personhood for AI, implemented via purpose-bound operating companies within human-controlled holding structures, serves as a precautionary governance instrument that enables transparency and accountabili...

  4. "Show Me You Comply... Without Showing Me Anything": Zero-Knowledge Software Auditing for AI-Enabled Systems

    cs.SE 2025-10 unverdicted novelty 6.0

    ZKMLOps is an MLOps framework that uses zero-knowledge proofs to generate verifiable cryptographic evidence of AI model compliance without revealing confidential information.

  5. Output-Constrained Decision Trees

    cs.LG 2024-05 unverdicted novelty 6.0

    Presents three new training procedures for regression trees that enforce convex output constraints at training time and validates them on synthetic and hierarchical time-series data.

  6. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    cs.CL 2022-08 accept novelty 6.0

    RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.

  7. The AI Evaluability Gap: The Missing Layer for Managing Risk and Sustaining Value

    cs.AI 2026-06 unverdicted novelty 5.0

    Introduces the AI Evaluability Gap and Evaluability framework to address missing evidentiary foundations in AI risk and value governance decisions.

  8. CoT-Guard: Small Models for Strong Monitoring

    cs.CR 2026-05 unverdicted novelty 5.0

    CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.

  9. Developing an AI Concept Envisioning Toolkit to Support Reflective Juxtaposition of Values and Harms

    cs.HC 2026-04 conditional novelty 5.0

    A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.

  10. Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    Intent compilation turns vague human goals into verifiable artifacts, using closure-gap vectors and delegation envelopes to separate open-world agent challenges from closed-world solvers and to benchmark closure fixes...

  11. What Should Frontier AI Developers Disclose About Internal Deployments?

    cs.CY 2026-04 unverdicted novelty 5.0

    A framework recommending that frontier AI developers disclose information on capabilities, usage, safety mitigations, and governance of internal model deployments.

  12. Assessing High-Risk AI Systems under the EU AI Act: From Legal Requirements to Technical Verification

    cs.CY 2025-12 unverdicted novelty 5.0

    A structured mapping translates EU AI Act requirements into implementable verification activities for high-risk AI systems.

  13. MalGEN: A Testbed for Modeling and Evaluating Malware Behaviors

    cs.CR 2025-06 unverdicted novelty 5.0

    MalGEN generates 977 executable malware samples across 1920 settings, with 45.71% evading existing detection engines and exposing gaps in current defenses.

  14. What Should Frontier AI Developers Disclose About Internal Deployments?

    cs.CY 2026-04 unverdicted novelty 4.0

    A four-category disclosure framework for internal frontier AI deployments, covering capabilities, usage, safety mitigations, and governance.

  15. AI Identification: An Integrated Framework for Sustainable Governance in Digital Enterprises

    cs.CR 2026-04 unverdicted novelty 4.0

    The paper introduces a dual-layer AI identification framework that integrates cryptographic, blockchain, and zero-knowledge techniques with governance checkpoints to support lifecycle accountability in digital enterprises.

  16. Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles

    cs.AI 2025-10 unverdicted novelty 3.0

    A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems...

  17. Large Language Model Agent: A Survey on Methodology, Applications and Challenges

    cs.CL 2025-03 accept novelty 3.0

    A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.

  18. Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding

    cs.AI 2026-04 unverdicted novelty 2.0

    Squirrel behaviors supply a comparative template for a hierarchical control model that integrates latent dynamics, episodic memory, observer beliefs, and delayed verification in agentic AI.

  19. Automation and AI Technology in Surface Mining With a Brief Introduction to Open-Pit Operations in the Pilbara

    cs.CY 2023-01 unverdicted novelty 1.0

    The paper surveys open-pit mining processes in the Pilbara and highlights AI/automation challenges and opportunities across nine steps from geological assessment to ore shipment.