Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
read the original abstract
With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.
This paper has not been read by Pith yet.
Forward citations
Cited by 19 Pith papers
-
LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions
LLM agents make collective belief dynamics programmable, with simulations showing coordinated agents induce stable belief shifts, and four structural properties that complicate detection and defense.
-
Ethics Testing: Proactive Identification of Generative AI System Harms
Ethics testing is introduced as a systematic approach to generate tests that identify software harms induced by unethical behavior in generative AI outputs.
-
Precautionary Governance of Autonomous AI: Legal Personhood as Functional Instrument
Limited legal personhood for AI, implemented via purpose-bound operating companies within human-controlled holding structures, serves as a precautionary governance instrument that enables transparency and accountabili...
-
"Show Me You Comply... Without Showing Me Anything": Zero-Knowledge Software Auditing for AI-Enabled Systems
ZKMLOps is an MLOps framework that uses zero-knowledge proofs to generate verifiable cryptographic evidence of AI model compliance without revealing confidential information.
-
Output-Constrained Decision Trees
Presents three new training procedures for regression trees that enforce convex output constraints at training time and validates them on synthetic and hierarchical time-series data.
-
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.
-
The AI Evaluability Gap: The Missing Layer for Managing Risk and Sustaining Value
Introduces the AI Evaluability Gap and Evaluability framework to address missing evidentiary foundations in AI risk and value governance decisions.
-
CoT-Guard: Small Models for Strong Monitoring
CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.
-
Developing an AI Concept Envisioning Toolkit to Support Reflective Juxtaposition of Values and Harms
A new toolkit with cards and maps enables AI designers to juxtapose values and harms in early concept stages, shown valuable in designer surveys and interviews.
-
Toward a Science of Intent: Closure Gaps and Delegation Envelopes for Open-World AI Agents
Intent compilation turns vague human goals into verifiable artifacts, using closure-gap vectors and delegation envelopes to separate open-world agent challenges from closed-world solvers and to benchmark closure fixes...
-
What Should Frontier AI Developers Disclose About Internal Deployments?
A framework recommending that frontier AI developers disclose information on capabilities, usage, safety mitigations, and governance of internal model deployments.
-
Assessing High-Risk AI Systems under the EU AI Act: From Legal Requirements to Technical Verification
A structured mapping translates EU AI Act requirements into implementable verification activities for high-risk AI systems.
-
MalGEN: A Testbed for Modeling and Evaluating Malware Behaviors
MalGEN generates 977 executable malware samples across 1920 settings, with 45.71% evading existing detection engines and exposing gaps in current defenses.
-
What Should Frontier AI Developers Disclose About Internal Deployments?
A four-category disclosure framework for internal frontier AI deployments, covering capabilities, usage, safety mitigations, and governance.
-
AI Identification: An Integrated Framework for Sustainable Governance in Digital Enterprises
The paper introduces a dual-layer AI identification framework that integrates cryptographic, blockchain, and zero-knowledge techniques with governance checkpoints to support lifecycle accountability in digital enterprises.
-
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles
A scoping review of AIES and FAccT literature concludes that AI trustworthiness research prioritizes technical precision over social, ethical, and institutional factors, leaving the sociotechnical nature of AI systems...
-
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.
-
Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding
Squirrel behaviors supply a comparative template for a hierarchical control model that integrates latent dynamics, episodic memory, observer beliefs, and delayed verification in agentic AI.
-
Automation and AI Technology in Surface Mining With a Brief Introduction to Open-Pit Operations in the Pilbara
The paper surveys open-pit mining processes in the Pilbara and highlights AI/automation challenges and opportunities across nine steps from geological assessment to ore shipment.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.