AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.
Senate Committee on the Judiciary, Subcommit- tee on Privacy, Technology, and the Law)
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
The 2025 AI Agent Index catalogs technical and safety details for 30 deployed AI agents and finds low developer transparency on safety, evaluations, and societal impacts.
A multi-agent conversational system using AMA flowcharts achieves 95.29% top-3 retrieval accuracy and 99.10% navigation accuracy on large synthetic medical conversation datasets.
A framework with seven dimensions for AI incident reporting systems is developed from literature and case studies in safety-critical industries to guide institutional design choices.
citing papers explorer
-
Unsteady Metrics and Benchmarking Cultures of AI Model Builders
AI model builders mostly highlight unique benchmarks that act as flexible narrative tools for market positioning rather than standardized scientific measurements.
-
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
The 2025 AI Agent Index catalogs technical and safety details for 30 deployed AI agents and finds low developer transparency on safety, evaluations, and societal impacts.
-
Multi-agent Self-triage System with Medical Flowcharts
A multi-agent conversational system using AMA flowcharts achieves 95.29% top-3 retrieval accuracy and 99.10% navigation accuracy on large synthetic medical conversation datasets.
-
Designing Incident Reporting Systems for Harms from General-Purpose AI
A framework with seven dimensions for AI incident reporting systems is developed from literature and case studies in safety-critical industries to guide institutional design choices.