{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2021:3W6OWNBDN5XXVX7QEVJJUIODVF","short_pith_number":"pith:3W6OWNBD","schema_version":"1.0","canonical_sha256":"ddbceb34236f6f7adff025529a21c3a94cf19d20a37c0276d159bdd99b0dbe03","source":{"kind":"arxiv","id":"2109.13916","version":5},"attestation_state":"computed","paper":{"title":"Unsolved Problems in ML Safety","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Machine learning safety should focus on four research areas as models scale and deploy in critical settings.","cross_cats":["cs.AI","cs.CL","cs.CV"],"primary_cat":"cs.LG","authors_text":"Dan Hendrycks, Jacob Steinhardt, John Schulman, Nicholas Carlini","submitted_at":"2021-09-28T17:59:36Z","abstract_excerpt":"Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards (\"Robustness\"), identifying hazards (\"Monitoring\"), reducing inherent model ha"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2109.13916","kind":"arxiv","version":5},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2021-09-28T17:59:36Z","cross_cats_sorted":["cs.AI","cs.CL","cs.CV"],"title_canon_sha256":"5a4ab8343ed3723538f3941013ab53844ebf2c24c850b5375261ae1bfc81ae59","abstract_canon_sha256":"ec2de830542c9c602b251629731834dfbf6d2dfc1a53e75a7221d1c506944fdc"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:46.628019Z","signature_b64":"AzBAPIT5uh0IqPXkZQwNGC8F8Jaq+H/qOtHJtBuBkKzQAd+SZQKJM8ufP+d7Pp9pEFAG87xWOmQI9rj+C/rnAA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"ddbceb34236f6f7adff025529a21c3a94cf19d20a37c0276d159bdd99b0dbe03","last_reissued_at":"2026-05-17T23:38:46.627509Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:46.627509Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Unsolved Problems in ML Safety","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Machine learning safety should focus on four research areas as models scale and deploy in critical settings.","cross_cats":["cs.AI","cs.CL","cs.CV"],"primary_cat":"cs.LG","authors_text":"Dan Hendrycks, Jacob Steinhardt, John Schulman, Nicholas Carlini","submitted_at":"2021-09-28T17:59:36Z","abstract_excerpt":"Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards (\"Robustness\"), identifying hazards (\"Monitoring\"), reducing inherent model ha"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We present four problems ready for research, namely withstanding hazards (Robustness), identifying hazards (Monitoring), reducing inherent model hazards (Alignment), and reducing systemic hazards (Systemic Safety).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the four categories comprehensively capture the primary safety challenges without major omissions or overlaps that would require a different organizing structure.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Machine learning safety should focus on four research areas as models scale and deploy in critical settings.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"50b09283156682c4430d6abb6d29e8cbdf1634ce34f2d84a17582b3fa30c5b5a"},"source":{"id":"2109.13916","kind":"arxiv","version":5},"verdict":{"id":"d2189e3c-86b6-4abf-9f81-8c8df98e94c3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T20:42:44.186395Z","strongest_claim":"We present four problems ready for research, namely withstanding hazards (Robustness), identifying hazards (Monitoring), reducing inherent model hazards (Alignment), and reducing systemic hazards (Systemic Safety).","one_line_summary":"The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the four categories comprehensively capture the primary safety challenges without major omissions or overlaps that would require a different organizing structure.","pith_extraction_headline":"Machine learning safety should focus on four research areas as models scale and deploy in critical settings."},"references":{"count":228,"sample":[{"doi":"","year":2000,"title":"Asilomar AI Principles","work_id":"03f6baad-25b0-48b1-8a06-a27fa1400be4","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2015,"title":"Autonomous Weapons: An Open Letter from AI and Robotics Researchers","work_id":"d7272ec7-47d2-4699-a4ee-a9f04aa6d6d1","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Deep Learning with Differential Privacy","work_id":"4eef2d5a-f5d7-41db-bcc1-267fd6da556f","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Network intrusion detection system: A systematic study of machine learning and deep learning approaches","work_id":"ef8751e2-9b84-4735-b05f-ad68f7914fee","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Concrete Problems in AI Safety","work_id":"08cbe17e-9d7e-44fb-858c-a0ac0590f206","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":228,"snapshot_sha256":"bf208f6b914ca6cfebba77df22cacd0f8a0d3b6842a507897bfba419a19076c0","internal_anchors":6},"formal_canon":{"evidence_count":1,"snapshot_sha256":"c1d24e7e2dfba039bb7d821d511fc92df8c3ea44e02c07a98474796c051d4f00"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2109.13916","created_at":"2026-05-17T23:38:46.627576+00:00"},{"alias_kind":"arxiv_version","alias_value":"2109.13916v5","created_at":"2026-05-17T23:38:46.627576+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2109.13916","created_at":"2026-05-17T23:38:46.627576+00:00"},{"alias_kind":"pith_short_12","alias_value":"3W6OWNBDN5XX","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"3W6OWNBDN5XXVX7Q","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"3W6OWNBD","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2306.12001","citing_title":"An Overview of Catastrophic AI Risks","ref_index":139,"is_internal_anchor":true},{"citing_arxiv_id":"2602.13372","citing_title":"MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2509.21173","citing_title":"Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2201.03544","citing_title":"The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2506.05171","citing_title":"Towards provable probabilistic safety for scalable embodied AI systems","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2507.06419","citing_title":"Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2509.21173","citing_title":"Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2406.10162","citing_title":"Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models","ref_index":175,"is_internal_anchor":true},{"citing_arxiv_id":"2512.21110","citing_title":"Beyond Context: Large Language Models' Failure to Grasp Users' Intent","ref_index":76,"is_internal_anchor":true},{"citing_arxiv_id":"2602.05353","citing_title":"AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2212.03827","citing_title":"Discovering Latent Knowledge in Language Models Without Supervision","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12809","citing_title":"Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces","ref_index":235,"is_internal_anchor":true},{"citing_arxiv_id":"2309.00614","citing_title":"Baseline Defenses for Adversarial Attacks Against Aligned Language Models","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02686","citing_title":"Beyond Semantic Manipulation: Token-Space Attacks on Reward Models","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2211.00593","citing_title":"Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26409","citing_title":"Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10822","citing_title":"Benchmarking Sensor-Fault Robustness in Forecasting","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2604.25684","citing_title":"Think Before You Act -- A Neurocognitive Governance Model for Autonomous AI Agents","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2202.03286","citing_title":"Red Teaming Language Models with Language Models","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2112.00861","citing_title":"A General Language Assistant as a Laboratory for Alignment","ref_index":224,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11174","citing_title":"EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems","ref_index":96,"is_internal_anchor":true},{"citing_arxiv_id":"2206.07682","citing_title":"Emergent Abilities of Large Language Models","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07728","citing_title":"SARC: A Governance-by-Architecture Framework for Agentic AI Systems","ref_index":53,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09689","citing_title":"Face Density as a Proxy for Data Complexity: Quantifying the Hardness of Instance Count","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02765","citing_title":"U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning","ref_index":40,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF","json":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF.json","graph_json":"https://pith.science/api/pith-number/3W6OWNBDN5XXVX7QEVJJUIODVF/graph.json","events_json":"https://pith.science/api/pith-number/3W6OWNBDN5XXVX7QEVJJUIODVF/events.json","paper":"https://pith.science/paper/3W6OWNBD"},"agent_actions":{"view_html":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF","download_json":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF.json","view_paper":"https://pith.science/paper/3W6OWNBD","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2109.13916&json=true","fetch_graph":"https://pith.science/api/pith-number/3W6OWNBDN5XXVX7QEVJJUIODVF/graph.json","fetch_events":"https://pith.science/api/pith-number/3W6OWNBDN5XXVX7QEVJJUIODVF/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF/action/timestamp_anchor","attest_storage":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF/action/storage_attestation","attest_author":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF/action/author_attestation","sign_citation":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF/action/citation_signature","submit_replication":"https://pith.science/pith/3W6OWNBDN5XXVX7QEVJJUIODVF/action/replication_record"}},"created_at":"2026-05-17T23:38:46.627576+00:00","updated_at":"2026-05-17T23:38:46.627576+00:00"}