{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:JUP2Q7U5OJMPVWDHDC4JXKLRS5","short_pith_number":"pith:JUP2Q7U5","schema_version":"1.0","canonical_sha256":"4d1fa87e9d7258fad86718b89ba97197530ba15f60f9eb57cc3f5466b683aef2","source":{"kind":"arxiv","id":"2505.11831","version":2},"attestation_state":"computed","paper":{"title":"ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Bryan Landers, Francois Chollet, Gregory Kamradt, Henry Pinkard, Mike Knoop","submitted_at":"2025-05-17T04:34:48Z","abstract_excerpt":"The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI), introduced in 2019, established a challenging benchmark for evaluating the general fluid intelligence of artificial systems via a set of unique, novel tasks only requiring minimal prior knowledge. While ARC-AGI has spurred significant research activity over the past five years, recent AI progress calls for benchmarks capable of finer-grained evaluation at higher levels of cognitive complexity. We introduce ARC-AGI-2, an upgraded version of the benchmark. ARC-AGI-2 preserves the input-output pair task format of "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2505.11831","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.AI","submitted_at":"2025-05-17T04:34:48Z","cross_cats_sorted":[],"title_canon_sha256":"597cec621d6cb1f3586f0bdb55926f5783a602101251a516d1835619aa5bbd88","abstract_canon_sha256":"c666d70bf85b4a5989194f51c6434f3e73d3397e9c1081c732e66dc8f0811ec8"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:50.833892Z","signature_b64":"Xz6mDPujL5SHXXWAz7DKkOMGR4OBN4TBb9Dx3im1aM2iIJ0cAMYNgmH4Uu3LxXIPZXciUVGTaK1FN8i7j7YaBA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"4d1fa87e9d7258fad86718b89ba97197530ba15f60f9eb57cc3f5466b683aef2","last_reissued_at":"2026-05-17T23:38:50.833466Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:50.833466Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Bryan Landers, Francois Chollet, Gregory Kamradt, Henry Pinkard, Mike Knoop","submitted_at":"2025-05-17T04:34:48Z","abstract_excerpt":"The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI), introduced in 2019, established a challenging benchmark for evaluating the general fluid intelligence of artificial systems via a set of unique, novel tasks only requiring minimal prior knowledge. While ARC-AGI has spurred significant research activity over the past five years, recent AI progress calls for benchmarks capable of finer-grained evaluation at higher levels of cognitive complexity. We introduce ARC-AGI-2, an upgraded version of the benchmark. ARC-AGI-2 preserves the input-output pair task format of "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ARC-AGI-2 incorporates a newly curated and expanded set of tasks specifically designed to provide a more granular signal to assess abstract reasoning and problem-solving abilities at higher levels of fluid intelligence.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The newly selected tasks genuinely require higher levels of fluid intelligence with only minimal prior knowledge, and the human testing protocol produces a reliable and representative baseline.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ARC-AGI-2 adds a larger, more complex set of tasks to the original ARC-AGI benchmark to give finer-grained measurement of fluid intelligence in AI.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a67d66da04baded035b7fa20132dcde441bbfb28ea7a45f96eb6681492283edd"},"source":{"id":"2505.11831","kind":"arxiv","version":2},"verdict":{"id":"51689693-5cd4-4e9c-9783-4e24d71b017c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T16:45:48.667064Z","strongest_claim":"ARC-AGI-2 incorporates a newly curated and expanded set of tasks specifically designed to provide a more granular signal to assess abstract reasoning and problem-solving abilities at higher levels of fluid intelligence.","one_line_summary":"ARC-AGI-2 adds a larger, more complex set of tasks to the original ARC-AGI benchmark to give finer-grained measurement of fluid intelligence in AI.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The newly selected tasks genuinely require higher levels of fluid intelligence with only minimal prior knowledge, and the human testing protocol produces a reliable and representative baseline.","pith_extraction_headline":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems."},"references":{"count":13,"sample":[{"doi":"","year":null,"title":"ARC Prize - Leaderboard.https://arcprize.org/leaderboard","work_id":"554fed08-e4ae-4231-bd02-0429d9a3a2dd","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"ARC Prize - Policy.https://arcprize.org/policy","work_id":"6db1d41d-6cb8-4ebf-83ed-b6446c01cb39","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Kaggle competition","work_id":"f2b15558-bb48-42b5-8df2-e1a2b24d0dc0","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Lab42 competi- tion","work_id":"ff6dbefe-0344-49fd-993a-56618353547c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Lab42 competi- tion","work_id":"612eb557-b344-4e61-ba1a-a7cfb956ca8e","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":13,"snapshot_sha256":"26c8c782e89c6f0fab091170754f0008a278e39827819865d1f325adefa0e17a","internal_anchors":0},"formal_canon":{"evidence_count":3,"snapshot_sha256":"54f0f11ce90d1acbf158e729524215e13365c12af772479d07a4466ff1fc4f06"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2505.11831","created_at":"2026-05-17T23:38:50.833532+00:00"},{"alias_kind":"arxiv_version","alias_value":"2505.11831v2","created_at":"2026-05-17T23:38:50.833532+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2505.11831","created_at":"2026-05-17T23:38:50.833532+00:00"},{"alias_kind":"pith_short_12","alias_value":"JUP2Q7U5OJMP","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"JUP2Q7U5OJMPVWDH","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"JUP2Q7U5","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":30,"internal_anchor_count":30,"sample":[{"citing_arxiv_id":"2602.18600","citing_title":"MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19376","citing_title":"Generative Recursive Reasoning","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20520","citing_title":"Open-World Evaluations for Measuring Frontier AI Capabilities","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02028","citing_title":"Language models fail at extended rule following","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19376","citing_title":"Generative Recursive Reasoning","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19943","citing_title":"Probabilistic Tiny Recursive Model","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2506.22899","citing_title":"Neural Cellular Automata: From Cells to Pixels","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2509.14448","citing_title":"VCBench: Benchmarking LLMs in Venture Capital","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2511.04570","citing_title":"Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2511.20814","citing_title":"SPHINX: A Synthetic Environment for Visual Perception and Reasoning","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2602.18600","citing_title":"MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2506.06941","citing_title":"The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2506.21734","citing_title":"Hierarchical Reasoning Model","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2510.04871","citing_title":"Less is More: Recursive Reasoning with Tiny Networks","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13821","citing_title":"Harnessing Agentic Evolution","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02486","citing_title":"VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03406","citing_title":"SASAV: Self-Directed Agent for Scientific Analysis and Visualization","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2506.01939","citing_title":"Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2503.09567","citing_title":"Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models","ref_index":136,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08498","citing_title":"MathConstraint: Automated Generation of Verified Combinatorial Reasoning Instances for LLMs","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24317","citing_title":"Don't Pause! Every prediction matters in a streaming video","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01359","citing_title":"Structural Ranking of the Cognitive Plausibility of Computational Models of Analogy and Metaphors with the Minimal Cognitive Grid","ref_index":180,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07904","citing_title":"Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07725","citing_title":"Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.13521","citing_title":"C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions","ref_index":2,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5","json":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5.json","graph_json":"https://pith.science/api/pith-number/JUP2Q7U5OJMPVWDHDC4JXKLRS5/graph.json","events_json":"https://pith.science/api/pith-number/JUP2Q7U5OJMPVWDHDC4JXKLRS5/events.json","paper":"https://pith.science/paper/JUP2Q7U5"},"agent_actions":{"view_html":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5","download_json":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5.json","view_paper":"https://pith.science/paper/JUP2Q7U5","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2505.11831&json=true","fetch_graph":"https://pith.science/api/pith-number/JUP2Q7U5OJMPVWDHDC4JXKLRS5/graph.json","fetch_events":"https://pith.science/api/pith-number/JUP2Q7U5OJMPVWDHDC4JXKLRS5/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5/action/timestamp_anchor","attest_storage":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5/action/storage_attestation","attest_author":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5/action/author_attestation","sign_citation":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5/action/citation_signature","submit_replication":"https://pith.science/pith/JUP2Q7U5OJMPVWDHDC4JXKLRS5/action/replication_record"}},"created_at":"2026-05-17T23:38:50.833532+00:00","updated_at":"2026-05-17T23:38:50.833532+00:00"}