{"paper":{"title":"ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems.","cross_cats":[],"primary_cat":"cs.AI","authors_text":"Bryan Landers, Francois Chollet, Gregory Kamradt, Henry Pinkard, Mike Knoop","submitted_at":"2025-05-17T04:34:48Z","abstract_excerpt":"The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI), introduced in 2019, established a challenging benchmark for evaluating the general fluid intelligence of artificial systems via a set of unique, novel tasks only requiring minimal prior knowledge. While ARC-AGI has spurred significant research activity over the past five years, recent AI progress calls for benchmarks capable of finer-grained evaluation at higher levels of cognitive complexity. We introduce ARC-AGI-2, an upgraded version of the benchmark. ARC-AGI-2 preserves the input-output pair task format of "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"ARC-AGI-2 incorporates a newly curated and expanded set of tasks specifically designed to provide a more granular signal to assess abstract reasoning and problem-solving abilities at higher levels of fluid intelligence.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The newly selected tasks genuinely require higher levels of fluid intelligence with only minimal prior knowledge, and the human testing protocol produces a reliable and representative baseline.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"ARC-AGI-2 adds a larger, more complex set of tasks to the original ARC-AGI benchmark to give finer-grained measurement of fluid intelligence in AI.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a67d66da04baded035b7fa20132dcde441bbfb28ea7a45f96eb6681492283edd"},"source":{"id":"2505.11831","kind":"arxiv","version":2},"verdict":{"id":"51689693-5cd4-4e9c-9783-4e24d71b017c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T16:45:48.667064Z","strongest_claim":"ARC-AGI-2 incorporates a newly curated and expanded set of tasks specifically designed to provide a more granular signal to assess abstract reasoning and problem-solving abilities at higher levels of fluid intelligence.","one_line_summary":"ARC-AGI-2 adds a larger, more complex set of tasks to the original ARC-AGI benchmark to give finer-grained measurement of fluid intelligence in AI.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The newly selected tasks genuinely require higher levels of fluid intelligence with only minimal prior knowledge, and the human testing protocol produces a reliable and representative baseline.","pith_extraction_headline":"ARC-AGI-2 introduces an expanded set of tasks to evaluate higher levels of abstract reasoning in AI systems."},"references":{"count":13,"sample":[{"doi":"","year":null,"title":"ARC Prize - Leaderboard.https://arcprize.org/leaderboard","work_id":"554fed08-e4ae-4231-bd02-0429d9a3a2dd","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"ARC Prize - Policy.https://arcprize.org/policy","work_id":"6db1d41d-6cb8-4ebf-83ed-b6446c01cb39","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Kaggle competition","work_id":"f2b15558-bb48-42b5-8df2-e1a2b24d0dc0","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Lab42 competi- tion","work_id":"ff6dbefe-0344-49fd-993a-56618353547c","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Lab42 competi- tion","work_id":"612eb557-b344-4e61-ba1a-a7cfb956ca8e","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":13,"snapshot_sha256":"26c8c782e89c6f0fab091170754f0008a278e39827819865d1f325adefa0e17a","internal_anchors":0},"formal_canon":{"evidence_count":3,"snapshot_sha256":"54f0f11ce90d1acbf158e729524215e13365c12af772479d07a4466ff1fc4f06"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}