{"work":{"id":"2957f2ae-a92d-479e-a1ff-b0b522445d0b","openalex_id":null,"doi":null,"arxiv_id":"2505.11831","raw_key":null,"title":"ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems","authors":null,"authors_text":"Francois Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, Henry Pinkard","year":2025,"venue":"cs.AI","abstract":"The Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI), introduced in 2019, established a challenging benchmark for evaluating the general fluid intelligence of artificial systems via a set of unique, novel tasks only requiring minimal prior knowledge. While ARC-AGI has spurred significant research activity over the past five years, recent AI progress calls for benchmarks capable of finer-grained evaluation at higher levels of cognitive complexity. We introduce ARC-AGI-2, an upgraded version of the benchmark. ARC-AGI-2 preserves the input-output pair task format of its predecessor, ensuring continuity for researchers. It incorporates a newly curated and expanded set of tasks specifically designed to provide a more granular signal to assess abstract reasoning and problem-solving abilities at higher levels of fluid intelligence. To contextualize the difficulty and characteristics of ARC-AGI-2, we present extensive results from human testing, providing a robust baseline that highlights the benchmark's accessibility to human intelligence, yet difficulty for current AI systems. ARC-AGI-2 aims to serve as a next-generation tool for rigorously measuring progress towards more general and human-like AI capabilities.","external_url":"https://arxiv.org/abs/2505.11831","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-22T10:31:25.115025+00:00","pith_arxiv_id":"2505.11831","created_at":"2026-05-09T05:45:21.589814+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems","render_title":"ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems"},"hub":{"state":{"work_id":"2957f2ae-a92d-479e-a1ff-b0b522445d0b","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":27,"external_cited_by_count":null,"distinct_field_count":5,"first_pith_cited_at":"2025-03-12T17:35:03+00:00","last_pith_cited_at":"2026-05-19T21:42:32+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-09T15:35:00.618013+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":7},{"context_role":"dataset","n":3}],"polarity_counts":[{"context_polarity":"background","n":7},{"context_polarity":"use_dataset","n":2},{"context_polarity":"unclear","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}