The conceptarc benchmark: Evaluating under- standing and generalization in the arc domain

Arseny Moskvichev, Victor Vikram Odouard, Melanie Mitchell · 2023 · arXiv 2305.07141

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Gradient-Based Program Synthesis with Neurally Interpreted Languages

cs.LG · 2026-04-20 · unverdicted · novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.

VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

VisAnalog is a new controlled benchmark showing VLMs substantially underperform humans on visual concept transfer under one- to four-step deterministic transformations, with relation inference as the main failure mode.

Less is More: Recursive Reasoning with Tiny Networks

cs.LG · 2025-10-06 · unverdicted · novelty 7.0

TRM with 7M parameters achieves 45% accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, surpassing most LLMs with under 0.01% of their parameters.

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

RefusalGuard constrains updates in hidden representation space to preserve safety-relevant geometric structure during fine-tuning, maintaining low attack success rates on safety benchmarks while preserving task performance.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Gradient-Based Program Synthesis with Neurally Interpreted Languages cs.LG · 2026-04-20 · unverdicted · none · ref 59
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prior methods on combinatorial generalization tasks.
Less is More: Recursive Reasoning with Tiny Networks cs.LG · 2025-10-06 · unverdicted · none · ref 12
TRM with 7M parameters achieves 45% accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, surpassing most LLMs with under 0.01% of their parameters.
RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs cs.LG · 2026-05-03 · unverdicted · none · ref 9
RefusalGuard constrains updates in hidden representation space to preserve safety-relevant geometric structure during fine-tuning, maintaining low attack success rates on safety benchmarks while preserving task performance.

The conceptarc benchmark: Evaluating under- standing and generalization in the arc domain

fields

years

verdicts

representative citing papers

citing papers explorer