pith. sign in

Slots, Transitions, Loops: Learning Composable World Models for ARC

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

ARC tests in-context rule induction: given a few input-output demonstrations, a model must infer the hidden rule and apply it to a new query. While many approaches express ARC rules through language, code, or symbolic programs, ARC itself is visual-symbolic: rules appear as grid transitions over objects, colors, shapes, and spatial relations. We introduce Loop-OWM, an object-centric world-modeling architecture that learns these rules as composable transitions over structured states. It combines color-prototype slots, demonstration-conditioned task summaries, and a looped transition model with dense propagation and slot-conditioned correction. On both ARC-1 and ARC-2, Loop-OWM outperforms non-looped and looped baselines with comparable or fewer parameters. These results suggest that ARC rules can be learned not only as language descriptions or searched programs, but also as transitions over visual-symbolic world states.

fields

cs.CV 2

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

Object-centric LeJEPA

cs.CV · 2026-07-02 · unverdicted · novelty 6.0

Object-centric LeJEPA uses SAM object masks to extend LeJEPA's distributional objective to variable object sets and adds an instance-separating loss, outperforming image-level LeJEPA on DAVIS tracking, ImageNet classification, ADE20k segmentation, and NAVI re-identification across 10-100% of COCO da

citing papers explorer

Showing 2 of 2 citing papers.

  • Object-centric LeJEPA cs.CV · 2026-07-02 · unverdicted · none · ref 10 · internal anchor

    Object-centric LeJEPA uses SAM object masks to extend LeJEPA's distributional objective to variable object sets and adds an instance-separating loss, outperforming image-level LeJEPA on DAVIS tracking, ImageNet classification, ADE20k segmentation, and NAVI re-identification across 10-100% of COCO da

  • Trajectory Forcing: Structure-First Generation with Controllable Semantic Trajectories cs.CV · 2026-06-21 · unverdicted · none · ref 14 · internal anchor

    Trajectory Forcing makes generative image synthesis trajectory-centric by organizing it into decodable semantic stages derived from clustered visual representations and trained with one-step flow-matching models.