Advances in Neural Information Processing Systems , volume=

Video pretraining (vpt): Learning to act by watching unlabeled online videos , author=

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

representative citing papers

Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding

cs.AI · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

LC-MAPF uses multi-round local communication between neighboring agents in a pre-trained model to outperform prior learning-based MAPF solvers on diverse unseen scenarios while preserving scalability.

From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.

TD-MPC2: Scalable, Robust World Models for Continuous Control

cs.LG · 2023-10-25 · conditional · novelty 6.0

TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.

Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

cs.AI · 2026-04-22 · unverdicted · novelty 4.0

SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.

Agent AI: Surveying the Horizons of Multimodal Interaction

cs.AI · 2024-01-07 · unverdicted · novelty 4.0

The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

citing papers explorer

Showing 6 of 6 citing papers.

Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding cs.AI · 2026-05-08 · unverdicted · none · ref 54 · 2 links
LC-MAPF uses multi-round local communication between neighboring agents in a pre-trained model to outperform prior learning-based MAPF solvers on diverse unseen scenarios while preserving scalability.
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models cs.RO · 2026-05-06 · unverdicted · none · ref 12
A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning cs.LG · 2026-05-01 · unverdicted · none · ref 14
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
TD-MPC2: Scalable, Robust World Models for Continuous Control cs.LG · 2023-10-25 · conditional · none · ref 151
TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.
Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning cs.AI · 2026-04-22 · unverdicted · none · ref 4
SuperIgor uses iterative co-training of a language model planner and a goal-conditional RL agent to self-generate and refine plans, resulting in stricter instruction adherence and better generalization to unseen instructions.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 121
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

Advances in Neural Information Processing Systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer