Canonical reference

Toward general-purpose robots via foundation models: A survey and meta-analysis,

· 2023 · arXiv 2312.08782

Canonical reference. 83% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 83% of classified citations

read on arXiv browse 12 citing papers

citation-role summary

background 6

citation-polarity summary

background 5 unclear 1

representative citing papers

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.

ASH: Agents that Self-Hone via Embodied Learning

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

cs.RO · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.

[Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

ATI is a tripartite bio-inspired architecture for physical AI that co-designs sensing and inference, shown in a camera prototype to raise accuracy from 53.8% to 88% and cut remote invocations by 43.3%.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

cs.CV · 2025-09-16 · unverdicted · novelty 7.0

MapAnything is a unified feed-forward transformer that regresses metric 3D scene geometry and cameras from images using a factored representation of depth maps, ray maps, poses, and scale.

ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

cs.RO · 2024-09-03 · conditional · novelty 7.0

ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.

From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

cs.AI · 2026-04-12 · unverdicted · novelty 6.0

EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

cs.RO · 2026-03-12 · unverdicted · novelty 6.0

Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.

DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies

cs.RO · 2025-05-12 · unverdicted · novelty 6.0

DexWild co-trains dexterous robot policies on in-the-wild human hand interactions recorded with a low-cost system and limited robot data, achieving 68.5% success in unseen environments and 5.8x better cross-embodiment generalization.

A Survey on Vision-Language-Action Models for Embodied AI

cs.RO · 2024-05-23 · unverdicted · novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

citing papers explorer

Showing 12 of 12 citing papers.

Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains cs.AI · 2026-05-19 · unverdicted · none · ref 18
Introduces the Grounded Observer framework that applies robotics-inspired formal constructs for runtime constraint enforcement on foundation model interaction trajectories in socially sensitive domains.
ASH: Agents that Self-Hone via Embodied Learning cs.AI · 2026-05-14 · unverdicted · none · ref 2
ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.
Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models cs.RO · 2026-05-12 · unverdicted · none · ref 4 · 2 links
Pace-and-Path Correction decomposes a quadratic cost minimization into orthogonal pace and path channels to correct chunked actions in VLA models, raising success rates by up to 28.8% in dynamic settings.
[Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI cs.AI · 2026-04-15 · unverdicted · none · ref 30
ATI is a tripartite bio-inspired architecture for physical AI that co-designs sensing and inference, shown in a camera prototype to raise accuracy from 53.8% to 88% and cut remote invocations by 43.3%.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis cs.RO · 2026-04-08 · unverdicted · none · ref 4
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 39
DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.
MapAnything: Universal Feed-Forward Metric 3D Reconstruction cs.CV · 2025-09-16 · unverdicted · none · ref 19
MapAnything is a unified feed-forward transformer that regresses metric 3D scene geometry and cameras from images using a factored representation of depth maps, ray maps, poses, and scale.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation cs.RO · 2024-09-03 · conditional · none · ref 90
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning cs.AI · 2026-04-12 · unverdicted · none · ref 6
EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies cs.RO · 2026-03-12 · unverdicted · none · ref 2
Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies cs.RO · 2025-05-12 · unverdicted · none · ref 18
DexWild co-trains dexterous robot policies on in-the-wild human hand interactions recorded with a low-cost system and limited robot data, achieving 68.5% success in unseen environments and 5.8x better cross-embodiment generalization.
A Survey on Vision-Language-Action Models for Embodied AI cs.RO · 2024-05-23 · unverdicted · none · ref 14
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

Toward general-purpose robots via foundation models: A survey and meta-analysis,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer