Foundation models in robotics: Applications, challenges, and the future

· 2023 · arXiv 2312.07843

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.

SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

SemiFA is a four-agent LangGraph pipeline that combines DINOv2 and LLaVA image analysis with SECS/GEM telemetry and vector retrieval to produce complete FA reports in 48 seconds.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

cs.RO · 2024-09-03 · conditional · novelty 7.0

ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

cs.RO · 2026-03-12 · unverdicted · novelty 6.0

Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.

A Survey on Vision-Language-Action Models for Embodied AI

cs.RO · 2024-05-23 · unverdicted · novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

Jump-Start Reinforcement Learning with Vision-Language-Action Regularization

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

VLAJS augments PPO with sparse annealed VLA guidance through directional regularization to cut required interactions by over 50% on manipulation tasks and enable zero-shot sim-to-real transfer.

Large Language Models for Multi-Robot Systems: A Survey

cs.RO · 2025-02-06 · unverdicted · novelty 4.0

A survey that categorizes LLM uses in multi-robot systems across task allocation, motion planning, action generation, and human interaction, while noting challenges and future research opportunities.

citing papers explorer

Showing 8 of 8 citing papers.

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic cs.LG · 2026-05-12 · unverdicted · none · ref 128 · 2 links
Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.
SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation cs.CV · 2026-04-14 · unverdicted · none · ref 22
SemiFA is a four-agent LangGraph pipeline that combines DINOv2 and LLaVA image analysis with SECS/GEM telemetry and vector retrieval to produce complete FA reports in 48 seconds.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis cs.RO · 2026-04-08 · unverdicted · none · ref 5
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation cs.RO · 2024-09-03 · conditional · none · ref 91
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies cs.RO · 2026-03-12 · unverdicted · none · ref 1
Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
A Survey on Vision-Language-Action Models for Embodied AI cs.RO · 2024-05-23 · unverdicted · none · ref 12
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
Jump-Start Reinforcement Learning with Vision-Language-Action Regularization cs.LG · 2026-04-15 · unverdicted · none · ref 9
VLAJS augments PPO with sparse annealed VLA guidance through directional regularization to cut required interactions by over 50% on manipulation tasks and enable zero-shot sim-to-real transfer.
Large Language Models for Multi-Robot Systems: A Survey cs.RO · 2025-02-06 · unverdicted · none · ref 25
A survey that categorizes LLM uses in multi-robot systems across task allocation, motion planning, action generation, and human interaction, while noting challenges and future research opportunities.

Foundation models in robotics: Applications, challenges, and the future

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer