Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.
Foundation models in robotics: Applications, challenges, and the future
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
SemiFA is a four-agent LangGraph pipeline that combines DINOv2 and LLaVA image analysis with SECS/GEM telemetry and vector retrieval to produce complete FA reports in 48 seconds.
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
A survey that categorizes LLM uses in multi-robot systems across task allocation, motion planning, action generation, and human interaction, while noting challenges and future research opportunities.
A tutorial that unifies explicit and implicit world models through shared predictive structure for applications in physical AI such as robotics.