A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
Embodied task planning with large language models
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
SENTINEL reduces MLLM object hallucinations by over 90% via sentence-level early intervention with detector-bootstrapped preference data and C-DPO loss, outperforming prior SOTA on hallucination and capability benchmarks.
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
RePlan-Bot achieves state-of-the-art results on the ALFRED benchmark for embodied instruction following by integrating LLM-based auditing, commonsense map search, and ViT action correction.
TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.
GiG uses a Graph-in-Graph architecture with GNN-encoded states, experience memory retrieval, and bounded symbolic lookahead to improve LLM planning on embodied benchmarks with gains up to 37%.
Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.
citing papers explorer
-
Mitigating Object Hallucinations via Sentence-Level Early Intervention
SENTINEL reduces MLLM object hallucinations by over 90% via sentence-level early intervention with detector-bootstrapped preference data and C-DPO loss, outperforming prior SOTA on hallucination and capability benchmarks.