pith. sign in

Hierarchical task learning from language instructions with unified transformers and self- monitoring

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

years

2026 2 2023 1

representative citing papers

PaLM-E: An Embodied Multimodal Language Model

cs.LG · 2023-03-06 · conditional · novelty 6.0

PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive transfer from joint training on language and robotics data.

citing papers explorer

Showing 3 of 3 citing papers.

  • ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints cs.AI · 2026-04-16 · unverdicted · none · ref 23

    ADAPT augments planners with affordance reasoning to raise task success in environments with unspecified and time-varying object affordances, and a LoRA-finetuned VLM backend beats GPT-4o on the new DynAfford benchmark.

  • PaLM-E: An Embodied Multimodal Language Model cs.LG · 2023-03-06 · conditional · none · ref 40

    PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive transfer from joint training on language and robotics data.

  • Environmental Understanding Vision-Language Model for Embodied Agent cs.CV · 2026-04-21 · unverdicted · none · ref 48

    EUEA fine-tunes VLMs on object perception, task planning, action understanding and goal recognition, with recovery and GRPO, to raise ALFRED success rates by 11.89% over behavior cloning.