pith. sign in
Pith Number

pith:BDDQ3SU5

pith:2026:BDDQ3SU5C4FNMMBT2LSZ5F54N4
not attested not anchored not stored refs resolved

Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

Anna Rohrbach, Christian Bialas, Georgia Chalvatzaki, Marcus Rohrbach, Nishad Singhi, Snehal Jauhri, Vignesh Prasad

A test-time verifier trained on synthesized failures helps MLLM agents pick reliable actions from multiple candidates.

arxiv:2605.12620 v1 · 2026-05-12 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BDDQ3SU5C4FNMMBT2LSZ5F54N4}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Across embodied reasoning benchmarks spanning the Habitat and ALFRED environments, VeGAS consistently improves generalization, achieving up to a 36% relative performance gain over strong CoT baselines on the most challenging multi-object, long-horizon tasks.

C2weakest assumption

That training a verifier on automatically synthesized failure cases from an LLM will produce a model that reliably identifies good actions in out-of-distribution scenarios where the base MLLM fails.

C3one line summary

VeGAS improves MLLM-based embodied agents by sampling action ensembles and using a verifier trained on LLM-synthesized failure cases, yielding up to 36% relative gains on hard multi-object long-horizon tasks in Habitat and ALFRED.

References

65 extracted · 65 resolved · 7 Pith anchors

[1] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances 2022 · arXiv:2204.01691
[2] Critique-out-loud reward models 2024
[3] Qwen2.5-VL Technical Report · arXiv:2502.13923
[4] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling 2024 · arXiv:2407.21787
[5] Training Verifiers to Solve Math Word Problems 2021 · arXiv:2110.14168

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T03:10:00.447469Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

08c70dca9d170ad63033d2e59e97bc6f28bb589f8e93b171468152d78bb4ad37

Aliases

arxiv: 2605.12620 · arxiv_version: 2605.12620v1 · doi: 10.48550/arxiv.2605.12620 · pith_short_12: BDDQ3SU5C4FN · pith_short_16: BDDQ3SU5C4FNMMBT · pith_short_8: BDDQ3SU5
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BDDQ3SU5C4FNMMBT2LSZ5F54N4 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 08c70dca9d170ad63033d2e59e97bc6f28bb589f8e93b171468152d78bb4ad37
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "357ae961b7ad0cd33cf52aba688de6cfeb3ccedb2b2a1e58e3e3dc61b353b0ae",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-12T18:08:24Z",
    "title_canon_sha256": "df1ac8fdff8401667f177b8bd831d1159e04ca4b32cb55a1738c26c46f7d3af5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12620",
    "kind": "arxiv",
    "version": 1
  }
}