Task success is not enough: Investigating the use of video-language models as behavior critics for catching undesirable agent behaviors.arXiv preprint arXiv:2402.04210, 2024

Lin Guan, Yifan Zhou, Denis Liu, Yantian Zha, Heni Ben Amor, Subbarao Kambhampati · 2024 · arXiv 2402.04210

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

DriveJudge combines VLM reasoning with rule functions on a new 33,577-sample human-annotated dataset, outperforming EPDMS by 21.23 AUC on quality classification and DriveCritic by 6.5% on trajectory preference.

citing papers explorer

Showing 1 of 1 citing paper after filters.

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models cs.CV · 2026-06-15 · unverdicted · none · ref 14
DriveJudge combines VLM reasoning with rule functions on a new 33,577-sample human-annotated dataset, outperforming EPDMS by 21.23 AUC on quality classification and DriveCritic by 6.5% on trajectory preference.

Task success is not enough: Investigating the use of video-language models as behavior critics for catching undesirable agent behaviors.arXiv preprint arXiv:2402.04210, 2024

fields

years

verdicts

representative citing papers

citing papers explorer