LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
Qwen2.5-1m technical report, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.
Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.
citing papers explorer
-
Using large language models for embodied planning introduces systematic safety risks
LLM planners for robots often produce dangerous plans even when planning succeeds, with safety awareness staying flat as model scale improves planning ability.
-
BEDTime: A Unified Benchmark for Automatically Describing Time Series
BEDTime benchmark tests 17 models on describing time series structure and finds vision-language models outperform dedicated time-series-language models and language-only approaches, with all models fragile to robustness tests.
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Rule-based RL on 5K logic puzzles induces advanced reasoning in a 7B model that transfers to AIME and AMC.