CaTR applies value-decomposed RL with hierarchical conflict-aware observations to achieve better safety-efficiency trade-offs than planning, optimization, and standard RL baselines in a realistic airport taxiway simulation.
hub
Artificial intelligence , volume=
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Explanations scored higher by an LLM-plus-planner model are judged more helpful by people and produce measurably better navigation performance in uncertain environments than lower-scored or no explanations.
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Scout-Assisted Planning uses UAV scouts and a GNN to predict information gain for pruning actions, cutting UGV travel costs by 31.9-37.7% versus the Canadian Traveler Problem baseline in partially known environments.
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
MATE uses permutation-invariant sum-aggregated memory of transition embeddings to solve CMDPs with online adaptation and computational advantages over Transformers and RNNs.
Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.
citing papers explorer
-
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.