hub

Artificial intelligence , volume=

Planning, acting in partially observable stochastic domains , author= · 1998

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

browse 12 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations

cs.AI · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

CaTR applies value-decomposed RL with hierarchical conflict-aware observations to achieve better safety-efficiency trade-offs than planning, optimization, and standard RL baselines in a realistic airport taxiway simulation.

Effective Explanations Support Planning Under Uncertainty

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

Explanations scored higher by an LLM-plus-planner model are judged more helpful by people and produce measurably better navigation performance in uncertain environments than lower-scored or no explanations.

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

Scout-Assisted Planning for Heterogeneous Robot Teams under Partially Known Environments

cs.RO · 2026-05-21 · unverdicted · novelty 6.0

Scout-Assisted Planning uses UAV scouts and a GNN to predict information gain for pruning actions, cutting UGV travel costs by 31.9-37.7% versus the Canadian Traveler Problem baseline in partially known environments.

Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.

Policy Gradient Methods for Non-Markovian Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

cs.AI · 2025-07-01 · conditional · novelty 6.0

Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.

MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings

cs.LG · 2026-05-17 · unverdicted · novelty 5.0

MATE uses permutation-invariant sum-aggregated memory of transition embeddings to solve CMDPs with online adaptation and computational advantages over Transformers and RNNs.

Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05

Human-Guided Harm Recovery for Computer Use Agents

cs.AI · 2026-04-20

citing papers explorer

Showing 12 of 12 citing papers.

Value-Decomposed Reinforcement Learning Framework for Taxiway Routing with Hierarchical Conflict-Aware Observations cs.AI · 2026-05-09 · unverdicted · none · ref 28 · 2 links
CaTR applies value-decomposed RL with hierarchical conflict-aware observations to achieve better safety-efficiency trade-offs than planning, optimization, and standard RL baselines in a realistic airport taxiway simulation.
Effective Explanations Support Planning Under Uncertainty cs.CL · 2026-05-08 · unverdicted · none · ref 10
Explanations scored higher by an LLM-plus-planner model are judged more helpful by people and produce measurably better navigation performance in uncertain environments than lower-scored or no explanations.
Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 119
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Goal-Conditioned Agents that Learn Everything All at Once cs.LG · 2026-05-22 · unverdicted · none · ref 97
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Scout-Assisted Planning for Heterogeneous Robot Teams under Partially Known Environments cs.RO · 2026-05-21 · unverdicted · none · ref 7
Scout-Assisted Planning uses UAV scouts and a GNN to predict information gain for pruning actions, cutting UGV travel costs by 31.9-37.7% versus the Canadian Traveler Problem baseline in partially known environments.
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making cs.LG · 2026-05-15 · unverdicted · none · ref 56
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
Policy Gradient Methods for Non-Markovian Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 40
Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning cs.AI · 2025-07-01 · conditional · none · ref 69
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
MATE: Solving Contextual Markov Decision Processes with Memory of Accumulated Transition Embeddings cs.LG · 2026-05-17 · unverdicted · none · ref 27
MATE uses permutation-invariant sum-aggregated memory of transition embeddings to solve CMDPs with online adaptation and computational advantages over Transformers and RNNs.
Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning cs.LG · 2026-05-06 · unverdicted · none · ref 47 · 2 links
Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unreviewed · ref 216
Human-Guided Harm Recovery for Computer Use Agents cs.AI · 2026-04-20 · unreviewed · ref 10

Artificial intelligence , volume=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer