hub Canonical reference

Planning and acting in partially observable stochastic domains

Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra · 1998 · Artificial Intelligence · DOI 10.1016/s0004-3702(98)00023-x

Canonical reference. 100% of citing Pith papers cite this work as background.

13 Pith papers citing it

2,353 external citations · Crossref

Background 100% of classified citations

open at publisher browse 13 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

Learning POMDP World Models from Observations with Language-Model Priors

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.

Stochastic Differential Dynamic Programming for Trajectory Optimization under Partial Observability

eess.SY · 2026-05-08 · unverdicted · novelty 7.0

A new stochastic differential dynamic programming method optimizes coupled trajectory design and orbit determination under partial observability, producing navigation-aware solutions with lower fuel consumption than deterministic local optimization in examples like the circular restricted three-body

Prediction and Empowerment: A Theory of Agency through Bridge Interfaces

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

In deterministic partially observable worlds, perfect prediction requires either identifying the relevant hidden quotient or achieving overwrite control, while high empowerment alone is insufficient.

Partially Observed Structural Causal Models

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

POSCMs extend structural causal models to latent contexts that co-determine both graph structure and mechanisms, supported by an identifiability theory and validation in a retina simulator.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

Engagement Process: Rethinking the Temporal Interface of Action and Observation

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

Engagement Process decouples actions and observations into separate time-based event streams within a POMDP structure to explicitly model timing mismatches, deliberation latency, and multi-rate interactions.

Multimodal Latent Reasoning via Predictive Embeddings

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

Pearl learns predictive embeddings from multimodal tool trajectories in latent space to enable efficient reasoning that matches or exceeds supervised fine-tuning and reconstruction-based methods without explicit tool invocation at inference.

Artifacts as Memory Beyond the Agent Boundary

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.

Reinforcement learning for adaptive interior point methods in convex quadratic programming

math.OC · 2025-09-09 · unverdicted · novelty 5.0

Reinforcement learning learns a policy that adapts control parameters of a regularized interior-point method, accelerating high-accuracy solutions for convex quadratic programs and generalizing across problem classes after lightweight training.

Gymnasium: A Standard Interface for Reinforcement Learning Environments

cs.LG · 2024-07-24 · accept · novelty 5.0

Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.

Optimal sequential decision-making for error propagation mitigation in digital twins

cs.LG · 2026-04-24 · unverdicted · novelty 4.0

Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

math.OC · 2026-04-13 · unverdicted · novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.

Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding

cs.AI · 2026-04-03 · unverdicted · novelty 2.0

Squirrel behaviors supply a comparative template for a hierarchical control model that integrates latent dynamics, episodic memory, observer beliefs, and delayed verification in agentic AI.

citing papers explorer

Showing 13 of 13 citing papers.

Learning POMDP World Models from Observations with Language-Model Priors cs.LG · 2026-05-13 · unverdicted · none · ref 5
Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.
Stochastic Differential Dynamic Programming for Trajectory Optimization under Partial Observability eess.SY · 2026-05-08 · unverdicted · none · ref 13
A new stochastic differential dynamic programming method optimizes coupled trajectory design and orbit determination under partial observability, producing navigation-aware solutions with lower fuel consumption than deterministic local optimization in examples like the circular restricted three-body
Prediction and Empowerment: A Theory of Agency through Bridge Interfaces cs.AI · 2026-05-07 · unverdicted · none · ref 15
In deterministic partially observable worlds, perfect prediction requires either identifying the relevant hidden quotient or achieving overwrite control, while high empowerment alone is insufficient.
Partially Observed Structural Causal Models cs.LG · 2026-05-05 · unverdicted · none · ref 119
POSCMs extend structural causal models to latent contexts that co-determine both graph structure and mechanisms, supported by an identifiability theory and validation in a retina simulator.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 36
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Engagement Process: Rethinking the Temporal Interface of Action and Observation cs.AI · 2026-05-12 · unverdicted · none · ref 20
Engagement Process decouples actions and observations into separate time-based event streams within a POMDP structure to explicitly model timing mismatches, deliberation latency, and multi-rate interactions.
Multimodal Latent Reasoning via Predictive Embeddings cs.LG · 2026-04-09 · unverdicted · none · ref 2
Pearl learns predictive embeddings from multimodal tool trajectories in latent space to enable efficient reasoning that matches or exceeds supervised fine-tuning and reconstruction-based methods without explicit tool invocation at inference.
Artifacts as Memory Beyond the Agent Boundary cs.AI · 2026-04-09 · unverdicted · none · ref 32
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
Reinforcement learning for adaptive interior point methods in convex quadratic programming math.OC · 2025-09-09 · unverdicted · none · ref 25
Reinforcement learning learns a policy that adapts control parameters of a regularized interior-point method, accelerating high-accuracy solutions for convex quadratic programs and generalizing across problem classes after lightweight training.
Gymnasium: A Standard Interface for Reinforcement Learning Environments cs.LG · 2024-07-24 · accept · none · ref 19
Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.
Optimal sequential decision-making for error propagation mitigation in digital twins cs.LG · 2026-04-24 · unverdicted · none · ref 13
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 66
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding cs.AI · 2026-04-03 · unverdicted · none · ref 13
Squirrel behaviors supply a comparative template for a hierarchical control model that integrates latent dynamics, episodic memory, observer beliefs, and delayed verification in agentic AI.

Planning and acting in partially observable stochastic domains

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer