Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.
hub Canonical reference
Planning and acting in partially observable stochastic domains
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 7polarities
background 7representative citing papers
A new stochastic differential dynamic programming method optimizes coupled trajectory design and orbit determination under partial observability, producing navigation-aware solutions with lower fuel consumption than deterministic local optimization in examples like the circular restricted three-body
In deterministic partially observable worlds, perfect prediction requires either identifying the relevant hidden quotient or achieving overwrite control, while high empowerment alone is insufficient.
POSCMs extend structural causal models to latent contexts that co-determine both graph structure and mechanisms, supported by an identifiability theory and validation in a retina simulator.
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Engagement Process decouples actions and observations into separate time-based event streams within a POMDP structure to explicitly model timing mismatches, deliberation latency, and multi-rate interactions.
Pearl learns predictive embeddings from multimodal tool trajectories in latent space to enable efficient reasoning that matches or exceeds supervised fine-tuning and reconstruction-based methods without explicit tool invocation at inference.
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
Reinforcement learning learns a policy that adapts control parameters of a regularized interior-point method, accelerating high-accuracy solutions for convex quadratic programs and generalizing across problem classes after lightweight training.
Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Squirrel behaviors supply a comparative template for a hierarchical control model that integrates latent dynamics, episodic memory, observer beliefs, and delayed verification in agentic AI.
citing papers explorer
-
Learning POMDP World Models from Observations with Language-Model Priors
Pinductor leverages language-model priors to learn POMDP world models from limited trajectories, matching privileged-access methods in performance and exceeding tabular baselines in sample efficiency.
-
Stochastic Differential Dynamic Programming for Trajectory Optimization under Partial Observability
A new stochastic differential dynamic programming method optimizes coupled trajectory design and orbit determination under partial observability, producing navigation-aware solutions with lower fuel consumption than deterministic local optimization in examples like the circular restricted three-body
-
Prediction and Empowerment: A Theory of Agency through Bridge Interfaces
In deterministic partially observable worlds, perfect prediction requires either identifying the relevant hidden quotient or achieving overwrite control, while high empowerment alone is insufficient.
-
Partially Observed Structural Causal Models
POSCMs extend structural causal models to latent contexts that co-determine both graph structure and mechanisms, supported by an identifiability theory and validation in a retina simulator.
-
Latent State Design for World Models under Sufficiency Constraints
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
-
Engagement Process: Rethinking the Temporal Interface of Action and Observation
Engagement Process decouples actions and observations into separate time-based event streams within a POMDP structure to explicitly model timing mismatches, deliberation latency, and multi-rate interactions.
-
Multimodal Latent Reasoning via Predictive Embeddings
Pearl learns predictive embeddings from multimodal tool trajectories in latent space to enable efficient reasoning that matches or exceeds supervised fine-tuning and reconstruction-based methods without explicit tool invocation at inference.
-
Artifacts as Memory Beyond the Agent Boundary
Artifacts in the environment can reduce the memory an RL agent needs to represent its history, as shown by a mathematical proof and experiments with spatial paths.
-
Reinforcement learning for adaptive interior point methods in convex quadratic programming
Reinforcement learning learns a policy that adapts control parameters of a regularized interior-point method, accelerating high-accuracy solutions for convex quadratic programs and generalizing across problem classes after lightweight training.
-
Gymnasium: A Standard Interface for Reinforcement Learning Environments
Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.
-
Optimal sequential decision-making for error propagation mitigation in digital twins
Error propagation mitigation in digital twins is cast as an MDP/POMDP with HMM-derived regimes as states, where the MDP policy maximizes reward and the POMDP recovers 95% of that performance.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
-
Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding
Squirrel behaviors supply a comparative template for a hierarchical control model that integrates latent dynamics, episodic memory, observer beliefs, and delayed verification in agentic AI.