Goal inference with Rao-Blackwellized Particle Filters

Dan P. Guralnik; Warren E. Dixon; Yixuan Wang

arxiv: 2512.09269 · v2 · pith:FEHKJQ7Pnew · submitted 2025-12-10 · 💻 cs.LG · cs.IR

Goal inference with Rao-Blackwellized Particle Filters

Yixuan Wang , Dan P. Guralnik , Warren E. Dixon This is my paper

Pith reviewed 2026-05-21 18:08 UTC · model grok-4.3

classification 💻 cs.LG cs.IR

keywords goal inferenceRao-Blackwellized particle filterintent inferenceKL divergencemobile agenttrajectory estimationinformation leakage

0 comments

The pith

Rao-Blackwellized particle filter infers mobile agent goals by analytically marginalizing linear-Gaussian dynamics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a Rao-Blackwellized Particle Filter to infer the eventual goal of a mobile agent from noisy observations of its trajectory. It assumes the agent follows closed-loop dynamics that admit a closed-form expression due to a provable practical stability property. The filter analytically marginalizes the linear-Gaussian substructure and updates only particle weights over the discrete intent variable, yielding higher sample efficiency than a standard particle filter. Two estimators for the intent distribution are presented, one using the full set of RBPF weights as a Gaussian mixture and a reduced version limited to effective samples. Information-theoretic leakage is quantified through computable lower bounds on the Kullback-Leibler divergence between the true intent distribution and the filter estimates.

Core claim

Under the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property, a Rao-Blackwellized Particle Filter analytically marginalizes the linear-Gaussian substructure of the assumed closed-form agent dynamics and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. Computable lower bounds on the Kullback-Leibler divergence between the true intent distribution and the RBPF estimates are provided via Gaussian-mixture KL upper

What carries the argument

Rao-Blackwellized Particle Filter that analytically marginalizes the linear-Gaussian substructure of closed-form closed-loop agent dynamics while sampling only over the discrete goal variable.

If this is right

The RBPF updates only particle weights after analytic marginalization, directly increasing sample efficiency relative to a standard particle filter.
A full Gaussian-mixture estimator and a reduced effective-sample estimator are supplied, together with a bound showing the reduced version performs nearly as well.
Computable lower bounds on KL divergence between true intent and filter output quantify the information an adversary can extract.
Experiments confirm fast and accurate goal recovery when the agent remains compliant with the assumed stable closed-loop dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The leakage bounds motivate the design of intent-obfuscating controllers that deliberately increase the KL divergence an observer can achieve.
The performance gap bound between the two estimators indicates that computational savings from the reduced version can be obtained with little loss in inference quality.
The marginalization technique could extend to other hybrid linear-nonlinear state estimation tasks that admit similar closed-form substructures.

Load-bearing premise

The agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property that permits closed-form dynamics.

What would settle it

Run the RBPF and a standard particle filter on simulated trajectories where the closed-loop controller lacks the assumed practical stability property and measure whether the RBPF loses its sample-efficiency advantage or produces inaccurate intent estimates.

Figures

Figures reproduced from arXiv: 2512.09269 by Dan P. Guralnik, Warren E. Dixon, Yixuan Wang.

**Figure 1.** Figure 1: Left: Hypothetical situation with three particles estimating x(θ ∗ ) with near-equal weights. The particle A with weight 1 3 + ε, is a highest weight estimate of θ ∗ according to the PF, but it represents a false positive with high probability ( 2 3 − ε). At the same time, the weighted average of the particles may not be relevant at all, since its location is not contained in any of the predicted goal regi… view at source ↗

**Figure 2.** Figure 2: RBPF intent–inference process. The upper-left panel shows the agent’s trajectory, and the upper-right panel plots the KL divergence for the complete and reduced estimators. The four lower panels display, from left to right, the effective sample size Neff , the position estimation error, the estimated goal radius, and the estimated arrival time. and there exist constants Cx, Cr, Ct such that for all k: ∆Hlo… view at source ↗

read the original abstract

Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes a Rao-Blackwellized Particle Filter (RBPF) for inferring a mobile agent's goal from noisy trajectory observations. Under the assumption that the agent's intent produces closed-loop dynamics with a provable practical stability property (yielding exact linear-Gaussian structure), the RBPF analytically marginalizes the substructure and updates only particle weights. Two difference estimators are introduced—a full Gaussian mixture model (GMM) using RBPF weights and a reduced version restricted to the effective sample—along with information-theoretic leakage metrics, computable lower bounds on the KL divergence between the true intent distribution and the RBPF estimate (via Gaussian-mixture KL bounds), and a performance-difference bound showing the reduced estimator nearly matches the full one. Experiments demonstrate fast, accurate intent recovery for agents satisfying the stability assumption.

Significance. If the stability assumption holds, the approach yields a concrete improvement in sample efficiency for goal inference while supplying explicit, computable bounds on estimation error and information leakage. The derivation of the RBPF marginalization directly from the closed-form dynamics, the two estimators with their performance bound, and the KL lower bounds are strengths that make the claims falsifiable and reproducible in the compliant setting. This could inform both intent-aware planning and the design of intent-obfuscating controllers, though the restriction to agents with the stated stability property limits immediate generality.

major comments (2)

[§4.2] §4.2, the definition of the reduced estimator: the claim that it 'performs almost as well' as the full GMM is supported by the performance-difference bound, but the bound is stated only in expectation; a high-probability version or explicit dependence on effective sample size N_eff would strengthen the result for finite-particle regimes.
[§5] §5, Experiments: all reported trials use agents whose closed-loop dynamics exactly satisfy the provable practical stability property; the central efficiency claim would be more convincing with at least one ablation on agents that only approximately satisfy the assumption (e.g., with small nonlinear perturbations) to quantify degradation.

minor comments (3)

[Introduction] The specific theorem or reference establishing the 'state-of-the-art provable practical stability property' is cited in the text but should be repeated in the introduction for readers who skip the preliminaries.
Notation for the intent distribution p(g) versus the RBPF estimate ˆp(g) is clear in the main derivation but becomes ambiguous in the leakage-metric definitions; a short table of symbols would help.
[Figure 3] Figure 3 (KL bound vs. number of particles) would benefit from error bars over multiple random seeds to illustrate variability of the lower bound.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive suggestions. We respond to each major comment below.

read point-by-point responses

Referee: [§4.2] §4.2, the definition of the reduced estimator: the claim that it 'performs almost as well' as the full GMM is supported by the performance-difference bound, but the bound is stated only in expectation; a high-probability version or explicit dependence on effective sample size N_eff would strengthen the result for finite-particle regimes.

Authors: We thank the referee for this observation. The performance-difference bound is stated in expectation, which guarantees that the reduced estimator is close to the full GMM in an average sense. To strengthen the finite-particle analysis, we will revise §4.2 to make the dependence on the effective sample size N_eff explicit, showing that the expected performance gap scales inversely with N_eff. This directly ties the bound to particle degeneracy and provides a clearer justification for using the reduced estimator in practice. A high-probability version would require additional concentration arguments and is left for future work. revision: partial
Referee: [§5] §5, Experiments: all reported trials use agents whose closed-loop dynamics exactly satisfy the provable practical stability property; the central efficiency claim would be more convincing with at least one ablation on agents that only approximately satisfy the assumption (e.g., with small nonlinear perturbations) to quantify degradation.

Authors: We agree that robustness to approximate satisfaction of the stability assumption is a natural question. However, the RBPF's analytical marginalization and the resulting sample-efficiency gains rely on the exact linear-Gaussian closed-loop structure that the practical stability property provides. Introducing even small nonlinear perturbations would destroy this structure, forcing a return to a standard particle filter and removing the central technical contribution. We have therefore confined the experiments to the exact-compliance regime for which the method is derived. In the revision we will expand the discussion in §5 and the conclusion to acknowledge this scope limitation and to describe qualitatively how performance is expected to degrade under perturbations. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper states its core assumption upfront: agent intent produces closed-loop dynamics possessing a provable practical stability property that yields an exact closed-form linear-Gaussian structure. All subsequent steps—the Rao-Blackwellized marginalization, particle-weight updates only, the two difference estimators (full GMM and reduced effective-sample), the Gaussian-mixture KL lower bounds, and the performance-difference bound—are derived directly from this assumed structure via standard filtering and information-theoretic identities. No parameter is fitted to the target leakage metric and then re-labeled as a prediction; no uniqueness theorem or ansatz is imported via self-citation to force the result; and experiments are restricted to agents that satisfy the stated assumption. The derivation chain is therefore self-contained and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central approach rests on the assumption of closed-loop agent dynamics with provable stability, which enables analytic marginalization; no explicit free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Agent intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property.
This assumption is invoked to leverage closed-form dynamics for RBPF marginalization.

pith-pipeline@v0.9.0 · 5738 in / 1172 out tokens · 27921 ms · 2026-05-21T18:08:19.933758+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only... Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version... computable lower bounds on the Kullback-Leibler (KL) divergence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Quantifying Trade-Offs Between Stability and Goal-Obfuscation
eess.SY 2026-05 unverdicted novelty 6.0

The authors introduce probabilistic control barrier functions to enforce a minimum information leakage threshold with high probability while preserving tracking stability under bounded disturbances.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Kulkarni, M

A. Kulkarni, M. Klenk, S. Rane, and H. Soroush. Resource bounded secure goal obfuscation. InAAAI Fall Symp. Integr. Plan. Diagn. Causal Reason., 2018

work page 2018
[2]

An optimization approach to robust goal obfuscation

Sara Bernardini, Fabio Fagnani, and Santiago Franco. An optimization approach to robust goal obfuscation. In Int. Conf. Princ. Knowl. Represent. Reason., pages 119–129, 2020

work page 2020
[3]

Deceptive path-planning

Peta Masters and Sebastian Sardina. Deceptive path-planning. InInt. Jt. Conf. Artif. Intell. (IJCAI), page 4368–4375, 2017

work page 2017
[4]

Entropy maximization for markov decision processes under temporal logic constraints.IEEE Trans

Yagiz Savas, Melkior Ornik, Murat Cubuktepe, Mustafa O Karabag, and Ufuk Topcu. Entropy maximization for markov decision processes under temporal logic constraints.IEEE Trans. Autom. Control, 65(4):1552–1567, 2019

work page 2019
[5]

Minimizing the information leakage regarding high-level task specifications.IFAC-PapersOnLine, 53(2):15388–15395, 2020

Michael Hibbard, Yagiz Savas, Zhe Xu, and Ufuk Topcu. Minimizing the information leakage regarding high-level task specifications.IFAC-PapersOnLine, 53(2):15388–15395, 2020

work page 2020
[6]

Privacy-preserving policy synthesis in Markov Decision Processes

Parham Gohari, Matthew Hale, and Ufuk Topcu. Privacy-preserving policy synthesis in Markov Decision Processes. InIEEE Conf. Decis. Control (CDC), pages 6266–6271, 2020

work page 2020
[7]

Towards differential privacy for symbolic systems

Austin Jones, Kevin Leahy, and Matthew Hale. Towards differential privacy for symbolic systems. InAm. Control Conf. (ACC), pages 372–377, 2019

work page 2019
[8]

Differential privacy for symbolic systems with application to Markov Chains.Automatica, 152:110908, 2023

Bo Chen, Kevin Leahy, Austin Jones, and Matthew Hale. Differential privacy for symbolic systems with application to Markov Chains.Automatica, 152:110908, 2023

work page 2023
[9]

Smoother entropy for active state trajectory estimation and obfuscation in pomdps.IEEE Trans

Timothy L Molloy and Girish N Nair. Smoother entropy for active state trajectory estimation and obfuscation in pomdps.IEEE Trans. Autom. Control, 68(6):3557–3572, 2023

work page 2023
[10]

Rao-Blackwellised particle filtering for dynamic bayesian networks

Kevin Murphy and Stuart Russell. Rao-Blackwellised particle filtering for dynamic bayesian networks. InSeq. Monte Carlo methods pract., pages 499–515. Springer, 2001. 7

work page 2001
[11]

Arulampalam, S

M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for online nonlinear/non- gaussian bayesian tracking.IEEE Trans. Signal Process., 50(2):174–188, 2002

work page 2002
[12]

Bayesian intention inference for trajectory prediction with an unknown goal destination

Graeme Best and Robert Fitch. Bayesian intention inference for trajectory prediction with an unknown goal destination. InIEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), pages 5817–5823, 2015

work page 2015
[13]

J. P. Aubin and H. Frankowska.Set-valued analysis. Birkhäuser, 2008

work page 2008
[14]

Durrieu, J.-Ph

J.-L. Durrieu, J.-Ph. Thiran, and F. Kelly. Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian Mixture Models. InIEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 4833–4836, 2012

work page 2012
[15]

Effective Model Pruning: Measure The Redundancy of Model Components

Yixuan Wang, Dan Guralnik, Saiedeh Akbari, and Warren Dixon. Effective model pruning.arXiv preprint arXiv:2509.25606, 2025. A Proof of Lemma 1 Let e≜x−x(θ) denote the error state and consider the Lyapunov function candidate V(e)≜ e 2 . Following [13, Theorem 10.1.3], let x: [0, T]→R n, T∈(0,∞] be a complete solution of the differential inclusion (3). Appl...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Otherwise, ωe f f≥ Ne f f N + N−N e f f N s N−N e f f−1 (Ne f f+ 1)(N−1) .(34) The bounds are sharp

IfN e f f=Nthenω e f f= 1. Otherwise, ωe f f≥ Ne f f N + N−N e f f N s N−N e f f−1 (Ne f f+ 1)(N−1) .(34) The bounds are sharp. Then, apply Proposition 2 on ωe f f, the bound for the effective weight becomes ωe f f∈[(1 + 1√ 2N )/2,1] . Hence, the additional information∆H low k can be bounded by: 0≤∆H low k ≤ X ν∈{x,r,t} (1 2 − 1 2 √ 2N ) logC ν .(35) 9

work page

[1] [1]

Kulkarni, M

A. Kulkarni, M. Klenk, S. Rane, and H. Soroush. Resource bounded secure goal obfuscation. InAAAI Fall Symp. Integr. Plan. Diagn. Causal Reason., 2018

work page 2018

[2] [2]

An optimization approach to robust goal obfuscation

Sara Bernardini, Fabio Fagnani, and Santiago Franco. An optimization approach to robust goal obfuscation. In Int. Conf. Princ. Knowl. Represent. Reason., pages 119–129, 2020

work page 2020

[3] [3]

Deceptive path-planning

Peta Masters and Sebastian Sardina. Deceptive path-planning. InInt. Jt. Conf. Artif. Intell. (IJCAI), page 4368–4375, 2017

work page 2017

[4] [4]

Entropy maximization for markov decision processes under temporal logic constraints.IEEE Trans

Yagiz Savas, Melkior Ornik, Murat Cubuktepe, Mustafa O Karabag, and Ufuk Topcu. Entropy maximization for markov decision processes under temporal logic constraints.IEEE Trans. Autom. Control, 65(4):1552–1567, 2019

work page 2019

[5] [5]

Minimizing the information leakage regarding high-level task specifications.IFAC-PapersOnLine, 53(2):15388–15395, 2020

Michael Hibbard, Yagiz Savas, Zhe Xu, and Ufuk Topcu. Minimizing the information leakage regarding high-level task specifications.IFAC-PapersOnLine, 53(2):15388–15395, 2020

work page 2020

[6] [6]

Privacy-preserving policy synthesis in Markov Decision Processes

Parham Gohari, Matthew Hale, and Ufuk Topcu. Privacy-preserving policy synthesis in Markov Decision Processes. InIEEE Conf. Decis. Control (CDC), pages 6266–6271, 2020

work page 2020

[7] [7]

Towards differential privacy for symbolic systems

Austin Jones, Kevin Leahy, and Matthew Hale. Towards differential privacy for symbolic systems. InAm. Control Conf. (ACC), pages 372–377, 2019

work page 2019

[8] [8]

Differential privacy for symbolic systems with application to Markov Chains.Automatica, 152:110908, 2023

Bo Chen, Kevin Leahy, Austin Jones, and Matthew Hale. Differential privacy for symbolic systems with application to Markov Chains.Automatica, 152:110908, 2023

work page 2023

[9] [9]

Smoother entropy for active state trajectory estimation and obfuscation in pomdps.IEEE Trans

Timothy L Molloy and Girish N Nair. Smoother entropy for active state trajectory estimation and obfuscation in pomdps.IEEE Trans. Autom. Control, 68(6):3557–3572, 2023

work page 2023

[10] [10]

Rao-Blackwellised particle filtering for dynamic bayesian networks

Kevin Murphy and Stuart Russell. Rao-Blackwellised particle filtering for dynamic bayesian networks. InSeq. Monte Carlo methods pract., pages 499–515. Springer, 2001. 7

work page 2001

[11] [11]

Arulampalam, S

M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for online nonlinear/non- gaussian bayesian tracking.IEEE Trans. Signal Process., 50(2):174–188, 2002

work page 2002

[12] [12]

Bayesian intention inference for trajectory prediction with an unknown goal destination

Graeme Best and Robert Fitch. Bayesian intention inference for trajectory prediction with an unknown goal destination. InIEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), pages 5817–5823, 2015

work page 2015

[13] [13]

J. P. Aubin and H. Frankowska.Set-valued analysis. Birkhäuser, 2008

work page 2008

[14] [14]

Durrieu, J.-Ph

J.-L. Durrieu, J.-Ph. Thiran, and F. Kelly. Lower and upper bounds for approximation of the Kullback-Leibler divergence between Gaussian Mixture Models. InIEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), pages 4833–4836, 2012

work page 2012

[15] [15]

Effective Model Pruning: Measure The Redundancy of Model Components

Yixuan Wang, Dan Guralnik, Saiedeh Akbari, and Warren Dixon. Effective model pruning.arXiv preprint arXiv:2509.25606, 2025. A Proof of Lemma 1 Let e≜x−x(θ) denote the error state and consider the Lyapunov function candidate V(e)≜ e 2 . Following [13, Theorem 10.1.3], let x: [0, T]→R n, T∈(0,∞] be a complete solution of the differential inclusion (3). Appl...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[16] [16]

Otherwise, ωe f f≥ Ne f f N + N−N e f f N s N−N e f f−1 (Ne f f+ 1)(N−1) .(34) The bounds are sharp

IfN e f f=Nthenω e f f= 1. Otherwise, ωe f f≥ Ne f f N + N−N e f f N s N−N e f f−1 (Ne f f+ 1)(N−1) .(34) The bounds are sharp. Then, apply Proposition 2 on ωe f f, the bound for the effective weight becomes ωe f f∈[(1 + 1√ 2N )/2,1] . Hence, the additional information∆H low k can be bounded by: 0≤∆H low k ≤ X ν∈{x,r,t} (1 2 − 1 2 √ 2N ) logC ν .(35) 9

work page