pith. machine review for the scientific record. sign in

arxiv: 2604.13788 · v1 · submitted 2026-04-15 · 💻 cs.RO · cs.CV

Recognition: unknown

Failure Identification in Imitation Learning Via Statistical and Semantic Filtering

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:36 UTC · model grok-4.3

classification 💻 cs.RO cs.CV
keywords imitation learningfailure detectionanomaly detectionvision-language modelsroboticsconformal predictionoptimal transportfailure identification
0
0 comments X

The pith

FIDeL identifies genuine robot failures by pairing statistical anomaly scores from demonstrations with vision-language model semantic checks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Imitation learning policies for robots perform well in training environments but break when rare events like hardware faults or unexpected actions occur outside the training data. Standard vision-based anomaly detection flags every deviation yet cannot separate harmless variations from actual failures that should halt execution. FIDeL builds a compact representation of successful demonstrations and uses optimal transport to align new observations, producing anomaly scores and heatmaps. Spatio-temporal thresholds come from an extension of conformal prediction, after which a vision-language model performs semantic filtering to keep only genuine failures. The method is policy-independent and is evaluated on a new multimodal dataset of real-world tasks called BotFails.

Core claim

FIDeL is a policy-independent failure detection module that leverages recent anomaly detection techniques to build compact representations of nominal demonstrations, aligns incoming observations via optimal transport matching to generate anomaly scores and heatmaps, derives spatio-temporal thresholds with an extension of conformal prediction, and applies a vision-language model to perform semantic filtering that discriminates benign anomalies from genuine failures.

What carries the argument

FIDeL module, which produces anomaly scores and heatmaps through optimal transport matching to nominal demonstrations, sets thresholds via conformal prediction, and applies vision-language model semantic filtering to remove benign cases.

If this is right

  • Enables safer real-world deployment of imitation learning policies by reducing false positives from benign deviations.
  • Provides policy-independent monitoring that requires no changes to the underlying learned controller.
  • Generates interpretable spatial heatmaps that localize where anomalies occur in the visual input.
  • Establishes BotFails as a multimodal benchmark for comparing failure detection methods on real robot tasks.
  • Yields measurable gains in both anomaly detection AUROC and failure-detection accuracy over prior statistical baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dual statistical-semantic approach may transfer to other sequential control domains such as autonomous navigation where false alarms carry high costs.
  • Performance will likely track advances in general vision-language models without requiring retraining on robot data.
  • If the conformal prediction thresholds prove robust across tasks, the method could reduce the need for extensive labeled failure data in new environments.
  • Extending the dataset to include a wider range of hardware faults would test whether the current gains generalize beyond the BotFails scenarios.

Load-bearing premise

The vision-language model can reliably discriminate benign anomalies from genuine failures without systematic misclassifications or task-specific fine-tuning.

What would settle it

Apply FIDeL to a held-out set of robot executions where the vision-language model is independently shown to mislabel the semantic content of failures versus benign anomalies; if the accuracy gain over baselines vanishes, the semantic filtering step does not hold.

Figures

Figures reproduced from arXiv: 2604.13788 by Fabrice Mayran de Chamisso, Jean-Baptiste Mouret, Quentin Rolland.

Figure 1
Figure 1. Figure 1: We introduce FIDeL, a framework for detecting failures in imitation learning (IL) policies. - Offline - Expert demonstrations are first encoded and stored in a memory buffer M. 0. Conformal Prediction Calibration — A decision threshold is computed from M using Conformal Prediction to determine when a score should be considered anomalous. - Online - 1. Anomaly Detection — During policy execution, anomaly sc… view at source ↗
Figure 2
Figure 2. Figure 2: BotFails dataset illustration - all images are illustrations of BotFails tasks executed by the expert - Domestic - 1. clear away the dishes, 2. make coffee, 3. set the table, 4. pour coffee, 5. sort groceries, 6. sort fruits and vegetables - Industrial - 7. sort screws, 8. measure voltage, 9. press buttons, 10. solder. is the set of admissible transport plans between two uniform distributions a, b ∈ ∆P . T… view at source ↗
Figure 3
Figure 3. Figure 3: Score illustration - Real-π soldering - episode 15 over 20 episodes of the evaluation set, score obtained with Representation and CP time. The Anomaly Detection score corresponds to the output of the AD module DA(ϕ(YN ), ϕt∗ ). µ t p is obtained when computing DA(ϕ(YN ), ϕ(EA)) and allows to compute the Threshold values, see section III-E. When the Anomaly Detection score is below the Threshold, no anomaly… view at source ↗
Figure 4
Figure 4. Figure 4: Heatmaps illustration - obtained using Representation AD and temporal/spatial CP - Real-π: ⃝1 expert, ⃝2 benign anomaly: screws present in the work plan, ⃝3 failure: dropping the iron - BotFails: ⃝4 expert, ⃝5 benign anomaly: someone walking in the background (top left corner), ⃝6 failure: spilling the cup. Representation (ours) logpZ0 [12] lopO AE STAC [11] AUROC ↑ F1@Opt ↑ AUROC ↑ F1@Opt ↑ AUROC ↑ F1@Opt… view at source ↗
Figure 5
Figure 5. Figure 5: Anomaly Detection evaluation with thresholding [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: End-to-end system evaluation (including semantic filtering) [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Imitation learning (IL) policies in robotics deliver strong performance in controlled settings but remain brittle in real-world deployments: rare events such as hardware faults, defective parts, unexpected human actions, or any state that lies outside the training distribution can lead to failed executions. Vision-based Anomaly Detection (AD) methods emerged as an appropriate solution to detect these anomalous failure states but do not distinguish failures from benign deviations. We introduce FIDeL (Failure Identification in Demonstration Learning), a policy-independent failure detection module. Leveraging recent AD methods, FIDeL builds a compact representation of nominal demonstrations and aligns incoming observations via optimal transport matching to produce anomaly scores and heatmaps. Spatio-temporal thresholds are derived with an extension of conformal prediction, and a Vision-Language Model (VLM) performs semantic filtering to discriminate benign anomalies from genuine failures. We also introduce BotFails, a multimodal dataset of real-world tasks for failure detection in robotics. FIDeL consistently outperforms state-of-the-art baselines, yielding +5.30% percent AUROC in anomaly detection and +17.38% percent failure-detection accuracy on BotFails compared to existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper introduces FIDeL, a policy-independent failure detection module for imitation learning in robotics. It combines anomaly detection via optimal transport matching on nominal demonstrations to produce anomaly scores and heatmaps, derives spatio-temporal thresholds using an extension of conformal prediction, and applies an off-the-shelf vision-language model for semantic filtering to distinguish genuine failures from benign anomalies. The authors also present the BotFails multimodal dataset of real-world robotic tasks and claim that FIDeL outperforms state-of-the-art baselines by +5.30% AUROC in anomaly detection and +17.38% in failure-detection accuracy on BotFails.

Significance. If the empirical claims hold after proper validation, the work would address a practical limitation in vision-based anomaly detection for imitation learning by adding a semantic layer to filter benign deviations, which is relevant for safe real-world robotic deployment. The introduction of the BotFails dataset is a positive contribution as a benchmark resource. The pipeline integrates established techniques (optimal transport, conformal prediction, VLMs) in a straightforward manner, but the overall significance remains provisional given the absence of detailed experimental support.

major comments (3)
  1. Abstract: The headline performance claims (+5.30% AUROC and +17.38% failure-detection accuracy) are presented without any reference to the specific baselines, number of runs, dataset splits, or statistical significance tests. This information is load-bearing for the central claim of consistent outperformance and must be supplied with full experimental details.
  2. Methods (VLM semantic filtering stage): The final step relies on an off-the-shelf VLM to convert anomaly scores/heatmaps into binary failure labels without task-specific fine-tuning, yet the manuscript supplies no quantitative validation (e.g., precision, recall, or agreement metrics) of this component on BotFails. Because the reported accuracy gains depend directly on correct discrimination between benign anomalies and genuine failures, the lack of such validation makes the superiority claim difficult to assess.
  3. Experimental evaluation: No ablation results, baseline descriptions, or implementation details for the optimal transport matching and conformal-prediction thresholds are referenced, despite these being core to the anomaly scoring pipeline. Without these, it is impossible to determine whether the gains arise from the proposed combination or from unstated implementation choices.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We will revise the manuscript to address the concerns regarding experimental details and validations, as outlined in our point-by-point responses below.

read point-by-point responses
  1. Referee: Abstract: The headline performance claims (+5.30% AUROC and +17.38% failure-detection accuracy) are presented without any reference to the specific baselines, number of runs, dataset splits, or statistical significance tests. This information is load-bearing for the central claim of consistent outperformance and must be supplied with full experimental details.

    Authors: We agree with the referee that the abstract should be more informative regarding the experimental setup supporting the performance claims. In the revised manuscript, we will update the abstract to explicitly mention the specific baselines used for comparison, the number of independent runs performed, the dataset splits employed for BotFails, and any statistical significance measures such as standard deviations across runs or p-values. revision: yes

  2. Referee: Methods (VLM semantic filtering stage): The final step relies on an off-the-shelf VLM to convert anomaly scores/heatmaps into binary failure labels without task-specific fine-tuning, yet the manuscript supplies no quantitative validation (e.g., precision, recall, or agreement metrics) of this component on BotFails. Because the reported accuracy gains depend directly on correct discrimination between benign anomalies and genuine failures, the lack of such validation makes the superiority claim difficult to assess.

    Authors: We acknowledge that the manuscript does not provide isolated quantitative metrics for the VLM-based semantic filtering stage on the BotFails dataset. To address this, we will conduct and report additional analysis in the revised version, including precision, recall, and agreement metrics (e.g., with human labels) for the VLM component. This will help isolate and validate its contribution to the overall failure detection accuracy improvement. revision: yes

  3. Referee: Experimental evaluation: No ablation results, baseline descriptions, or implementation details for the optimal transport matching and conformal-prediction thresholds are referenced, despite these being core to the anomaly scoring pipeline. Without these, it is impossible to determine whether the gains arise from the proposed combination or from unstated implementation choices.

    Authors: We will expand the experimental section in the revision to include detailed descriptions of the baselines, full implementation details for the optimal transport matching (such as the distance metric, regularization parameters, and solver) and the conformal prediction thresholds (including the calibration set construction and choice of significance level). Additionally, we will present comprehensive ablation studies that evaluate the impact of each component, including the optimal transport, conformal thresholds, and VLM filtering, on the final performance metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline built from independent established components

full rationale

The paper describes FIDeL as a composite module that applies existing anomaly detection techniques, optimal transport matching, an extension of conformal prediction for thresholds, and an off-the-shelf VLM for semantic filtering. Performance metrics are obtained by direct comparison against baselines on the newly introduced BotFails dataset. No equations, derivations, or load-bearing steps are shown that reduce by construction to self-defined quantities, fitted parameters renamed as predictions, or self-citation chains. The method chain remains independent of the reported numerical gains.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The approach relies on standard techniques (optimal transport, conformal prediction) whose assumptions are external to the paper; no new entities are postulated and free parameters appear limited to threshold derivation.

free parameters (1)
  • spatio-temporal thresholds
    Derived via extension of conformal prediction; specific values may involve data-driven choices not detailed in abstract.

pith-pipeline@v0.9.0 · 5502 in / 1133 out tokens · 72189 ms · 2026-05-10T13:36:26.914276+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 12 canonical work pages · 2 internal anchors

  1. [1]

    $\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization

    P. Intelligence, K. Black, et al., “π 0.5: a vision-language-action model with open-world generalization,” 2025. [Online]. Available: https://arxiv.org/abs/2504.16054

  2. [2]

    Diffusion policy: Visuomotor policy learning via action diffusion,

    C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” The International Journal of Robotics Research, 2024

  3. [3]

    Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware,

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware,” in Proceedings of Robotics: Science and Systems, Daegu, Republic of Korea, July 2023

  4. [4]

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning,

    Y . Gal et al., “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in Proc. of ICML, 2016

  5. [5]

    What uncertainties do we need in bayesian deep learning for computer vision?

    A. Kendall and Y . Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” in Proc. of NeurIPS, 2017

  6. [6]

    A review of uncertainty for deep reinforce- ment learning,

    O. Lockwood and M. Si, “A review of uncertainty for deep reinforce- ment learning,” in Proc. of AAAI, 2022

  7. [7]

    Uncertainty quantification and explo- ration for reinforcement learning,

    Y . Zhu, J. Dong, and H. Lam, “Uncertainty quantification and explo- ration for reinforcement learning,” Operations Research, 2024

  8. [8]

    Towards total recall in industrial anomaly detection,

    K. Roth et al., “Towards total recall in industrial anomaly detection,” in Proc. of CVPR, 2022

  9. [9]

    Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder,

    Y . Fan et al., “Video anomaly detection and localization via gaussian mixture fully convolutional variational autoencoder,” Computer Vision and Image Understanding, 2020

  10. [10]

    Safe llm-controlled robots with formal guarantees via reachability analysis.arXivpreprintarXiv:2503.03911, 2025

    A. Hafez, A. N. Akhormeh, A. Hegazy, and A. Alanwar, “Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis,” 2025. [Online]. Available: https://arxiv.org/abs/2503.03911

  11. [11]

    Unpacking failure modes of generative policies: Runtime monitoring of consistency and progress,

    C. Agia et al., “Unpacking failure modes of generative policies: Runtime monitoring of consistency and progress,” in Proc. of CoRL, 2025

  12. [12]

    Can we detect failures without failure data? uncertainty- aware runtime failure detection for imitation learning policies,

    C. Xu et al., “Can we detect failures without failure data? uncertainty- aware runtime failure detection for imitation learning policies,” arXiv preprint arXiv:2503.08558, 2025

  13. [13]

    A survey on unsupervised anomaly detection algorithms for industrial images,

    Y . Cui, Z. Liu, and S. Lian, “A survey on unsupervised anomaly detection algorithms for industrial images,” IEEE Access, 2023

  14. [14]

    Video anomaly detection in 10 years: a survey and outlook,

    M. Abdalla, S. Javed, M. Al Radi, A. Ulhaq, and N. Werghi, “Video anomaly detection in 10 years: a survey and outlook,” Neural Computing and Applications, vol. 37, no. 32, pp. 26 321– 26 364, Nov. 2025. [Online]. Available: https://doi.org/10.1007/ s00521-025-11659-8

  15. [15]

    Deep learning for video anomaly detection: A review,

    P. Wu, C. Pan, Y . Yan, G. Pang, P. Wang, and Y . Zhang, “Deep learning for video anomaly detection: A review,” 2024. [Online]. Available: https://arxiv.org/abs/2409.05383

  16. [16]

    Deep Learning for Anomaly Detection: A Survey

    R. Chalapathy and S. Chawla, “Deep learning for anomaly detection: A survey,” 2019. [Online]. Available: https://arxiv.org/abs/1901.03407

  17. [17]

    Support vector domain description,

    J. Tax, David M. and W. Duin, Robert P. “Support vector domain description,” Pattern Recognition Letters, 1999

  18. [18]

    Tibshirani, and Larry Wasserman

    J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-free predictive inference for regression,” 2017. [Online]. Available: https://arxiv.org/abs/1604.04173

  19. [19]

    The importance of being a band: Finite-sample exact distribution-free prediction sets for functional data,

    J. Diquigiovanni, M. Fontana, and S. Vantini, “The importance of being a band: Finite-sample exact distribution-free prediction sets for functional data,” 2021. [Online]. Available: https://arxiv.org/abs/2102. 06746

  20. [20]

    Lerobot: State-of-the-art machine learning for real-world robotics in pytorch,

    R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, and T. Wolf, “Lerobot: State-of-the-art machine learning for real-world robotics in pytorch,” https://github.com/huggingface/lerobot, 2024

  21. [21]

    Cutpaste: Self-supervised learning for anomaly detection and localization,

    C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” in Proc. of CVPR, 2021

  22. [22]

    Draem – a discriminatively trained reconstruction embedding for surface anomaly detection,

    V . Zavrtanik, M. Kristan, and D. Sko ˇcaj, “Draem – a discriminatively trained reconstruction embedding for surface anomaly detection,” in Proc. of ICCV, 2021

  23. [23]

    Natural synthetic anomalies for self-supervised anomaly detection and localization,

    H. M. Schl ¨uter, J. Tan, B. Hou, and B. Kainz, “Natural synthetic anomalies for self-supervised anomaly detection and localization,” in Proc. of ICCV, 2022

  24. [24]

    Attention guided anomaly localization in images,

    S. Venkataramanan, K.-C. Peng, R. V . Singh, and A. Mahalanobis, “Attention guided anomaly localization in images,” in Proc. of ECCV, 2020

  25. [25]

    Unsupervised two-stage anomaly detection,

    Y . Liu, C. Zhuang, and F. Lu, “Unsupervised two-stage anomaly detection,” 2021. [Online]. Available: https://arxiv.org/abs/2103.11671

  26. [26]

    Unsupervised anomaly segmentation via deep feature reconstruction,

    Y . Shi, J. Yang, and Z. Qi, “Unsupervised anomaly segmentation via deep feature reconstruction,” Neurocomputing, 2021

  27. [27]

    Student-teacher feature pyramid matching for anomaly detection,

    G. Wang, S. Han, E. Ding, and D. Huang, “Student-teacher feature pyramid matching for anomaly detection,” in Proc. of BMVC, 2021

  28. [28]

    Reconstructed student-teacher and discriminative networks for anomaly detection,

    S. Yamada, S. Kamiya, and K. Hotta, “Reconstructed student-teacher and discriminative networks for anomaly detection,” in Proc. of IROS, 2022

  29. [29]

    Asymmetric student-teacher networks for industrial anomaly detection,

    M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt, “Asymmetric student-teacher networks for industrial anomaly detection,” in Proc. of the IEEE/CVF winter conference on applications of computer vision, 2023

  30. [30]

    Self-supervised normalizing flows for image anomaly detection and localization,

    L.-L. Chiu and S.-H. Lai, “Self-supervised normalizing flows for image anomaly detection and localization,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023, pp. 2927–2936

  31. [31]

    Autoencoders for anomaly detection are unreliable,

    R. Bouman and T. Heskes, “Autoencoders for anomaly detection are unreliable,” 2025. [Online]. Available: https://arxiv.org/abs/2501. 13864

  32. [32]

    Variational inference with normal- izing flows,

    D. J. Rezende and S. Mohamed, “Variational inference with normal- izing flows,” in Proc. of ICML, 2015

  33. [33]

    Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,

    D. Gudovskiy, S. Ishizaka, and K. Kozuka, “Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows,” in Proc. of the IEEE/CVF winter conference on applications of computer vision, 2021

  34. [34]

    Fully convolutional cross-scale-flows for image-based defect detection,

    M. Rudolph, T. Wehrbein, B. Rosenhahn, and B. Wandt, “Fully convolutional cross-scale-flows for image-based defect detection,” in Proc. of the IEEE/CVF winter conference on applications of computer vision, 2021

  35. [35]

    Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows.arXiv preprint arXiv:2111.07677, 2021

    J. Yu et al., “Fastflow: Unsupervised anomaly detection and localization via 2d normalizing flows,” 2021. [Online]. Available: https://arxiv.org/abs/2111.07677

  36. [36]

    Why normalizing flows fail to detect out-of-distribution data,

    P. Kirichenko, P. Izmailov, and A. G. Wilson, “Why normalizing flows fail to detect out-of-distribution data,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 20 578–20 589. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2020/file/ ecb9fe2fbb99c31f567e9823e884dbec-Paper.pdf

  37. [37]

    Padim: a patch dis- tribution modeling framework for anomaly detection and localization,

    T. Defard, A. Setkov, A. Loesch, and R. Audigier, “Padim: a patch dis- tribution modeling framework for anomaly detection and localization,” in Proc. of ICPR, 2020

  38. [38]

    Glancing at the patch: Anomaly localization with global and local feature comparison,

    S. Wang, L. Wu, L. Cui, and Y . Shen, “Glancing at the patch: Anomaly localization with global and local feature comparison,” in Proc. of CVPR, 2021

  39. [39]

    Focus your distribution: Coarse-to-fine non- contrastive learning for anomaly detection and localization,

    Y . Zheng et al., “Focus your distribution: Coarse-to-fine non- contrastive learning for anomaly detection and localization,” in Proc. of ICME, 2022

  40. [40]

    Model-based runtime monitoring with interactive imitation learning,

    H. Liu, S. Dass, R. Mart ´ın-Mart´ın, and Y . Zhu, “Model-based runtime monitoring with interactive imitation learning,” in Proc. of ICRA, 2024

  41. [41]

    Asking for help: Failure prediction in behavioral cloning through value approximation,

    C. Gokmen, D. Ho, and M. Khansari, “Asking for help: Failure prediction in behavioral cloning through value approximation,” 2023. [Online]. Available: https://arxiv.org/abs/2302.04334

  42. [42]

    Multi-Task Interactive Robot Fleet Learning with Visual World Models,

    H. Liu et al., “Multi-Task Interactive Robot Fleet Learning with Visual World Models,” in Proc. of CoRL, 2024

  43. [43]

    Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers,

    A. Gupta, K. Chakraborty, and S. Bansal, “Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers,” in Proc. of ICRA, 2024

  44. [44]

    Grounding language plans in demonstrations through counterfactual perturba- tions,

    Y . Wang, T.-H. Wang, J. Mao, M. Hagenow, and J. Shah, “Grounding language plans in demonstrations through counterfactual perturba- tions,” in Proc. of ICLR, 2024

  45. [45]

    Rediffuser: Reliable decision-making using a diffuser with confidence estimation,

    N. He et al., “Rediffuser: Reliable decision-making using a diffuser with confidence estimation,” in Proc. of ICML, 2024

  46. [46]

    Conformal prediction for uncertainty-aware planning with diffusion dynamics model,

    J. Sun et al., “Conformal prediction for uncertainty-aware planning with diffusion dynamics model,” in Proc. of NeurIPS, 2023

  47. [47]

    Neural ordinary differential equations,

    R. T. Q. Chen, Y . Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in Proc. of NeurIPS, 2018

  48. [48]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of CVPR, 2016

  49. [49]

    Dinov2: Learning robust visual features without supervision,

    M. Oquab et al., “Dinov2: Learning robust visual features without supervision,” TMLR, 2024

  50. [50]

    Imitation learning with sinkhorn distances,

    G. Papagiannis and Y . Li, “Imitation learning with sinkhorn distances,” in ECML/PKDD, 2022. [Online]. Available: https: //arxiv.org/abs/2008.09167

  51. [51]

    Primal wasserstein imitation learning,

    R. Dadashi, L. Hussenot, M. Geist, and O. Pietquin, “Primal wasserstein imitation learning,” 2021. [Online]. Available: https: //arxiv.org/abs/2006.04678

  52. [52]

    Qwen2.5 Technical Report

    Qwen, :, A. Yang, and B. Y . et al, “Qwen2.5 technical report,” 2025. [Online]. Available: https://arxiv.org/abs/2412.15115

  53. [53]

    Normalizing flow neural networks by jko scheme,

    C. Xu, X. Cheng, and Y . Xie, “Normalizing flow neural networks by jko scheme,” in Proc. of NeurIPS, 2023

  54. [54]

    Learning temporal regularity in video sequences,

    M. Hasan et al., “Learning temporal regularity in video sequences,” in Proc. of CVPR, 2016

  55. [55]

    Generative neural networks for anomaly detection in crowded scenes,

    T. Wang et al., “Generative neural networks for anomaly detection in crowded scenes,” IEEE TIFS, 2018