Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory

Changchuan Yang; Guanzhong Tian; Haizhou Ge; Hongrui Zhu; Yuhang Dong

arxiv: 2504.04991 · v4 · submitted 2025-04-07 · 💻 cs.RO

Wavelet Policy: Imitation Learning in the Scale Domain with World Prior Memory

Changchuan Yang , Yuhang Dong , Guanzhong Tian , Haizhou Ge , Hongrui Zhu This is my paper

Pith reviewed 2026-05-22 20:45 UTC · model grok-4.3

classification 💻 cs.RO

keywords wavelet policyimitation learningworld prior memoryrobotic manipulationscale domainvisuomotor policylong-horizon taskswavelet transform

0 comments

The pith

Wavelet Policy encodes persistent scene structure from background images into memory tokens and decomposes actions in the wavelet domain to improve long-horizon robot manipulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Wavelet Policy as a lightweight imitation learning method that pairs World Prior Memory with wavelet-based multi-scale action modeling. Standard time-domain action prediction often lacks durable scene awareness and memory across long sequences, whereas full world-model approaches add heavy overhead. The method extracts compact tokens from static background images to capture persistent physical structure, injects them as world-prior tokens into the encoder, decomposes horizon-aligned latent action tokens across scales using a Single-Encoder Multiple-Decoder architecture, and reconstructs executable actions via inverse wavelet transform. A world-prior adaptation loss keeps the memory encoder lightweight and stable. Experiments across four simulated and six real-world manipulation tasks show consistent gains over strong baselines.

Core claim

The central claim is that encoding persistent physical scene structure from static background images into compact memory tokens, fusing them as world-prior tokens during encoding, and performing wavelet-domain decomposition on horizon-aligned latent action tokens with a Single-Encoder Multiple-Decoder architecture yields reconstructed actions that improve performance on long-horizon embodied manipulation tasks while remaining efficient.

What carries the argument

World Prior Memory (WPM) fused into the encoder together with wavelet-based multi-scale decomposition of latent action tokens via a Single-Encoder Multiple-Decoder (SE2MD) architecture.

If this is right

Outperforms strong baselines on four simulated and six real-world robotic manipulation tasks.
Delivers better physical scene awareness and long-horizon memory than direct time-domain prediction.
Avoids the substantial computation overhead of full world-model-based policies.
Maintains a lightweight and stable background encoder through the world-prior adaptation loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the memory tokens prove stable across lighting or minor layout changes, the same background-encoding step could be reused across multiple tasks without retraining.
The wavelet scale separation may transfer to other sequence-generation domains where actions unfold at multiple time resolutions.
Replacing the static background encoder with a slow-updating module could extend the method to mildly dynamic scenes without increasing inference cost.

Load-bearing premise

Persistent physical scene structure can be reliably encoded from static background images into compact memory tokens that remain lightweight and stable while improving policy performance on manipulation tasks.

What would settle it

An ablation study that removes the world-prior memory tokens or the wavelet decomposition and finds no measurable drop in success rate on long-horizon manipulation tasks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2504.04991 by Changchuan Yang, Guanzhong Tian, Haizhou Ge, Hongrui Zhu, Yuhang Dong.

**Figure 2.** Figure 2: Wavelet Policy employs a modular design with three key components: a FE module for initial visual processing, the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Left: The MuJuCo tasks Transfer Cube, Bimanual Insertion, Transfer Plus, and Stack Two Blocks. Right: The realworld tasks Stack Block, Store Strawberry, Store Lemon, Store Items,Assist Sewing, and Stack Blocks. takes the pair (tJ,k, H) and (dj,k, H) as input: y (j) t = ( Decoder0 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Multi-scale error comparison between ACT and our Wavelet Policy on the four tasks. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Progressive accumulation of success rates across sub [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The relationship between the success rate of the task and the value of N for the LSDF. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Conventional visuomotor imitation learning usually predicts future robot actions directly in the time domain. Such formulations often have limited physical scene awareness and weak long-horizon memory. In contrast, world-model-based perception and memory-augmented policies can improve world awareness with substantial computation overhead. In this work, we propose Wavelet Policy, a lightweight imitation learning framework that combines World Prior Memory (WPM) with wavelet-based multi-scale action modeling. Our key idea is to encode persistent physical scene structure from static background images into compact memory tokens, which are fused into world-prior tokens and injected into the encoder during forward propagation. Based on this memory-conditioned representation, We further perform wavelet-domain decomposition over horizon-aligned latent action tokens and adopt a Single-Encoder Multiple-Decoder (SE2MD) architecture to model latent components at different temporal scales. The resulting latent subbands are reconstructed through inverse wavelet transform and finally projected into executable action chunks. To facilitate efficient world prior learning, we introduce a world-prior adaptation loss, encouraging the background encoder to retain persistent scene knowledge while remaining lightweight and stable. Extensive experiments on four simulated and six real-world robotic manipulation tasks show that Wavelet Policy consistently outperforms strong baselines. These results demonstrate that combining scale-domain action modeling with world-prior memory provides an effective and efficient solution for long-horizon embodied manipulation. We release the source code, data and model checkpoint of simulation task at https://github.com/lurenjia384/Wavelet_Policy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Wavelet Policy, a lightweight imitation learning framework for robotic manipulation that encodes persistent physical scene structure from static background images into compact World Prior Memory (WPM) tokens. These tokens are fused into world-prior representations and injected into the encoder; actions are then modeled via wavelet-domain decomposition of horizon-aligned latent tokens using a Single-Encoder Multiple-Decoder (SE2MD) architecture, with reconstruction via inverse wavelet transform. A world-prior adaptation loss is introduced to keep the background encoder lightweight and stable. The paper reports consistent outperformance over strong baselines on four simulated and six real-world long-horizon manipulation tasks and releases code, data, and checkpoints for the simulation tasks.

Significance. If the empirical gains hold under rigorous controls, the work demonstrates that scale-domain action modeling combined with a lightweight world-prior memory mechanism can improve long-horizon performance without the computational overhead of full world models. The release of code and checkpoints is a clear strength that supports reproducibility and further investigation of the WPM and wavelet components.

major comments (2)

The central empirical claim of consistent outperformance on ten tasks rests on the stability of WPM tokens derived from static backgrounds. The skeptic note and abstract description indicate that no ablations isolate whether performance gains survive object motion or occlusion changes that alter the scene after the background image is captured; this is load-bearing for the claim that WPM remains 'lightweight, stable, and performance-improving' in realistic manipulation.
Experiments section (and abstract): the reported outperformance lacks accompanying details on experimental controls, error bars, statistical significance tests, or data exclusion rules. Without these, it is not possible to assess whether the gains are robust or could be explained by implementation differences rather than the proposed WPM + wavelet combination.

minor comments (2)

Clarify the exact wavelet family and decomposition levels used in the SE2MD architecture, as these choices directly affect the temporal scale modeling.
The world-prior adaptation loss weight is listed as a free parameter; report its value and sensitivity analysis in the experimental setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. We address each major comment below.

read point-by-point responses

Referee: The central empirical claim of consistent outperformance on ten tasks rests on the stability of WPM tokens derived from static backgrounds. The skeptic note and abstract description indicate that no ablations isolate whether performance gains survive object motion or occlusion changes that alter the scene after the background image is captured; this is load-bearing for the claim that WPM remains 'lightweight, stable, and performance-improving' in realistic manipulation.

Authors: The WPM is explicitly designed to encode persistent physical scene structure from a static background image captured before task execution, as described in the manuscript. Our simulated and real-world experiments use manipulation tasks in which the background remains fixed while only foreground objects are moved. The reported gains are therefore demonstrated under the method's stated assumptions. We did not conduct ablations involving post-capture background alterations because such changes lie outside the intended scope of WPM. In the revision we will add an explicit statement of this scope and a brief discussion of the limitation for dynamic backgrounds. revision: partial
Referee: Experiments section (and abstract): the reported outperformance lacks accompanying details on experimental controls, error bars, statistical significance tests, or data exclusion rules. Without these, it is not possible to assess whether the gains are robust or could be explained by implementation differences rather than the proposed WPM + wavelet combination.

Authors: We agree that the current manuscript would benefit from greater transparency on these points. In the revised version we will expand the Experiments section to describe the experimental controls, report error bars computed over multiple random seeds, include statistical significance tests, and state any data exclusion rules applied. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with independent experimental validation

full rationale

The paper presents an empirical imitation learning framework whose central claims rest on experimental outperformance across simulated and real-world tasks rather than any closed mathematical derivation. No equations are shown that define a quantity in terms of itself or rename a fitted parameter as a prediction. The world-prior adaptation loss and wavelet decomposition are architectural choices whose performance impact is measured externally via baselines and ablations; the released code further allows independent reproduction. Self-citations, if present, are not load-bearing for the core result. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of the proposed SE2MD architecture and world-prior adaptation loss; the paper introduces no new physical entities or unproven mathematical axioms beyond standard wavelet properties.

free parameters (1)

world-prior adaptation loss weight
Hyperparameter balancing the background encoder loss against the main imitation objective; value not specified in abstract.

axioms (1)

standard math Wavelet transform allows perfect reconstruction via inverse transform
Invoked when reconstructing latent subbands into action chunks.

invented entities (1)

World Prior Memory (WPM) tokens no independent evidence
purpose: Compact encoding of persistent scene structure from static background images
New component introduced to inject world awareness into the policy encoder.

pith-pipeline@v0.9.0 · 5808 in / 1260 out tokens · 38413 ms · 2026-05-22T20:45:44.084029+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SkiP: When to Skip and When to Refine for Efficient Robot Manipulation
cs.RO 2026-05 unverdicted novelty 7.0

SkiP introduces action relabeling and Motion Spectrum Keying to skip redundant steps in robot trajectories, cutting executed steps by 15-40% while maintaining success rates across 72 simulated and 3 real tasks.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Diffusion policy: Visuomotor policy learning via ac- tion diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via ac- tion diffusion,” The International Journal of Robotics Research , p. 02783649241273668, 2023

work page 2023
[2]

Integrating natural language instructions into the action chunking transformer for multi-task robotic manipulation

K. Rohling, “Integrating natural language instructions into the action chunking transformer for multi-task robotic manipulation.” [Online]. Available: https://github.com/krohling

work page
[3]

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” arXiv preprint arXiv:2304.13705, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Exploring embodied intelligence in soft robotics: a re- view,

Z. Zhao, Q. Wu, J. Wang, B. Zhang, C. Zhong, and A. A. Zhilenkov, “Exploring embodied intelligence in soft robotics: a re- view,” Biomimetics, vol. 9, no. 4, p. 248, 2024

work page 2024
[5]

Time series representation learning: A survey on deep learning techniques for time series forecasting,

T. Schmieg and C. Lanquillon, “Time series representation learning: A survey on deep learning techniques for time series forecasting,” in International Conference on Human-Computer Interaction. Springer, 2024, pp. 422–435

work page 2024
[6]

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” arXiv preprint arXiv:2401.02117, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Interact: Inter-dependency aware action chunking with hierarchical attention transformers for bimanual manipulation,

A. C.-W. Lee, I. Chuang, L.-Y . Chen, and I. Soltani, “Interact: Inter-dependency aware action chunking with hierarchical attention transformers for bimanual manipulation,” in Conference on Robot Learning. PMLR, 2025, pp. 1730–1743

work page 2025
[8]

Hierarchical action chunking transformer: Learning tempo- ral multimodality from demonstrations with fast imitation behavior,

J. H. Park, W. Choi, S. Hong, H. Seo, J. Ahn, C. Ha, H. Han, and J. Kwon, “Hierarchical action chunking transformer: Learning tempo- ral multimodality from demonstrations with fast imitation behavior,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 12 648–12 654

work page 2024
[9]

Wavelet transform,

D. Zhang and D. Zhang, “Wavelet transform,” Fundamentals of image data mining: Analysis, Features, Classification and Retrieval , pp. 35– 44, 2019

work page 2019
[10]

Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning,

J. Hua, L. Zeng, G. Li, and Z. Ju, “Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning,” Sensors, vol. 21, no. 4, p. 1278, 2021

work page 2021
[11]

Leveraging imitation learning in agricultural robotics: a comprehensive survey and comparative analysis,

S. Mahmoudi, A. Davar, P. Sohrabipour, R. B. Bist, Y . Tao, and D. Wang, “Leveraging imitation learning in agricultural robotics: a comprehensive survey and comparative analysis,”Frontiers in Robotics and AI, vol. 11, p. 1441312, 2024

work page 2024
[12]

Keypoint action tokens enable in-context imitation learning in robotics,

N. Di Palo and E. Johns, “Keypoint action tokens enable in-context imitation learning in robotics,”arXiv preprint arXiv:2403.19578, 2024

work page arXiv 2024
[13]

Distribution- ally robust behavioral cloning for robust imitation learning,

K. Panaganti, Z. Xu, D. Kalathil, and M. Ghavamzadeh, “Distribution- ally robust behavioral cloning for robust imitation learning,” in 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 1342–1347

work page 2023
[14]

Behavioral cloning and imitation learning,

B. Lin, “Behavioral cloning and imitation learning,” in Reinforcement Learning Methods in Speech and Language Technology . Springer, 2024, pp. 63–67

work page 2024
[15]

From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

Z. Li, R. P ´erez-Dattari, R. Babuska, C. Della Santina, and J. Kober, “Beyond behavior cloning: Robustness through interactive imitation and contrastive learning,” arXiv preprint arXiv:2502.07645 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Causal imitation learn- ing via inverse reinforcement learning,

K. Ruan, J. Zhang, X. Di, and E. Bareinboim, “Causal imitation learn- ing via inverse reinforcement learning,” in The Eleventh International Conference on Learning Representations , 2023

work page 2023
[17]

A survey of imitation learning: Algorithms, recent developments, and challenges,

M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A survey of imitation learning: Algorithms, recent developments, and challenges,” IEEE Transactions on Cybernetics , 2024

work page 2024
[18]

Deep imitation learning for humanoid loco-manipulation through human teleoperation,

M. Seo, S. Han, K. Sim, S. H. Bang, C. Gonzalez, L. Sentis, and Y . Zhu, “Deep imitation learning for humanoid loco-manipulation through human teleoperation,” in 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids). IEEE, 2023, pp. 1–8

work page 2023
[19]

Fusion dynamical systems with machine learning in imitation learn- ing: A comprehensive overview,

Y . Hu, F. J. Abu-Dakka, F. Chen, X. Luo, Z. Li, A. Knoll, and W. Ding, “Fusion dynamical systems with machine learning in imitation learn- ing: A comprehensive overview,”Information Fusion, p. 102379, 2024

work page 2024
[20]

W.et al.Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks (2024)

J. W. Kim, T. Z. Zhao, S. Schmidgall, A. Deguet, M. Kobilarov, C. Finn, and A. Krieger, “Surgical robot transformer (srt): Imitation learning for surgical tasks,” arXiv preprint arXiv:2407.12998 , 2024

work page arXiv 2024
[21]

Mtmol- gpt: De novo multi-target molecular generation with transformer- based generative adversarial imitation learning,

C. Ai, H. Yang, X. Liu, R. Dong, Y . Ding, and F. Guo, “Mtmol- gpt: De novo multi-target molecular generation with transformer- based generative adversarial imitation learning,” PLoS computational biology, vol. 20, no. 6, p. e1012229, 2024

work page 2024
[22]

Model-based imitation learn- ing for urban driving,

A. Hu, G. Corrado, N. Griffiths, Z. Murez, C. Gurau, H. Yeo, A. Kendall, R. Cipolla, and J. Shotton, “Model-based imitation learn- ing for urban driving,” Advances in Neural Information Processing Systems, vol. 35, pp. 20 703–20 716, 2022

work page 2022
[23]

Visual imitation learning of task-oriented object grasping and rearrangement,

Y . Cai, J. Gao, C. Pohl, and T. Asfour, “Visual imitation learning of task-oriented object grasping and rearrangement,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2024, pp. 364–371

work page 2024
[24]

Deep learning-based imitation of human actions for autonomous pick-and-place tasks,

A. Saadati, M. T. Masouleh, and A. Kalhor, “Deep learning-based imitation of human actions for autonomous pick-and-place tasks,” in 2024 32nd International Conference on Electrical Engineering (ICEE). IEEE, 2024, pp. 1–7

work page 2024
[25]

Trajectory tracking control for robotic manipulator based on soft actor–critic and generative adversarial imitation learning,

J. Hu, F. Wang, X. Li, Y . Qin, F. Guo, and M. Jiang, “Trajectory tracking control for robotic manipulator based on soft actor–critic and generative adversarial imitation learning,” Biomimetics, vol. 9, no. 12, p. 779, 2024

work page 2024
[26]

T-conv: A convolutional neural network for multi-scale taxi trajectory prediction,

J. Lv, Q. Li, Q. Sun, and X. Wang, “T-conv: A convolutional neural network for multi-scale taxi trajectory prediction,” in 2018 IEEE international conference on big data and smart computing (bigcomp) . IEEE, 2018, pp. 82–89

work page 2018
[27]

Multi-scale and multi- scope convolutional neural networks for destination prediction of trajectories,

J. Lv, Q. Sun, Q. Li, and L. Moreira-Matias, “Multi-scale and multi- scope convolutional neural networks for destination prediction of trajectories,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 8, pp. 3184–3195, 2019

work page 2019
[28]

Mstf: Multiscale transformer for incomplete trajectory prediction,

Z. Liu, C. Li, N. Yang, Y . Wang, J. Ma, G. Cheng, and X. Zhao, “Mstf: Multiscale transformer for incomplete trajectory prediction,” in 2024 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2024, pp. 573–580

work page 2024
[29]

Multi- scale temporal fusion transformer for incomplete vehicle trajectory prediction,

Z. Liu, C. Li, Y . Wang, N. Yang, X. Fan, J. Ma, and X. Zhao, “Multi- scale temporal fusion transformer for incomplete vehicle trajectory prediction,” IEEE Transactions on Intelligent Vehicles , 2024

work page 2024
[30]

3d diffusion policy: Generalizable visuomotor policy learning via simple 3d rep- resentations,

Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d rep- resentations,” in ICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation

work page 2024
[31]

Flight trajectory prediction enabled by time-frequency wavelet transform,

Z. Zhang, D. Guo, S. Zhou, J. Zhang, and Y . Lin, “Flight trajectory prediction enabled by time-frequency wavelet transform,” Nature Communications, vol. 14, no. 1, p. 5258, 2023

work page 2023
[32]

Unlocking fine-grained details with wavelet- based high-frequency enhancement in transformers,

R. Azad, A. Kazerouni, A. Sulaiman, A. Bozorgpour, E. K. Aghdam, A. Jose, and D. Merhof, “Unlocking fine-grained details with wavelet- based high-frequency enhancement in transformers,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2023, pp. 207–216

work page 2023
[33]

Sdwnet: A straight dilated network with wavelet transformation for image deblurring,

W. Zou, M. Jiang, Y . Zhang, L. Chen, Z. Lu, and Y . Wu, “Sdwnet: A straight dilated network with wavelet transformation for image deblurring,” in Proceedings of the IEEE/CVF international conference on computer vision , 2021, pp. 1895–1904

work page 2021
[34]

Comparative analysis of stft and wavelet transform in time-frequency analysis of non-stationary signals,

M. L. A. Sarna, M. R. Hossain, and M. A. Islam, “Comparative analysis of stft and wavelet transform in time-frequency analysis of non-stationary signals,” International Journal of Novel Research in Engineering and Science , 2024

work page 2024
[35]

Automated surface texture analysis via discrete cosine transform and discrete wavelet transform,

M. C. Yesilli, J. Chen, F. A. Khasawneh, and Y . Guo, “Automated surface texture analysis via discrete cosine transform and discrete wavelet transform,”Precision Engineering, vol. 77, pp. 141–152, 2022

work page 2022
[36]

Effects of tetrahydrolipstatin on glioblastoma in mice: Mri-based morphologic and texture analysis correlated with histopathology and immunochemistry findings—a pilot study,

S. Wagner, C. Ewald, D. Freitag, K.-H. Herrmann, A. Koch, J. Bauer, T. J. V ogl, A. Kemmling, and H. Gufler, “Effects of tetrahydrolipstatin on glioblastoma in mice: Mri-based morphologic and texture analysis correlated with histopathology and immunochemistry findings—a pilot study,” Cancers, vol. 16, no. 8, p. 1591, 2024

work page 2024
[37]

Image compression using discrete wavelet transform and convolution neural networks,

G. S. Kumar and M. L. P. Rani, “Image compression using discrete wavelet transform and convolution neural networks,” Journal of Elec- trical Engineering & Technology, vol. 19, no. 6, pp. 3713–3721, 2024

work page 2024
[38]

The application of dicrete wavelet transform for digital image compression,

A. K. Umam, P. T. B. Ngastiti, A. Alfan, Z. Shahadah, and A. F. Muamalah, “The application of dicrete wavelet transform for digital image compression,” Jurnal Matematika Sains dan Teknologi, vol. 25, no. 1, pp. 01–08, 2024

work page 2024
[39]

The wavelet transform for feature extraction and surface roughness evaluation after micromachining,

D. Grochała, R. Grzejda, A. Parus, and S. Berczy ´nski, “The wavelet transform for feature extraction and surface roughness evaluation after micromachining,” Coatings, vol. 14, no. 2, p. 210, 2024

work page 2024
[40]

Tunable q-factor wavelet transform based lung signal decomposition and statistical feature extraction for effective lung disease classification,

B. Cansiz, C. U. Kilinc, and G. Serbes, “Tunable q-factor wavelet transform based lung signal decomposition and statistical feature extraction for effective lung disease classification,” Computers in Biology and Medicine , vol. 178, p. 108698, 2024

work page 2024
[41]

Optimizing transformer models for low-latency inference: Techniques, architectures, and code implemen- tations,

A. Kasoju and T. Vishwakarma, “Optimizing transformer models for low-latency inference: Techniques, architectures, and code implemen- tations,”International Journal of Science and Research (IJSR), vol. 14, pp. 857–866, 2025

work page 2025
[42]

Imitation learning through prior injection in markov decision processes,

G. Di Gennaro, A. Buonanno, F. Verolla, G. Fioretti, F. A. Palmieri, and K. R. Pattipati, “Imitation learning through prior injection in markov decision processes,” in Applications of Artificial Intelligence and Neural Systems to Data Science . Springer, 2023, pp. 103–113

work page 2023

[1] [1]

Diffusion policy: Visuomotor policy learning via ac- tion diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via ac- tion diffusion,” The International Journal of Robotics Research , p. 02783649241273668, 2023

work page 2023

[2] [2]

Integrating natural language instructions into the action chunking transformer for multi-task robotic manipulation

K. Rohling, “Integrating natural language instructions into the action chunking transformer for multi-task robotic manipulation.” [Online]. Available: https://github.com/krohling

work page

[3] [3]

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” arXiv preprint arXiv:2304.13705, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Exploring embodied intelligence in soft robotics: a re- view,

Z. Zhao, Q. Wu, J. Wang, B. Zhang, C. Zhong, and A. A. Zhilenkov, “Exploring embodied intelligence in soft robotics: a re- view,” Biomimetics, vol. 9, no. 4, p. 248, 2024

work page 2024

[5] [5]

Time series representation learning: A survey on deep learning techniques for time series forecasting,

T. Schmieg and C. Lanquillon, “Time series representation learning: A survey on deep learning techniques for time series forecasting,” in International Conference on Human-Computer Interaction. Springer, 2024, pp. 422–435

work page 2024

[6] [6]

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Z. Fu, T. Z. Zhao, and C. Finn, “Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” arXiv preprint arXiv:2401.02117, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Interact: Inter-dependency aware action chunking with hierarchical attention transformers for bimanual manipulation,

A. C.-W. Lee, I. Chuang, L.-Y . Chen, and I. Soltani, “Interact: Inter-dependency aware action chunking with hierarchical attention transformers for bimanual manipulation,” in Conference on Robot Learning. PMLR, 2025, pp. 1730–1743

work page 2025

[8] [8]

Hierarchical action chunking transformer: Learning tempo- ral multimodality from demonstrations with fast imitation behavior,

J. H. Park, W. Choi, S. Hong, H. Seo, J. Ahn, C. Ha, H. Han, and J. Kwon, “Hierarchical action chunking transformer: Learning tempo- ral multimodality from demonstrations with fast imitation behavior,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 12 648–12 654

work page 2024

[9] [9]

Wavelet transform,

D. Zhang and D. Zhang, “Wavelet transform,” Fundamentals of image data mining: Analysis, Features, Classification and Retrieval , pp. 35– 44, 2019

work page 2019

[10] [10]

Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning,

J. Hua, L. Zeng, G. Li, and Z. Ju, “Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning,” Sensors, vol. 21, no. 4, p. 1278, 2021

work page 2021

[11] [11]

Leveraging imitation learning in agricultural robotics: a comprehensive survey and comparative analysis,

S. Mahmoudi, A. Davar, P. Sohrabipour, R. B. Bist, Y . Tao, and D. Wang, “Leveraging imitation learning in agricultural robotics: a comprehensive survey and comparative analysis,”Frontiers in Robotics and AI, vol. 11, p. 1441312, 2024

work page 2024

[12] [12]

Keypoint action tokens enable in-context imitation learning in robotics,

N. Di Palo and E. Johns, “Keypoint action tokens enable in-context imitation learning in robotics,”arXiv preprint arXiv:2403.19578, 2024

work page arXiv 2024

[13] [13]

Distribution- ally robust behavioral cloning for robust imitation learning,

K. Panaganti, Z. Xu, D. Kalathil, and M. Ghavamzadeh, “Distribution- ally robust behavioral cloning for robust imitation learning,” in 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 1342–1347

work page 2023

[14] [14]

Behavioral cloning and imitation learning,

B. Lin, “Behavioral cloning and imitation learning,” in Reinforcement Learning Methods in Speech and Language Technology . Springer, 2024, pp. 63–67

work page 2024

[15] [15]

From Action Labels to Sets: Rethinking Action Supervision for Imitation Learning from Corrective Feedback

Z. Li, R. P ´erez-Dattari, R. Babuska, C. Della Santina, and J. Kober, “Beyond behavior cloning: Robustness through interactive imitation and contrastive learning,” arXiv preprint arXiv:2502.07645 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[16] [16]

Causal imitation learn- ing via inverse reinforcement learning,

K. Ruan, J. Zhang, X. Di, and E. Bareinboim, “Causal imitation learn- ing via inverse reinforcement learning,” in The Eleventh International Conference on Learning Representations , 2023

work page 2023

[17] [17]

A survey of imitation learning: Algorithms, recent developments, and challenges,

M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A survey of imitation learning: Algorithms, recent developments, and challenges,” IEEE Transactions on Cybernetics , 2024

work page 2024

[18] [18]

Deep imitation learning for humanoid loco-manipulation through human teleoperation,

M. Seo, S. Han, K. Sim, S. H. Bang, C. Gonzalez, L. Sentis, and Y . Zhu, “Deep imitation learning for humanoid loco-manipulation through human teleoperation,” in 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids). IEEE, 2023, pp. 1–8

work page 2023

[19] [19]

Fusion dynamical systems with machine learning in imitation learn- ing: A comprehensive overview,

Y . Hu, F. J. Abu-Dakka, F. Chen, X. Luo, Z. Li, A. Knoll, and W. Ding, “Fusion dynamical systems with machine learning in imitation learn- ing: A comprehensive overview,”Information Fusion, p. 102379, 2024

work page 2024

[20] [20]

W.et al.Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks (2024)

J. W. Kim, T. Z. Zhao, S. Schmidgall, A. Deguet, M. Kobilarov, C. Finn, and A. Krieger, “Surgical robot transformer (srt): Imitation learning for surgical tasks,” arXiv preprint arXiv:2407.12998 , 2024

work page arXiv 2024

[21] [21]

Mtmol- gpt: De novo multi-target molecular generation with transformer- based generative adversarial imitation learning,

C. Ai, H. Yang, X. Liu, R. Dong, Y . Ding, and F. Guo, “Mtmol- gpt: De novo multi-target molecular generation with transformer- based generative adversarial imitation learning,” PLoS computational biology, vol. 20, no. 6, p. e1012229, 2024

work page 2024

[22] [22]

Model-based imitation learn- ing for urban driving,

A. Hu, G. Corrado, N. Griffiths, Z. Murez, C. Gurau, H. Yeo, A. Kendall, R. Cipolla, and J. Shotton, “Model-based imitation learn- ing for urban driving,” Advances in Neural Information Processing Systems, vol. 35, pp. 20 703–20 716, 2022

work page 2022

[23] [23]

Visual imitation learning of task-oriented object grasping and rearrangement,

Y . Cai, J. Gao, C. Pohl, and T. Asfour, “Visual imitation learning of task-oriented object grasping and rearrangement,” in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . IEEE, 2024, pp. 364–371

work page 2024

[24] [24]

Deep learning-based imitation of human actions for autonomous pick-and-place tasks,

A. Saadati, M. T. Masouleh, and A. Kalhor, “Deep learning-based imitation of human actions for autonomous pick-and-place tasks,” in 2024 32nd International Conference on Electrical Engineering (ICEE). IEEE, 2024, pp. 1–7

work page 2024

[25] [25]

Trajectory tracking control for robotic manipulator based on soft actor–critic and generative adversarial imitation learning,

J. Hu, F. Wang, X. Li, Y . Qin, F. Guo, and M. Jiang, “Trajectory tracking control for robotic manipulator based on soft actor–critic and generative adversarial imitation learning,” Biomimetics, vol. 9, no. 12, p. 779, 2024

work page 2024

[26] [26]

T-conv: A convolutional neural network for multi-scale taxi trajectory prediction,

J. Lv, Q. Li, Q. Sun, and X. Wang, “T-conv: A convolutional neural network for multi-scale taxi trajectory prediction,” in 2018 IEEE international conference on big data and smart computing (bigcomp) . IEEE, 2018, pp. 82–89

work page 2018

[27] [27]

Multi-scale and multi- scope convolutional neural networks for destination prediction of trajectories,

J. Lv, Q. Sun, Q. Li, and L. Moreira-Matias, “Multi-scale and multi- scope convolutional neural networks for destination prediction of trajectories,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 8, pp. 3184–3195, 2019

work page 2019

[28] [28]

Mstf: Multiscale transformer for incomplete trajectory prediction,

Z. Liu, C. Li, N. Yang, Y . Wang, J. Ma, G. Cheng, and X. Zhao, “Mstf: Multiscale transformer for incomplete trajectory prediction,” in 2024 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2024, pp. 573–580

work page 2024

[29] [29]

Multi- scale temporal fusion transformer for incomplete vehicle trajectory prediction,

Z. Liu, C. Li, Y . Wang, N. Yang, X. Fan, J. Ma, and X. Zhao, “Multi- scale temporal fusion transformer for incomplete vehicle trajectory prediction,” IEEE Transactions on Intelligent Vehicles , 2024

work page 2024

[30] [30]

3d diffusion policy: Generalizable visuomotor policy learning via simple 3d rep- resentations,

Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3d diffusion policy: Generalizable visuomotor policy learning via simple 3d rep- resentations,” in ICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation

work page 2024

[31] [31]

Flight trajectory prediction enabled by time-frequency wavelet transform,

Z. Zhang, D. Guo, S. Zhou, J. Zhang, and Y . Lin, “Flight trajectory prediction enabled by time-frequency wavelet transform,” Nature Communications, vol. 14, no. 1, p. 5258, 2023

work page 2023

[32] [32]

Unlocking fine-grained details with wavelet- based high-frequency enhancement in transformers,

R. Azad, A. Kazerouni, A. Sulaiman, A. Bozorgpour, E. K. Aghdam, A. Jose, and D. Merhof, “Unlocking fine-grained details with wavelet- based high-frequency enhancement in transformers,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2023, pp. 207–216

work page 2023

[33] [33]

Sdwnet: A straight dilated network with wavelet transformation for image deblurring,

W. Zou, M. Jiang, Y . Zhang, L. Chen, Z. Lu, and Y . Wu, “Sdwnet: A straight dilated network with wavelet transformation for image deblurring,” in Proceedings of the IEEE/CVF international conference on computer vision , 2021, pp. 1895–1904

work page 2021

[34] [34]

Comparative analysis of stft and wavelet transform in time-frequency analysis of non-stationary signals,

M. L. A. Sarna, M. R. Hossain, and M. A. Islam, “Comparative analysis of stft and wavelet transform in time-frequency analysis of non-stationary signals,” International Journal of Novel Research in Engineering and Science , 2024

work page 2024

[35] [35]

Automated surface texture analysis via discrete cosine transform and discrete wavelet transform,

M. C. Yesilli, J. Chen, F. A. Khasawneh, and Y . Guo, “Automated surface texture analysis via discrete cosine transform and discrete wavelet transform,”Precision Engineering, vol. 77, pp. 141–152, 2022

work page 2022

[36] [36]

Effects of tetrahydrolipstatin on glioblastoma in mice: Mri-based morphologic and texture analysis correlated with histopathology and immunochemistry findings—a pilot study,

S. Wagner, C. Ewald, D. Freitag, K.-H. Herrmann, A. Koch, J. Bauer, T. J. V ogl, A. Kemmling, and H. Gufler, “Effects of tetrahydrolipstatin on glioblastoma in mice: Mri-based morphologic and texture analysis correlated with histopathology and immunochemistry findings—a pilot study,” Cancers, vol. 16, no. 8, p. 1591, 2024

work page 2024

[37] [37]

Image compression using discrete wavelet transform and convolution neural networks,

G. S. Kumar and M. L. P. Rani, “Image compression using discrete wavelet transform and convolution neural networks,” Journal of Elec- trical Engineering & Technology, vol. 19, no. 6, pp. 3713–3721, 2024

work page 2024

[38] [38]

The application of dicrete wavelet transform for digital image compression,

A. K. Umam, P. T. B. Ngastiti, A. Alfan, Z. Shahadah, and A. F. Muamalah, “The application of dicrete wavelet transform for digital image compression,” Jurnal Matematika Sains dan Teknologi, vol. 25, no. 1, pp. 01–08, 2024

work page 2024

[39] [39]

The wavelet transform for feature extraction and surface roughness evaluation after micromachining,

D. Grochała, R. Grzejda, A. Parus, and S. Berczy ´nski, “The wavelet transform for feature extraction and surface roughness evaluation after micromachining,” Coatings, vol. 14, no. 2, p. 210, 2024

work page 2024

[40] [40]

Tunable q-factor wavelet transform based lung signal decomposition and statistical feature extraction for effective lung disease classification,

B. Cansiz, C. U. Kilinc, and G. Serbes, “Tunable q-factor wavelet transform based lung signal decomposition and statistical feature extraction for effective lung disease classification,” Computers in Biology and Medicine , vol. 178, p. 108698, 2024

work page 2024

[41] [41]

Optimizing transformer models for low-latency inference: Techniques, architectures, and code implemen- tations,

A. Kasoju and T. Vishwakarma, “Optimizing transformer models for low-latency inference: Techniques, architectures, and code implemen- tations,”International Journal of Science and Research (IJSR), vol. 14, pp. 857–866, 2025

work page 2025

[42] [42]

Imitation learning through prior injection in markov decision processes,

G. Di Gennaro, A. Buonanno, F. Verolla, G. Fioretti, F. A. Palmieri, and K. R. Pattipati, “Imitation learning through prior injection in markov decision processes,” in Applications of Artificial Intelligence and Neural Systems to Data Science . Springer, 2023, pp. 103–113

work page 2023