pith. sign in

arxiv: 2606.29570 · v1 · pith:FDMRFXO2new · submitted 2026-06-28 · 💻 cs.RO

Hierarchical Policy Learning via Spectral Decomposition

Pith reviewed 2026-06-30 06:52 UTC · model grok-4.3

classification 💻 cs.RO
keywords robot manipulationspectral decompositiondiscrete cosine transformhierarchical policy learningcausal action generationprecision tasksteleoperation noise
0
0 comments X

The pith

Robot action sequences decompose into low-frequency task intent and high-frequency execution details via the discrete cosine transform, enabling a causal policy that generates coarse motions first then conditional fine corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper observes that applying the discrete cosine transform to robot action sequences reveals a consistent separation: low-frequency components encode global motion trajectories while high-frequency components capture precise timing, alignment, and contact. This structure motivates Causal Spectral Policy, which first predicts coarse motion from current observation and language instruction, then produces fine corrections conditioned on the realized coarse trajectory. The resulting approach is evaluated on precision-sensitive manipulation tasks in both simulation and real-world settings, where it outperforms strong baselines. The same frequency view also supports a data-augmentation technique that injects human-like teleoperation noise into demonstrations, under which the policy remains robust.

Core claim

Action sequences admit a semantic frequency decomposition in which low-frequency DCT coefficients represent task-level motion intent and high-frequency coefficients represent execution-level refinements; modeling generation as a causal coarse-to-fine process—coarse prediction from observation and language followed by conditional fine correction—yields a policy that improves performance on precision manipulation tasks.

What carries the argument

Causal Spectral Policy (CSP), which uses the discrete cosine transform to split action generation into a causally ordered coarse-motion stage and a fine-correction stage conditioned on the realized coarse trajectory.

If this is right

  • CSP produces higher success rates than standard policies on precision-sensitive manipulation in both simulation and real hardware.
  • The coarse-to-fine causal structure allows fine corrections to adapt to actual execution deviations rather than assuming perfect coarse realization.
  • Human-inspired noise injection during data collection yields policies that tolerate noisy demonstrations without retraining.
  • The same spectral split can be applied at inference time to inspect or intervene on task-level versus execution-level components separately.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the frequency split generalizes, similar decompositions could be tested on other sequential decision domains such as navigation or locomotion where global path and local gait adjustments are naturally separable.
  • One could measure whether the DCT basis remains optimal by comparing it against learned frequency-like bases on the same robot datasets.
  • The causal conditioning step suggests a natural way to incorporate online feedback: after each coarse segment is executed, the policy could replan the next coarse segment using updated observations.

Load-bearing premise

The frequency separation observed in the evaluated action sequences reflects a general semantic distinction between task intent and execution details rather than a task-specific pattern.

What would settle it

A controlled ablation in which the policy is forced to predict fine corrections without conditioning on the realized coarse trajectory, or in which low- and high-frequency bands are swapped, and performance on the same precision tasks drops to baseline levels.

Figures

Figures reproduced from arXiv: 2606.29570 by Animesh Garg, Liquan Wang, Shuxin Cao, Walker Byrnes, Yilun Du, Yiye Chen.

Figure 1
Figure 1. Figure 1: Action sequences admit a coarse-to-fine structure in the spectral domain. CSP predicts low-frequency motion from obser￾vation and language, then generates high-frequency corrections conditioned on the coarse trajectory. these methods struggle to explicitly capture the hierarchical temporal structure in robot actions, particularly in speed and precision-sensitive manipulation tasks. To confirm our intuition… view at source ↗
Figure 2
Figure 2. Figure 2: Action reconstruction under different frequency cutoffs λ in the real-robot dart insertion task. As λ decreases, coarse motion toward the target is preserved while fine alignment and contact accuracy degrade, highlighting the role of high-frequency components in precision execution. We collect a single successful demonstration and analyze the action sequence using a fixed chunk size of K = 64. By progressi… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed hierarchical spectral policy. Given observation o and language instruction l, the policy first predicts low-frequency action components that capture coarse task-level motion. Conditioned on the realized low-frequency trajectory, a second module predicts high-frequency corrective components. The final action sequence is reconstructed by concatenating frequency coefficients and apply… view at source ↗
Figure 4
Figure 4. Figure 4: Baseline action prediction paradigms. Left: chunk-based joint prediction. Right: autoregressive prediction. Both lack an explicit representation of coarse temporal structure, motivating our coarse-to-fine formulation. 4.2. Action Representation in the Spectral Domain For the coarse-to-fine factorization in Eq. (1) to be effective, the action representation must make temporal structure at different resoluti… view at source ↗
Figure 6
Figure 6. Figure 6: Real-robot qualitative results across four manipulation tasks. For each task, we show the initial state, successful executions by CSP (middle), and representative failure cases from baseline policies (right). Tasks require precise alignment and contact, including keyboard pressing, block stacking, and lid closing, highlighting the importance of accurate fine-grained execution. Method Low-frequency conditio… view at source ↗
Figure 7
Figure 7. Figure 7: Baseline and ablation architectures used to isolate temporal abstraction, spectral representation, and coarse-to-fine conditioning [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Libero Noise Injection Results For single tasks and subsets of libero 90, CSP is more resistant to noise injection than all evaluated baselines. The noise model herein models real-world teleoperation noise, and this result showcases that while previous architecture capture cleaner data better, but perform rather poorly in prescence of high-frequency noise present in real data [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 10
Figure 10. Figure 10: Single-Task Noise Injection Results Evaluation of CSP and baselines on select single tasks from Libero-90. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
read the original abstract

In this paper, we identify a semantic decomposition in robot action sequences, separating task-level motion intent from execution-level refinements. By analyzing actions in the spectral domain using the discrete cosine transform (DCT), we observe that low-frequency components capture global motion trajectories, while high-frequency components encode precise timing, alignment, and contact behaviors. Motivated by this structure, we propose Causal Spectral Policy (CSP), which models action generation as a causal coarse-to-fine process: coarse motion is predicted from observation and language, and fine corrections are generated conditionally on the realized trajectory. Across simulation and real-world evaluations, CSP consistently outperforms strong baselines on precision-sensitive manipulation tasks. Additionally, we propose human-inspired teleoperation noise injection as a data augmentation method, under which our approach demonstrates strong robustness to noisy demonstrations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies a semantic decomposition in robot action sequences using the discrete cosine transform (DCT), where low-frequency components capture task-level motion intent and high-frequency components encode execution-level refinements. Motivated by this, it proposes the Causal Spectral Policy (CSP) that generates actions in a causal coarse-to-fine manner: coarse motion from observation and language, then fine corrections conditionally. The method is evaluated on precision-sensitive manipulation tasks in simulation and real-world settings, showing consistent outperformance over baselines, and includes a human-inspired teleoperation noise injection for data augmentation demonstrating robustness.

Significance. If the spectral decomposition holds as a general property rather than a task-specific artifact, this work provides a principled spectral basis for hierarchical policy learning in robotics, potentially advancing precision in manipulation tasks. Strengths include the combination of simulation and real-world evaluations, and the proposal of noise injection augmentation. The approach could influence future hierarchical methods if the frequency split is validated more broadly.

major comments (2)
  1. [Abstract] Abstract: The claim that low-frequency DCT components capture global motion trajectories as task-level intent (while high-frequency encode refinements) is presented as a general semantic decomposition motivating the CSP architecture, but the manuscript reports this observation only on the evaluated precision-sensitive manipulation tasks; no cross-domain experiments, theoretical derivation, or parameter-free justification is supplied to establish domain-independence, which is load-bearing for the causal coarse-to-fine design.
  2. [§4 (Experiments)] §4 (Experiments): The central empirical claim that 'CSP consistently outperforms strong baselines' is reported without visible error bars, number of random seeds, or statistical tests in the abstract summary; this undermines assessment of whether gains are robust or could be explained by the noise-injection augmentation alone rather than the spectral hierarchy.
minor comments (2)
  1. [Abstract] Abstract: 'Strong baselines' are referenced but not named; this should be expanded for immediate clarity on the comparison.
  2. The notation for DCT frequency components and the exact conditioning in the coarse-to-fine process could be introduced with an equation in the method section for precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that low-frequency DCT components capture global motion trajectories as task-level intent (while high-frequency encode refinements) is presented as a general semantic decomposition motivating the CSP architecture, but the manuscript reports this observation only on the evaluated precision-sensitive manipulation tasks; no cross-domain experiments, theoretical derivation, or parameter-free justification is supplied to establish domain-independence, which is load-bearing for the causal coarse-to-fine design.

    Authors: We agree that the semantic decomposition is an empirical observation drawn from the precision-sensitive manipulation tasks used in our evaluations. The abstract presents this as an identification in robot action sequences without explicitly qualifying the domain. To address this, we will revise the abstract and introduction to clarify that the decomposition was observed in the context of these tasks and that the CSP design is motivated by this finding rather than claiming a proven general or domain-independent property. No cross-domain experiments or theoretical derivation were performed, as the work focuses on precision manipulation. revision: partial

  2. Referee: [§4 (Experiments)] §4 (Experiments): The central empirical claim that 'CSP consistently outperforms strong baselines' is reported without visible error bars, number of random seeds, or statistical tests in the abstract summary; this undermines assessment of whether gains are robust or could be explained by the noise-injection augmentation alone rather than the spectral hierarchy.

    Authors: We will revise the experimental section to include error bars on all reported results, explicitly state the number of random seeds used for each experiment, and incorporate statistical significance tests (e.g., paired t-tests) comparing CSP against baselines. This will allow readers to better assess the robustness of the gains and separate the contribution of the spectral hierarchy from the noise-injection augmentation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observation motivates architecture without self-referential reduction

full rationale

The paper's derivation begins with an empirical observation on action sequences (low-frequency DCT components for global trajectories, high-frequency for refinements) and uses this to motivate the CSP coarse-to-fine architecture. This is presented as a data-driven finding rather than a mathematical derivation. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and no ansatz is smuggled through prior work. The architecture choice follows directly from the stated observation without tautological closure. The method remains self-contained against external benchmarks as a standard hierarchical policy with spectral conditioning.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; full text unavailable so ledger is minimal and provisional.

axioms (1)
  • domain assumption Low-frequency DCT components capture global motion trajectories while high-frequency components encode precise timing, alignment, and contact behaviors as a semantic decomposition.
    Stated as observed structure motivating the method in the abstract.

pith-pipeline@v0.9.1-grok · 5670 in / 1172 out tokens · 24991 ms · 2026-06-30T06:52:13.563023+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 38 canonical work pages · 8 internal anchors

  1. [1]

    Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

    Chi, Cheng and Xu, Zhenjia and Feng, Siyuan and Cousineau, Eric and Du, Yilun and Burchfiel, Benjamin and Tedrake, Russ and Song, Shuran , year =. Diffusion. doi:10.48550/ARXIV.2303.04137 , abstract =

  2. [2]

    doi:10.48550/arXiv.2406.07539 , abstract =

    Haldar, Siddhant and Peng, Zhuoran and Pinto, Lerrel , month = jul, year =. doi:10.48550/arXiv.2406.07539 , abstract =

  3. [3]

    Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

    Zhao, Tony Z. and Kumar, Vikash and Levine, Sergey and Finn, Chelsea , month = apr, year =. Learning. doi:10.48550/arXiv.2304.13705 , abstract =

  4. [4]

    Diffusion transformer policy,

    Hou, Zhi and Zhang, Tianyi and Xiong, Yuwen and Pu, Hengjun and Zhao, Chengyang and Tong, Ronglei and Qiao, Yu and Dai, Jifeng and Chen, Yuntao , month = mar, year =. Diffusion. doi:10.48550/arXiv.2410.15959 , abstract =

  5. [5]

    FAST: Efficient Action Tokenization for Vision-Language-Action Models

    Pertsch, Karl and Stachowicz, Kyle and Ichter, Brian and Driess, Danny and Nair, Suraj and Vuong, Quan and Mees, Oier and Finn, Chelsea and Levine, Sergey , year =. doi:10.48550/ARXIV.2501.09747 , abstract =

  6. [6]

    Freqpolicy: Efficient flow-based visuomotor policy via frequency consistency, 2025

    Su, Yifei and Liu, Ning and Chen, Dong and Zhao, Zhen and Wu, Kun and Li, Meng and Xu, Zhiyuan and Che, Zhengping and Tang, Jian , month = jun, year =. doi:10.48550/arXiv.2506.08822 , abstract =

  7. [7]

    doi:10.48550/arXiv.2506.01583 , abstract =

    Zhong, Yiming and Liu, Yumeng and Xiao, Chuyang and Yang, Zemin and Wang, Youzhuo and Zhu, Yufei and Shi, Ye and Sun, Yujing and Zhu, Xinge and Ma, Yuexin , month = oct, year =. doi:10.48550/arXiv.2506.01583 , abstract =

  8. [8]

    doi:10.48550/arXiv.2506.14769 , abstract =

    Ma, Jiahua and Qin, Yiran and Li, Yixiong and Liao, Xuanqi and Guo, Yulan and Zhang, Ruimao , month = aug, year =. doi:10.48550/arXiv.2506.14769 , abstract =

  9. [9]

    Rethinking

    Bai, Shuanghao and Zhou, Wanqi and Ding, Pengxiang and Zhao, Wei and Wang, Donglin and Chen, Badong , month = may, year =. Rethinking. doi:10.48550/arXiv.2502.02853 , abstract =

  10. [10]

    doi:10.48550/arXiv.2409.14719 , abstract =

    Oh, Nayoung and Jang, Jaehyeong and Jung, Moonkyeong and Park, Daehyung , month = may, year =. doi:10.48550/arXiv.2409.14719 , abstract =

  11. [11]

    doi:10.48550/arXiv.2509.16063 , abstract =

    Su, Yue and Zhang, Chubin and Chen, Sijin and Tan, Liufan and Tang, Yansong and Wang, Jianan and Liu, Xihui , month = sep, year =. doi:10.48550/arXiv.2509.16063 , abstract =

  12. [12]

    Su, Yue and Zhan, Xinyu and Fang, Hongjie and Xue, Han and Fang, Hao-Shu and Li, Yong-Lu and Lu, Cewu and Yang, Lixin , month = mar, year =. Dense. doi:10.48550/arXiv.2503.13217 , abstract =

  13. [13]

    doi:10.48550/arXiv.2412.06782 , abstract =

    Gong, Zhefei and Ding, Pengxiang and Lyu, Shangke and Huang, Siteng and Sun, Mingyang and Zhao, Wei and Fan, Zhaoxin and Wang, Donglin , month = aug, year =. doi:10.48550/arXiv.2412.06782 , abstract =

  14. [14]

    doi:10.48550/arXiv.2505.03912 , abstract =

    Cui, Can and Ding, Pengxiang and Song, Wenxuan and Bai, Shuanghao and Tong, Xinyang and Ge, Zirui and Suo, Runze and Zhou, Wanqi and Liu, Yang and Jia, Bofang and Zhao, Han and Huang, Siteng and Wang, Donglin , month = may, year =. doi:10.48550/arXiv.2505.03912 , abstract =

  15. [15]

    arXiv preprint arXiv:2406.11838 (2024)

    Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming , month = nov, year =. Autoregressive. doi:10.48550/arXiv.2406.11838 , abstract =

  16. [16]

    doi:10.48550/arXiv.2411.09911 , abstract =

    Liu, Xiaoyi and Tang, Hao , month = apr, year =. doi:10.48550/arXiv.2411.09911 , abstract =

  17. [17]

    doi:10.48550/arXiv.2412.15032 , abstract =

    Ning, Mang and Li, Mingxiao and Su, Jianlin and Jia, Haozhe and Liu, Lanmiao and Beneš, Martin and Chen, Wenshuo and Salah, Albert Ali and Ertugrul, Itir Onal , month = may, year =. doi:10.48550/arXiv.2412.15032 , abstract =

  18. [18]

    Frequency

    Yu, Hu and Luo, Hao and Yuan, Hangjie and Rong, Yu and Zhao, Feng , month = mar, year =. Frequency. doi:10.48550/arXiv.2503.05305 , abstract =

  19. [19]

    Dexmimicgen: Automated data generation for bimanual dexterous manipulation via imitation learning

    Jiang, Zhenyu and Xie, Yuqi and Lin, Kevin and Xu, Zhenjia and Wan, Weikang and Mandlekar, Ajay and Fan, Linxi and Zhu, Yuke , month = mar, year =. doi:10.48550/arXiv.2410.24185 , abstract =

  20. [20]

    IEEE Robotics and Automation Letters , author =

    Programmatic. IEEE Robotics and Automation Letters , author =. 2024 , pages =. doi:10.1109/LRA.2024.3385691 , number =

  21. [21]

    Belkhale, Suneel and Cui, Yuchen and Sadigh, Dorsa , month = jun, year =. Data. doi:10.48550/arXiv.2306.02437 , abstract =

  22. [22]

    Proceedings of the AAAI Conference on Artificial Intelligence , author =

    Learning. Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2023 , pages =. doi:10.1609/aaai.v37i7.25962 , abstract =

  23. [23]

    ISBN 978-1- 72819-077-8

    Johns, Edward , month = may, year =. Coarse-to-. 2021. doi:10.1109/ICRA48506.2021.9560942 , urldate =

  24. [24]

    DART: Noise Injection for Robust Imitation Learning

    Laskey, Michael and Lee, Jonathan and Fox, Roy and Dragan, Anca and Goldberg, Ken , month = oct, year =. doi:10.48550/arXiv.1703.09327 , abstract =

  25. [25]

    IEEE Transactions on Neural Networks and Learning Systems , author =

    Restoring. IEEE Transactions on Neural Networks and Learning Systems , author =. 2026 , note =. doi:10.1109/TNNLS.2025.3607111. , abstract =

  26. [26]

    doi:10.48550/arXiv.2404.03382 , abstract =

    Huang, Kaichen and Sun, Hai-Hang and Wan, Shenghua and Shao, Minghao and Feng, Shuai and Gan, Le and Zhan, De-Chuan , month = apr, year =. doi:10.48550/arXiv.2404.03382 , abstract =

  27. [27]

    Sakr, Maram and Loos, H. F. Machiel Van der and Kulic, Dana and Croft, Elizabeth , month = apr, year =. Consistency. doi:10.48550/arXiv.2412.14309 , abstract =

  28. [28]

    IEEE Robotics and Automation Letters , author =

    Quantifying. IEEE Robotics and Automation Letters , author =. 2022 , pages =. doi:10.1109/LRA.2022.3191950 , number =

  29. [29]

    Intelligence, Physical and Black, Kevin and Brown, Noah and Darpinian, James and Dhabalia, Karan and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Galliker, Manuel Y. and Ghosh, Dibya and Groom, Lachy and Hausman, Karol and Ichter, Brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and LeBlanc, Devin a...

  30. [30]

    $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

    Black, Kevin and Brown, Noah and Driess, Danny and Esmail, Adnan and Equi, Michael and Finn, Chelsea and Fusai, Niccolo and Groom, Lachy and Hausman, Karol and Ichter, Brian and Jakubczak, Szymon and Jones, Tim and Ke, Liyiming and Levine, Sergey and Li-Bell, Adrian and Mothukuri, Mohith and Nair, Suraj and Pertsch, Karl and Shi, Lucy Xiaoyang and Tanner,...

  31. [31]

    AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models

    Jiang, Yuhua and Cheng, Shuang and Ding, Yan and Gao, Feifei and Qi, Biqing , month = nov, year =. doi:10.48550/arXiv.2511.14148 , abstract =

  32. [32]

    2023 , eprint=

    LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning , author=. 2023 , eprint=

  33. [33]

    and Finn, Chelsea , title =

    Fu, Zipeng and Zhao, Tony Z. and Finn, Chelsea , title =

  34. [34]

    7th Annual Conference on Robot Learning , year=

    MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations , author=. 7th Annual Conference on Robot Learning , year=

  35. [35]

    Ast: Audio spectrogram trans- former,

    Yuan Gong and Yu. CoRR , volume =. 2021 , url =. 2104.01778 , timestamp =

  36. [36]

    Proceedings of the 33rd

    Wang, Hui and Liu, Shujie and Meng, Lingwei and Li, Jinyu and Yang, Yifan and Zhao, Shiwan and Sun, Haiyang and Liu, Yanqing and Sun, Haoqin and Zhou, Jiaming and Lu, Yan and Qin, Yong , month = oct, year =. Proceedings of the 33rd. doi:10.1145/3746027.3755494 , language =

  37. [37]

    DiffWave: A Versatile Diffusion Model for Audio Synthesis

    Kong, Zhifeng and Ping, Wei and Huang, Jiaji and Zhao, Kexin and Catanzaro, Bryan , month = mar, year =. doi:10.48550/arXiv.2009.09761 , abstract =

  38. [38]

    Nature Neuroscience , author =

    Optimal feedback control as a theory of motor coordination , volume =. Nature Neuroscience , author =. 2002 , pages =. doi:10.1038/nn963 , language =

  39. [39]

    Nature Reviews Neuroscience , author =

    Noise in the nervous system , volume =. Nature Reviews Neuroscience , author =. 2008 , pages =. doi:10.1038/nrn2258 , language =

  40. [40]

    Journal of Neurophysiology , author =

    The. Journal of Neurophysiology , author =. 2004 , pages =. doi:10.1152/jn.00652.2003 , abstract =

  41. [41]

    Todorov, Emanuel and Jordan, Michael , editor =. A. Advances in

  42. [42]

    Nature , author =

    Signal-dependent noise determines motor planning , volume =. Nature , author =. 1998 , pages =. doi:10.1038/29528 , language =

  43. [43]

    Nature , volume=

    Signal-dependent noise determines motor planning , author=. Nature , volume=. 1998 , publisher=

  44. [44]

    Advances in Neural Information Processing Systems , volume=

    Quest: Self-supervised skill abstractions for learning continuous control , author=. Advances in Neural Information Processing Systems , volume=

  45. [45]

    8th Annual Conference on Robot Learning , year=

    Discovering Robotic Interaction Modes with Discrete Representation Learning , author=. 8th Annual Conference on Robot Learning , year=

  46. [46]

    2023 IEEE International Conference on Robotics and Automation (ICRA) , year=

    Self-supervised learning of action affordances as interaction modes , author=. 2023 IEEE International Conference on Robotics and Automation (ICRA) , year=

  47. [47]

    2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year=

    Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , author=. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , year=

  48. [48]

    2019 International Conference on Robotics and Automation (ICRA) , year=

    Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks , author=. 2019 International Conference on Robotics and Automation (ICRA) , year=

  49. [49]

    2021 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    LASER: Learning a Latent Action Space for Efficient Reinforcement Learning , author=. 2021 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2021 , organization=

  50. [50]

    2020 IEEE International Conference on Robotics and Automation (ICRA) , pages=

    IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data , author=. 2020 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2020 , organization=

  51. [51]

    3rd Annual Conference on Robot Learning (CoRL) , pages=

    AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers , author=. 3rd Annual Conference on Robot Learning (CoRL) , pages=. 2019 , organization=

  52. [52]

    IEEE Transactions on Neural Networks and Learning Systems , volume=

    Monotonic quantile network for worst-case offline reinforcement learning , author=. IEEE Transactions on Neural Networks and Learning Systems , volume=. 2022 , publisher=

  53. [53]

    International Conference on Learning Representations , year=

    Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning , author=. International Conference on Learning Representations , year=

  54. [54]

    5th Annual Conference on Robot Learning , year=

    S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning in Robotics , author=. 5th Annual Conference on Robot Learning , year=

  55. [55]

    International Conference on Learning Representations , year=

    Conservative Safety Critics for Exploration , author=. International Conference on Learning Representations , year=

  56. [56]

    Advances in Neural Information Processing Systems , year=

    MoCoDA: Model-based Counterfactual Data Augmentation , author=. Advances in Neural Information Processing Systems , year=

  57. [57]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  58. [58]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  59. [59]

    M. J. Kearns , title =

  60. [60]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  61. [61]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  62. [62]

    Suppressed for Anonymity , author=

  63. [63]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  64. [64]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959