GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

Kuan-Ting Yu; Yen-Ling Tai; Yi-Ru Yang; Yi-Ting Chen; Yu-Wei Chao

arxiv: 2510.00573 · v2 · submitted 2025-10-01 · 💻 cs.RO

GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

Yen-Ling Tai , Yi-Ru Yang , Kuan-Ting Yu , Yu-Wei Chao , Yi-Ting Chen This is my paper

Pith reviewed 2026-05-18 11:03 UTC · model grok-4.3

classification 💻 cs.RO

keywords robot manipulationdiffusion policyfood scoopingspillage reductionguided samplingsim-to-realrobotic learning

0 comments

The pith

A spillage predictor guides diffusion policies to achieve 82 percent success and 4 percent spillage when robots scoop unseen foods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a learned spillage predictor can steer diffusion-based robot policies toward low-spill trajectories during food scooping. The predictor is built from simulation data on four basic shapes and then used at test time as a differentiable signal that biases the sampling process. This combination is meant to cut spillage sharply while still letting the robot complete the transfer of food to the target container. If the claim holds, service robots could manage a wider range of real foods with less waste and fewer failures in everyday kitchen settings.

Core claim

GRITS trains a spillage predictor on simulated scooping episodes built from spheres, cubes, cones, and cylinders that vary in mass, friction, and particle size. At inference the predictor supplies a guidance gradient that shifts the diffusion sampling distribution toward action sequences with lower predicted spill probability. On a real robot platform the resulting policy reaches 82 percent task success and 4 percent spillage across ten food categories never seen in training, cutting spillage by more than 40 percent relative to diffusion baselines that lack the guidance term.

What carries the argument

The spillage predictor that estimates spill probability from current observation and planned action rollout and supplies the gradient used to steer diffusion sampling.

If this is right

The same guidance approach allows the robot to handle varied quantities and shapes after training on only six food categories.
Spillage drops more than 40 percent while task completion stays at 82 percent on ten new categories.
Simulation data on simple shapes transfers to produce measurable gains on physical robot hardware.
Differentiable guidance preserves task success while directly reducing an undesired side effect.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same style of predictor could be trained for other loose-material tasks such as pouring or sorting to reduce loss without new demonstrations.
Combining the guidance term with real-time visual feedback might further adapt trajectories when food state changes mid-scoop.
Expanding the set of simulated primitives could widen the range of foods the method covers without collecting new real data.

Load-bearing premise

The predictor trained only on four primitive shapes in simulation will give accurate enough guidance signals for real foods of different shapes and quantities without lowering the rate at which scoops succeed.

What would settle it

Real-robot trials on foods whose shapes or flow properties lie outside the four simulated primitives where the guided policy shows either higher actual spillage or lower success than the unguided baseline.

Figures

Figures reproduced from arXiv: 2510.00573 by Kuan-Ting Yu, Yen-Ling Tai, Yi-Ru Yang, Yi-Ting Chen, Yu-Wei Chao.

**Figure 1.** Figure 1: Spillage-aware trajectory generation with GRITS. Robotic food scooping demands exact and delicate control, as small deviations can result in spillage. GRITS addresses this challenge by leveraging predicted spillage probabilities to adaptively refine trajectories, leading to safer and more reliable manipulation. The adjustments are subtle, with an average displacement of only 0.3 cm between consecutive traj… view at source ↗

**Figure 2.** Figure 2: The architecture of GRITS. GRITS is a guided diffusion policy designed for robotic food scooping. Given an RGB-D image and an initial noisy trajectory, the diffusion policy denoises it into a refined trajectory. A spillage predictor, which takes segmented point clouds as input to reduce the sim-to-real gap, estimates the probability of spillage for given candidate trajectory. This probability provides a gu… view at source ↗

**Figure 3.** Figure 3: Simulated Food Scooping Data Collection. We construct a scooping dataset in simulation to train the spillage predictor. Simulated foods are composed of four primitive shapes: spheres, cubes, cones, and cylinders, with varied physical properties, including mass, friction, and particle size. This design enables diverse and controllable scooping and spillage cases under different rollouts, which are impractic… view at source ↗

**Figure 4.** Figure 4: Food categories for training and testing sets. The training set includes six food items (top row) varying in sphere size from small to large. The testing set (bottom row) covers ten additional food categories with diverse shapes and material properties. Numbers below each food item indicate quantities: small-particle foods are measured by weight (g), and largeparticle foods are measured by count (pieces).… view at source ↗

**Figure 5.** Figure 5: Real-World Experimental Platform. We set up a 35 × 30 cm workspace (indicated in the Red bounding box) for the experiments. IV. EXPERIMENTS A. Experiment Setup In the real-world setup, we use a 7-DoF Franka Emika Panda robot with a spoon attachment and two fixed Orbbec Femto Bolt RGB-D cameras, as shown in [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Real-world experiment results. GRITS achieves the highest task success rate 82% while simultaneously maintaining the lowest spillage rate 4% and a low scoop failure rate 14%. These results highlight the effectiveness of integrating spillageaware guidance into diffusion models for robust and reliable food scooping. Notably, the scoop failure rate is slightly higher than vanilla DP, reflecting a trade-off w… view at source ↗

**Figure 7.** Figure 7: Per-food comparison between vanilla diffusion policy (left) and GRITS (right). The figure clearly highlights the effectiveness of GRITS in reducing spillage and ensuring scooping success. Unlike the previous figure, here we classify a trial as a spillage failure if spillage occurs at any point before a scoop is completed, regardless of whether food is eventually lifted, providing a more intuitive compariso… view at source ↗

**Figure 8.** Figure 8: Real-world rollout of GRITS. This figure illustrates GRITS scooping penne, where the original trajectory is refined into a safer one through spillage-aware guidance, reducing high-risk states. The robot then transfers the scooped items and places them into the target container. For more examples, please refer to our supplementary video. and dense packing, while penne is more stable and achieves higher succ… view at source ↗

**Figure 9.** Figure 9: Ablation study of spillage predictor. We compare three variants of the spillage predictor: (1) using raw point cloud as input during real-world inference, (2) replacing the encoder with original PointNet++ [38], and (3) our GRITS method with composite point cloud (food from raw camera data, spoon and bowl from known object models) input and DP3 [32] encoder. The results show that GRITS achieves the highest… view at source ↗

read the original abstract

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness. More details are available on our project website: https://hcis-lab.github.io/GRITS/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRITS adds a sim-trained spillage predictor as differentiable guidance inside a diffusion policy and shows clear spillage cuts on real-robot tests with unseen foods, though the predictor's transfer from primitive shapes is the main open question.

read the letter

The main point is that this paper trains a spillage predictor on simulation data from four basic shapes, then uses it as a differentiable signal to steer diffusion sampling toward lower-spillage trajectories during robot scooping. The real-robot tests report 82% success and 4% spillage on ten unseen food categories, with more than 40% less spillage than the no-guidance baselines.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces GRITS, a spillage-aware guided diffusion policy for robotic food scooping tasks. It trains a differentiable spillage predictor exclusively in simulation on four primitive shapes (spheres, cubes, cones, cylinders) with varied mass, friction, and particle size to estimate spillage probability from observations and action rollouts. This predictor is used as guidance to steer the diffusion sampling process toward low-spillage trajectories at inference time. The policy is trained on six food categories and evaluated on a real robotic platform for ten unseen categories with different shapes and quantities, reporting an 82% task success rate and 4% spillage rate that reduces spillage by over 40% relative to baselines without guidance.

Significance. If the reported gains hold under proper validation, the work offers a practical advance in reliable sim-to-real transfer for robotic manipulation of variable, particle-based food items. The use of a simulation-trained predictor to provide differentiable guidance within a diffusion policy is a targeted contribution that could inform safety-aware robot learning methods in food service applications.

major comments (1)

[Abstract and Evaluation] Abstract and Evaluation: The central performance claim (82% success, 4% spillage, >40% reduction on ten unseen real foods) depends on the spillage predictor—trained only on four primitive shapes in simulation—producing accurate guidance that transfers to irregular real-world items without trading off task success. No predictor accuracy metrics on real data, no sim-to-real validation of the predictor, and no ablation isolating the guidance contribution are referenced, which directly undermines the generalization and effectiveness assertions.

minor comments (2)

[Abstract] The reported numeric results lack error bars, confidence intervals, or statistical tests comparing GRITS to baselines.
[Abstract] Clarify whether the six training food categories were used only in simulation or also on the real robot, and provide details on baseline implementations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for identifying areas where additional validation would strengthen the claims. We address the major comment below and have incorporated revisions to provide clearer evidence on the guidance contribution.

read point-by-point responses

Referee: The central performance claim (82% success, 4% spillage, >40% reduction on ten unseen real foods) depends on the spillage predictor—trained only on four primitive shapes in simulation—producing accurate guidance that transfers to irregular real-world items without trading off task success. No predictor accuracy metrics on real data, no sim-to-real validation of the predictor, and no ablation isolating the guidance contribution are referenced, which directly undermines the generalization and effectiveness assertions.

Authors: We agree that the manuscript would benefit from more explicit validation of the predictor's role. The reported real-world results on ten unseen categories already demonstrate that the guided policy achieves higher success and substantially lower spillage than unguided baselines, indicating effective transfer. However, we acknowledge the absence of standalone predictor accuracy metrics on real data and a dedicated sim-to-real predictor study. To address this, we will add an ablation study that directly compares the diffusion policy with and without the spillage-predictor guidance under identical conditions, isolating its contribution to the observed 40%+ spillage reduction. We will also expand the simulation validation section to report the predictor's accuracy on held-out primitive-shape rollouts and discuss how the four primitives were selected to span relevant physical properties. Direct real-world predictor accuracy is difficult to obtain without additional instrumentation for ground-truth spillage labels, so we rely on end-to-end task metrics; this limitation will be noted explicitly in the revised text. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results independent of training inputs

full rationale

The paper trains a spillage predictor exclusively on simulation data generated from four primitive shapes with varied physical properties, then deploys it as a differentiable guidance signal during diffusion sampling at inference time. Task success (82%) and spillage (4%) rates are obtained from separate real-robot experiments on ten unseen food categories, with no equations, fitted parameters, or self-citations that reduce these measured outcomes to the simulation training data by construction. The derivation chain from predictor training through guidance to real-world evaluation is externally falsifiable and does not collapse into self-definition or tautological renaming.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are explicitly introduced beyond standard diffusion sampling and simulation assumptions; the central claim rests on empirical generalization from primitive shapes.

pith-pipeline@v0.9.0 · 5811 in / 990 out tokens · 33615 ms · 2026-05-18T11:03:59.372862+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

spillage predictor that estimates the probability of spillage given current observation and action rollout... trained on a simulated dataset... four primitive shapes (spheres, cubes, cones, and cylinders)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

guided diffusion policy... steering the diffusion sampling process toward safer trajectories

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

[1]

Safety-Critical Manipulation for Collision-Free Food Preparation,

A. Singletary, W. Guffey, T. G. Molnar, R. Sinnet, and A. D. Ames, “Safety-Critical Manipulation for Collision-Free Food Preparation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 954– 10 961, 2022

work page 2022
[2]

RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects,

Z. Xu, Z. Xian, X. Lin, C. Chi, Z. Huang, C. Gan, and S. Song, “RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects,”Robotics: Science and Systems (RSS), 2023

work page 2023
[3]

MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loop,

R. Ye, Y . Hu, Y . A. Bian, L. Kulm, and T. Bhattacharjee, “MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loop,” inIEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024
[4]

Leveraging multimodal haptic sensory data for robust cutting,

K. Zhang, M. Sharma, M. Veloso, and O. Kroemer, “Leveraging multimodal haptic sensory data for robust cutting,” inIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2019

work page 2019
[5]

LA V A: Long-horizon Visual Action based Food Acquisition,

A. Bhaskar, R. Liu, V . D. Sharma, G. Shi, and P. Tokekar, “LA V A: Long-horizon Visual Action based Food Acquisition,”IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2024

work page 2024
[6]

Robot-Assisted Feeding: Gen- eralizing Skewering Strategies across Food Items on a Realistic Plate,

R. Feng, Y . Kim, G. Lee, E. K. Gordon, M. Schmittle, S. Kumar, T. Bhattacharjee, and S. S. Srinivasa, “Robot-Assisted Feeding: Gen- eralizing Skewering Strategies across Food Items on a Realistic Plate,” inThe International Symposium of Robotics Research, 2019

work page 2019
[7]

Learning Bimanual Scooping Policies for Food Acquisition,

J. Grannen, Y . Wu, S. Belkhale, and D. Sadigh, “Learning Bimanual Scooping Policies for Food Acquisition,” inConference on Robot Learning (CoRL), 2022

work page 2022
[8]

FLAIR: Feeding via Long-horizon AcquIsition of Re- alistic Dishes,

R. K. Jenamani, P. Sundaresan, M. Sakr, T. Bhattacharjee, and D. Sadigh, “FLAIR: Feeding via Long-horizon AcquIsition of Re- alistic Dishes,”Robotics: Science and Systems (RSS), 2024

work page 2024
[9]

Kiri-Spoon: A Soft Shape-Changing Utensil for Robot-Assisted Feeding,

M. Keely, B. Franco, C. Grothoff, R. K. Jenamani, T. Bhattacharjee, D. P. Losey, and H. Nemlekar, “Kiri-Spoon: A Soft Shape-Changing Utensil for Robot-Assisted Feeding,” inIEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2024

work page 2024
[10]

Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configu- rations and Food Types,

R. Liu, A. Bhaskar, and P. Tokekar, “Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configu- rations and Food Types,” inInternational Conference on Robotics and Automation (ICRA) - Assistive Systems: Lab to Patient Care, 2024

work page 2024
[11]

IMRL: Integrat- ing Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition,

R. Liu, Z. Mahammad, A. Bhaskar, and P. Tokekar, “IMRL: Integrat- ing Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition,”arXiv preprint arXiv:2409.12092, 2024

work page arXiv 2024
[12]

Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding,

P. Sundaresan, S. Belkhale, and D. Sadigh, “Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding,” inConference on Robot Learning (CoRL), 2022

work page 2022
[13]

Learning Sequential Acqui- sition Policies for Robot-Assisted Feeding,

P. Sundaresan, J. Wu, and D. Sadigh, “Learning Sequential Acqui- sition Policies for Robot-Assisted Feeding,” inConference on Robot Learning (CoRL), 2023

work page 2023
[14]

Scone: A Food Scooping Robot Learning Framework with Active Perception,

Y .-L. Tai, Y . C. Chiu, Y .-W. Chao, and Y .-T. Chen, “Scone: A Food Scooping Robot Learning Framework with Active Perception,” in Conference on Robot Learning (CoRL), 2023

work page 2023
[15]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion,”The International Journal of Robotics Research, 2024

work page 2024
[16]

Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation,

H. Xue, J. Ren, W. Chen, G. Zhang, Y . Fang, G. Gu, H. Xu, and C. Lu, “Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation,” inRobotics: Science and Systems (RSS), 2025

work page 2025
[17]

Planning with Dif- fusion for Flexible Behavior Synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with Dif- fusion for Flexible Behavior Synthesis,” inInternational Conference on Machine Learning (ICML), 2022

work page 2022
[18]

Language- Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation,

H. Li, Q. Feng, Z. Zheng, J. Feng, and A. Knoll, “Language- Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation,” inarXiv preprint arXiv:2407.00451, 2024

work page arXiv 2024
[19]

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models,

J. Carvalho, A. T. Le, M. Baierl, D. Koert, and J. Peters, “Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models,” inIEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS), 2023

work page 2023
[20]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023
[21]

Diffusion Models Beat GANs on Image Synthesis,

P. Dhariwal and A. Nichol, “Diffusion Models Beat GANs on Image Synthesis,” inConference on Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[22]

Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned,

D. Park, Y . Hoshi, H. P. Mahajan, H. K. Kim, Z. Erickson, W. A. Rogers, and C. C. Kemp, “Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned,” inRobotics and Autonomous Systems (RSS), 2020

work page 2020
[23]

A survey of robot learning from demonstration,

B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,”Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009

work page 2009
[24]

Robot learning from demonstration,

C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” inICML, vol. 97, 1997, pp. 12–20

work page 1997
[25]

Denoising Diffusion Probabilistic Models,

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” inAdvances in Neural Information Processing Systems (NIPS), 2020

work page 2020
[26]

Improved Denoising Diffusion Prob- abilistic Models,

A. Q. Nichol and P. Dhariwal, “Improved Denoising Diffusion Prob- abilistic Models,” inInternational Conference on Machine Learning (ICML), 2021

work page 2021
[27]

Deep Unsupervised Learning Using Nonequilibrium Thermodynam- ics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep Unsupervised Learning Using Nonequilibrium Thermodynam- ics,” inInternational Conference on Machine Learning (ICML), 2015

work page 2015
[28]

Score-Based Generative Modeling Through Stochastic Differential Equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-Based Generative Modeling Through Stochastic Differential Equations,”International Conference on Representation Learning (ICLR), 2021

work page 2021
[29]

Is conditional generative modeling all you need for decision- making?

A. Ajay, Y . Du, A. Gupta, J. Tenenbaum, T. Jaakkola, and P. Agrawal, “Is conditional generative modeling all you need for decision- making?” inInternational Conference on Representation Learning (ICLR), 2023

work page 2023
[30]

Goal-Conditioned Im- itation Learning Using Score-Based Diffusion Policies,

M. Reuss, M. Li, X. Jia, and R. Lioutikov, “Goal-Conditioned Im- itation Learning Using Score-Based Diffusion Policies,”Robotics: Science and Systems (RSS), 2023

work page 2023
[31]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,

Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,” inInternational Conference on Representation Learning (ICLR), 2023

work page 2023
[32]

3D Diffusion Policy Generalizable Visuomotor Policy Learning via Simple 3D Representations,

Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3D Diffusion Policy Generalizable Visuomotor Policy Learning via Simple 3D Representations,”Robotics: Science and Systems (RSS), 2024

work page 2024
[33]

Diffusion-Based Generation, Optimization, and Planning in 3D Scenes,

S. Huang, Z. Wang, P. Li, B. Jia, T. Liu, Y . Zhu, W. Liang, and S.-C. Zhu, “Diffusion-Based Generation, Optimization, and Planning in 3D Scenes,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023
[34]

Learning Diverse Robot Striking Mo- tions with Diffusion Models and Kinematically Constrained Gradient Guidance,

K. M. Lee, S. Ye, Q. Xiao, Z. Wu, Z. Zaidi, D. B. D’Ambrosio, P. R. Sanketi, and M. Gombolay, “Learning Diverse Robot Striking Mo- tions with Diffusion Models and Kinematically Constrained Gradient Guidance,”arXiv preprint arXiv:2409.15528, 2024

work page arXiv 2024
[35]

EDMP: Ensemble-of-costs-guided Dif- fusion for Motion Planning,

K. Saha, V . Mandadi, J. Reddy, A. Srikanth, A. Agarwal, B. Sen, A. Singh, and M. Krishna, “EDMP: Ensemble-of-costs-guided Dif- fusion for Motion Planning,” inIEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024
[36]

Dynamics-Guided Diffusion Model for Robot Manipulator Design,

X. Xu, H. Ha, and S. Song, “Dynamics-Guided Diffusion Model for Robot Manipulator Design,” inConference on Robot Learning (CoRL), 2024

work page 2024
[37]

SAM 2: Segment Anything in Images and Videos

“SAM 2: Segment Anything in Images and Videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in Advances in Neural Information Processing Systems (NIPS), 2017

work page 2017
[39]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[1] [1]

Safety-Critical Manipulation for Collision-Free Food Preparation,

A. Singletary, W. Guffey, T. G. Molnar, R. Sinnet, and A. D. Ames, “Safety-Critical Manipulation for Collision-Free Food Preparation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 954– 10 961, 2022

work page 2022

[2] [2]

RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects,

Z. Xu, Z. Xian, X. Lin, C. Chi, Z. Huang, C. Gan, and S. Song, “RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects,”Robotics: Science and Systems (RSS), 2023

work page 2023

[3] [3]

MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loop,

R. Ye, Y . Hu, Y . A. Bian, L. Kulm, and T. Bhattacharjee, “MORPHeus: a Multimodal One-armed Robot-assisted Peeling System with Human Users In-the-loop,” inIEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024

[4] [4]

Leveraging multimodal haptic sensory data for robust cutting,

K. Zhang, M. Sharma, M. Veloso, and O. Kroemer, “Leveraging multimodal haptic sensory data for robust cutting,” inIEEE-RAS International Conference on Humanoid Robots (Humanoids), 2019

work page 2019

[5] [5]

LA V A: Long-horizon Visual Action based Food Acquisition,

A. Bhaskar, R. Liu, V . D. Sharma, G. Shi, and P. Tokekar, “LA V A: Long-horizon Visual Action based Food Acquisition,”IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2024

work page 2024

[6] [6]

Robot-Assisted Feeding: Gen- eralizing Skewering Strategies across Food Items on a Realistic Plate,

R. Feng, Y . Kim, G. Lee, E. K. Gordon, M. Schmittle, S. Kumar, T. Bhattacharjee, and S. S. Srinivasa, “Robot-Assisted Feeding: Gen- eralizing Skewering Strategies across Food Items on a Realistic Plate,” inThe International Symposium of Robotics Research, 2019

work page 2019

[7] [7]

Learning Bimanual Scooping Policies for Food Acquisition,

J. Grannen, Y . Wu, S. Belkhale, and D. Sadigh, “Learning Bimanual Scooping Policies for Food Acquisition,” inConference on Robot Learning (CoRL), 2022

work page 2022

[8] [8]

FLAIR: Feeding via Long-horizon AcquIsition of Re- alistic Dishes,

R. K. Jenamani, P. Sundaresan, M. Sakr, T. Bhattacharjee, and D. Sadigh, “FLAIR: Feeding via Long-horizon AcquIsition of Re- alistic Dishes,”Robotics: Science and Systems (RSS), 2024

work page 2024

[9] [9]

Kiri-Spoon: A Soft Shape-Changing Utensil for Robot-Assisted Feeding,

M. Keely, B. Franco, C. Grothoff, R. K. Jenamani, T. Bhattacharjee, D. P. Losey, and H. Nemlekar, “Kiri-Spoon: A Soft Shape-Changing Utensil for Robot-Assisted Feeding,” inIEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2024

work page 2024

[10] [10]

Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configu- rations and Food Types,

R. Liu, A. Bhaskar, and P. Tokekar, “Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configu- rations and Food Types,” inInternational Conference on Robotics and Automation (ICRA) - Assistive Systems: Lab to Patient Care, 2024

work page 2024

[11] [11]

IMRL: Integrat- ing Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition,

R. Liu, Z. Mahammad, A. Bhaskar, and P. Tokekar, “IMRL: Integrat- ing Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition,”arXiv preprint arXiv:2409.12092, 2024

work page arXiv 2024

[12] [12]

Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding,

P. Sundaresan, S. Belkhale, and D. Sadigh, “Learning Visuo-Haptic Skewering Strategies for Robot-Assisted Feeding,” inConference on Robot Learning (CoRL), 2022

work page 2022

[13] [13]

Learning Sequential Acqui- sition Policies for Robot-Assisted Feeding,

P. Sundaresan, J. Wu, and D. Sadigh, “Learning Sequential Acqui- sition Policies for Robot-Assisted Feeding,” inConference on Robot Learning (CoRL), 2023

work page 2023

[14] [14]

Scone: A Food Scooping Robot Learning Framework with Active Perception,

Y .-L. Tai, Y . C. Chiu, Y .-W. Chao, and Y .-T. Chen, “Scone: A Food Scooping Robot Learning Framework with Active Perception,” in Conference on Robot Learning (CoRL), 2023

work page 2023

[15] [15]

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion,”The International Journal of Robotics Research, 2024

work page 2024

[16] [16]

Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation,

H. Xue, J. Ren, W. Chen, G. Zhang, Y . Fang, G. Gu, H. Xu, and C. Lu, “Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation,” inRobotics: Science and Systems (RSS), 2025

work page 2025

[17] [17]

Planning with Dif- fusion for Flexible Behavior Synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with Dif- fusion for Flexible Behavior Synthesis,” inInternational Conference on Machine Learning (ICML), 2022

work page 2022

[18] [18]

Language- Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation,

H. Li, Q. Feng, Z. Zheng, J. Feng, and A. Knoll, “Language- Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation,” inarXiv preprint arXiv:2407.00451, 2024

work page arXiv 2024

[19] [19]

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models,

J. Carvalho, A. T. Le, M. Baierl, D. Koert, and J. Peters, “Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models,” inIEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS), 2023

work page 2023

[20] [20]

Orbit: A unified simulation framework for interactive robot learning environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automa- tion Letters, vol. 8, no. 6, pp. 3740–3747, 2023

work page 2023

[21] [21]

Diffusion Models Beat GANs on Image Synthesis,

P. Dhariwal and A. Nichol, “Diffusion Models Beat GANs on Image Synthesis,” inConference on Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[22] [22]

Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned,

D. Park, Y . Hoshi, H. P. Mahajan, H. K. Kim, Z. Erickson, W. A. Rogers, and C. C. Kemp, “Active Robot-Assisted Feeding with a General-Purpose Mobile Manipulator: Design, Evaluation, and Lessons Learned,” inRobotics and Autonomous Systems (RSS), 2020

work page 2020

[23] [23]

A survey of robot learning from demonstration,

B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,”Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009

work page 2009

[24] [24]

Robot learning from demonstration,

C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” inICML, vol. 97, 1997, pp. 12–20

work page 1997

[25] [25]

Denoising Diffusion Probabilistic Models,

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” inAdvances in Neural Information Processing Systems (NIPS), 2020

work page 2020

[26] [26]

Improved Denoising Diffusion Prob- abilistic Models,

A. Q. Nichol and P. Dhariwal, “Improved Denoising Diffusion Prob- abilistic Models,” inInternational Conference on Machine Learning (ICML), 2021

work page 2021

[27] [27]

Deep Unsupervised Learning Using Nonequilibrium Thermodynam- ics,

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep Unsupervised Learning Using Nonequilibrium Thermodynam- ics,” inInternational Conference on Machine Learning (ICML), 2015

work page 2015

[28] [28]

Score-Based Generative Modeling Through Stochastic Differential Equations,

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-Based Generative Modeling Through Stochastic Differential Equations,”International Conference on Representation Learning (ICLR), 2021

work page 2021

[29] [29]

Is conditional generative modeling all you need for decision- making?

A. Ajay, Y . Du, A. Gupta, J. Tenenbaum, T. Jaakkola, and P. Agrawal, “Is conditional generative modeling all you need for decision- making?” inInternational Conference on Representation Learning (ICLR), 2023

work page 2023

[30] [30]

Goal-Conditioned Im- itation Learning Using Score-Based Diffusion Policies,

M. Reuss, M. Li, X. Jia, and R. Lioutikov, “Goal-Conditioned Im- itation Learning Using Score-Based Diffusion Policies,”Robotics: Science and Systems (RSS), 2023

work page 2023

[31] [31]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,

Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,” inInternational Conference on Representation Learning (ICLR), 2023

work page 2023

[32] [32]

3D Diffusion Policy Generalizable Visuomotor Policy Learning via Simple 3D Representations,

Y . Ze, G. Zhang, K. Zhang, C. Hu, M. Wang, and H. Xu, “3D Diffusion Policy Generalizable Visuomotor Policy Learning via Simple 3D Representations,”Robotics: Science and Systems (RSS), 2024

work page 2024

[33] [33]

Diffusion-Based Generation, Optimization, and Planning in 3D Scenes,

S. Huang, Z. Wang, P. Li, B. Jia, T. Liu, Y . Zhu, W. Liang, and S.-C. Zhu, “Diffusion-Based Generation, Optimization, and Planning in 3D Scenes,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023

[34] [34]

Learning Diverse Robot Striking Mo- tions with Diffusion Models and Kinematically Constrained Gradient Guidance,

K. M. Lee, S. Ye, Q. Xiao, Z. Wu, Z. Zaidi, D. B. D’Ambrosio, P. R. Sanketi, and M. Gombolay, “Learning Diverse Robot Striking Mo- tions with Diffusion Models and Kinematically Constrained Gradient Guidance,”arXiv preprint arXiv:2409.15528, 2024

work page arXiv 2024

[35] [35]

EDMP: Ensemble-of-costs-guided Dif- fusion for Motion Planning,

K. Saha, V . Mandadi, J. Reddy, A. Srikanth, A. Agarwal, B. Sen, A. Singh, and M. Krishna, “EDMP: Ensemble-of-costs-guided Dif- fusion for Motion Planning,” inIEEE International Conference on Robotics and Automation (ICRA), 2024

work page 2024

[36] [36]

Dynamics-Guided Diffusion Model for Robot Manipulator Design,

X. Xu, H. Ha, and S. Song, “Dynamics-Guided Diffusion Model for Robot Manipulator Design,” inConference on Robot Learning (CoRL), 2024

work page 2024

[37] [37]

SAM 2: Segment Anything in Images and Videos

“SAM 2: Segment Anything in Images and Videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in Advances in Neural Information Processing Systems (NIPS), 2017

work page 2017

[39] [39]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010