Seam-to-Graph Reconstruction for Garment Configuration Alignment

Fuyuki Tokuda; Kai Tang; Kazuhiro Kosuge; Norman C. Tien; Xuzhao Huang

arxiv: 2606.15171 · v2 · pith:CI3QMMIBnew · submitted 2026-06-13 · 💻 cs.RO

Seam-to-Graph Reconstruction for Garment Configuration Alignment

Xuzhao Huang , Kai Tang , Fuyuki Tokuda , Norman C. Tien , Kazuhiro Kosuge This is my paper

Pith reviewed 2026-06-27 04:31 UTC · model grok-4.3

classification 💻 cs.RO

keywords garment manipulationseam reconstructiongraph neural networksvisual servoingbimanual roboticsdeformable objectsconfiguration alignment

0 comments

The pith

Seam observations are mapped to a structural skeleton graph to enable precise robotic garment alignment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how a neural network can turn partial seam observations on a garment into a complete topology-encoded graph that represents its structural skeleton. This graph then feeds a visual servoing controller that lets a bimanual robot load and align the garment to a target configuration. Real-robot trials confirm the method reaches human-level accuracy while cutting variance in error and working across different garments. The central idea is that seams carry enough hidden structure to support closed-loop control even when only fragments are visible.

Core claim

A Seam-to-Graph network based on graph neural networks and attention mechanisms converts unstructured, partially visible seam data into a topology-encoded structural skeleton graph that supports real-time state estimation and deformation-aware hierarchical visual servoing for garment configuration alignment on a bimanual robot.

What carries the argument

The Seam-to-Graph network, which reconstructs partial seam observations into a topology-encoded structural skeleton graph using graph neural networks and attention.

If this is right

The graph-based state estimate allows the controller to handle garment deformation during alignment.
The same pipeline achieves consistent performance across multiple garment types without retraining.
Alignment error variance drops below human demonstration levels while mean accuracy stays comparable.
The approach runs in real time on physical hardware for screen-printing platen loading.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The graph representation could be reused as input for learning-based policies in other deformable-object tasks.
Adding depth or tactile cues to the seam observations might further reduce partial-visibility failures.
The hierarchical servoing structure might generalize to multi-stage garment folding or folding sequences.

Load-bearing premise

Seams carry enough structural information about a garment to be mapped reliably into a graph that supports closed-loop control, even when only partially visible.

What would settle it

A controlled robot trial in which removing the seam-to-graph reconstruction step produces no measurable drop in alignment accuracy or increase in error variance compared with the full method.

Figures

Figures reproduced from arXiv: 2606.15171 by Fuyuki Tokuda, Kai Tang, Kazuhiro Kosuge, Norman C. Tien, Xuzhao Huang.

**Figure 2.** Figure 2: Overall pipeline. Four modules are included to achieve garment loading and alignment using a bimanual robot. (a) We use SAM2 [20] and a seam [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The architecture of the Seam-to-Graph network. The network is [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Unfolding operation to regional skeleton vertices. (a)(b)(c) A [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Hardware platform for experiments. A screen printing platen is [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study on real-world repeatability and segment-to-skeleton alignment. The Full group uses our proposed complete method with a dual-branch [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Clothing and qualitative results. The first row shows the garments used during the experiments. Our Seam-to-Graph network is only trained on [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Evaluation of garment loading and alignment performance. We [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Illustration of initial grasping positions. Yellow points indicate initial [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

read the original abstract

Seams encode rich structural information about garments but are frequently partially observable in robotic manipulation scenarios. To robustly leverage seam information, we propose a Seam-to-Graph network based on graph neural networks and attention mechanisms. This network maps unstructured seam observations to a topology-encoded structural skeleton graph for real-time garment state estimation. Using this skeleton-graph-based state estimation, we design a deformation-aware, hierarchical visual servoing controller for garment configuration alignment. We implement this controller on a bimanual robot system to load a garment onto a screen printing platen and to align it to the desired configuration precisely. Real-robot experiments demonstrate that the robot using the proposed method not only achieves human-level alignment accuracy with reduced variance in alignment error but is also robust to different garments. These results demonstrate that the use of seam information is effective for garment manipulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a seam-to-graph pipeline with GNNs for mapping partial seam observations to a skeleton for bimanual garment alignment control, but the abstract supplies no metrics or comparisons so the performance claims stay uncheckable.

read the letter

The paper's main contribution is a Seam-to-Graph network that takes partial seam observations, runs them through graph neural networks and attention, and outputs a topology-encoded skeleton graph. That graph then drives a deformation-aware hierarchical visual servoing controller on a bimanual robot for precise garment alignment on a screen-printing platen. The approach targets a concrete manufacturing step where fabric deforms but seams still carry usable structure.

The choice to encode seams as a graph makes sense because it preserves connectivity even under partial views and deformation. Integrating that state estimate directly into the controller is a practical step for closed-loop handling of soft objects. The claim of robustness across different garments also fits the industrial setting they describe.

The soft spot is the complete absence of numbers. The abstract states that the method reaches human-level accuracy with lower variance and works on varied garments, yet it gives no error values, trial counts, baselines, or training details. Without those, it is impossible to judge whether the graph mapping actually improves control or whether the controller design is doing most of the work. The assumption that partial seams can be mapped reliably enough for real-time servoing is central, but it remains untested in the provided text.

This work is aimed at roboticists who build systems for textile manipulation in factories. Someone already working on visual servoing or graph-based perception for deformables could extract the architecture and controller structure for their own experiments.

I would send it to peer review. The task is well-scoped and the method is a direct response to a real constraint, so referees can check the missing quantitative evidence and see whether the results hold up.

Referee Report

1 major / 0 minor

Summary. The paper proposes a Seam-to-Graph network based on graph neural networks and attention mechanisms that maps partial seam observations to a topology-encoded structural skeleton graph for real-time garment state estimation. This graph is then used to design a deformation-aware hierarchical visual servoing controller implemented on a bimanual robot system for precise garment alignment on a screen printing platen, with claims that real-robot experiments show human-level accuracy, reduced variance in alignment error, and robustness across different garments.

Significance. If the experimental claims hold with proper quantitative validation, the work could contribute to robotic handling of deformable objects by demonstrating effective use of seam-based structural information for state estimation and closed-loop control, an area where partial observability often limits performance.

major comments (1)

Abstract: The central claim that the method 'achieves human-level alignment accuracy with reduced variance in alignment error' and is 'robust to different garments' is presented without any quantitative metrics, baselines, trial counts, error bars, statistical tests, or validation procedures, rendering the experimental outcomes unverifiable and the soundness of the primary result impossible to assess.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We agree that the primary claims require quantitative support in the abstract itself to allow immediate assessment of the results.

read point-by-point responses

Referee: [—] Abstract: The central claim that the method 'achieves human-level alignment accuracy with reduced variance in alignment error' and is 'robust to different garments' is presented without any quantitative metrics, baselines, trial counts, error bars, statistical tests, or validation procedures, rendering the experimental outcomes unverifiable and the soundness of the primary result impossible to assess.

Authors: We agree that the abstract should include quantitative metrics to substantiate its claims. In the revised manuscript we will update the abstract to report key experimental results, including mean alignment error and standard deviation (with trial count), direct comparison to human performance, and evidence of robustness across garment types. The full experimental section already provides baselines, error bars, trial counts, and statistical details; the revision will ensure the abstract summarizes these findings concisely without altering the underlying data or analysis. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The provided abstract and context contain no equations, derivations, fitted parameters presented as predictions, or self-citations. The method description (Seam-to-Graph network using GNN/attention to produce a skeleton graph, followed by visual servoing) is presented at a high level without any reduction of outputs to inputs by construction. No load-bearing steps can be identified that match the enumerated circularity patterns. The derivation chain is not inspectable from the given text and appears self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities detailed beyond the high-level proposal of the network itself.

invented entities (1)

Seam-to-Graph network no independent evidence
purpose: Maps unstructured seam observations to topology-encoded structural skeleton graph
Core proposed component stated in abstract; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5678 in / 1141 out tokens · 44524 ms · 2026-06-27T04:31:46.127481+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Cloth grasp point detection based on multiple-view geometric cues with appli- cation to robotic towel folding,

J. Maitin-Shepard, M. Cusumano-Towner, J. Lei, and P. Abbeel, “Cloth grasp point detection based on multiple-view geometric cues with appli- cation to robotic towel folding,” in2010 IEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 2308–2315

2010
[2]

Learning-based fabric folding and box wrapping,

X. Wang, J. Zhao, X. Jiang, and Y .-H. Liu, “Learning-based fabric folding and box wrapping,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5703–5710, 2022

2022
[3]

Learning a general model: Folding clothing with topological dynamics,

Y . Liu, L. Han, E. Gu, and H. Wang, “Learning a general model: Folding clothing with topological dynamics,”arXiv preprint arXiv:2504.20720, 2025

work page arXiv 2025
[4]

Automated action generation based on action field for robotic garment smoothing and alignment,

H. Cheng, F. Tokuda, and K. Kosuge, “Automated action generation based on action field for robotic garment smoothing and alignment,” IEEE Transactions on Automation Science and Engineering, vol. 23, pp. 5884–5896, 2026

2026
[5]

SIS: Seam-informed strategy for t-shirt unfolding,

X. Huanget al., “SIS: Seam-informed strategy for t-shirt unfolding,” IEEE Robotics and Automation Letters, vol. 10, no. 7, pp. 7342–7349, 2025

2025
[6]

Garment diffusion models for robot- assisted dressing,

S. Kotsovolis and Y . Demiris, “Garment diffusion models for robot- assisted dressing,”IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 1217–1224, 2025

2025
[7]

Multi-critic reinforcement learning for garment handling: Addressing unpredictability in temporal-phase continuous contact tasks,

Y . Zhang, D. Chen, W. He, A. E. Petrilli Barcel ´o, J. V . Salazar Luces, and Y . Hirata, “Multi-critic reinforcement learning for garment handling: Addressing unpredictability in temporal-phase continuous contact tasks,” IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 10 741–10 752, 2025

2025
[8]

Graph- garment: Learning garment dynamics for bimanual cloth manipulation tasks,

W. Chen, K. Li, D. Lee, X. Chen, R. Zong, and P. Kormushev, “Graph- garment: Learning garment dynamics for bimanual cloth manipulation tasks,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 7615–7621

2025
[9]

Learning to grasp clothing structural regions for garment manipulation tasks,

W. Chen, D. Lee, D. Chappell, and N. Rojas, “Learning to grasp clothing structural regions for garment manipulation tasks,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 4889–4895

2023
[10]

A review of learning-based dynamics models for robotic manipulation,

B. Ai, S. Tian, H. Shi, Y . Wang, T. Pfaff, C. Tan, H. I. Christensen, H. Su, J. Wu, and Y . Li, “A review of learning-based dynamics models for robotic manipulation,”Science Robotics, vol. 10, no. 106, p. eadt1497, 2025

2025
[11]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichteret al., “pi0: A vision-language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Garmentnets: Category-level pose estimation for garments via canonical space shape completion,

C. Chi and S. Song, “Garmentnets: Category-level pose estimation for garments via canonical space shape completion,” in2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3304– 3313

2021
[13]

Fabric flattening and alignment system using real-time mesh-based state estimation and visual servoing,

E. Lo, X. Huang, K. Tang, A. Seino, F. Tokuda, and K. Kosuge, “Fabric flattening and alignment system using real-time mesh-based state estimation and visual servoing,” in2025 IEEE International Conference on Mechatronics and Automation (ICMA), 2025, pp. 328–334

2025
[14]

RTFF: Random-to-Target Fabric Flattening Policy using Dual-Arm Manipulator

K. Tang, D. Bhattacharya, H. Xu, F. Tokuda, N. C. Tien, and K. Kosuge, “Rtff: Random-to-target fabric flattening policy using dual-arm manipulator,” 2025. [Online]. Available: https://arxiv.org/abs/ 2510.00814

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Unigarmentmanip: A unified framework for category-level garment manipulation via dense visual correspondence,

R. Wu, H. Lu, Y . Wang, Y . Wang, and H. Dong, “Unigarmentmanip: A unified framework for category-level garment manipulation via dense visual correspondence,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16 340–16 350

2024
[16]

Development and control of robot hand with finger camera for garment handling tasks,

H. Kondo, J. V . S. Luces, and Y . Hirata, “Development and control of robot hand with finger camera for garment handling tasks,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 8940–8947

2022
[17]

Cloth-splatting: 3d state estimation from RGB supervision,

A. Longhini, M. B ¨usching, B. P. Duisterhof, J. Lundell, J. Ichnowski, M. Bj ¨orkman, and D. Kragic, “Cloth-splatting: 3d state estimation from RGB supervision,” in8th Annual Conference on Robot Learning, 2024. [Online]. Available: https://openreview.net/forum?id=WmWbswjTsi

2024
[18]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, G. Drettakiset al., “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 1–14, 2023

2023
[19]

Graphneuralcloth: A graph- neural-network-based framework for non-skinning cloth simulation,

Y . Li, X. Wang, X. Song, C. Fan, B. Zhu, J. Peng, Q. Liang, J. Lam, K.-Y . Sze, and K.-W. Kwok, “Graphneuralcloth: A graph- neural-network-based framework for non-skinning cloth simulation,” Advanced Intelligent Systems, vol. 8, no. 4, p. e202501120, 2026. [Online]. Available: https://advanced.onlinelibrary.wiley.com/doi/abs/10. 1002/aisy.202501120

2026
[20]

SAM 2: Segment anything in images and videos,

N. Raviet al., “SAM 2: Segment anything in images and videos,” inInternational Conference on Learning Representations, vol. 2025, 2025, pp. 28 085–28 128. [On- line]. Available: https://proceedings.iclr.cc/paper files/paper/2025/file/ 45c1f6a8cbf2da59ebf2c802b4f742cd-Paper-Conference.pdf

2025
[21]

A tutorial on visual servo control,

S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,”IEEE transactions on robotics and automation, vol. 12, no. 5, pp. 651–670, 1996. 11

1996
[22]

Visual servo control. i. basic ap- proaches,

F. Chaumette and S. Hutchinson, “Visual servo control. i. basic ap- proaches,”IEEE robotics & automation magazine, vol. 13, no. 4, pp. 82–90, 2006

2006
[23]

Trakdis: A transformer-based knowledge distillation approach for visual reinforcement learning with application to cloth manipulation,

W. Chen and N. Rojas, “Trakdis: A transformer-based knowledge distillation approach for visual reinforcement learning with application to cloth manipulation,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2455–2462, 2024

2024
[24]

Vision guided cable installation in constraint environments utilizing parametric curve rep- resentation,

X. Jiang, H. Wei, Z. Liu, W. Liao, and W. Ran, “Vision guided cable installation in constraint environments utilizing parametric curve rep- resentation,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 1418–1424

2025
[25]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

2024
[26]

The graph neural network model,

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,”IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009

2009
[27]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

2017
[28]

Graph attention networks,

P. Veli ˇckovi´cet al., “Graph attention networks,” inInternational con- ference on learning representations, vol. 6. Ithaca, 2018, p. 2

2018
[29]

Pointnet: Deep learning on point sets for 3d classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660

2017
[30]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[31]

A discussion of the solution for the best rotation to relate two sets of vectors,

W. Kabsch, “A discussion of the solution for the best rotation to relate two sets of vectors,”Acta Crystallogr. A, vol. 34, no. 5, pp. 827–828, Sep 1978

1978
[32]

Robot operating system 2: Design, architecture, and uses in the wild,

S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot operating system 2: Design, architecture, and uses in the wild,”Sci. Robot., vol. 7, no. 66, p. eabm6074, 2022

2022
[33]

Adaptive anisotropic remeshing for cloth simulation,

R. Narain, A. Samii, and J. F. O’Brien, “Adaptive anisotropic remeshing for cloth simulation,”ACM Transactions on Graphics, vol. 31, no. 6, pp. 147:1–10, Nov. 2012, proceedings of ACM SIGGRAPH Asia 2012, Singapore. [Online]. Available: http://graphics. berkeley.edu/papers/Narain-AAR-2012-11/

2012
[34]

Data-driven elastic models for cloth: modeling and measurement,

H. Wang, J. F. O’Brien, and R. Ramamoorthi, “Data-driven elastic models for cloth: modeling and measurement,”ACM Trans. Graph., vol. 30, no. 4, Jul. 2011. [Online]. Available: https: //doi.org/10data-dr.1145/2010324.1964966 Xuzhao Huang(Graduate Student Member, IEEE) received his B.Eng. degree in mechanical design, manufacturing, and automation from Xiame...

work page arXiv 2011

[1] [1]

Cloth grasp point detection based on multiple-view geometric cues with appli- cation to robotic towel folding,

J. Maitin-Shepard, M. Cusumano-Towner, J. Lei, and P. Abbeel, “Cloth grasp point detection based on multiple-view geometric cues with appli- cation to robotic towel folding,” in2010 IEEE International Conference on Robotics and Automation. IEEE, 2010, pp. 2308–2315

2010

[2] [2]

Learning-based fabric folding and box wrapping,

X. Wang, J. Zhao, X. Jiang, and Y .-H. Liu, “Learning-based fabric folding and box wrapping,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5703–5710, 2022

2022

[3] [3]

Learning a general model: Folding clothing with topological dynamics,

Y . Liu, L. Han, E. Gu, and H. Wang, “Learning a general model: Folding clothing with topological dynamics,”arXiv preprint arXiv:2504.20720, 2025

work page arXiv 2025

[4] [4]

Automated action generation based on action field for robotic garment smoothing and alignment,

H. Cheng, F. Tokuda, and K. Kosuge, “Automated action generation based on action field for robotic garment smoothing and alignment,” IEEE Transactions on Automation Science and Engineering, vol. 23, pp. 5884–5896, 2026

2026

[5] [5]

SIS: Seam-informed strategy for t-shirt unfolding,

X. Huanget al., “SIS: Seam-informed strategy for t-shirt unfolding,” IEEE Robotics and Automation Letters, vol. 10, no. 7, pp. 7342–7349, 2025

2025

[6] [6]

Garment diffusion models for robot- assisted dressing,

S. Kotsovolis and Y . Demiris, “Garment diffusion models for robot- assisted dressing,”IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 1217–1224, 2025

2025

[7] [7]

Multi-critic reinforcement learning for garment handling: Addressing unpredictability in temporal-phase continuous contact tasks,

Y . Zhang, D. Chen, W. He, A. E. Petrilli Barcel ´o, J. V . Salazar Luces, and Y . Hirata, “Multi-critic reinforcement learning for garment handling: Addressing unpredictability in temporal-phase continuous contact tasks,” IEEE Transactions on Automation Science and Engineering, vol. 22, pp. 10 741–10 752, 2025

2025

[8] [8]

Graph- garment: Learning garment dynamics for bimanual cloth manipulation tasks,

W. Chen, K. Li, D. Lee, X. Chen, R. Zong, and P. Kormushev, “Graph- garment: Learning garment dynamics for bimanual cloth manipulation tasks,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 7615–7621

2025

[9] [9]

Learning to grasp clothing structural regions for garment manipulation tasks,

W. Chen, D. Lee, D. Chappell, and N. Rojas, “Learning to grasp clothing structural regions for garment manipulation tasks,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 4889–4895

2023

[10] [10]

A review of learning-based dynamics models for robotic manipulation,

B. Ai, S. Tian, H. Shi, Y . Wang, T. Pfaff, C. Tan, H. I. Christensen, H. Su, J. Wu, and Y . Li, “A review of learning-based dynamics models for robotic manipulation,”Science Robotics, vol. 10, no. 106, p. eadt1497, 2025

2025

[11] [11]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichteret al., “pi0: A vision-language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Garmentnets: Category-level pose estimation for garments via canonical space shape completion,

C. Chi and S. Song, “Garmentnets: Category-level pose estimation for garments via canonical space shape completion,” in2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3304– 3313

2021

[13] [13]

Fabric flattening and alignment system using real-time mesh-based state estimation and visual servoing,

E. Lo, X. Huang, K. Tang, A. Seino, F. Tokuda, and K. Kosuge, “Fabric flattening and alignment system using real-time mesh-based state estimation and visual servoing,” in2025 IEEE International Conference on Mechatronics and Automation (ICMA), 2025, pp. 328–334

2025

[14] [14]

RTFF: Random-to-Target Fabric Flattening Policy using Dual-Arm Manipulator

K. Tang, D. Bhattacharya, H. Xu, F. Tokuda, N. C. Tien, and K. Kosuge, “Rtff: Random-to-target fabric flattening policy using dual-arm manipulator,” 2025. [Online]. Available: https://arxiv.org/abs/ 2510.00814

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Unigarmentmanip: A unified framework for category-level garment manipulation via dense visual correspondence,

R. Wu, H. Lu, Y . Wang, Y . Wang, and H. Dong, “Unigarmentmanip: A unified framework for category-level garment manipulation via dense visual correspondence,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16 340–16 350

2024

[16] [16]

Development and control of robot hand with finger camera for garment handling tasks,

H. Kondo, J. V . S. Luces, and Y . Hirata, “Development and control of robot hand with finger camera for garment handling tasks,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 8940–8947

2022

[17] [17]

Cloth-splatting: 3d state estimation from RGB supervision,

A. Longhini, M. B ¨usching, B. P. Duisterhof, J. Lundell, J. Ichnowski, M. Bj ¨orkman, and D. Kragic, “Cloth-splatting: 3d state estimation from RGB supervision,” in8th Annual Conference on Robot Learning, 2024. [Online]. Available: https://openreview.net/forum?id=WmWbswjTsi

2024

[18] [18]

3d gaussian splatting for real-time radiance field rendering

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, G. Drettakiset al., “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 1–14, 2023

2023

[19] [19]

Graphneuralcloth: A graph- neural-network-based framework for non-skinning cloth simulation,

Y . Li, X. Wang, X. Song, C. Fan, B. Zhu, J. Peng, Q. Liang, J. Lam, K.-Y . Sze, and K.-W. Kwok, “Graphneuralcloth: A graph- neural-network-based framework for non-skinning cloth simulation,” Advanced Intelligent Systems, vol. 8, no. 4, p. e202501120, 2026. [Online]. Available: https://advanced.onlinelibrary.wiley.com/doi/abs/10. 1002/aisy.202501120

2026

[20] [20]

SAM 2: Segment anything in images and videos,

N. Raviet al., “SAM 2: Segment anything in images and videos,” inInternational Conference on Learning Representations, vol. 2025, 2025, pp. 28 085–28 128. [On- line]. Available: https://proceedings.iclr.cc/paper files/paper/2025/file/ 45c1f6a8cbf2da59ebf2c802b4f742cd-Paper-Conference.pdf

2025

[21] [21]

A tutorial on visual servo control,

S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,”IEEE transactions on robotics and automation, vol. 12, no. 5, pp. 651–670, 1996. 11

1996

[22] [22]

Visual servo control. i. basic ap- proaches,

F. Chaumette and S. Hutchinson, “Visual servo control. i. basic ap- proaches,”IEEE robotics & automation magazine, vol. 13, no. 4, pp. 82–90, 2006

2006

[23] [23]

Trakdis: A transformer-based knowledge distillation approach for visual reinforcement learning with application to cloth manipulation,

W. Chen and N. Rojas, “Trakdis: A transformer-based knowledge distillation approach for visual reinforcement learning with application to cloth manipulation,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2455–2462, 2024

2024

[24] [24]

Vision guided cable installation in constraint environments utilizing parametric curve rep- resentation,

X. Jiang, H. Wei, Z. Liu, W. Liao, and W. Ran, “Vision guided cable installation in constraint environments utilizing parametric curve rep- resentation,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 1418–1424

2025

[25] [25]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

2024

[26] [26]

The graph neural network model,

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,”IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009

2009

[27] [27]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

2017

[28] [28]

Graph attention networks,

P. Veli ˇckovi´cet al., “Graph attention networks,” inInternational con- ference on learning representations, vol. 6. Ithaca, 2018, p. 2

2018

[29] [29]

Pointnet: Deep learning on point sets for 3d classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660

2017

[30] [30]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[31] [31]

A discussion of the solution for the best rotation to relate two sets of vectors,

W. Kabsch, “A discussion of the solution for the best rotation to relate two sets of vectors,”Acta Crystallogr. A, vol. 34, no. 5, pp. 827–828, Sep 1978

1978

[32] [32]

Robot operating system 2: Design, architecture, and uses in the wild,

S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot operating system 2: Design, architecture, and uses in the wild,”Sci. Robot., vol. 7, no. 66, p. eabm6074, 2022

2022

[33] [33]

Adaptive anisotropic remeshing for cloth simulation,

R. Narain, A. Samii, and J. F. O’Brien, “Adaptive anisotropic remeshing for cloth simulation,”ACM Transactions on Graphics, vol. 31, no. 6, pp. 147:1–10, Nov. 2012, proceedings of ACM SIGGRAPH Asia 2012, Singapore. [Online]. Available: http://graphics. berkeley.edu/papers/Narain-AAR-2012-11/

2012

[34] [34]

Data-driven elastic models for cloth: modeling and measurement,

H. Wang, J. F. O’Brien, and R. Ramamoorthi, “Data-driven elastic models for cloth: modeling and measurement,”ACM Trans. Graph., vol. 30, no. 4, Jul. 2011. [Online]. Available: https: //doi.org/10data-dr.1145/2010324.1964966 Xuzhao Huang(Graduate Student Member, IEEE) received his B.Eng. degree in mechanical design, manufacturing, and automation from Xiame...

work page arXiv 2011