GarmNet: Improving Global with Local Perception for Robotic Laundry Folding

Daniel Fernandes Gomes; Luis F. Teixeira; Shan Luo

arxiv: 1907.00408 · v1 · pith:IOKQ2ET7new · submitted 2019-06-30 · 💻 cs.RO · cs.CV

GarmNet: Improving Global with Local Perception for Robotic Laundry Folding

Daniel Fernandes Gomes , Shan Luo , Luis F. Teixeira This is my paper

Pith reviewed 2026-05-25 12:23 UTC · model grok-4.3

classification 💻 cs.RO cs.CV

keywords garment localizationlandmark detectionrobotic foldingend-to-end deep learningmulti-task perceptionCloPeMa datasetclothing manipulation

0 comments

The pith

GarmNet performs garment localization and landmark detection together in one network, cutting localization error by 24.7 percent on the CloPeMa dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GarmNet as an end-to-end model that handles both global garment localization for category recognition and local landmark detection for grasping in a single network. Prior approaches treated these tasks separately, which limited how well robots could perceive varied garment states. Training and testing on 3,330 images from the CloPeMa Garment dataset shows that adding the landmark task improves localization accuracy. The result is presented as a scalable, efficient perception solution for robotic laundry folding.

Core claim

GarmNet simultaneously localizes the garment as a whole and detects landmarks for grasping. Localization supplies global information to recognize garment category, while landmark detection supports grasping actions. When landmark detection is included, garment localization error drops by 24.7 percent compared with localization alone.

What carries the argument

GarmNet, an end-to-end deep learning model that jointly outputs garment localization and landmark detections.

If this is right

Robots obtain both category recognition and grasping cues from one forward pass, reducing separate processing steps.
The combined representation supports handling a wider range of crumpled garment configurations than single-task models.
Memory and compute stay low enough for deployment on robotic platforms that must run multiple domestic tasks.
The same joint-perception pattern can be applied to other garment types in the dataset without redesigning separate networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the joint-training benefit generalizes, similar multi-task networks could reduce error in other robotic perception problems that combine global scene understanding with local action points.
Real-robot folding trials would be needed to check whether the dataset error reduction produces higher end-to-end success rates under variable lighting and fabric stretch.
The approach leaves open whether adding more auxiliary tasks, such as grasp quality prediction, would yield further localization gains.

Load-bearing premise

The reported error reduction is produced by the joint training of localization and landmark detection rather than by differences in model size, training details, or data handling.

What would settle it

Training two models of identical capacity on the identical CloPeMa split, one with only localization and one with both tasks, and finding no meaningful difference in localization error.

Figures

Figures reproduced from arXiv: 1907.00408 by Daniel Fernandes Gomes, Luis F. Teixeira, Shan Luo.

**Figure 1.** Figure 1: GarmNet macro view, UML [3] Components Diagram. The architecture is broken into three blocks (components): Feature extractor, Landmark Detector and Garment Localizer; that output: intermediate features at two depths, landmarks classes+localizations and garment class+localization. Feature extractor We implement the feature extraction module with a Fully Convolutional Neural Network (FCNN), a 50-layer ResNet… view at source ↗

**Figure 2.** Figure 2: Landmark detector component, UML[3] representation. After one intermediate branch, two separate branches output 18 × 18 landmark proposals (classification and location). This block, implemented with convolutional layers, can be interpreted as small fully connected network sliding over the feature extractor output. Garment localizer To perform the localization of the piece of clothing present in the image,… view at source ↗

**Figure 3.** Figure 3: Garment localizer component, UML[3] representation. Similar to the landmark detector 2, yet fully connected layers are used. The Intermediate layer outputs a 512-d, the classifier a 9-d (one hot encoded classes) and the regressor the 3-d (x,y, with and height) vectors. 4 Experiments Our implementation was performed using Keras4 framework with the TensorFlow5 back-end. All experiments were carried out on … view at source ↗

**Figure 4.** Figure 4: Representative cases of the result of applying the spacial constraint loss. At the top row, predictions with composed loss, at the middle, without, and the bottom, the ground truth [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: GarmNet-B (introduced in 4.5) representation using UML[3]. The output emitted by the classifier block from the Landmark Detector branch is concatenated with the Feature Extractor output before being fed into the intermediate layer [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Developing autonomous assistants to help with domestic tasks is a vital topic in robotics research. Among these tasks, garment folding is one of them that is still far from being achieved mainly due to the large number of possible configurations that a crumpled piece of clothing may exhibit. Research has been done on either estimating the pose of the garment as a whole or detecting the landmarks for grasping separately. However, such works constrain the capability of the robots to perceive the states of the garment by limiting the representations for one single task. In this paper, we propose a novel end-to-end deep learning model named GarmNet that is able to simultaneously localize the garment and detect landmarks for grasping. The localization of the garment represents the global information for recognising the category of the garment, whereas the detection of landmarks can facilitate subsequent grasping actions. We train and evaluate our proposed GarmNet model using the CloPeMa Garment dataset that contains 3,330 images of different garment types in different poses. The experiments show that the inclusion of landmark detection (GarmNet-B) can largely improve the garment localization, with an error rate of 24.7% lower. Solutions as ours are important for robotics applications, as these offer scalable to many classes, memory and processing efficient solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GarmNet combines localization and landmark detection for garments and reports a 24.7% error drop, but the abstract leaves the baseline comparison under-specified.

read the letter

The main takeaway is that adding landmark detection to a garment localizer produces a 24.7% error reduction on the CloPeMa dataset, but the paper does not isolate whether that gain comes from joint training or from other differences in the models. GarmNet is an end-to-end network that outputs both the garment bounding box or pose and the keypoint locations for grasping. The authors train it on 3330 images covering different garment types and crumpled states. The joint version (GarmNet-B) beats the localization-only version. This is a direct extension of separate-task methods into a multi-task setup, which fits the needs of a folding robot that needs both global category info and local grasp points. The work is useful because it keeps the model lightweight and avoids running two separate networks at inference time. The motivation section lays out the problem clearly for domestic robotics. The main weakness is the lack of detail on the baseline comparison. The abstract gives the percentage but no architecture table, parameter counts, or training schedule match. If GarmNet-B has a larger backbone or different hyperparameters, the improvement cannot be credited to the landmark loss. The paper also assumes the CloPeMa results will carry over to physical robots without any domain adaptation tests or real-robot trials. Those gaps make the central claim harder to evaluate. This paper is aimed at applied roboticists who need perception for deformable manipulation. A reader looking for a ready-to-use component for laundry tasks will get value from the architecture and the dataset usage. It is not a methods breakthrough, but the practical framing is reasonable. I would recommend sending it to peer review. The experimental claim is important enough to check, even if revisions will be needed to tighten the controls.

Referee Report

3 major / 2 minor

Summary. The paper proposes GarmNet, an end-to-end CNN for simultaneous garment localization (global category recognition) and landmark detection (for grasping) in robotic laundry folding. It evaluates the model on the CloPeMa Garment dataset (3,330 images) and reports that adding landmark detection (GarmNet-B) reduces garment localization error by 24.7% compared to localization-only training.

Significance. If the reported improvement is attributable to joint training rather than capacity differences and generalizes beyond the fixed dataset, the multi-task formulation could provide a scalable, efficient perception module for domestic robotics tasks involving deformable objects. The work addresses a practical gap between separate global-pose and local-landmark pipelines.

major comments (3)

[Abstract / Experiments] Abstract and experiments section: The central claim of a 24.7% localization error reduction for GarmNet-B is presented without any description of the baseline architecture (GarmNet-A), parameter counts, backbone depth, training schedule, data augmentation, or loss-weighting scheme. Without an explicit statement that the only difference is the added landmark head and multi-task loss, the improvement cannot be isolated from confounding factors such as increased model capacity.
[Experiments] Experiments: No information is supplied on train/test splits, cross-validation, statistical significance testing, or variance across runs. The reported error reduction is therefore an empirical fit on a single fixed dataset whose robustness to different partitions or hyperparameter choices remains unverified.
[Introduction / Conclusion] Introduction and conclusion: The paper assumes that performance on the CloPeMa dataset will transfer to real robotic folding scenarios, yet no domain-shift, sim-to-real, or physical-robot experiments are described to support this transfer claim.

minor comments (2)

[Abstract] The abstract states the model is 'scalable to many classes, memory and processing efficient' but provides no supporting measurements (e.g., FLOPs, parameter counts, inference time) relative to single-task baselines.
[Methods] Notation for the two variants (GarmNet-A vs. GarmNet-B) is introduced only in the abstract; a clear definition and diagram in the methods section would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our multi-task approach. We address each point below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and experiments section: The central claim of a 24.7% localization error reduction for GarmNet-B is presented without any description of the baseline architecture (GarmNet-A), parameter counts, backbone depth, training schedule, data augmentation, or loss-weighting scheme. Without an explicit statement that the only difference is the added landmark head and multi-task loss, the improvement cannot be isolated from confounding factors such as increased model capacity.

Authors: We agree that additional architectural and training details are needed to isolate the effect of joint training. In the revised manuscript, we will expand the experiments section with a table comparing GarmNet-A and GarmNet-B, explicitly stating that the backbone, parameter counts (except for the added landmark head), training schedule, data augmentation, and loss weighting remain identical, with the sole difference being the addition of the landmark detection head and its multi-task loss term. revision: yes
Referee: [Experiments] Experiments: No information is supplied on train/test splits, cross-validation, statistical significance testing, or variance across runs. The reported error reduction is therefore an empirical fit on a single fixed dataset whose robustness to different partitions or hyperparameter choices remains unverified.

Authors: We will add the train/test split details (proportions and any randomization seed) used for the 3,330-image CloPeMa dataset to the experiments section. The original evaluation was performed on a single fixed partition without multiple runs; we will note this limitation explicitly and, where possible, report results from additional runs with varied seeds to provide variance estimates. revision: partial
Referee: [Introduction / Conclusion] Introduction and conclusion: The paper assumes that performance on the CloPeMa dataset will transfer to real robotic folding scenarios, yet no domain-shift, sim-to-real, or physical-robot experiments are described to support this transfer claim.

Authors: The manuscript evaluates the perception module on the CloPeMa dataset and discusses its relevance to robotic applications. We will revise the introduction and conclusion to remove any implication of direct transfer, instead stating that the results demonstrate improved perception on this dataset and that validation on physical robots or under domain shift remains future work. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance comparison on fixed dataset with no load-bearing derivations or self-citations

full rationale

The paper presents an empirical ML model (GarmNet) trained and evaluated on the CloPeMa Garment dataset. The central claim is a measured 24.7% error reduction when adding a landmark detection head, reported directly from experimental results rather than any mathematical derivation, prediction, or first-principles chain. No equations, ansatzes, uniqueness theorems, or self-citations are invoked as load-bearing steps. The result is a standard train/evaluate comparison on a fixed dataset and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim rests on empirical training of a deep network whose many weights are fitted to the CloPeMa images and on the untested assumption that the dataset captures the variability needed for robotic deployment.

free parameters (1)

multi-task loss weighting and backbone choice
Hyperparameters that control how localization and landmark losses are balanced and which CNN backbone is used; these are selected to produce the reported improvement.

axioms (1)

domain assumption The CloPeMa Garment dataset of 3,330 images is representative of garment configurations encountered in robotic folding.
All training and evaluation occur on this dataset; generalization claims depend on it.

pith-pipeline@v0.9.0 · 5755 in / 1254 out tokens · 29351 ms · 2026-05-25T12:23:18.619063+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 6 internal anchors

[1]

Pattern Recognition 74, 629 – 641 (2018)

Corona, E., Aleny, G., Gabas, A., Torras, C.: Active garment recognition and target grasping point detection using deep learning. Pattern Recognition 74, 629 – 641 (2018). https://doi.org/https://doi.org/10.1016/j.patcog.2017.09.042, http: //www.sciencedirect.com/science/article/pii/S0031320317303941

work page doi:10.1016/j.patcog.2017.09.042 2018
[2]

In: CVPR09 (2009) GarmNet: Improving Global with Local Perception for Robotic Laundry

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large- Scale Hierarchical Image Database. In: CVPR09 (2009) GarmNet: Improving Global with Local Perception for Robotic Laundry... 11

work page 2009
[3]

https://doi.org/10.1007/3-540-44988-4 3

Engels, G., Heckel, R., Sauer, S.: Uml - a universal modeling language? LNCS (10 2000). https://doi.org/10.1007/3-540-44988-4 3

work page doi:10.1007/3-540-44988-4 2000
[4]

Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (Jun 2010). https://doi.org/10.1007/s11263-009-0275-4, http://dx.doi.org/10.1007/ s11263-009-0275-4

work page doi:10.1007/s11263-009-0275-4 2010
[5]

Fast R-CNN

Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015), http://arxiv.org/ abs/1504.08083

work page internal anchor Pith review Pith/arXiv arXiv 2015
[6]

Rich feature hierarchies for accurate object detection and semantic segmentation

Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for ac- curate object detection and semantic segmentation. CoRRabs/1311.2524 (2013), http://arxiv.org/abs/1311.2524

work page internal anchor Pith review Pith/arXiv arXiv 2013
[7]

Deep Residual Learning for Image Recognition

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015), http://arxiv.org/abs/1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2015
[8]

In: Advances in Neural Information Processing Systems (2012)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep con- volutional neural networks. In: Advances in Neural Information Processing Systems (2012)

work page 2012
[9]

The Handbook of Brain Theory and Neural Networks (01 1995)

Lecun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. The Handbook of Brain Theory and Neural Networks (01 1995)

work page 1995
[10]

In: Proceed- ings of the IEEE International Conference on Robotics and Automation (ICRA) (2019)

Lee, J.T., Bollegala, D., Luo, S.: ”Touching to See” and” Seeing to Feel”: Robotic Cross-modal Sensory Data Generation for Visual-Tactile Perception. In: Proceed- ings of the IEEE International Conference on Robotics and Automation (ICRA) (2019)

work page 2019
[11]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2014)

Li, Y., Chen, C.F., Allen, P.K.: Recognition of deformable object category and pose. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2014)

work page 2014
[12]

In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

work page 2016
[13]

Mechatronics 48, 54–67 (2017)

Luo, S., Bimbo, J., Dahiya, R., Liu, H.: Robotic tactile perception of object prop- erties: A review. Mechatronics 48, 54–67 (2017)

work page 2017
[14]

Autonomous Robots pp

Luo, S., Mou, W., Althoefer, K., Liu, H.: iCLAP: Shape recognition by combining proprioception and touch sensing. Autonomous Robots pp. 1–12 (2018)

work page 2018
[15]

In: 2010 IEEE International Conference on Robotics and Automation

Maitin-Shepard, J., Cusumano-Towner, M., Lei, J., Abbeel, P.: Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: 2010 IEEE International Conference on Robotics and Automation. pp. 2308–2315 (May 2010). https://doi.org/10.1109/ROBOT.2010.5509439

work page doi:10.1109/robot.2010.5509439 2010
[16]

In: 2015 Inter- national Conference on Advanced Robotics (ICAR)

Mariolis, I., Peleka, G., Kargakos, A., Malassiotis, S.: Pose and category recognition of highly deformable objects using deep learning. In: 2015 Inter- national Conference on Advanced Robotics (ICAR). pp. 655–662. IEEE (jul 2015). https://doi.org/10.1109/ICAR.2015.7251526, http://ieeexplore.ieee. org/document/7251526/

work page doi:10.1109/icar.2015.7251526 2015
[17]

You Only Look Once: Unified, Real-Time Object Detection

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Uniﬁed, real-time object detection. CoRR abs/1506.02640 (2015), http://arxiv.org/ abs/1506.02640

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

YOLO9000: Better, Faster, Stronger

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016), http://arxiv.org/abs/1612.08242

work page internal anchor Pith review Pith/arXiv arXiv 2016
[19]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015), http: //arxiv.org/abs/1506.01497 12 Daniel Fernandes Gomes, Shan Luo, and Luis F. Teixeira

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Expert Systems with Applications 116, 328 – 339 (2019)

Seo, Y., shik Shin, K.: Hierarchical convolutional neural networks for fash- ion image classiﬁcation. Expert Systems with Applications 116, 328 – 339 (2019). https://doi.org/https://doi.org/10.1016/j.eswa.2018.09.022, http://www. sciencedirect.com/science/article/pii/S0957417418305992

work page doi:10.1016/j.eswa.2018.09.022 2019
[21]

Wagner, L., K.D., Smutn, V.: Ctu color and depth image dataset of spread gar- ments. Tech. Rep. CTUCMP201325, Center for Machine Perception, K13133 FEE Czech Technical University, Prague, Czech Republic (September 2013)

work page 2013
[22]

2015 IEEE International Conference on Robotics and Biomimetics, IEEE-ROBIO 2015 pp

Yamazaki, K.: Instance recognition of clumped clothing using image fea- tures focusing on clothing fabrics and wrinkles. 2015 IEEE International Conference on Robotics and Biomimetics, IEEE-ROBIO 2015 pp. 1102–1108 (2016). https://doi.org/10.1109/ROBIO.2015.7418919, http://dx.doi.org/10. 1007/s10514-016-9559-z

work page doi:10.1109/robio.2015.7418919 2015
[23]

In: Macq, B., Schelkens, P

Yang, M., Yu, K.: Real-time clothing recognition in surveillance videos. In: Macq, B., Schelkens, P. (eds.) ICIP. pp. 2937–2940. IEEE (2011), http://dblp. uni-trier.de/db/conf/icip/icip2011.html#YangY11 GarmNet: Improving Global with Local Perception for Robotic Laundry... 13 6 Appendix T able 2. Summary of landmark Classiﬁcation+Localization, as follows:...

work page 2011

[1] [1]

Pattern Recognition 74, 629 – 641 (2018)

Corona, E., Aleny, G., Gabas, A., Torras, C.: Active garment recognition and target grasping point detection using deep learning. Pattern Recognition 74, 629 – 641 (2018). https://doi.org/https://doi.org/10.1016/j.patcog.2017.09.042, http: //www.sciencedirect.com/science/article/pii/S0031320317303941

work page doi:10.1016/j.patcog.2017.09.042 2018

[2] [2]

In: CVPR09 (2009) GarmNet: Improving Global with Local Perception for Robotic Laundry

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large- Scale Hierarchical Image Database. In: CVPR09 (2009) GarmNet: Improving Global with Local Perception for Robotic Laundry... 11

work page 2009

[3] [3]

https://doi.org/10.1007/3-540-44988-4 3

Engels, G., Heckel, R., Sauer, S.: Uml - a universal modeling language? LNCS (10 2000). https://doi.org/10.1007/3-540-44988-4 3

work page doi:10.1007/3-540-44988-4 2000

[4] [4]

Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (Jun 2010). https://doi.org/10.1007/s11263-009-0275-4, http://dx.doi.org/10.1007/ s11263-009-0275-4

work page doi:10.1007/s11263-009-0275-4 2010

[5] [5]

Fast R-CNN

Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015), http://arxiv.org/ abs/1504.08083

work page internal anchor Pith review Pith/arXiv arXiv 2015

[6] [6]

Rich feature hierarchies for accurate object detection and semantic segmentation

Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for ac- curate object detection and semantic segmentation. CoRRabs/1311.2524 (2013), http://arxiv.org/abs/1311.2524

work page internal anchor Pith review Pith/arXiv arXiv 2013

[7] [7]

Deep Residual Learning for Image Recognition

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015), http://arxiv.org/abs/1512.03385

work page internal anchor Pith review Pith/arXiv arXiv 2015

[8] [8]

In: Advances in Neural Information Processing Systems (2012)

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classiﬁcation with deep con- volutional neural networks. In: Advances in Neural Information Processing Systems (2012)

work page 2012

[9] [9]

The Handbook of Brain Theory and Neural Networks (01 1995)

Lecun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. The Handbook of Brain Theory and Neural Networks (01 1995)

work page 1995

[10] [10]

In: Proceed- ings of the IEEE International Conference on Robotics and Automation (ICRA) (2019)

Lee, J.T., Bollegala, D., Luo, S.: ”Touching to See” and” Seeing to Feel”: Robotic Cross-modal Sensory Data Generation for Visual-Tactile Perception. In: Proceed- ings of the IEEE International Conference on Robotics and Automation (ICRA) (2019)

work page 2019

[11] [11]

In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2014)

Li, Y., Chen, C.F., Allen, P.K.: Recognition of deformable object category and pose. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2014)

work page 2014

[12] [12]

In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

work page 2016

[13] [13]

Mechatronics 48, 54–67 (2017)

Luo, S., Bimbo, J., Dahiya, R., Liu, H.: Robotic tactile perception of object prop- erties: A review. Mechatronics 48, 54–67 (2017)

work page 2017

[14] [14]

Autonomous Robots pp

Luo, S., Mou, W., Althoefer, K., Liu, H.: iCLAP: Shape recognition by combining proprioception and touch sensing. Autonomous Robots pp. 1–12 (2018)

work page 2018

[15] [15]

In: 2010 IEEE International Conference on Robotics and Automation

Maitin-Shepard, J., Cusumano-Towner, M., Lei, J., Abbeel, P.: Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: 2010 IEEE International Conference on Robotics and Automation. pp. 2308–2315 (May 2010). https://doi.org/10.1109/ROBOT.2010.5509439

work page doi:10.1109/robot.2010.5509439 2010

[16] [16]

In: 2015 Inter- national Conference on Advanced Robotics (ICAR)

Mariolis, I., Peleka, G., Kargakos, A., Malassiotis, S.: Pose and category recognition of highly deformable objects using deep learning. In: 2015 Inter- national Conference on Advanced Robotics (ICAR). pp. 655–662. IEEE (jul 2015). https://doi.org/10.1109/ICAR.2015.7251526, http://ieeexplore.ieee. org/document/7251526/

work page doi:10.1109/icar.2015.7251526 2015

[17] [17]

You Only Look Once: Unified, Real-Time Object Detection

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: Uniﬁed, real-time object detection. CoRR abs/1506.02640 (2015), http://arxiv.org/ abs/1506.02640

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

YOLO9000: Better, Faster, Stronger

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016), http://arxiv.org/abs/1612.08242

work page internal anchor Pith review Pith/arXiv arXiv 2016

[19] [19]

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015), http: //arxiv.org/abs/1506.01497 12 Daniel Fernandes Gomes, Shan Luo, and Luis F. Teixeira

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Expert Systems with Applications 116, 328 – 339 (2019)

Seo, Y., shik Shin, K.: Hierarchical convolutional neural networks for fash- ion image classiﬁcation. Expert Systems with Applications 116, 328 – 339 (2019). https://doi.org/https://doi.org/10.1016/j.eswa.2018.09.022, http://www. sciencedirect.com/science/article/pii/S0957417418305992

work page doi:10.1016/j.eswa.2018.09.022 2019

[21] [21]

Wagner, L., K.D., Smutn, V.: Ctu color and depth image dataset of spread gar- ments. Tech. Rep. CTUCMP201325, Center for Machine Perception, K13133 FEE Czech Technical University, Prague, Czech Republic (September 2013)

work page 2013

[22] [22]

2015 IEEE International Conference on Robotics and Biomimetics, IEEE-ROBIO 2015 pp

Yamazaki, K.: Instance recognition of clumped clothing using image fea- tures focusing on clothing fabrics and wrinkles. 2015 IEEE International Conference on Robotics and Biomimetics, IEEE-ROBIO 2015 pp. 1102–1108 (2016). https://doi.org/10.1109/ROBIO.2015.7418919, http://dx.doi.org/10. 1007/s10514-016-9559-z

work page doi:10.1109/robio.2015.7418919 2015

[23] [23]

In: Macq, B., Schelkens, P

Yang, M., Yu, K.: Real-time clothing recognition in surveillance videos. In: Macq, B., Schelkens, P. (eds.) ICIP. pp. 2937–2940. IEEE (2011), http://dblp. uni-trier.de/db/conf/icip/icip2011.html#YangY11 GarmNet: Improving Global with Local Perception for Robotic Laundry... 13 6 Appendix T able 2. Summary of landmark Classiﬁcation+Localization, as follows:...

work page 2011