{"paper":{"title":"LACE: Latent Visual Representation for Cross-Embodiment Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"LACE aligns latent visual features of humans and robots using sparse body-part correspondences from one demonstration to enable effective cross-embodiment policy transfer.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Cristina Mata, Jorge Mendez-Mendez, Kanchana Ranasinghe, Michael S. Ryoo, Yichi Zhang, Yoo Sung Jang","submitted_at":"2026-05-16T01:50:18Z","abstract_excerpt":"Cross-embodiment learning from human demonstrations is hindered by the visual gap between human and robot embodiments. While self-supervised learning (SSL) backbones encode rich inter-class semantics of general objects, we show they fail to establish correspondence between human and robot hands. We propose LACE, a framework that aligns human and robot visual representations in the latent space of these backbones by leveraging correspondences between shared body parts across embodiments as sparse supervision. These annotations can be automatically obtained via forward kinematics, and single rob"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"In zero-shot transfer, policies using LACE-DINO outperform those using DINO by a large margin (65%), with consistent gains in low-data regimes and out-of-distribution environments.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That sparse correspondences between shared body parts (automatically obtained via forward kinematics from a single robot demonstration) are sufficient to lift patch-level supervision to reliable semantic-level alignment in the latent space without degrading the quality of the pretrained SSL backbone features.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"LACE aligns latent visual features of humans and robots using sparse body-part correspondences from one demonstration to enable effective cross-embodiment policy transfer.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"6090e455cd7680dff8f5bf093a09b947b87f744e90f901a420465a9add085419"},"source":{"id":"2605.16743","kind":"arxiv","version":1},"verdict":{"id":"0d8745d7-60a2-4e0e-962c-8c996b3c8d1e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T21:34:46.665590Z","strongest_claim":"In zero-shot transfer, policies using LACE-DINO outperform those using DINO by a large margin (65%), with consistent gains in low-data regimes and out-of-distribution environments.","one_line_summary":"LACE aligns human-robot visual features via semantic distribution matching on corresponding body parts plus Gram loss, yielding 65% better zero-shot policy transfer than baseline DINO.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That sparse correspondences between shared body parts (automatically obtained via forward kinematics from a single robot demonstration) are sufficient to lift patch-level supervision to reliable semantic-level alignment in the latent space without degrading the quality of the pretrained SSL backbone features.","pith_extraction_headline":"LACE aligns latent visual features of humans and robots using sparse body-part correspondences from one demonstration to enable effective cross-embodiment policy transfer."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16743/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T22:01:19.844242Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T21:40:55.234375Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T19:01:56.332605Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:26.462290Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"709c6275fb83b519c85dcd0336ca851e35ba0171025dfcd72559f291fba215dc"},"references":{"count":73,"sample":[{"doi":"10.1109/icra57147.2024.10611477","year":2024,"title":"Idd-x: A multi-view dataset for ego-relative important object localization and explanation in den se and unstructured traﬃc","work_id":"be79e919-e91f-4ecb-8b06-6b3091bc58b1","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.15607/rss.2024.xx.120","year":2024,"title":"DROID: A large-scale in-the-wild robot manipulation dataset","work_id":"975b2832-68f4-4b30-83de-f9cae34622a3","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Bridgedata v2: A dataset for robot learning at scale","work_id":"9e225a90-0e29-4a77-9ba1-bf950857562a","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Humanoid policy˜ human policy","work_id":"5105d696-60c7-4b64-97d8-a65aa9cebde8","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Kanchana Ranasinghe, Xiang Li, Cristina Mata, Jong Sung Park, and Michael S. Ryoo. Pixel motion as universal representation for robot control.ArXiv, 2025","work_id":"1c0db2af-1e44-457e-81e7-c19a68b7966a","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":73,"snapshot_sha256":"b5af80aeab6ce0b06e37e3aca79fe6ab7ba2f746b2e4ba616e1f6d2631ebb86b","internal_anchors":16},"formal_canon":{"evidence_count":2,"snapshot_sha256":"d483b6fd6bd18125979af8f6d1afabb83de09c4e7cd1e296921fcbc085813e76"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}