{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:SY66OVTCYN53FBFZDHRLC5CBLU","short_pith_number":"pith:SY66OVTC","schema_version":"1.0","canonical_sha256":"963de75662c37bb284b919e2b174415d2a3fb2bea17f1e521e4c02569ce876cb","source":{"kind":"arxiv","id":"2603.02115","version":2},"attestation_state":"computed","paper":{"title":"Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Robometer trains generalizable robot reward models by combining frame-level progress with inter-trajectory preferences.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.RO","authors_text":"Abhishek Gupta, Abrar Anwar, Aditya Shah, Alex S. Huang, Andreea Bobu, Anqi Li, Anthony Liang, Dieter Fox, Erdem Biyik, Jesse Zhang, Jiahui Zhang, Luke Zettlemoyer, Minyoung Hwang, Sidhant Kaushik, Stephen Tu, Yigit Korkmaz, Yu Xiang","submitted_at":"2026-03-02T17:38:58Z","abstract_excerpt":"General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer is trained with a dual objective: a frame-level progress"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2603.02115","kind":"arxiv","version":2},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.RO","submitted_at":"2026-03-02T17:38:58Z","cross_cats_sorted":["cs.AI","cs.LG"],"title_canon_sha256":"4761d45d1f37188f3842b83e93a2b0103d0ebfc4a4ca08f4246107d12684c2b2","abstract_canon_sha256":"a505b19a79e578f89d121e8e4ec6a3cd4397029daefce162deaa908f5549ae4a"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:15.891477Z","signature_b64":"1J4fwczMI/6rhKzGFRcwwzg+6cGyRNvszvGEuJlAX7Vgk3RC47Lm1xg01LopIJ8tSEHCuAdsFtBZgGrDxnnzDA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"963de75662c37bb284b919e2b174415d2a3fb2bea17f1e521e4c02569ce876cb","last_reissued_at":"2026-05-17T23:39:15.890639Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:15.890639Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Robometer trains generalizable robot reward models by combining frame-level progress with inter-trajectory preferences.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.RO","authors_text":"Abhishek Gupta, Abrar Anwar, Aditya Shah, Alex S. Huang, Andreea Bobu, Anqi Li, Anthony Liang, Dieter Fox, Erdem Biyik, Jesse Zhang, Jiahui Zhang, Luke Zettlemoyer, Minyoung Hwang, Sidhant Kaushik, Stephen Tu, Yigit Korkmaz, Yu Xiang","submitted_at":"2026-03-02T17:38:58Z","abstract_excerpt":"General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer is trained with a dual objective: a frame-level progress"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Across benchmarks and real-world evaluations, Robometer learns more generalizable reward functions than prior methods and improves robot learning performance across a diverse set of downstream applications.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That inter-trajectory preference supervision from comparisons imposes reliable global ordering constraints even on ambiguous suboptimal and failure trajectories without introducing significant labeling noise or bias.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Robometer combines intra-trajectory progress supervision with inter-trajectory preference supervision on a 1M-trajectory dataset to learn more generalizable robotic reward functions than prior methods.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Robometer trains generalizable robot reward models by combining frame-level progress with inter-trajectory preferences.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"36c6fa79879b23373564907bda335e214737710d5fdd7e5cd3b15ad9e6356b41"},"source":{"id":"2603.02115","kind":"arxiv","version":2},"verdict":{"id":"5435211c-cc25-4e3f-a609-f6481331bf6b","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T17:54:56.992930Z","strongest_claim":"Across benchmarks and real-world evaluations, Robometer learns more generalizable reward functions than prior methods and improves robot learning performance across a diverse set of downstream applications.","one_line_summary":"Robometer combines intra-trajectory progress supervision with inter-trajectory preference supervision on a 1M-trajectory dataset to learn more generalizable robotic reward functions than prior methods.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That inter-trajectory preference supervision from comparisons imposes reliable global ordering constraints even on ambiguous suboptimal and failure trajectories without introducing significant labeling noise or bias.","pith_extraction_headline":"Robometer trains generalizable robot reward models by combining frame-level progress with inter-trajectory preferences."},"references":{"count":166,"sample":[{"doi":"","year":1984,"title":"The relativity of ‘absolute’ judge- ments,","work_id":"125e8af7-4b41-47fe-b40c-97c16f5cc000","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2005,"title":"Absolute identification by relative judgment","work_id":"6a57dfa1-29dc-4103-b84b-33ed6f410df4","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"The effect of relative encoding on memory-based judgments,","work_id":"4ab862b3-f356-4444-bdb2-73957f39fc6c","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Rank2reward: Learning shaped reward func- tions from passive video,","work_id":"d75e0e5c-933d-46e2-a225-d757d0043a2a","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"ReWiND: Language-guided rewards teach robot policies without new demonstrations,","work_id":"08adf09c-4638-4abb-88b9-57ed78420d1a","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":166,"snapshot_sha256":"02b45c8f8aae15f44be5a3da7868eda243441e3dda861ac302a268d1e09ef801","internal_anchors":15},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7bb0b1ff64a5794adeb2398fa16f497c4b7c1f61672ab3780346bf0ca097eb64"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2603.02115","created_at":"2026-05-17T23:39:15.890808+00:00"},{"alias_kind":"arxiv_version","alias_value":"2603.02115v2","created_at":"2026-05-17T23:39:15.890808+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2603.02115","created_at":"2026-05-17T23:39:15.890808+00:00"},{"alias_kind":"pith_short_12","alias_value":"SY66OVTCYN53","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"SY66OVTCYN53FBFZ","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"SY66OVTC","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":8,"internal_anchor_count":8,"sample":[{"citing_arxiv_id":"2605.22123","citing_title":"Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12334","citing_title":"Reinforcing VLAs in Task-Agnostic World Models","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03037","citing_title":"ARM: Advantage Reward Modeling for Long-Horizon Manipulation","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11750","citing_title":"DreamAvoid: Critical-Phase Test-Time Dreaming to Avoid Failures in VLA Policies","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12334","citing_title":"Reinforcing VLAs in Task-Agnostic World Models","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03269","citing_title":"RLDX-1 Technical Report","ref_index":72,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11751","citing_title":"Grounded World Model for Semantically Generalizable Planning","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03269","citing_title":"RLDX-1 Technical Report","ref_index":72,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU","json":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU.json","graph_json":"https://pith.science/api/pith-number/SY66OVTCYN53FBFZDHRLC5CBLU/graph.json","events_json":"https://pith.science/api/pith-number/SY66OVTCYN53FBFZDHRLC5CBLU/events.json","paper":"https://pith.science/paper/SY66OVTC"},"agent_actions":{"view_html":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU","download_json":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU.json","view_paper":"https://pith.science/paper/SY66OVTC","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2603.02115&json=true","fetch_graph":"https://pith.science/api/pith-number/SY66OVTCYN53FBFZDHRLC5CBLU/graph.json","fetch_events":"https://pith.science/api/pith-number/SY66OVTCYN53FBFZDHRLC5CBLU/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU/action/timestamp_anchor","attest_storage":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU/action/storage_attestation","attest_author":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU/action/author_attestation","sign_citation":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU/action/citation_signature","submit_replication":"https://pith.science/pith/SY66OVTCYN53FBFZDHRLC5CBLU/action/replication_record"}},"created_at":"2026-05-17T23:39:15.890808+00:00","updated_at":"2026-05-17T23:39:15.890808+00:00"}