{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:SRJ44VVRY67BJ7Y7SLBCJ6QEV2","short_pith_number":"pith:SRJ44VVR","schema_version":"1.0","canonical_sha256":"9453ce56b1c7be14ff1f92c224fa04aeab080d6b66123f71a0865fa06c7174e2","source":{"kind":"arxiv","id":"2605.14350","version":1},"attestation_state":"computed","paper":{"title":"Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Adaptive sampling of hard tasks via a minimax objective improves worst-case performance in multi-task reinforcement learning.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Josiah P. Hanna, Nicholas E. Corrado, Wenyuan Huang","submitted_at":"2026-05-14T04:22:24Z","abstract_excerpt":"Multi-task reinforcement learning (MTRL) aims to train a single agent to efficiently optimize performance across multiple tasks simultaneously. However, jointly optimizing all tasks often yields imbalanced learning: agents quickly solve easy tasks but learn slowly on harder ones. While prior work primarily attributes this imbalance to conflicting task gradients and proposes gradient manipulation or specialized architectures to address it, we instead focus on a distinct and under-explored challenge: imbalanced data allocation. Standard MTRL allocates an equal number of environment interactions "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2605.14350","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2026-05-14T04:22:24Z","cross_cats_sorted":[],"title_canon_sha256":"a274d0337ed4a04fde459cca36a3ade46fc83684b67668132cd55949409bc954","abstract_canon_sha256":"f68d578e90315e981806d3165f42c93a3865aa17434c1e32d1ab1dd74e332aef"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:08.078550Z","signature_b64":"cWa2BD5nGwCzo3Re723KwrG9ZaDRaKgb0nNJSzlZtw8AeKdUwl+mRh7a8ZWjObuedFKVmRLWKAHGoDV9cHd4BA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"9453ce56b1c7be14ff1f92c224fa04aeab080d6b66123f71a0865fa06c7174e2","last_reissued_at":"2026-05-17T23:39:08.077945Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:08.077945Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Adaptive sampling of hard tasks via a minimax objective improves worst-case performance in multi-task reinforcement learning.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Josiah P. Hanna, Nicholas E. Corrado, Wenyuan Huang","submitted_at":"2026-05-14T04:22:24Z","abstract_excerpt":"Multi-task reinforcement learning (MTRL) aims to train a single agent to efficiently optimize performance across multiple tasks simultaneously. However, jointly optimizing all tasks often yields imbalanced learning: agents quickly solve easy tasks but learn slowly on harder ones. While prior work primarily attributes this imbalance to conflicting task gradients and proposes gradient manipulation or specialized architectures to address it, we instead focus on a distinct and under-explored challenge: imbalanced data allocation. Standard MTRL allocates an equal number of environment interactions "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"In benchmarks like MetaWorld-MT10 and MT50, DRATS improves data efficiency and increases worst-task performance compared to existing task sampling algorithms.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That adaptively sampling tasks furthest from being solved via the derived minimax objective will consistently lead to improved overall learning without causing instability or requiring additional assumptions about task difficulty.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Adaptive sampling of hard tasks via a minimax objective improves worst-case performance in multi-task reinforcement learning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"da541e7d9ebc1069b8e118c41330786fe20bfaffd5a661f236ff05831262e9a7"},"source":{"id":"2605.14350","kind":"arxiv","version":1},"verdict":{"id":"d53c1e52-cd1d-42d0-812f-14387066d11a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T03:01:28.806870Z","strongest_claim":"In benchmarks like MetaWorld-MT10 and MT50, DRATS improves data efficiency and increases worst-task performance compared to existing task sampling algorithms.","one_line_summary":"DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That adaptively sampling tasks furthest from being solved via the derived minimax objective will consistently lead to improved overall learning without causing instability or requiring additional assumptions about task difficulty.","pith_extraction_headline":"Adaptive sampling of hard tasks via a minimax objective improves worst-case performance in multi-task reinforcement learning."},"references":{"count":299,"sample":[{"doi":"","year":null,"title":"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=","work_id":"46bb83ad-97ac-47d0-8391-e979332d2cfd","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in Neural Information Processing Systems , volume=","work_id":"df139fe8-8a4a-4e57-b33b-cdc368a14ec9","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"arXiv preprint arXiv:2408.14037 , year=","work_id":"20771a43-2745-4d95-bfc6-da38aeea3731","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Under review , volume=","work_id":"be2838ff-703d-4db4-9ae4-1c38414d8cba","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1911,"title":"Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization","work_id":"b9385d0d-bafd-43d3-8948-4d2da8ee27a0","ref_index":5,"cited_arxiv_id":"1911.08731","is_internal_anchor":true}],"resolved_work":299,"snapshot_sha256":"ab232e0d5ebde4f5df82afd31e6c6a1b80a5940d6dceaa7154506863afed367c","internal_anchors":37},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1a837db42be13b7e6054f668ea9bcd5a8fff5a2f485aa346c41ddf99bd000c86"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14350","created_at":"2026-05-17T23:39:08.078036+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14350v1","created_at":"2026-05-17T23:39:08.078036+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14350","created_at":"2026-05-17T23:39:08.078036+00:00"},{"alias_kind":"pith_short_12","alias_value":"SRJ44VVRY67B","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"SRJ44VVRY67BJ7Y7","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"SRJ44VVR","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2","json":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2.json","graph_json":"https://pith.science/api/pith-number/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/graph.json","events_json":"https://pith.science/api/pith-number/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/events.json","paper":"https://pith.science/paper/SRJ44VVR"},"agent_actions":{"view_html":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2","download_json":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2.json","view_paper":"https://pith.science/paper/SRJ44VVR","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14350&json=true","fetch_graph":"https://pith.science/api/pith-number/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/graph.json","fetch_events":"https://pith.science/api/pith-number/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/action/timestamp_anchor","attest_storage":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/action/storage_attestation","attest_author":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/action/author_attestation","sign_citation":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/action/citation_signature","submit_replication":"https://pith.science/pith/SRJ44VVRY67BJ7Y7SLBCJ6QEV2/action/replication_record"}},"created_at":"2026-05-17T23:39:08.078036+00:00","updated_at":"2026-05-17T23:39:08.078036+00:00"}