{"paper":{"title":"Trajectory-Level Data Augmentation for Offline Reinforcement Learning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A trajectory-level augmentation technique lets offline reinforcement learning succeed from limited suboptimal trajectories by using geometric relationships between rewards, value functions, and logging policies.","cross_cats":["cs.RO","stat.ML"],"primary_cat":"cs.LG","authors_text":"Matthias Burkhardt, Tobias Schm\\\"ahling, Tobias Windisch","submitted_at":"2026-05-13T11:57:17Z","abstract_excerpt":"We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies, enabling training of off-policy models from a limited number of suboptimal trajectories.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a usable geometric relationship between rewards, value functions, and logging policies exists and can be reliably exploited for augmentation without introducing bias that harms downstream policy performance.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A trajectory-level augmentation technique lets offline reinforcement learning succeed from limited suboptimal trajectories by using geometric relationships between rewards, value functions, and logging policies.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"86f98fd75c500002557da8afeb454ab3602e9d859390effe10f5a686c8175065"},"source":{"id":"2605.13401","kind":"arxiv","version":1},"verdict":{"id":"4dbc8cea-8d96-49d2-9a83-32f009772d4c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:03:28.860197Z","strongest_claim":"We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies, enabling training of off-policy models from a limited number of suboptimal trajectories.","one_line_summary":"Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a usable geometric relationship between rewards, value functions, and logging policies exists and can be reliably exploited for augmentation without introducing bias that harms downstream policy performance.","pith_extraction_headline":"A trajectory-level augmentation technique lets offline reinforcement learning succeed from limited suboptimal trajectories by using geometric relationships between rewards, value functions, and logging policies."},"references":{"count":40,"sample":[{"doi":"10.1016/j.optcom.2020.126685","year":2021,"title":"Alignment of decam-like large survey telescope for real-time active optics and error analysis","work_id":"1aebf031-6b68-40ef-8847-8a275e138a33","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Hindsight experience replay","work_id":"f6acfa8a-070d-4d5b-85c1-b4ae0a4fc6e4","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"J., Smith, L., Kostrikov, I., and Levine, S","work_id":"71da1712-318a-4407-a8d3-63416fed75f6","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1117/12.2041754","year":2014,"title":"Automated assembly of camera modules using active alignment with up to six degrees of freedom","work_id":"1d005457-34e0-43dc-b95f-9b7d8fb2ecc5","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Active alignments of lens systems with reinforcement learning, 2025","work_id":"dce31f56-fb06-4ebe-a950-7c2dfb9ea2c2","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":40,"snapshot_sha256":"6c2b707068b4cf75de7fe217be9481ae753229ae8da0073eff2c14e19937eb03","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}