{"paper":{"title":"QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Systems","license":"http://creativecommons.org/licenses/by/4.0/","headline":"QuickLAP fuses language feedback as probabilistic observations with physical corrections to infer robot reward functions in real time.","cross_cats":["cs.RO"],"primary_cat":"cs.AI","authors_text":"Andreea Bobu, David Lee, Jordan Abi Nader, Nathaniel Dennler","submitted_at":"2025-11-22T00:45:33Z","abstract_excerpt":"Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpre"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines in a semi-autonomous driving simulator, with a 15-participant user study showing significantly higher understandability, collaboration, and preference for the learned behavior.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That large language models can reliably extract accurate reward feature attention masks and preference shifts from free-form user utterances without introducing systematic bias or hallucination that would degrade the Bayesian fusion.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"QuickLAP fuses language and physical feedback in a Bayesian update to learn reward functions in real time for semi-autonomous systems, reducing error by over 70% versus physical-only and heuristic baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"QuickLAP fuses language feedback as probabilistic observations with physical corrections to infer robot reward functions in real time.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"996802faa337dda7bea9194d4f6f1f75c7ec695618099cb0f70a5b2ff2b59122"},"source":{"id":"2511.17855","kind":"arxiv","version":2},"verdict":{"id":"f14c2d78-4f67-47e0-83ff-308866ad9117","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T06:46:57.341389Z","strongest_claim":"QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines in a semi-autonomous driving simulator, with a 15-participant user study showing significantly higher understandability, collaboration, and preference for the learned behavior.","one_line_summary":"QuickLAP fuses language and physical feedback in a Bayesian update to learn reward functions in real time for semi-autonomous systems, reducing error by over 70% versus physical-only and heuristic baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That large language models can reliably extract accurate reward feature attention masks and preference shifts from free-form user utterances without introducing systematic bias or hallucination that would degrade the Bayesian fusion.","pith_extraction_headline":"QuickLAP fuses language feedback as probabilistic observations with physical corrections to infer robot reward functions in real time."},"references":{"count":70,"sample":[{"doi":"10.1145/1015330.1015430","year":2004,"title":"In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML)","work_id":"ec674945-89da-4bc1-99ff-754529936f07","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review.Journal of Human-Robot Interaction6, 1 (2017), 25–63","work_id":"d4ce08e8-e337-4843-8274-c4887eb1ef2f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1145/3171221.3171267","year":2018,"title":"Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2018. Learning from Physical Human Corrections, One Feature at a Time. InPro- ceedings of the 2018 ACM/IEEE International Confere","work_id":"c44d6fe6-6eac-48ae-b2f6-b6dcf02269fc","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Andrea Bajcsy, Dylan P. Losey, Marcia K. O’Malley, and Anca D. Dragan. 2017. Learning Robot Objectives from Physical Human Interaction. InProceedings of the 1st Annual Conference on Robot Learning (Pr","work_id":"07d7f2b3-2f6a-43d8-8cc4-0254a816b0aa","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2007,"title":"Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. 2007. Goal inference as inverse planning. InProceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29","work_id":"def66dda-2650-4e11-8504-3fd1a372237d","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":70,"snapshot_sha256":"e41889c37836b58e8f8e40fc8497f326ee82db6e1bfaac1520a0b601d5a7e801","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b7b67fa83e3f14d5849b551c720209c6107b1f0ec4394bf3184f56717422d8b3"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}