{"paper":{"title":"\"I'm Not Mad, Just Focused'': Understanding Human Emotions in Human-Robot Collaboration","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A vision-language model for emotion recognition aligns better with human judgments than convolutional networks and produces preferred robot adaptations in collaboration tasks.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Dana Kuli\\'c, Leimin Tian, Seung Chan Hong","submitted_at":"2026-05-16T05:18:22Z","abstract_excerpt":"Human-robot collaboration (HRC) can benefit from robots' abilities to interpret human emotional states. However, current emotion recognition (ER) models in HRC often fall short, particularly due to their reliance on acted datasets and single-modality inputs like facial expressions. We propose a novel vision language model (VLM)-based ER system that leverages contextual understanding to improve emotion interpretation in HRC. We first evaluate the VLM-ER system by assessing its semantic and sentiment similarity with human annotations on an existing HRC dataset. Then, in a user study with a servi"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The proposed VLM-ER system achieves higher semantic similarity and positive sentiment alignment with human annotations compared to a baseline convolutional neural network-based system. Further, participants in the user study preferred emotion-adaptive robot behaviour facilitated by the VLM-ER system.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That modulating robot behavior according to the VLM-inferred emotional state will produce measurable improvements in collaboration quality and user preference without introducing new sources of error or bias in real-world HRC settings.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A VLM-based emotion recognition system for human-robot collaboration achieves higher semantic and sentiment alignment with human annotations than a CNN baseline and results in preferred adaptive robot behavior in a user study.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A vision-language model for emotion recognition aligns better with human judgments than convolutional networks and produces preferred robot adaptations in collaboration tasks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3890ae390c4b6742d4e66e1029840d57e0dc95b0445414118159c10deb3ea4fd"},"source":{"id":"2605.16816","kind":"arxiv","version":1},"verdict":{"id":"fc8b131f-5ca8-4421-a488-ac1a5ae22ae5","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T21:29:28.395221Z","strongest_claim":"The proposed VLM-ER system achieves higher semantic similarity and positive sentiment alignment with human annotations compared to a baseline convolutional neural network-based system. Further, participants in the user study preferred emotion-adaptive robot behaviour facilitated by the VLM-ER system.","one_line_summary":"A VLM-based emotion recognition system for human-robot collaboration achieves higher semantic and sentiment alignment with human annotations than a CNN baseline and results in preferred adaptive robot behavior in a user study.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That modulating robot behavior according to the VLM-inferred emotional state will produce measurable improvements in collaboration quality and user preference without introducing new sources of error or bias in real-world HRC settings.","pith_extraction_headline":"A vision-language model for emotion recognition aligns better with human judgments than convolutional networks and produces preferred robot adaptations in collaboration tasks."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16816/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T22:01:19.596360Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T21:40:53.156556Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T19:01:56.273168Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:26.413052Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"cb3ee33ff900910f86fddb9e6b5a25fe3ff02605a335fc568decbf251fb5783e"},"references":{"count":28,"sample":[{"doi":"","year":2024,"title":"Li Liu, Fu Guo, Zishuai Zou, and Vincent G Duffy. Application, development and future opportunities of collaborative robots (cobots) in manufacturing: A literature review.International Journal of Huma","work_id":"cde3c74e-b7a4-4989-8972-f1f49576f5bc","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Mohammed, and Jose L","work_id":"a8f3d546-d36b-4cda-9b60-36831e909948","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"A review of the emotion recognition model of robots.Applied Intelligence, 55(6):1–33, 2025","work_id":"a9567ba8-baf8-49d8-816f-630014937353","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Detection of genuine and posed facial expressions of emotion: databases and methods.Frontiers in psychology, 11:580287, 2021","work_id":"9b16b9f3-25c3-4810-aa14-c2f9272fca3b","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Facial emotion expressions in human–robot interaction: A survey.International Journal of Social Robotics, 14(7):1583–1604, 2022","work_id":"e81a4fde-1c3b-44e6-8432-91e0baba6869","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":28,"snapshot_sha256":"b716fda343e6f46cf509ff8724f41184af1054c4cd20f7cb30fc92e116da1116","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"3be6baf8db50dfbbb3ba8e9ab0aef1ed5ae061d9a89db87ffbe4dd696376faf0"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}