SHREC is a new benchmark dataset of embodied human-robot conversations that shows substantial performance gaps in state-of-the-art foundation models on tasks involving social error detection and rationale generation.
ArXiv preprint, abs/2404.11023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SAVOIR combines prospective expected utility valuation with Shapley values for fair credit assignment in social dialogue RL, achieving SOTA on SOTOPIA where a 7B model matches or exceeds GPT-4o and Claude-3.5-Sonnet.
citing papers explorer
-
Social Human Robot Embodied Conversation (SHREC) Dataset: Benchmarking Foundational Models' Social Reasoning
SHREC is a new benchmark dataset of embodied human-robot conversations that shows substantial performance gaps in state-of-the-art foundation models on tasks involving social error detection and rationale generation.
-
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
SAVOIR combines prospective expected utility valuation with Shapley values for fair credit assignment in social dialogue RL, achieving SOTA on SOTOPIA where a 7B model matches or exceeds GPT-4o and Claude-3.5-Sonnet.