Multi-speaker attention alignment for multimodal social interaction

Liangyang Ouyang, Yifei Huang, Mingfang Zhang, Caixin Kang, Ryosuke Furuta, Yoichi Sato · 2025 · arXiv 2511.17952

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Introduces the Grounded Personality Reasoning task and MM-OCEAN dataset to show that MLLMs frequently produce correct Big Five personality ratings without grounding them in observable video evidence.

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

SocialDirector uses spatiotemporal actor masking and directional reweighting on cross-attention maps to reduce actor-action mismatches and improve target-directed interactions in generated multi-person videos.

citing papers explorer

Showing 3 of 3 citing papers.

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? cs.AI · 2026-05-21 · unverdicted · none · ref 53
Introduces the Grounded Personality Reasoning task and MM-OCEAN dataset to show that MLLMs frequently produce correct Big Five personality ratings without grounding them in observable video evidence.
GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions cs.CV · 2026-05-15 · unverdicted · none · ref 64
GRASP is a large-scale dataset and benchmark for social reasoning grounded in gaze and gesture events in multi-person videos, with Social Grounding Reward (SGR) proposed to improve model performance on GRASP-Bench.
SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation cs.CV · 2026-05-11 · unverdicted · none · ref 48
SocialDirector uses spatiotemporal actor masking and directional reweighting on cross-attention maps to reduce actor-action mismatches and improve target-directed interactions in generated multi-person videos.

Multi-speaker attention alignment for multimodal social interaction

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer