OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization
read the original abstract
Socially intelligent AI systems must entail reasoning across diverse human behavioral tasks, and generalization to new contexts. However, AI has yet to achieve this level of social intelligence. Existing models remain fundamentally constrained by the imbalanced learning dynamics induced by training on behavioral data. Namely, behavioral data is inherently heterogeneous, comprising diverse modalities and prediction targets that often produce uneven training signals across samples. To address this, we develop Omnisapiens-7B 2.0, a foundation model for social behavior processing that explicitly addresses learning from heterogeneous behavioral data. This is enabled through Heterogeneity-Aware Relative Policy Optimization, a novel reasoning RL method that explicitly rebalances learning signals across samples. The core insight is to approximate contribution signals to the policy update, using them to inform geometrically centered and intertially smoothed advantage modulation. Results demonstrate that Omnisapiens-7B 2.0 achieves the best and most consistent performance across 10 diverse behavioral tasks, while also attaining the best performance on all five held-out zero-shot generalization benchmarks, with gains of up to +12.02% and +9.37% respectively. Furthermore, Omnisapiens-7B 2.0 demonstrates more consistent and interpretable reasoning traces, supporting reliable real-world behavioral applications. Our model and codes can be found at https://github.com/MIT-MI/human_behavior_atlas.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.