pith. sign in

Spin-bench: How well do llms plan strategically and reason socially?

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

fields

cs.AI 2 cs.HC 1

years

2026 2 2025 1

roles

dataset 1

polarities

background 1

clear filters

representative citing papers

TUX: Measuring Human--AI Tacit Understanding

cs.HC · 2026-05-29 · unverdicted · novelty 6.0

Profile-conditioned LLMs achieve higher tacit alignment with humans on subjective spectra when traits match, as quantified by the new Tacit Understanding Index (TUX) from 241 humans and 200 agents.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • TUX: Measuring Human--AI Tacit Understanding cs.HC · 2026-05-29 · unverdicted · none · ref 49

    Profile-conditioned LLMs achieve higher tacit alignment with humans on subjective spectra when traits match, as quantified by the new Tacit Understanding Index (TUX) from 241 humans and 200 agents.

  • MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs cs.AI · 2026-05-28 · unverdicted · none · ref 21

    Mindgames introduces a four-game evaluation platform for multi-agent LLM reasoning, runs a 944-agent competition, surfaces rule-adherence and error-survival limitations, and releases a 29k-game dataset with an offline scoring protocol.