Title resolution pending

Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R Lyu · 2024 · arXiv 2403.11807

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

cs.AI · 2026-06-02 · unverdicted · novelty 7.0

SMAC-Talk is a new benchmark that adds natural language messaging and deceptive-agent scenarios to SMAC for testing LLM coordination in multi-agent environments.

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

cs.AI · 2025-06-04 · unverdicted · novelty 7.0

Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

cs.LG · 2024-06-15 · unverdicted · novelty 7.0

MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

OmniGameArena is a unified UE5 benchmark with 12 games and the IDC harness for cold-start scores and improvement dynamics of VLM agents.

Common-agency Games for Multi-Objective Test-Time Alignment

cs.GT · 2026-05-08 · unverdicted · novelty 6.0

CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.

Understanding the Mechanism of Altruism in Large Language Models

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse

cs.CV · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems

cs.MA · 2026-01-15

citing papers explorer

Showing 9 of 9 citing papers after filters.

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models cs.AI · 2026-06-02 · unverdicted · none · ref 21
SMAC-Talk is a new benchmark that adds natural language messaging and deceptive-agent scenarios to SMAC for testing LLM coordination in multi-agent environments.
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games cs.AI · 2025-06-04 · unverdicted · none · ref 25
Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data cs.LG · 2024-06-15 · unverdicted · none · ref 16
MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.
OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics cs.CV · 2026-06-08 · unverdicted · none · ref 46
OmniGameArena is a unified UE5 benchmark with 12 games and the IDC harness for cold-start scores and improvement dynamics of VLM agents.
Common-agency Games for Multi-Objective Test-Time Alignment cs.GT · 2026-05-08 · unverdicted · none · ref 47
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
Understanding the Mechanism of Altruism in Large Language Models econ.GN · 2026-04-21 · unverdicted · none · ref 251
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments cs.AI · 2025-06-03 · unverdicted · none · ref 29
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse cs.CV · 2026-05-11 · unverdicted · none · ref 164 · 2 links
The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 288
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer