Title resolution pending

Generative Agents: Interactive Simulacra of Human Behavior , author= · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

Agent Island is a new multiagent game environment that functions as a dynamic benchmark resistant to saturation and contamination, with Bayesian ranking showing OpenAI GPT-5.5 as the strongest performer among 49 models across 999 games.

Latent Preference Modeling for Cross-Session Personalized Tool Calling

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

Introduces MPT benchmark and PRefine method that models user preferences as evolving hypotheses to improve personalized tool calling accuracy with 1.24% of full-history token cost.

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

cs.AI · 2025-07-01 · conditional · novelty 6.0

Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

cs.CL · 2026-04-22 · unverdicted · novelty 5.0

Cooperative profiles from behavioral economics games predict LLM team performance in AI-for-science workflows.

citing papers explorer

Showing 4 of 4 citing papers.

Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games cs.AI · 2026-05-05 · unverdicted · none · ref 9
Agent Island is a new multiagent game environment that functions as a dynamic benchmark resistant to saturation and contamination, with Bayesian ranking showing OpenAI GPT-5.5 as the strongest performer among 49 models across 999 games.
Latent Preference Modeling for Cross-Session Personalized Tool Calling cs.CL · 2026-04-20 · unverdicted · none · ref 22
Introduces MPT benchmark and PRefine method that models user preferences as evolving hypotheses to improve personalized tool calling accuracy with 1.24% of full-history token cost.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning cs.AI · 2025-07-01 · conditional · none · ref 132
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows cs.CL · 2026-04-22 · unverdicted · none · ref 7
Cooperative profiles from behavioral economics games predict LLM team performance in AI-for-science workflows.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer