Introduces OmniBehavior benchmark from real-world data and shows LLMs exhibit hyper-activity, persona homogenization, and utopian bias in behavior simulation.
Some methods of classification and analysis of multivariate observations
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
ECC calibrates semantic embeddings with posterior model comparisons and Bradley-Terry capability profiles to create flexible, mixed-membership query clusters that improve LLM capability ranking.
MindLoom synthesizes frontier-level reasoning data by decomposing solutions into thought mode chains, training a retrieval model for mode selection, composing new problems with distribution-aligned sampling, and applying rollout-based difficulty labeling for fine-tuning.
ConformaDecompose decomposes conformal prediction uncertainty by progressively localizing calibration sets, revealing reducible epistemic components that align with model limitations across tasks.
citing papers explorer
-
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
Introduces OmniBehavior benchmark from real-world data and shows LLMs exhibit hyper-activity, persona homogenization, and utopian bias in behavior simulation.
-
Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
ECC calibrates semantic embeddings with posterior model comparisons and Bradley-Terry capability profiles to create flexible, mixed-membership query clusters that improve LLM capability ranking.
-
MindLoom: Composing Thought Modes for Frontier-Level Reasoning Data Synthesis
MindLoom synthesizes frontier-level reasoning data by decomposing solutions into thought mode chains, training a retrieval model for mode selection, composing new problems with distribution-aligned sampling, and applying rollout-based difficulty labeling for fine-tuning.
-
ConformaDecompose: Explaining Uncertainty via Calibration Localization
ConformaDecompose decomposes conformal prediction uncertainty by progressively localizing calibration sets, revealing reducible epistemic components that align with model limitations across tasks.