pith. sign in

Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

clear filters

representative citing papers

DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

DICE formalizes multi-agent LLM coordination as discounted incomplete-information Markov games and introduces Heterogeneous Quantal Response Equilibrium (HQRE) to achieve unique stable equilibria with bounded regret, demonstrated via prompt-control and fine-tuning algorithms on eleven benchmarks.

CodeMind: Evaluating Large Language Models for Code Reasoning

cs.SE · 2024-02-15 · unverdicted · novelty 7.0

CodeMind evaluates ten LLMs on four benchmarks using three new code reasoning tasks, finding performance varies by model size and drops with complexity while showing no correlation with bug repair ability.

The CRISTAL Method: Neurosymbolic analysis from AI-synthesized world models

cs.AI · 2026-06-29 · unverdicted · novelty 5.0

CRISTAL is a neurosymbolic framework that synthesizes interpretable probabilistic world models from language priors for full Bayesian analysis and budget-aware data acquisition, claiming Bayes-optimal accuracy on synthetic equity classification with 5 examples.

citing papers explorer

Showing 8 of 8 citing papers after filters.