Indications of belief-guided agency and meta-cognitive monitoring in large language models.arXiv preprint arXiv:2602.02467, 2026

Noam Steinmetz Yalon, Ariel Goldstein, Liad Mudrik, Mor Geva · 2026 · arXiv 2602.02467

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

Introduces BeliefTrack benchmark diagnosing three CBM failures in LLMs and shows RL with belief-state rewards cuts failure rates by 70.9% while representation steering cuts them by 46.1%.

Can LLMs Introspect? A Reality Check

cs.AI · 2026-05-25 · conditional · novelty 6.0

Re-examination of two LLM introspection paradigms with new controls shows models lack privileged access to internal states, performing equivalently with input-only classifiers or near chance on relabeled tasks.

Emergent Language as an Approach to Conscious AI

cs.CL · 2026-06-04 · unverdicted · novelty 4.0

Agents in a minimal multi-agent RL setup develop self-referential communication and an echo-mismatch detection circuit that emerges from environmental affordances rather than task structure or architecture.

citing papers explorer

Showing 3 of 3 citing papers after filters.

When Should Models Change Their Minds? Contextual Belief Management in Large Language Models cs.AI · 2026-05-28 · unverdicted · none · ref 45
Introduces BeliefTrack benchmark diagnosing three CBM failures in LLMs and shows RL with belief-state rewards cuts failure rates by 70.9% while representation steering cuts them by 46.1%.
Can LLMs Introspect? A Reality Check cs.AI · 2026-05-25 · conditional · none · ref 3
Re-examination of two LLM introspection paradigms with new controls shows models lack privileged access to internal states, performing equivalently with input-only classifiers or near chance on relabeled tasks.
Emergent Language as an Approach to Conscious AI cs.CL · 2026-06-04 · unverdicted · none · ref 96
Agents in a minimal multi-agent RL setup develop self-referential communication and an echo-mismatch detection circuit that emerges from environmental affordances rather than task structure or architecture.

Indications of belief-guided agency and meta-cognitive monitoring in large language models.arXiv preprint arXiv:2602.02467, 2026

fields

years

verdicts

representative citing papers

citing papers explorer