CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations

Qun Ma; Xiangning Yu; Xiao Xue; Yuqi Hou; Yuwei Guo

arxiv: 2604.14691 · v2 · submitted 2026-04-16 · 💻 cs.AI · cs.CL· cs.CY

CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations

Xiangning Yu , Yuwei Guo , Yuqi Hou , Xiao Xue , Qun Ma This is my paper

Pith reviewed 2026-05-10 11:34 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CY

keywords causal discoveryLLM agent simulationsmicro-macro emergenceMarkov boundarycounterfactual probingagent-based modelingsocial emergence

0 comments

The pith

CAMO converts hypotheses about LLM agent interactions into computable causal graphs that trace micro behaviors to macro emergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CAMO to address the challenge of identifying causal mechanisms in LLM-powered agent simulations where emergence arises from intertwined interactions and nonlinear feedbacks. It establishes a process that turns mechanistic hypotheses into factors computable from simulation records, then constructs a compact causal model around an emergent target outcome. This yields a Markov boundary and minimal upstream subgraph that make causal chains explicit and point to specific intervention opportunities. Simulator-internal counterfactual tests further orient uncertain edges and update the model when new evidence appears. A sympathetic reader would care because clear causal representations could make agent simulations more reliable tools for studying and influencing social outcomes.

Core claim

CAMO converts mechanistic hypotheses into computable factors grounded in simulation records and learns a compact causal representation centered on an emergent target Y. CAMO outputs a computable Markov boundary and a minimal upstream explanatory subgraph, yielding interpretable causal chains and actionable intervention levers. It also uses simulator-internal counterfactual probing to orient ambiguous edges and revise hypotheses when evidence contradicts the current view.

What carries the argument

The CAMO process of hypothesis-to-factor conversion, followed by construction of a Markov boundary and minimal upstream subgraph, with simulator-internal counterfactual probing to orient edges and revise the model.

Load-bearing premise

Mechanistic hypotheses about agent behaviors can be turned into factors that are reliably computable from simulation records, and internal counterfactual tests can correctly orient causal edges while revising hypotheses as needed.

What would settle it

Apply the discovered intervention levers in new simulation runs and check whether the observed changes in the emergent target Y align with the directions and strengths predicted by the causal graph.

Figures

Figures reproduced from arXiv: 2604.14691 by Qun Ma, Xiangning Yu, Xiao Xue, Yuqi Hou, Yuwei Guo.

**Figure 1.** Figure 1: Causal representations recovered by CAMO. CAMO identifies a compact causal neighborhood around the target outcome Y that is sufficient for causal identification, and augments it with a minimal set of upstream pathways needed to explain and support intervention on micro-to-macro emergence. Although emergent patterns are frequently observed in such simulations, they provide limited insight into the causal … view at source ↗

**Figure 2.** Figure 2: Overview of CAMO. A fast–slow loop integrates textual worldviews, causal discovery, and simulatorinternal interventions to recover a minimal causal interface and micro-to-macro explanation for the target outcome. 3 Methodology 3.1 Problem Formulation and Objectives We study causal discovery for emergent outcomes in LLM-empowered agent simulations, where populations of agents interact and collectively giv… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of recovered causal structures (O2O delivery simulation; projected). Full [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Estimated (p, C) across LLMs. root-cause-analysis-style evaluation in Zheng et al. (2024); Shen et al. (2024), we run Random Walk with Restart from Y and use each candidate target node’s RWR score (stationary visit probability) as the ranking score, where higher values indicate stronger graph-mediated relevance to Y . We re [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Add/prune dynamics of CAMO (A2–A3) under different LLM backbones. We plot, per round, the A2 proposal size, [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Recovered causal structures for all methods on the O2O delivery simulation after applying the observedvariable projection. The figure provides a complete qualitative comparison complementing [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of recovered causal structures (O2O delivery simulation; without projection). [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Timeline of an intervention trial in the agent coordination setting. The figure illustrates the emergence of [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Ablation study on the O2O delivery simulation. [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Learned causal graph for agent coordination. Visualization of a representative causal graph learned by CAMO under the agent coordination emergent phenomenon. User Interaction Network Network Topology Type Persuasive Message Supply Intensity Exposure Bias Level (Homophilous Share) Initial Opinion Level Individual Opinion Level Exposure Policy Type Opinion Bimodality Index Polarization Level [PITH_FULL_IMA… view at source ↗

**Figure 11.** Figure 11: Learned causal graph for opinion polarization. Visualization of a representative causal graph learned by CAMO under the opinion polarization emergent phenomenon. User Interaction Network Network Topology Type Inflammatory Posting Rate Moderation Detection Sensitivity User Emotion Level Platform Moderation Policy Type Moderation Action Type Moderation Action Threshold Effective Connectivity Level Exposure … view at source ↗

**Figure 12.** Figure 12: Learned causal graph for inflammatory message spread. Visualization of a representative causal graph learned by CAMO under the spread of inflammatory messages emergent phenomenon. C.1.4 Successful steps and estimating pˆ and Cˆ Because Fbt is measured with finite samples, we introduce a small tolerance ϵ > 0 and call step t successful if the proxy difficulty decreases by at least ϵ: Fbt+1 ≤ Fbt − ϵ. (25) … view at source ↗

read the original abstract

LLM-empowered agent simulations are increasingly used to study social emergence, yet the micro-to-macro causal mechanisms behind macro outcomes often remain unclear. This is challenging because emergence arises from intertwined agent interactions and meso-level feedback and nonlinearity, making generative mechanisms hard to disentangle. To this end, we introduce \textbf{\textsc{CAMO}}, an automated \textbf{Ca}usal discovery framework from \textbf{M}icr\textbf{o} behaviors to \textbf{M}acr\textbf{o} Emergence in LLM agent simulations. \textsc{CAMO} converts mechanistic hypotheses into computable factors grounded in simulation records and learns a compact causal representation centered on an emergent target $Y$. \textsc{CAMO} outputs a computable Markov boundary and a minimal upstream explanatory subgraph, yielding interpretable causal chains and actionable intervention levers. It also uses simulator-internal counterfactual probing to orient ambiguous edges and revise hypotheses when evidence contradicts the current view. Experiments across four emergent settings demonstrate the promise of \textsc{CAMO}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAMO sketches an integrated pipeline for causal discovery in LLM agent simulations using hypothesis grounding and internal counterfactual probing, but the description stays too high-level to judge if the steps actually deliver reliable outputs.

read the letter

The main point is that CAMO turns mechanistic hypotheses into computable factors from simulation records, learns a compact causal representation around an emergent target, extracts a Markov boundary plus minimal upstream subgraph, and runs simulator-internal counterfactual probes to orient edges and revise hypotheses when needed. This targets the real difficulty of disentangling micro interactions and nonlinear feedbacks in multi-agent LLM setups to get interpretable causal chains and intervention levers. The combination of those pieces, especially the internal probing loop for emergence, is the new element here rather than a direct lift from standard causal discovery tools. The paper does a clear job stating why existing simulation work leaves the generative mechanisms opaque. Experiments across four settings are at least referenced, which shows an attempt to test the idea in different emergence scenarios. The soft spots are the missing details. No equations or algorithm steps appear for the factor conversion, the causal representation learning, or the probing protocol, and there are no reported metrics, baselines, or error checks from those experiments. Without that, it is impossible to check whether the factor grounding stays faithful to the records or whether the probes give unbiased orientation signals instead of just echoing the simulator's own structure. The central claims rest on those unshown pieces working as described. This is for researchers in multi-agent systems and computational social science who want causal interpretability on top of their simulations. A reader already working on emergence or causal methods for agents could pick up the high-level structure as a starting point for their own thinking. It deserves a serious referee because the gap it names is genuine and the proposed pipeline is concrete enough to review, even if the current version will need substantial added technical content and validation to stand up.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces CAMO, an agentic framework for automated causal discovery in LLM agent simulations. It converts mechanistic hypotheses into computable factors grounded in simulation records, learns a compact causal representation for an emergent target Y that yields a Markov boundary and minimal upstream explanatory subgraph, and employs simulator-internal counterfactual probing to orient ambiguous edges and revise hypotheses when contradicted by evidence. The approach is evaluated across four emergent settings to demonstrate its promise for yielding interpretable causal chains and actionable interventions.

Significance. If the implementation details hold, CAMO could meaningfully advance causal understanding of micro-to-macro emergence in agent-based LLM simulations, a domain where nonlinearity, feedback, and intertwined interactions typically obscure mechanisms. The framework's strengths include grounding in external simulation records, use of internal probing for orientation, and hypothesis revision, which together address limitations of static causal discovery methods and could provide falsifiable, intervention-oriented outputs.

major comments (2)

[§3] §3 (Framework): The central claim that mechanistic hypotheses are converted into 'computable factors' without circularity or loss of fidelity is load-bearing for all downstream outputs (Markov boundary, subgraph, probing). The manuscript must specify the exact grounding procedure, including how simulation records are mapped to factors and what validation ensures non-self-referential definitions.
[§5] §5 (Experiments): The four settings are said to demonstrate promise, but without reported quantitative metrics (e.g., edge orientation accuracy, subgraph minimality scores, or comparison against standard causal discovery baselines such as PC or NOTEARS on the same records), it is impossible to assess whether the Markov boundary and revision logic improve over alternatives or merely reproduce simulation artifacts.

minor comments (2)

[Abstract] Abstract and §1: The four emergent settings are referenced but not named or briefly characterized; adding one sentence identifying them would improve accessibility.
[Notation] Notation throughout: Symbols such as Y and the Markov boundary are introduced but should be accompanied by a short table or explicit first-use definitions to avoid ambiguity for readers outside causal inference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive referee report. We appreciate the recognition of CAMO's potential to advance causal understanding in LLM agent simulations. Below, we provide point-by-point responses to the major comments and describe the revisions we plan to implement.

read point-by-point responses

Referee: [§3] §3 (Framework): The central claim that mechanistic hypotheses are converted into 'computable factors' without circularity or loss of fidelity is load-bearing for all downstream outputs (Markov boundary, subgraph, probing). The manuscript must specify the exact grounding procedure, including how simulation records are mapped to factors and what validation ensures non-self-referential definitions.

Authors: We agree that a precise specification of the grounding procedure is essential to substantiate the framework's claims. The current manuscript outlines the high-level process but does not provide sufficient implementation details. In the revised version, we will expand Section 3 with a dedicated subsection on 'Hypothesis Grounding and Factor Computation'. This will include: (1) the formal mapping from natural language hypotheses to computable factors using simulation record schemas, (2) examples of how specific record fields (e.g., agent states, interaction logs) are used to instantiate factors, and (3) validation criteria such as consistency checks against the original hypothesis text and cross-validation with held-out simulation runs to prevent self-referential or circular definitions. We will also include pseudocode for the grounding algorithm. revision: yes
Referee: [§5] §5 (Experiments): The four settings are said to demonstrate promise, but without reported quantitative metrics (e.g., edge orientation accuracy, subgraph minimality scores, or comparison against standard causal discovery baselines such as PC or NOTEARS on the same records), it is impossible to assess whether the Markov boundary and revision logic improve over alternatives or merely reproduce simulation artifacts.

Authors: We acknowledge that the experimental section currently focuses on qualitative demonstration of interpretable causal chains across the four settings rather than quantitative benchmarks. This choice was made to highlight the framework's ability to handle complex, emergent phenomena where ground-truth causal structures are not always straightforward to define. However, we agree that quantitative evaluation would allow better assessment of performance. In the revision, we will augment §5 with quantitative metrics, including edge orientation accuracy where ground truth is available from the simulation design, measures of subgraph minimality (e.g., number of nodes/edges relative to full graph), and comparisons to baselines such as the PC algorithm and NOTEARS applied to aggregated simulation records. We will discuss any necessary adaptations for these methods to the dynamic, non-i.i.d. nature of agent simulation data. If certain metrics prove infeasible due to the lack of explicit ground truth in some settings, we will clearly state the limitations. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents CAMO as a framework that converts mechanistic hypotheses into computable factors grounded in external simulation records, learns a compact causal representation for an emergent target Y, and applies simulator-internal counterfactual probing to orient edges and revise hypotheses. No equations, self-citations, or derivation steps are shown that reduce the central outputs (Markov boundary, minimal upstream subgraph) to fitted parameters or prior self-referential definitions by construction. The claims remain dependent on the soundness of factor grounding and probing protocols rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the framework implicitly assumes hypotheses can be grounded in records and counterfactuals are reliable, but these are not formalized.

pith-pipeline@v0.9.0 · 5486 in / 1028 out tokens · 17638 ms · 2026-05-10T11:34:47.407189+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page