pith. machine review for the scientific record. sign in

arxiv: 2605.12507 · v1 · submitted 2026-03-20 · 💻 cs.SI · cs.AI· cs.MA

Recognition: no theorem link

Can LLM Agents Simulate Dynamic Networks? A Case Study on Email Networks with Phishing Synthesis

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:12 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.MA
keywords LLM multi-agent systemsdynamic network simulationemail networksphishing synthesisHawkes processesnetwork topologycybersecurity modeling
0
0 comments X

The pith

LLM multi-agent systems simulate realistic email network dynamics when extended with event triggers and Hawkes processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether LLM multi-agent systems can simulate dynamic networks such as email communications. Standard setups generate believable individual messages but overlook the overall network structures that arise from many interactions. By adding data-driven event triggers to maintain ongoing conversations and Hawkes processes to model timing patterns, the simulations achieve both plausible local behaviors and accurate global topologies. This enables creating synthetic phishing campaigns that exploit real network weaknesses, which is useful for testing cybersecurity strategies.

Core claim

Integrating data-driven event triggers and Hawkes processes into LLM multi-agent simulation frameworks allows these systems to produce both plausible micro-level interactions and emergent macroscopic network topologies that match real email data, facilitating the synthesis of realistic phishing campaigns in evolving communication networks.

What carries the argument

Data-driven event triggers for sustaining long-horizon interactions and Hawkes processes for modeling temporal activation dynamics in LLM agents.

If this is right

  • Phishing threats can be modeled as they exploit specific structural features in communication networks.
  • The framework supports analysis of information propagation in dynamic settings.
  • Next-generation defenses can be developed by testing against synthesized realistic threat scenarios.
  • Simulations preserve both individual plausibility and network-level fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar extensions could apply to simulating other types of dynamic networks like social media interactions or supply chain communications.
  • Generated synthetic networks might serve as training data for machine learning models in cybersecurity without privacy risks.
  • Validating against additional real-world datasets could strengthen the case for using these simulations in policy decisions.

Load-bearing premise

Adding the data-driven event triggers and Hawkes processes to LLM multi-agent systems preserves the agents' capacity for realistic individual interactions while producing matching overall network structures.

What would settle it

Running the augmented simulation and comparing its generated network statistics, such as degree distributions or temporal burst patterns, against those from actual email datasets to check for statistical similarity.

Figures

Figures reproduced from arXiv: 2605.12507 by Hans Hao-Hsun Hsu, Kaiqing Zhang, Mufei Li, Pan Li, Siqi Miao, Yuhong Luo, Ziyang Chen.

Figure 1
Figure 1. Figure 1: Illustration of the simulation framework. LLM agents are equipped with personas and historical context to [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Effect of data grounding on simulation fidelity. Left: quantitative comparison across the individual [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Simulated message volume over time. as Reciprocity and Topology Overlap remain sys￾tematically distorted, degrading even below the RANDOM baseline [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Simulation fidelity as the simulation horizon increases for different methods on Enron. [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (Top) Varying the fraction of trigger nodes. (Bottom) Varying the historical context available to agents. The y-axis reports relative performance degrada￾tion (see Appendix A.3 for the exact definition). Lower is better. stylistically similar to real authors. On Enron, the generated content achieves great similarity, with the correct author appearing in the top five about two-thirds of the time. On IETF, p… view at source ↗
Figure 6
Figure 6. Figure 6: Keywords extracted from the generated emails. ( [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Prompts for analyzing employees’ professional personas. [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Anonymized professional persona of an employee, synthesized by an LLM based on historical email [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Prompts for LLM-based Dynamic Network Simulation. Ablation and baseline settings share similar [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Prompts for LLM recipients in the phishing synthesis study. [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Synthesized phishing emails with and without recent-context conditioning (single-node attack). [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Example of relationship-based attack. Example of Multi-Node Attack with Information Sharing Relevant Real legitimate email 1: Hendricks v. Dynegy Today, I worked with Robin Gibbs and Jeff Alexander on an Enron pleading in the captioned case. I asked if we had considered asking FERC Staff to file a "friend of the court" brief detailing how FERC has "occupied the field" of wholesale rate regulation. Was thi… view at source ↗
Figure 13
Figure 13. Figure 13: Example of multi-node attack with information sharing. [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Example of multi-node attack with collaborations. [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗
read the original abstract

While Large Language Model (LLM) multi-agent systems (MAS) offer a transformative approach to simulating human behavior in complex systems, it remains largely unexplored whether these simulations can replicate realistic structural and temporal dynamics from a dynamic network perspective. Our evaluation indicates that existing frameworks excel at generating plausible micro-level interactions but fail to capture the emergent, macroscopic topologies necessary for domains that rely on realistic network dynamics, such as modeling information propagation and cybersecurity threats. To bridge this gap, we introduce two easily integrable extensions to simulation frameworks to ensure they preserve macroscopic network fidelity: 1) augmenting LLM agents with data-driven event triggers to organically sustain long-horizon interactions, and 2) integrating Hawkes processes to accurately model temporal activation dynamics. Our approach allows LLM MAS to capture both plausible micro-level patterns and macroscopic topologies. We further demonstrate the utility of this framework in synthesizing realistic phishing campaigns within evolving communication networks. The study reveals how threats exploit structural vulnerabilities, highlighting the potential of our framework for developing next-generation defenses. Our code is available at https://github.com/Graph-COM/NSL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that existing LLM multi-agent system (MAS) frameworks generate plausible micro-level interactions (e.g., email content and recipient choice) but fail to reproduce emergent macroscopic network topologies in dynamic settings such as email networks. To remedy this, the authors introduce two extensions—data-driven event triggers to sustain long-horizon interactions and integration of Hawkes processes to model temporal activation dynamics—that can be added to existing frameworks while preserving both micro plausibility and macro fidelity. They demonstrate the resulting simulator by synthesizing phishing campaigns that exploit structural vulnerabilities in evolving communication networks and release the code at https://github.com/Graph-COM/NSL.

Significance. If the central empirical claims are substantiated, the work would offer a practical bridge between flexible LLM-driven agent behavior and statistically grounded network models, enabling more realistic synthetic dynamic networks for cybersecurity and information-propagation studies. The public code release supports reproducibility and extension by the community.

major comments (2)
  1. [Abstract] Abstract: the statement that existing frameworks 'fail to capture the emergent, macroscopic topologies' and that the two proposed extensions succeed is presented without any quantitative metrics, baselines, statistical tests, or dataset details, leaving the central evaluation claim unsupported.
  2. [Methods] Methods (Hawkes integration): the manuscript does not specify the precise coupling mechanism—whether Hawkes intensity samples are injected only as an external scheduler or are also conditioned inside the LLM prompt and agent state update. Without this detail it is impossible to verify that macro timing realism is achieved without overriding individual agent autonomy and thereby degrading micro-level semantic coherence.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'our evaluation indicates' is used without naming the email network corpus, the exact topology metrics (e.g., degree distribution, clustering coefficient, temporal burstiness), or the comparison baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below and indicate where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that existing frameworks 'fail to capture the emergent, macroscopic topologies' and that the two proposed extensions succeed is presented without any quantitative metrics, baselines, statistical tests, or dataset details, leaving the central evaluation claim unsupported.

    Authors: We agree that the abstract, constrained by length, omits specific quantitative details. The full manuscript (Section 4) reports these evaluations, including baseline comparisons to standard LLM-MAS frameworks, metrics such as modularity, clustering coefficient, temporal burstiness, and Kolmogorov-Smirnov tests on degree distributions and inter-event times, using the Enron email dataset. We will revise the abstract to concisely include the key quantitative improvements (e.g., 25-40% gains in macroscopic fidelity) and dataset reference while preserving brevity. revision: yes

  2. Referee: [Methods] Methods (Hawkes integration): the manuscript does not specify the precise coupling mechanism—whether Hawkes intensity samples are injected only as an external scheduler or are also conditioned inside the LLM prompt and agent state update. Without this detail it is impossible to verify that macro timing realism is achieved without overriding individual agent autonomy and thereby degrading micro-level semantic coherence.

    Authors: We thank the referee for highlighting this ambiguity. In the framework, Hawkes intensities function solely as an external scheduler: they sample event times from the intensity function fitted to historical data and advance the global simulation clock, triggering agent activations at those times. The sampled values are never passed into LLM prompts or used to modify agent internal states or decision logic. Agents retain full autonomy over content generation and recipient selection based on their local context. We will add a dedicated subsection with pseudocode, a coupling diagram, and explicit statements confirming this separation to ensure micro-level coherence is not compromised. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical augmentation of existing LLM multi-agent frameworks by adding data-driven event triggers and Hawkes processes to better match macroscopic network topologies while preserving micro-level interactions. No equations, parameter fits, or derivations are described that reduce the central claims to self-definitions or fitted inputs by construction. The approach is framed as integrable extensions evaluated on real email network data for phishing synthesis, with the result self-contained against external benchmarks rather than relying on load-bearing self-citations or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that LLM agents already produce plausible micro-level interactions and that the added components can be integrated without side effects on those interactions.

free parameters (1)
  • Hawkes process intensity parameters
    Parameters controlling temporal activation dynamics are expected to be calibrated to observed email data.
axioms (1)
  • domain assumption LLM agents generate plausible micro-level interactions when prompted appropriately
    Stated as the baseline performance of existing frameworks before the proposed extensions.

pith-pipeline@v0.9.0 · 5512 in / 1176 out tokens · 36890 ms · 2026-05-15T07:12:39.830141+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 1 internal anchor

  1. [1]

    Eric Bonabeau

    Modelling reciprocating relationships with hawkes processes.Advances in neural information processing systems, 25. Eric Bonabeau. 2002. Agent-based modeling: Meth- ods and techniques for simulating human systems. Proceedings of the national academy of sciences, 99(suppl_3):7280–7287. Carter T Butts. 2008. 4. a relational event framework for social action....

  2. [2]

    InEuropean conference on information retrieval, pages 684–691

    A text feature based automatic keyword ex- traction method for single documents. InEuropean conference on information retrieval, pages 684–691. Springer. Serina Chang, Alicja Chaszczewicz, Emma Wang, Maya Josifovska, Emma Pierson, and Jure Leskovec

  3. [3]

    InPro- ceedings of the International AAAI Conference on Web and Social Media, volume 19, pages 341–371

    Llms generate structurally realistic social net- works but overestimate political homophily. InPro- ceedings of the International AAAI Conference on Web and Social Media, volume 19, pages 341–371. Tony Kiplagat Cheptoo. 2024. cybersectony/phishing- email-detection-distilbert_v2.4.1. 9 Ayush Chopra, Alexander Rodríguez, Jayakumar Sub- ramanian, Arnau Quera...

  4. [4]

    Personal llm agents: Insights and survey about the capability, efficiency and security

    Graphs over time: densification laws, shrink- ing diameters and possible explanations. InPro- ceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177–187. Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. Camel: Communicative agents for" mind" exploration of large lang...

  5. [5]

    InProceedings of the tenth ACM international conference on web search and data mining, pages 601–610

    Motifs in temporal networks. InProceedings of the tenth ACM international conference on web search and data mining, pages 601–610. Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao Schardl, and Charles Leiserson. 2020. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. InProceeding...

  6. [6]

    LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

    Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109. Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, and 1 others. 2025. Agentso- ciety: Large-scale simulation of llm-driven genera- tive agents advances understanding of human behav- iors and society.arXiv pr...

  7. [7]

    Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang

    Temporal graph networks for deep learning on dynamic graphs.arXiv preprint arXiv:2006.10637. Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. 2020. Dysat: Deep neural represen- tation learning on dynamic graphs via self-attention networks. InProceedings of the 13th international conference on web search and data mining, pages 519–527. Thoma...

  8. [8]

    InACM CO- DASPY

    Diverse datasets and a customizable bench- marking framework for phishing. InACM CO- DASPY. Victor Zeng and Rakesh M. Verma. 2020. Phishbench 2.0: A versatile and extendable benchmarking frame- work for phishing. InACM CODASPY (Demo). Xinnong Zhang, Jiayu Lin, Xinyi Mou, Shiyue Yang, Xiawei Liu, Libo Sun, Hanjia Lyu, Yihang Yang, Weihong Qi, Yue Chen, and...

  9. [9]

    Yanping Zheng, Lu Yi, and Zhewei Wei

    Dynamic text bundling supervision for zero- shot inference on text-attributed graphs.arXiv preprint arXiv:2505.17599. Yanping Zheng, Lu Yi, and Zhewei Wei. 2025. A sur- vey of dynamic graph neural networks.Frontiers of Computer Science, 19(6):196323. 12 Outline of Appendix A Dataset Construction and Evaluation De- tails 13 A.1 Email Corpora and Preprocess...

  10. [10]

    We then compare motif distributions between sim- ulation and ground truth using JSD

    Broadcast then Cross-link: a→b , a→c , followed by eitherb→corc→b. We then compare motif distributions between sim- ulation and ground truth using JSD. DegDist (Degree distribution).For each day, let dv denote the degree of node v in the daily aggregated network. We compute the empirical dis- tribution of {dv :v∈V} for both simulation and ground truth, an...

  11. [11]

    on the historical event timestamps available in the history window. During simulation, we mimic the trigger-node setting used in HPG by re- vealing the same designated trigger events at their ground-truth timestamps and using them to initial- ize or refresh the Hawkes intensities. B.5.2 Dynamic GNNs We adapt dynamic GNN baselines by evaluating each model ...

  12. [12]

    Smith Street

    investigate end-to-end phishing automation with LLMs, capable of generating full phishing kits (site cloning, credential capture, deployment) au- tomatically. Furthermore, as agent-based systems are increasingly deployed as autonomous assistants with access to users’ emails, calendars, and contact histories (Li et al., 2024; Woodward, 2026; Vargas, 2026),...

  13. [13]

    • List any core duties, responsibilities, or specific tasks they are known to handle

    Role & Responsibilities: • Summarize the employee’s job title or primary role within the organization. • List any core duties, responsibilities, or specific tasks they are known to handle

  14. [14]

    • Note any specialized jargon, technical skills, or industry-specific knowledge

    Topics & Domains of Expertise: • Identify main areas of knowledge, expertise, or recurring subject matter they discuss. • Note any specialized jargon, technical skills, or industry-specific knowledge

  15. [15]

    • What are the patterns of initiating or responding to emails (e.g., rarely initiates, often loops in certain teams, etc.)?

    Interaction Patterns: • Describe how this employee typically communicates with others (tone, style, formality, etc.). • What are the patterns of initiating or responding to emails (e.g., rarely initiates, often loops in certain teams, etc.)?

  16. [16]

    • Identify if they approve/deny requests, give instructions, or escalate issues to higher management

    Decision-Making & Authority Level: • Indicate whether the employee has decision-making power in certain areas. • Identify if they approve/deny requests, give instructions, or escalate issues to higher management

  17. [17]

    • Note whether they provide detailed answers, short confirmations, or request clarifications frequently

    Response Behavior: • Observe how promptly they respond. • Note whether they provide detailed answers, short confirmations, or request clarifications frequently

  18. [18]

    • Mention any cross-departmental communication patterns or recurring collaborations

    Key Partnerships & Collaboration: • Identify who they regularly interact with. • Mention any cross-departmental communication patterns or recurring collaborations

  19. [19]

    Notable Projects & Achievements: • Highlight major projects, accomplishments, and milestones referenced in their emails

  20. [20]

    Communication Nuances: • Note any unique habits or quirks, such as consistent sign-offs, emoticon usage, or certain phrases

  21. [21]

    • Identify any concerns or pain points that frequently appear

    Potential Motivations or Concerns: • Mention if they appear driven by specific goals (e.g., career growth, innovation, competition, etc.). • Identify any concerns or pain points that frequently appear

  22. [22]

    "" {email_history}

    Overall Summary: • Provide an at-a-glance overview of the employee’s professional profile. • Include any additional points that would be relevant to simulating their email style or decision-making. # Email History to Analyze """ {email_history} """ # Output Format & Style • Present your findings clearly, using the headings outlined above. • Focus on accur...

  23. [23]

    Your persona and role

  24. [24]

    mark as phishing

    Your previous communication with the sender. Decision policy: - Respond to the received email while taking particular care to avoid falling for phishing attempts. - Select exactly one action from: "mark as phishing", "ignore", "download attachment", or "click the link". - If the received email is phishing, choosing "download attachment" or "click the link...