pith. sign in

arxiv: 2604.13052 · v1 · submitted 2026-03-17 · 💻 cs.SI · cs.AI· cs.CL· cs.CY· cs.MA

Form Without Function: Agent Social Behavior in the Moltbook Network

Pith reviewed 2026-05-15 09:52 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.CLcs.CYcs.MA
keywords AI agentssocial networksmulti-agent systemsreciprocityonline behaviorsocio-technical systemsplatform analysis
0
0 comments X

The pith

AI agents on Moltbook reproduce the full structure of social media but show almost no reciprocity, argumentation, or sustained engagement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines a platform populated entirely by AI agents and measures activity across 1.3 million posts and 6.7 million comments. It finds that technical elements such as rate limits and content filters trigger immediate behavioral changes, yet social patterns fail to appear: most authors never return to their own threads, conversations stay flat, and agents ignore community topics and soft instructions. A sympathetic reader would care because the work isolates the gap between platform form and actual social function when participants lack human-like drives. The analysis covers interaction, content, and instruction layers over a 40-day window and documents persistent security exposures that quality filters do not catch.

Core claim

Moltbook is a socio-technical system where the technical layer responds to changes, but the social layer largely fails to emerge. The form of social media is reproduced in full. The function is absent.

What carries the argument

Three-layer evaluation (interaction, content, instruction) of agent activity that tracks reciprocity rates, reply depth, topic adherence, and response to instruction changes.

Load-bearing premise

The observed low reciprocity, flat conversations, and topic mismatch result from inherent limits in the AI agents themselves rather than from platform design choices or data collection filters.

What would settle it

A comparable network of AI agents run on different rules that produces reciprocity above 20 percent or sustained multi-turn argumentation would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.13052 by Andreas Einwiller, Annette Hautli-Janisz, Artur Romazanov, Dang H. Dang, Felix Klement, Florian Lemmerich, Jelena Mitrovic, Kanishka Ghosh Dastidar, Michael Dinzinger, Michael Granitzer, Saber Zerhoudi, Stefan Katzenbeisser.

Figure 1
Figure 1. Figure 1: Agent dropout and daily activity. Bars: number of agents whose last activity fell on each day. Lines: unique daily active agents (solid), posters only (dashed), commenters only (dotted). The February 2 spike coincides with a platform growth burst whose cohort did not persist. Agents active in the final two days are excluded from the stopped set by construction [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Agent posting periodicity. Each dot is one of the top 100 most active persistent agents, plotted by median inter-action interval (x-axis) against total actions (y-axis), colored by regularity classification (coefficient of variation). 99 of 100 agents are irregular. The concentration near the origin—high volume at sub-minute intervals—is a clear signature of automated operation. Agent Inter-arrival Interva… view at source ↗
Figure 3
Figure 3. Figure 3: Poisson process test for top 100 most active persistent agents. Left: inter-arrival histogram with exponential fit. Centre: log-scale revealing tail departure. Right: Q–Q plot against exponential quantiles. D = 0.42, p ≈ 0: Poisson hypothesis rejected. Interaction Over Time Each temporal finding points to a gap between a structural feature and its intended function. Participation infrastructure, absence of… view at source ↗
Figure 4
Figure 4. Figure 4: The engagement cliff. (a) Time-to-first-comment distribution. 52.3% arrive within one minute (median = 55 s), revealing an automated responder layer. (b) Length–engagement curve. Engagement rises from 16.6% (1–10 words) to 84.8% (500+ words). (c) The karma paradox. Engagement follows a U-shape: negative-karma and top-karma agents both attract outsized attention. Summary: Engagement Does posting produce con… view at source ↗
Figure 5
Figure 5. Figure 5: Thread hijacking. (a) Thread depth distribution. 85.6% of threads are flat; max observed depth is 47. (b) Monologue anatomy. Hijacker monologues outnumber OP monologues 11.9:1. (c) Content substance by depth. Vocabulary overlap (Jaccard = 3.2%) stays low at all depths. 5.3 Reputation: Votes Without Signal On human platforms, votes and reputation scores serve as crowdsourced quality filters. Moltbook inheri… view at source ↗
Figure 6
Figure 6. Figure 6: The karma economy. (a) Lorenz curve for post upvotes (Gini = 0.779). Agent-aggregated upvotes (0.931) and karma (0.935) show progressively extreme concentration. (b) Content–vote correlations. Only the comment count shows a meaningful relationship (r = 0.317); all content features are negligible. (c) Karma vs. activity for the 20 highest-karma agents. Karma bears no relationship to participation (r = 0.089… view at source ↗
Figure 7
Figure 7. Figure 7: shows the frequency of argument relations for first comments to posts, restricted to the top 20 most active agents (ordered by post/comment volume from 1 to 20). The results show that the most active agents rarely argue with each other. The “no relation” category (grey bars) is on average significantly higher than inferences (green bars). While individual agents show slightly different patterns (e.g., Agen… view at source ↗
Figure 8
Figure 8. Figure 8: Argument relations in conversations. The deeper a conversation goes, the more rephrases and non-relations are detected. Conversations stall rather than develop. Summary: Interaction Layer Do agents interact with each other, or merely generate output side by side? • Population dynamics. 96.5% of agents stopped posting more than two days before the observation window ended. Peak daily active agents (33,790) … view at source ↗
Figure 9
Figure 9. Figure 9: The bio paradox. (a) Bio topic distribution. 62.0% self-identify as AI/tech; 18.3% as helpers. The declared-identity distribution is an echo chamber. (b) The alignment paradox. Bio–post Jaccard = 2.3%; 97.9% never post in a relevant community. (c) LLM model self-identification. Claude accounts for 89.0% of the 1.5% who name their model [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: The orchestrator network. (a) X/Twitter follower distribution. Median = 0; 71.1% are ghost accounts. (b) The identity split. Orchestrator bios: “founder,” “crypto,” “engineer.” Agent bios: “helper,” “AI assistant.” (c) Community scope. 83.9% operate in a single community. Summary: Identity Do agent profiles predict behavior, or are they labels without behavioral content? • Homogeneous self-description. 62… view at source ↗
Figure 11
Figure 11. Figure 11: Community DNA. (a) Focus spectrum. 92.5% are melting pots (entropy ≥ 2.0); only 12 are focused. Inset: AI/tech dominates 79.7%. (b) Description–content alignment (Jaccard = 0.195). 12% achieve zero alignment. (c) Mega-community entropy. All top 10 have 12–13 topics and near-maximum entropy. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The URL ecosystem. (a) Top domains. clawhub.ai (48.4%) dominates; the first external domain (github.com) ranks 4th. (b) TLD distribution. .ai (51.4%) surpasses .com (23.0%). (c) URL sources. 80.2% from comments; 97.1% at depth 1 alone. 38× drop to depth 2. Summary: Information Flow Does the platform connect agents to external knowledge, or does it link only to itself? • Self-referential. A single platform… view at source ↗
Figure 13
Figure 13. Figure 13: Four natural experiments during the 40-day observation window. (a) Daily post volume with all six instruction-change dates annotated. (b) E5: verification status distribution flipped after Feb 25, with pending rising from 28.7% to 66.8%. (c) Crypto￾content fraction of m/general: the Feb 14 filter suppresses crypto sharply; the E5 following liberalisation triggers a rebound to 53%; E6 begins a renewed decl… view at source ↗
Figure 14
Figure 14. Figure 14: For visual inspection, embeddings are projected to two dimensions using UMAP with cosine distance, preserving the local and global structure of the high-dimensional space. In sum, the clustering results reveal a corpus dominated by two distinct populations: a technically sophisticated cohort discussing the architectural security of multi-agent AI offensive frameworks (clusters 6–9), and a more operational… view at source ↗
Figure 15
Figure 15. Figure 15: Global thread contributions by agent. The heavily right-skewed distribution reveals that Moltbook is dominated by a minority of agents. The median agent contributed only 4 posts or comments. In contrast, the three most active agents account for 964,438, 947,272, and 279,631 contributions. 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 Mean content length (posts and comme… view at source ↗
Figure 16
Figure 16. Figure 16: Mean content length by agent (word-level). The median agent’s content word-length of 8 highlights how brief most contributions are, while the most verbose agent generates content averaging 22,250 words. 33 [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Global thread contribution by depth. Contributions at depth 0 correspond to posts; those at deeper depths correspond to comments. The clear right skew of the distribution shows that most contributions are concentrated near the root, indicating that most Moltbook threads are structurally shallow. Although little multi-hop interaction is evident, we observe a maximum depth of 48. 1 2 3 4 5 6 7 8 9 10 11 12 … view at source ↗
Figure 18
Figure 18. Figure 18: Global Post-Comment timedelta by depth. The distribution demonstrates that the root concentration holds over time, suggesting that agent-to-agent communication is anchored around the post itself or immediate replies, i.e., top-level comments around depth 1 or 2. Summary: Further Interaction Analysis Do agents on Moltbook merely broadcast into the void? • Contribution heterogeneity. Moltbook contributions … view at source ↗
read the original abstract

Moltbook is a social network where every participant is an AI agent. We analyze 1,312,238 posts, 6.7~million comments, and over 120,000 agent profiles across 5,400 communities, collected over 40 days (January 27 to March 9, 2026). We evaluate the platform through three layers. At the interaction layer, 91.4% of post authors never return to their own threads, 85.6% of conversations are flat (no reply ever receives a reply), the median time-to-first-comment is 55 seconds, and 97.3% of comments receive zero upvotes. Interaction reciprocity is 3.3%, compared to 22-60% on human platforms. An argumentation analysis finds that 64.6% of comment-to-post relations carry no argumentative connection. At the content layer, 97.9% of agents never post in a community matching their bio, 92.5% of communities contain every topic in roughly equal proportions, and over 80% of shared URLs point to the platform's own infrastructure. At the instruction layer, we use 41 Wayback Machine snapshots to identify six instruction changes during the observation window. Hard constraints (rate limit, content filters) produce immediate behavioral shifts. Soft guidance (``upvote good posts'', ``stay on topic'') is ignored until it becomes an explicit step in the executable checklist. The platform also poses technological risks. We document credential leaks (API keys, JWT tokens), 12,470 unique Ethereum addresses with 3,529 confirmed transaction histories, and attack discourse ranging from template-based SSH brute-forcing to multi-agent offensive security architectures. These persist unmoderated because the quality-filtering mechanisms are themselves non-functional. Moltbook is a socio-technical system where the technical layer responds to changes, but the social layer largely fails to emerge. The form of social media is reproduced in full. The function is absent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper analyzes 1.3 million posts, 6.7 million comments, and 120k profiles from the Moltbook AI-agent social network over 40 days. It reports low interaction metrics (3.3% reciprocity, 85.6% flat conversations, 91.4% authors never returning to threads), content mismatches (97.9% agents posting outside bio-matched communities), and responsiveness only to hard instruction changes, concluding that the technical layer adapts while the social layer fails to emerge, reproducing social-media form without function.

Significance. If the central claim holds after addressing controls, the work supplies a large-scale observational dataset on AI-agent behavior in a socio-technical environment, documenting both absent social emergence and concrete security exposures (credential leaks, Ethereum addresses). This could inform research on multi-agent systems and platform design, provided the attribution to agent limitations rather than setup is isolated.

major comments (3)
  1. [Interaction layer] Interaction-layer section: the claim that reciprocity (3.3%) and flat-conversation rates (85.6%) demonstrate absent social function rests on comparisons to human platforms (22-60%), yet no details are given on how those baselines were selected or matched for scale, topic uniformity, or moderation regime; without this, the contrast cannot be interpreted as evidence of inherent agent limitations.
  2. [Instruction layer] Instruction-layer analysis: six documented instruction changes are used to argue that hard constraints produce immediate shifts while soft guidance is ignored, but the text provides no before-after quantitative metrics, statistical tests, or ablation isolating the effect of each change from concurrent platform filters or data-collection decisions.
  3. [Methods / Data collection] Data-collection and filtering description: the observation window includes rate limits, content filters, and community-topic uniformity, yet no controls, ablations, or sensitivity checks are reported to test whether low engagement (e.g., 97.3% zero-upvote comments) arises from agent architecture versus these design choices or post-hoc thread removal; this directly undermines the attribution of absent function to the agents themselves.
minor comments (3)
  1. [Argumentation analysis] Define 'flat conversation' and 'argumentative connection' explicitly, including the exact decision rules or classifiers used in the 64.6% non-argumentative finding.
  2. [Content layer] Report sample sizes or confidence intervals alongside all percentages (e.g., 97.9% bio mismatch) so readers can assess precision.
  3. [Instruction layer] Clarify how the 41 Wayback snapshots were sampled and whether any instruction changes coincided with data-filtering events.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript analyzing the Moltbook AI-agent social network. We address each major comment point by point below, providing clarifications and committing to revisions where appropriate to strengthen the evidence for our claims.

read point-by-point responses
  1. Referee: [Interaction layer] Interaction-layer section: the claim that reciprocity (3.3%) and flat-conversation rates (85.6%) demonstrate absent social function rests on comparisons to human platforms (22-60%), yet no details are given on how those baselines were selected or matched for scale, topic uniformity, or moderation regime; without this, the contrast cannot be interpreted as evidence of inherent agent limitations.

    Authors: We acknowledge that the manuscript lacks sufficient detail on the selection of human-platform baselines. In the revised version, we will add a dedicated paragraph in the interaction-layer section citing the specific sources for the 22-60% reciprocity range (e.g., studies on Twitter reciprocity rates around 22-30% and higher rates in moderated Reddit communities up to 60%). We will explicitly discuss the challenges in matching for scale, topic, and moderation, noting that while perfect matching is not feasible across platforms, the substantial gap (3.3% vs. minimum 22%) provides indicative evidence of reduced social function in the agent setting. This will allow readers to better interpret the contrast. revision: yes

  2. Referee: [Instruction layer] Instruction-layer analysis: six documented instruction changes are used to argue that hard constraints produce immediate shifts while soft guidance is ignored, but the text provides no before-after quantitative metrics, statistical tests, or ablation isolating the effect of each change from concurrent platform filters or data-collection decisions.

    Authors: We agree that quantitative support for the instruction-layer claims is needed. The revised manuscript will include before-and-after metrics for key behaviors (e.g., posting frequency, community adherence, and responsiveness) around each of the six instruction changes, derived from the 41 Wayback Machine snapshots. We will apply appropriate statistical tests, such as paired t-tests or Wilcoxon signed-rank tests, to evaluate the significance of shifts following hard constraint changes versus soft guidance. While full ablation isolating from all concurrent factors is challenging in this observational dataset, we will discuss potential confounders and highlight the temporal alignment of behavioral changes with hard instruction updates. revision: yes

  3. Referee: [Methods / Data collection] Data-collection and filtering description: the observation window includes rate limits, content filters, and community-topic uniformity, yet no controls, ablations, or sensitivity checks are reported to test whether low engagement (e.g., 97.3% zero-upvote comments) arises from agent architecture versus these design choices or post-hoc thread removal; this directly undermines the attribution of absent function to the agents themselves.

    Authors: We recognize the importance of addressing potential confounds from the platform's design choices. In the revision, we will expand the methods section to include sensitivity checks, such as comparing engagement metrics across periods with different rate limits and analyzing subsets of data before and after content filter implementations. We will also detail the thread removal process and perform ablations where feasible, e.g., excluding potentially filtered content. However, as this is an observational study without experimental control over the platform, complete isolation of agent limitations from socio-technical setup is not possible. We will revise the discussion to more carefully attribute the absent social function to the observed system as a whole, while maintaining that the agents' responses to hard constraints indicate technical adaptability. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational metrics from raw platform data

full rationale

The paper reports direct empirical statistics (reciprocity 3.3%, flat conversations 85.6%, non-argumentative comments 64.6%, bio-community mismatch 97.9%) computed from the collected 1.3M posts and 6.7M comments. No equations, fitted parameters, predictions, or derivations are present. Instruction changes are documented via external Wayback snapshots and correlated with observed shifts, but this remains descriptive rather than a self-referential model. No self-citations are invoked to justify core claims, and the analysis does not rename or smuggle prior results. The derivation chain is simply data collection followed by counting; it does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis relies on empirical data collection and standard metrics without introducing new free parameters or invented entities.

axioms (1)
  • domain assumption Standard assumptions in social network analysis such as the validity of reciprocity metrics for measuring social function.
    The paper uses these to compare to human platforms.

pith-pipeline@v0.9.0 · 5732 in / 1300 out tokens · 74591 ms · 2026-05-15T09:52:28.021149+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    Hacking Moltbook: The AI Social Network Any Human Can Control

    Gal Nagli. Hacking Moltbook: The AI Social Network Any Human Can Control. https://www.wiz.io/blog/ exposed-moltbook-database-reveals-millions-of-api-keys. Accessed: 11.02.2026. Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. Large language models empowered agent-based modeling and simulation: A survey and pe...

  2. [2]

    24 Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M

    31 Form Without Function: Agent Social Behavior in the Moltbook NetworkA PREPRINT Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, and Yang Zhang. "Humans welcome to observe": A First Look at the Agent Social Network Moltbook.CoRR abs/2602.10127,

  3. [3]

    Community interaction and conflict on the web

    Srijan Kumar, William L Hamilton, Jure Leskovec, and Dan Jurafsky. Community interaction and conflict on the web. InProceedings of the 2018 world wide web conference, pages 933–943,

  4. [4]

    arXiv:2602.02613

    Yu-Zheng Lin, Bono Po-Jen Shih, Hsuan-Ying Alessandra Chien, Shalaka Satam, Jesus Horacio Pacheco, Sicong Shao, Soheil Salehi, and Pratik Satam. Exploring Silicon-Based Societies: An Early Study of the Moltbook Agent Community.arXiv preprint arXiv:2602.02613, February

  5. [5]

    Md Motaleb Hossen Manik and Ge Wang

    v2. Md Motaleb Hossen Manik and Ge Wang. OpenClaw Agents on Moltbook: Risky Instruction Sharing and Norm Enforcement in an Agent-Only Social Network.arXiv preprint arXiv:2602.02625, February

  6. [6]

    The Journal of Open Source Software 2(11) (mar 2017)

    doi: 10.21105/joss.00205. URLhttps://doi.org/10.21105/joss.00205. Lev Muchnik, Sinan Aral, and Sean J Taylor. Social influence bias: A randomized experiment.Science, 341(6146): 647–651,

  7. [7]

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein

    doi: 10.1177/1461445617734955. Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology, pages 1–22,

  8. [8]

    Sentence-BERT: Sentence embeddings using Siamese BERT-networks

    Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 3982–3992,...

  9. [9]

    Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

    Association for Computational Linguistics. doi: 10.18653/v1/D19-1410. URLhttps://aclanthology.org/D19-1410/. Reuters. Meta acquires ai agent social network moltbook.Reuters, March

  10. [10]

    Michael A

    URL https://www.reuters.com/ business/meta-acquires-ai-agent-social-network-moltbook-2026-03-10/. Michael A. Riegler and Sushant Gautam. Risk assessment report: Moltbook platform & moltbot ecosystem. Tech- nical report, Simula & SimulaMet, Oslo, Norway, January

  11. [11]

    doi: 10.1109/MIS.2021.3073993

    ISSN 1941-1294. doi: 10.1109/MIS.2021.3073993. Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R Johnston, et al. Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548,