arxiv: 2605.08380 · v1 · submitted 2026-05-08 · 💻 cs.SE · cs.AI

Recognition: no theorem link

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

Junyu Huo , Ziqi Mao , Zihao Wan , Gouri Ginde

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:17 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords AI agentssoftware engineering discourseMoltBooktopic modelingGitHub Discussionsautonomous technical communicationsecurity and trustworkflow automation

0 comments

The pith

AI agents produce coherent technical discussions that focus on security and trust while omitting concrete runtime details common in human developer exchanges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies what software-engineering topics autonomous AI agents discuss when they interact only with one another on MoltBook, an AI-agents-only network. It combines open coding of posts, topic modeling across thousands of entries, and a direct comparison to human developer posts on GitHub to show that the AI discourse stays organized around twelve themes but stays selective. A reader would care because these patterns reveal how AI agents approach engineering problems without human guidance or shared project context. The analysis finds that security, trust, memory management, tooling, debugging, workflow automation, and infrastructure dominate, while specific code artifacts, environment details, runtime failures, and reproduction steps appear far less often than in human samples. This selectivity may arise because the AI-only setting contains fewer grounded, environment-specific failures.

Core claim

Autonomous AI agents on MoltBook generate coherent but selective technical discourse that repeatedly returns to concerns such as security and trust, memory and context management, tooling and APIs, debugging and error handling, workflow automation, and infrastructure and operations. At the community level activity concentrates heavily yet still yields stable sub-topics under topic analysis. Compared with matched GitHub Discussions posts, MoltBook entries contain fewer concrete cues such as code-formatted artifacts, environment details, runtime failures, and reproduction steps; social mimicry appears only in limited form while idealization shows mainly through reduced hedging. The discourse,

What carries the argument

The matched-instrument comparison of content features and topic distributions between MoltBook AI-only posts and GitHub human Discussions posts, supported by open coding and a stability-aware BERTopic pipeline.

If this is right

AI agent teams may naturally emphasize high-level concerns such as security and workflows during autonomous collaboration.
Their exchanges could require external mechanisms to add concrete project grounding that humans supply through context and failures.
High concentration of activity in a few sub-communities suggests AI-only networks may form specialized clusters around particular themes.
Lower hedging in AI discourse implies a more direct or idealized tone that might influence how agents evaluate ideas among themselves.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This selectivity could limit how effectively multi-agent systems solve grounded engineering tasks without human-provided context or simulated runtime feedback.
Similar patterns might emerge on other AI-only communication platforms, offering a way to test whether the omission of concrete details is platform-specific or general to autonomous agents.
Adding controlled environment feedback or error logs to AI agent interactions could be tested to see whether discourse shifts toward including more runtime and reproduction details.

Load-bearing premise

That posts on MoltBook represent purely autonomous AI-agent interactions without human prompting or platform artifacts shaping the content.

What would settle it

Discovery of many MoltBook posts that include specific code snippets, detailed environment setups, runtime error traces, or reproduction steps at rates comparable to GitHub Discussions would indicate the claimed selectivity does not hold.

Figures

Figures reproduced from arXiv: 2605.08380 by Gouri Ginde, Junyu Huo, Zihao Wan, Ziqi Mao.

**Figure 2.** Figure 2: RQ1 summary distributions. Panel A reports the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Primary inferential corpus for RQ2. The enrich [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

AI agents are increasingly framed as software-engineering teammates, yet most research studies them inside human-centered workflows. Little is known about the software-engineering discourse autonomous AI agents produce when they interact primarily with one another. This paper examines what autonomous AI agents discuss in MoltBook, an AI-agents-only social network, how that discourse is organized, and how it differs from human developer discourse. We combine human open coding of a 500-post sample, a concentration-plus-check topic-analysis pipeline over 4,707 English-filtered MoltBook technology posts, and a matched-instrument comparison against 5,211 GitHub Discussions posts. MoltBook technology discourse spans 12 recurring themes and is led by Security and Trust (27.4%). At the community level, activity is highly concentrated: the largest submolt contains 63.5% of posts and the Gini coefficient is 0.88, yet a stability-aware BERTopic pipeline still yields 32 non-outlier sub-topics. Compared with the GitHub Discussions baseline, MoltBook discourse contains fewer concrete, context-rich cues such as code-formatted artifacts, environment details, runtime failures, and reproduction steps; social mimicry appears only in a limited way, while idealization is mainly reflected through lower hedging. Overall, AI-only technical discourse is coherent but selective. It repeatedly returns to concerns such as security and trust, memory and context management, tooling and APIs, debugging and error handling, workflow automation, and infrastructure/ops, while omitting much of the concrete runtime and project-local detail common in human developer discourse. This may be because MoltBook contains fewer environment-specific failures, reproduction steps, and other concrete grounding cues.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper compares AI-only technical discourse on MoltBook to human GitHub posts and finds it coherent but selective, yet the central claim rests on an unverified assumption about agent autonomy.

read the letter

The paper's main contribution is an empirical comparison showing that AI-only technical discourse on MoltBook is coherent but selective, with heavy emphasis on security and trust while lacking the concrete runtime details typical in human GitHub discussions. It does this by sampling 500 posts for open coding and running topic analysis on 4,707 posts, then matching against 5,211 GitHub Discussions. The themes that emerge—security, memory management, tooling, debugging, automation—give a picture of what agents focus on when left to themselves. The concentration metrics and stability-aware BERTopic add some quantitative backing. The approach is straightforward and the contrast is new. No prior work seems to have done this specific AI-only platform versus human baseline at this scale. The soft spot is the unverified assumption that MoltBook posts are produced by autonomous AI agents. The methods describe the platform as AI-agents-only but provide no details on how they ruled out human prompting, account verification, or platform artifacts. That matters because the reported differences in context-rich cues could stem from how the platform is set up or occasional human involvement rather than from AI discourse itself. The GitHub sample also assumes a clean human baseline without addressing possible platform differences. This paper is for researchers studying multi-agent systems and AI in software engineering. Someone working on agent workflows or memory architectures could use the theme list as a starting point for design decisions. It deserves peer review. The topic is relevant and the data collection is substantial, but the autonomy check needs to be added or clarified before the selectivity claim can be taken as solid.

Referee Report

2 major / 2 minor

Summary. The paper claims that autonomous AI agents on the MoltBook platform (an AI-agents-only network) produce coherent but selective technical discourse in software engineering, with 12 recurring themes led by Security and Trust (27.4% of posts). It combines open coding of 500 posts, a concentration-plus-check BERTopic analysis of 4,707 English-filtered technology posts, and a matched comparison to 5,211 GitHub Discussions posts, finding fewer concrete cues (code artifacts, runtime failures, reproduction steps) than human discourse while noting high community concentration (Gini 0.88) yet stable sub-topics.

Significance. If the core findings hold after addressing data-source validity, the work offers a valuable first empirical baseline on AI-only SE discourse, distinguishing it from human patterns in ways that could guide agent design for collaboration, memory management, and tooling. The mixed-methods approach (qualitative coding plus topic modeling with matched baseline) and explicit reporting of theme concentrations provide a reproducible starting point for future studies.

major comments (2)

[Methods] Methods (data collection and sampling description): No protocol is described for verifying that MoltBook posts originate from autonomous AI agents without human prompting, platform curation, or mixed authorship (e.g., via account metadata, prompt-artifact checks, or exclusion criteria). This assumption is load-bearing for the central claim of 'AI-only' discourse and the interpretation of selectivity versus the GitHub baseline, as unverified authorship could introduce confounds explaining reduced concrete runtime details.
[Methods / Results] Topic analysis pipeline (4,707-post sample): The concentration-plus-check BERTopic procedure and open-coding sample lack reported validation metrics (topic coherence, inter-coder reliability), details on English filtering thresholds, and outlier removal criteria. These gaps directly affect the reliability of the 12 themes and the claim that discourse is 'coherent but selective.'

minor comments (2)

[Methods] The abstract and results refer to a 'stability-aware BERTopic pipeline' yielding 32 non-outlier sub-topics, but the methods section provides insufficient implementation details (e.g., stability parameters or post-processing rules) for full reproducibility.
[Discussion] The GitHub Discussions baseline is presented as a matched human comparator, but potential platform-norm differences (e.g., discussion format, moderation) are not explicitly addressed as possible confounds in the selectivity findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments, which help clarify the methodological foundations of our study on AI-only technical discourse. We address each point below and will incorporate revisions to improve transparency and rigor.

read point-by-point responses

Referee: [Methods] Methods (data collection and sampling description): No protocol is described for verifying that MoltBook posts originate from autonomous AI agents without human prompting, platform curation, or mixed authorship (e.g., via account metadata, prompt-artifact checks, or exclusion criteria). This assumption is load-bearing for the central claim of 'AI-only' discourse and the interpretation of selectivity versus the GitHub baseline, as unverified authorship could introduce confounds explaining reduced concrete runtime details.

Authors: We agree that explicit documentation of the authorship assumption is essential. MoltBook is an AI-agents-only platform by design, with all posts generated through autonomous agent interactions as described in the platform documentation and our data collection section. Our sampling drew directly from the public technology post stream without human curation or intervention. However, we did not conduct post-hoc prompt-artifact detection or metadata verification beyond the platform's stated agent-only policy. In the revision we will expand the Methods section with a new subsection on data provenance, restate the platform's agent-only architecture, and add an explicit limitations paragraph noting that while the platform design supports the AI-only framing, independent verification of every post's generative origin was not performed. This will allow readers to assess potential confounds when comparing selectivity to the GitHub baseline. revision: yes
Referee: [Methods / Results] Topic analysis pipeline (4,707-post sample): The concentration-plus-check BERTopic procedure and open-coding sample lack reported validation metrics (topic coherence, inter-coder reliability), details on English filtering thresholds, and outlier removal criteria. These gaps directly affect the reliability of the 12 themes and the claim that discourse is 'coherent but selective.'

Authors: We concur that these metrics and procedural details are necessary for reproducibility. For the 500-post open-coding sample we will report inter-coder reliability (Cohen's kappa) in the revised Methods. For the BERTopic analysis of the 4,707 English-filtered posts we will add topic coherence scores (CV and NPMI), specify the language-detection threshold and library used for English filtering, and detail the outlier-removal rules within the concentration-plus-check pipeline. These additions will be placed in the Methods and Results sections to directly support the reliability of the 12 themes and the coherence claim. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical discourse analysis

full rationale

The paper's central claims derive from direct empirical processing of external data sources: human open coding of 500 MoltBook posts, a BERTopic-based topic pipeline on 4,707 filtered posts, and a matched comparison against 5,211 GitHub Discussions posts. No self-referential equations, fitted parameters renamed as predictions, load-bearing self-citations, or ansatzes smuggled via prior work appear in the derivation. The reported themes, concentration metrics, and selectivity observations are produced by standard topic-modeling and coding pipelines applied to the sampled corpora, rendering the analysis self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The study relies on established qualitative and NLP methods without introducing new free parameters, axioms beyond standard assumptions, or invented entities.

axioms (2)

domain assumption Human open coding yields reliable theme labels for technical posts
Used to derive the 12 recurring themes from the 500-post sample.
standard math BERTopic with stability-aware filtering produces meaningful non-outlier sub-topics
Applied to the 4,707-post corpus to obtain 32 sub-topics.

pith-pipeline@v0.9.0 · 5623 in / 1364 out tokens · 36439 ms · 2026-05-12T01:17:30.552497+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 2 internal anchors

[1]

2013.Categorical Data Analysis(3 ed.)

Alan Agresti. 2013.Categorical Data Analysis(3 ed.). Wiley

work page 2013
[2]

Danial Amin, Joni Salminen, and Bernard J. Jansen. 2026. How to Model AI Agents as Personas?: Applying the Persona Ecosystem Playground to 41,300 Posts on Moltbook for Behavioral Insights.CoRR(2026). doi:10.48550/arXiv.2603.03140

work page doi:10.48550/arxiv.2603.03140 2026
[3]

James, and Nadia Polikarpova

Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proceedings of the ACM on Programming Languages7 (2023). doi:10.1145/3586030

work page doi:10.1145/3586030 2023
[4]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society: Series B57 (1995). doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[5]

Huiru Chen, Zhenhua Wang, and Ming Ren. 2026. Unveiling the Collective Behaviors of Large Language Model-Based Autonomous Agents in an Online Community: A Social Network Analysis Perspective.Data and Information Management10 (2026). doi:10.1016/j.dim.2025.100107

work page doi:10.1016/j.dim.2025.100107 2026
[6]

Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, and Yanghua Xiao. 2024. From Persona to Junyu Huo, Ziqi Mao, Zihao Wan, and Gouri Ginde Personalization: A Survey on Role-Playing Language Agents.arXiv preprint arXiv:...

work page doi:10.48550/arxiv.2404.18231 2024
[7]

Roberts, Brandon M

Jason Chuang, Margaret E. Roberts, Brandon M. Stewart, Rebecca Weiss, Dustin Tingley, Justin Grimmer, and Jeffrey Heer. 2015. TopicCheck: Interactive Align- ment for Assessing Topic Model Stability. InProceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. doi:10.3115/...

work page doi:10.3115/v1/n15-1018 2015
[8]

Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-Law Distributions in Empirical Data.SIAM Rev.51 (2009). doi:10.1137/070710111

work page doi:10.1137/070710111 2009
[9]

Nicole Davila, Igor Wiese, Igor Steinmacher, Lucas Lucio da Silva, Andre Kawamoto, Gilson Jose Peres Favaro, and Ingrid Nunes. 2024. An Industry Case Study on Adoption of AI-based Programming Assistants. InProceedings - 2024 ACM/IEEE 46th International Conference on Software Engineering: Software Engineering in Practice. doi:10.1145/3639477.3643648

work page doi:10.1145/3639477.3643648 2024
[10]

Adji B Dieng, Francisco J R Ruiz, and David M Blei. 2020. Topic Modeling in Embedding Spaces.Transactions of the Association for Computational Linguistics 8 (2020). doi:10.1162/tacl_a_00325

work page doi:10.1162/tacl_a_00325 2020
[11]

Mateusz Dolata, Norbert Lange, and Gerhard Schwabe. 2024. Development in Times of Hype: How Freelancers Explore Generative AI?. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. doi:10.1145/ 3597503.3639111

work page arXiv 2024
[12]

Ronald A. Fisher. 1922. On the Interpretation of 𝜒 2 from Contingency Tables, and the Calculation of P.Journal of the Royal Statistical Society85 (1922). doi:10. 2307/2340521

work page 1922
[13]

Marco Gerosa, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma. 2024. Can AI Serve as a Substitute for Human Subjects in Software Engineering Re- search?Automated Software Engineering31 (2024). doi:10.1007/s10515-023- 00409-6

work page doi:10.1007/s10515-023- 2024
[14]

Georgios Gousios, Eirini Kalliamvakou, and Diomidis Spinellis. 2008. Measuring Developer Contribution from Software Repository Data. InProceedings of the 2008 International Working Conference on Mining Software Repositories. doi:10. 1145/1370750.1370781

work page arXiv 2008
[15]

Maarten Grootendorst. 2022. BERTopic: Neural Topic Modeling with a Class- Based TF-IDF Procedure.arXiv preprint arXiv:2203.05794abs/2203.05794 (2022). doi:10.48550/arXiv.2203.05794

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.05794 2022
[16]

Xiaobo Guo, Neil Potnis, Melody Yu, Nabeel Gillani, and Soroush Vosoughi

work page
[17]

InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/2024.emnlp-main.327

work page doi:10.18653/v1/2024.emnlp-main.327 2024
[18]

Huizi Hao, Kazi Amit Hasan, Hong Qin, Marcos Macedo, Yuan Tian, Steven H. H. Ding, and Ahmed E. Hassan. 2024. An Empirical Study on Developers’ Shared Conversations with ChatGPT in GitHub Pull Requests and Issues.Empirical Software Engineering29 (2024). doi:10.1007/s10664-024-10540-x

work page doi:10.1007/s10664-024-10540-x 2024
[19]

Hideaki Hata, Nicole Novielli, Sebastian Baltes, Raula Gaikovina Kula, and Christoph Treude. 2022. GitHub Discussions: An Exploratory Study of Early Adoption.Empirical Software Engineering27 (2022). doi:10.1007/s10664-021- 10058-6

work page doi:10.1007/s10664-021- 2022
[20]

Adery C. A. Hope. 1968. A Simplified Monte Carlo Significance Test Procedure. Journal of the Royal Statistical Society: Series B (Methodological)30 (1968). doi:10. 1111/j.2517-6161.1968.tb00759.x

work page arXiv 1968
[21]

Tahira Iqbal, Moniba Khan, Kuldar Taveter, and Norbert Seyff. 2021. Mining Reddit as a New Source for Software Requirements. In2021 IEEE 29th International Requirements Engineering Conference (RE). doi:10.1109/RE51729.2021.00019

work page doi:10.1109/re51729.2021.00019 2021
[22]

humans wel- come to observe

Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, and Yang Zhang. 2026. "Humans welcome to observe": A First Look at the Agent Social Network Molt- book.CoRRabs/2602.10127 (2026). doi:10.48550/arXiv.2602.10127

work page doi:10.48550/arxiv.2602.10127 2026
[23]

German, and Daniela Damian

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The Promises and Perils of Mining GitHub. InProceedings of the 11th Working Conference on Mining Software Repositories. doi:10.1145/2597073.2597074

work page doi:10.1145/2597073.2597074 2014
[24]

Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco Gomes de Oliveira Neto. 2024. Beyond Code Generation: An Observational Study of Chat- GPT Usage in Software Engineering Practice.Proceedings of the ACM on Software Engineering1 (2024). doi:10.1145/3660788

work page doi:10.1145/3660788 2024
[25]

Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. doi:10.3115/v1/E14-1056

work page doi:10.3115/v1/e14-1056 2014
[26]

Hao Li, Haoxiang Zhang, and Ahmed E. Hassan. 2025. The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshap- ing Software Engineering.CoRRabs/2507.15003 (2025). doi:10.48550/arXiv.2507. 15003

work page doi:10.48550/arxiv.2507 2025
[27]

InProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024

Alexander Lill, André N. Meyer, and Thomas Fritz. 2024. On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the Past. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. doi:10.1145/3597503.3623341

work page doi:10.1145/3597503.3623341 2024
[28]

Manuj Malik, Jing Jiang, and Kian Ming A. Chai. 2024. An Empirical Analysis of the Writing Styles of Persona-Assigned LLMs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). doi:10. 18653/v1/2024.emnlp-main.1079

work page 2024
[29]

The Annals of Mathematical Statistics , author =

Henry B. Mann and Donald R. Whitney. 1947. On a Test of Whether One of Two Random Variables is Stochastically Larger than the Other.The Annals of Mathematical Statistics18 (1947). doi:10.1214/aoms/1177730491

work page doi:10.1214/aoms/1177730491 1947
[30]

Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical Density Based Clustering.The Journal of Open Source Software2 (2017). doi:10. 21105/joss.00205

work page 2017
[31]

Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating GitHub for Engineered Software Projects.Empirical Software Engineer- ing22 (2017). doi:10.1007/s10664-017-9512-6

work page doi:10.1007/s10664-017-9512-6 2017
[32]

OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card.CoRRabs/2508.10925 (2025). doi:10.48550/arXiv.2508.10925

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10925 2025
[33]

Maria Papoutsoglou, Johannes Wachs, and Georgia M Kapitsaki. 2021. Mining DEV for Social and Technical Insights About Software Development. In2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). doi:10.1109/MSR52588.2021.00053

work page doi:10.1109/msr52588.2021.00053 2021
[34]

W. M. Patefield. 1981. Algorithm AS 159: An Efficient Method of Generating Random 𝑅×𝐶 Tables with Given Row and Column Totals.Applied Statistics30 (1981). doi:10.2307/2346669

work page doi:10.2307/2346669 1981
[35]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[36]

Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and Alexey Zagalsky. 2014. The (R)Evolution of Social Media in Software Engineering. InFuture of Software Engineering Proceedings. doi:10.1145/2593882.2593887

work page doi:10.1145/2593882.2593887 2014
[37]

Trang Tran and Mari Ostendorf. 2016. Characterizing the Language of Online Communities and its Relation to Community Reception. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. doi:10. 18653/v1/D16-1108

work page 2016
[38]

Christoph Treude, Ohad Barzilay, and Margaret-Anne Storey. 2011. How Do Programmers Ask and Answer Questions on the Web? (NIER Track). InProceedings of the 33rd International Conference on Software Engineering. doi:10.1145/1985793.1985907

work page doi:10.1145/1985793.1985907 2011
[39]

TrustAIRLab. 2026. TrustAIRLab/Moltbook. https://huggingface.co/datasets/ TrustAIRLab/Moltbook

work page 2026
[40]

Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

Priyan Vaithilingam, Tianyi Zhang, and Elena L. Glassman. 2022. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. InCHI Conference on Human Factors in Computing Systems Extended Abstracts. doi:10.1145/3491101.3519665

work page doi:10.1145/3491101.3519665 2022
[41]

Bowen Zhang, Yi Yang, Fuqiang Niu, Xianghua Fu, Genan Dai, and Hu Huang

work page
[42]

In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V

SPARK: Simulating the Co-evolution of Stance and Topic Dynamics in Online Discourse with LLM-based Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/2025.emnlp- main.1176

work page doi:10.18653/v1/2025.emnlp- 2025
[43]

Yang Zhang, Yiwen Wu, Tingting Chen, Tao Wang, Hui Liu, and Huaimin Wang

work page
[44]

Unilog: Automatic logging via LLM and in-context learning

How Do Developers Talk about GitHub Actions? Evidence from On- line Software Development Community. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. doi:10.1145/3597503.3623327

work page doi:10.1145/3597503.3623327
[45]

Lianghui Zhu, Xinggang Wang, and Xinlong Wang. 2025. JudgeLM: Fine-tuned Large Language Models are Scalable Judges. InThe Thirteenth International Conference on Learning Representations (ICLR). https://proceedings.iclr.cc/ paper_files/paper/2025/hash/7f8f73134e253845a8f82983219a8452-Abstract- Conference.html

work page 2025