pith. sign in

arxiv: 2504.02553 · v2 · submitted 2025-04-03 · 💻 cs.SE

Exploring Individual Factors in the Adoption of LLMs for Specific Software Engineering Purposes

Pith reviewed 2026-05-22 21:44 UTC · model grok-4.3

classification 💻 cs.SE
keywords LLM adoptionsoftware engineeringUTAUT2structural equation modelingpurpose-specific useindividual factorstechnology acceptance
0
0 comments X

The pith

Software engineers adopt LLMs for different SE tasks based on distinct sets of individual factors, with some factors reducing adoption when examined alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys 188 software engineers and applies the UTAUT2 model through structural equation modeling to test how personal factors relate to LLM use across five specific purposes such as artifact generation, decision support, and information retrieval. It finds that each purpose draws on its own combination of factors and that some factors show negative effects when considered without their interactions. A reader would care because earlier work looked only at overall adoption, leaving tool builders and team leads without guidance on matching systems or encouragement tactics to particular workflows. The central object carrying the argument is the purpose-specific application of UTAUT2 constructs to actual reported usage.

Core claim

This study surveyed 188 software engineers and applied the Unified Theory of Acceptance and Use of Technology (UTAUT2) through structural equation modeling to identify the individual factors influencing LLM adoption for five distinct software engineering purposes. The results show that each purpose is affected by unique combinations of factors, and that certain factors negatively influence adoption when analyzed in isolation, highlighting the intricate nature of integrating LLMs into specific SE activities.

What carries the argument

UTAUT2 constructs tested via structural equation modeling on survey responses for each of five separate SE purposes.

If this is right

  • Tool developers can design purpose-tuned LLM features instead of one-size-fits-all agents.
  • Team leaders can apply different encouragement tactics for artifact generation than for decision-making support.
  • Adoption models must treat factors in combination rather than in isolation to avoid misleading negative signals.
  • Strategies for LLM integration need to be workflow-specific rather than organization-wide.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • General LLM training programs may show uneven results because the same factor can help one task and hinder another.
  • Objective usage metrics collected over time could test whether the reported negative effects persist beyond initial intentions.
  • The pattern may extend to other AI tools in engineering, suggesting purpose-specific studies rather than broad adoption surveys.

Load-bearing premise

Self-reported survey answers from 188 engineers accurately capture the causal links between UTAUT2 factors and real usage behavior for each purpose rather than just intentions or social bias.

What would settle it

A follow-up study that replaces self-reports with logged usage data from the same engineers and finds no differences in which factors predict use across the five purposes would undermine the claim.

Figures

Figures reproduced from arXiv: 2504.02553 by Daniel Russo, Fabio Palomba, Filomena Ferrucci, Gemma Catolino, Stefano Lambiase.

Figure 1
Figure 1. Figure 1: Research model and hypotheses. ² Related Work: Summary and Research Gap. The research community is dedicating significant effort to characterize the role of LLMs in software engineering. However, despite the growing body of research examining the factors influencing LLM adoption, there remains a gap in understanding the individual factors associated with increased usage of LLMs for specific tasks. Expandin… view at source ↗
read the original abstract

Context: The advent of Large Language Models (LLMs) is transforming software development, significantly enhancing software engineering (SE) processes. Research has explored their role within development teams, focusing on the specific purposes for which LLMs are used within SE tasks, such as artifact generation, decision-making support, and information retrieval. Despite the growing body of work on LLMs in SE, most studies have centered on broad adoption trends, neglecting the nuanced relationship between individual cognitive and behavioral factors and their impact on purpose-specific adoption. While factors such as perceived effort and performance expectancy have been explored at a general level, their influence on distinct SE purposes remains underexamined. This gap hinders the development of tailored LLM-based systems (e.g., Generative AI Agents) that align with engineers' specific needs and limits the ability of team leaders to devise effective strategies for fostering LLM adoption in targeted workflows. Objectives: For the reasons mentioned above, this study aims to study the individual factors that drive the choice to use LLMs for distinct SE purposes. Methods: To achieve the above-mentioned objective, we surveyed 188 software engineers to test the relationship between individual attributes related to technology adoption and LLM adoption across five key purposes, using structural equation modeling (SEM). The Unified Theory of Acceptance and Use of Technology (UTAUT2) was applied to characterize individual adoption behaviors. Results: The findings reveal that purpose-specific adoption is influenced by distinct factors, some of which negatively impact adoption when considered in isolation, underscoring the complexity of LLM integration in SE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper surveys 188 software engineers and applies structural equation modeling (SEM) with the UTAUT2 framework to test how individual factors relate to LLM adoption across five specific SE purposes (artifact generation, decision-making support, information retrieval, and two others). The central claim is that each purpose is influenced by a distinct set of UTAUT2 constructs, with some constructs exerting negative effects on adoption when examined in isolation.

Significance. If the SEM results prove robust after proper validation, the work advances technology-acceptance research in SE by shifting from broad adoption studies to purpose-specific analysis. This could inform the design of tailored generative-AI agents and help team leaders target interventions. The empirical survey-plus-SEM design is conventional for UTAUT2 studies, but its value hinges on transparent reporting of model diagnostics and explicit limits on causal inference.

major comments (2)
  1. [Methods] Methods section: The abstract and available description state that SEM was performed but supply no model-fit statistics (CFI, RMSEA, SRMR), no information on missing-data handling, no multicollinearity diagnostics, and no indication whether the five purpose-specific models were estimated separately or jointly. These omissions prevent verification that the reported path coefficients support the claim of distinct, including negative, effects.
  2. [Results] Results section: The claim that certain UTAUT2 constructs 'negatively impact adoption when considered in isolation' is presented as a substantive finding. With only cross-sectional self-report data, the paths capture associations among survey items; the manuscript does not report objective usage logs, longitudinal follow-up, or multi-source validation that would be required to treat the coefficients as causal influences on actual behavior.
minor comments (1)
  1. [Introduction] The manuscript should explicitly cite the original UTAUT2 reference (Venkatesh et al., 2012) when describing the constructs and should clarify whether any items were adapted for the LLM context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of our manuscript. Below we respond point-by-point to the major comments, indicating the revisions we will undertake.

read point-by-point responses
  1. Referee: [Methods] Methods section: The abstract and available description state that SEM was performed but supply no model-fit statistics (CFI, RMSEA, SRMR), no information on missing-data handling, no multicollinearity diagnostics, and no indication whether the five purpose-specific models were estimated separately or jointly. These omissions prevent verification that the reported path coefficients support the claim of distinct, including negative, effects.

    Authors: We agree that these methodological details are essential for transparency. In the revised manuscript we will report the model-fit indices (CFI, RMSEA, SRMR) for each of the five models, describe missing-data handling (complete cases were used after listwise deletion of the small number of incomplete responses), provide multicollinearity diagnostics (VIF values for all predictors), and explicitly state that the five purpose-specific models were estimated separately rather than jointly. These additions will enable readers to evaluate the reported coefficients. revision: yes

  2. Referee: [Results] Results section: The claim that certain UTAUT2 constructs 'negatively impact adoption when considered in isolation' is presented as a substantive finding. With only cross-sectional self-report data, the paths capture associations among survey items; the manuscript does not report objective usage logs, longitudinal follow-up, or multi-source validation that would be required to treat the coefficients as causal influences on actual behavior.

    Authors: We accept that the cross-sectional, self-reported nature of the data means the SEM paths reflect associations rather than causal effects. We will revise the Results and Discussion sections to replace causal language such as 'negatively impact' with 'are negatively associated with' and will add an explicit limitations paragraph noting the absence of objective usage logs, longitudinal data, or multi-source validation. These changes will better frame the findings as correlational insights within the UTAUT2 framework while preserving the observation that certain constructs show negative coefficients when examined in isolation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical SEM on external survey data

full rationale

The paper reports an empirical study that surveys 188 software engineers and applies structural equation modeling (SEM) using the established UTAUT2 framework to identify associations between individual factors and purpose-specific LLM adoption. Results are obtained by fitting the model to collected survey responses; no mathematical derivation, first-principles prediction, or parameter fitted in one step is then relabeled as an independent prediction in another. The abstract and context contain no self-citation load-bearing steps, no uniqueness theorems imported from the authors' prior work, and no ansatz smuggled via citation. The chain is self-contained against standard external benchmarks for survey-based SEM research.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of the UTAUT2 constructs as measured by the survey instrument and on the assumption that SEM can isolate purpose-specific effects from cross-purpose correlations. No free parameters are named in the abstract, but the fitted path coefficients in the SEM are implicitly data-driven. No new entities are postulated.

free parameters (1)
  • SEM path coefficients
    The structural paths between UTAUT2 constructs and adoption intention/use are estimated from the survey data; their specific values are not reported in the abstract.
axioms (2)
  • domain assumption UTAUT2 constructs are appropriate and sufficient to characterize individual LLM adoption behavior in software engineering contexts.
    The abstract states that UTAUT2 was applied to characterize behaviors without additional justification or comparison to alternative models.
  • domain assumption Self-reported survey data can be treated as a reliable proxy for actual usage behavior across the five purposes.
    The study relies on survey responses to test relationships; no behavioral logs or observational validation are mentioned.

pith-pipeline@v0.9.0 · 5817 in / 1552 out tokens · 33773 ms · 2026-05-22T21:44:44.680909+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. To Copilot and Beyond: 22 AI Systems Developers Want Built

    cs.SE 2026-04 unverdicted novelty 5.0

    Survey of 860 developers reveals 22 desired AI systems for non-coding tasks with explicit constraints on authority, provenance, and quality signals, framed as bounded delegation where AI handles assembly work but not ...

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper

  1. [1]

    An empirical study of the code generation of safety-critical software using llms,

    M. Liu, J. Wang, T. Lin, Q. Ma, Z. Fang, and Y . Wu, “An empirical study of the code generation of safety-critical software using llms,” Applied Sciences, vol. 14, no. 3, p. 1046, 2024

  2. [2]

    Exploring large language models for code explanation,

    P. Bhattacharya, M. Chakraborty, K. N. S. N. Palepu, V . Pandey, I. Dindorkar, R. Rajpurohit, and R. Gupta, “Exploring large language models for code explanation,” ArXiv, vol. abs/2310.16673, 2023

  3. [3]

    Self-planning code generation with large language models,

    X. Jiang, Y . Dong, L. Wang, Q. Shang, and G. Li, “Self-planning code generation with large language models,” ACM Transactions on Software Engineering and Methodology , 2023

  4. [4]

    Self-collaboration code generation via chatgpt,

    Y . Dong, X. Jiang, Z. Jin, and G. Li, “Self-collaboration code generation via chatgpt,” ACM Transactions on Software Engineering and Method- ology, 2023

  5. [5]

    A survey on large language model (llm) security and privacy: The good, the bad, and the ugly,

    Y . Yao, J. Duan, K. Xu, Y . Cai, Z. Sun, and Y . Zhang, “A survey on large language model (llm) security and privacy: The good, the bad, and the ugly,” High-Confidence Computing, p. 100211, 2024

  6. [6]

    The programmer’s assistant: Conversational interaction with a large language model for software development,

    S. I. Ross, F. Martinez, S. Houde, M. Muller, and J. D. Weisz, “The programmer’s assistant: Conversational interaction with a large language model for software development,” in Proceedings of the 28th International Conference on Intelligent User Interfaces , 2023, pp. 491– 514

  7. [7]

    Code summarization without direct ac- cess to code-towards exploring federated llms for software engineering,

    J. Kumar and S. Chimalakonda, “Code summarization without direct ac- cess to code-towards exploring federated llms for software engineering,” in Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering , 2024, pp. 100–109

  8. [8]

    An empirical evaluation of using large language models for automated unit test generation,

    M. Sch ¨afer, S. Nadi, A. Eghbali, and F. Tip, “An empirical evaluation of using large language models for automated unit test generation,” IEEE Transactions on Software Engineering , vol. 50, pp. 85–105, 2023

  9. [9]

    Chatgpt vs sbst: A comparative assessment of unit test suite generation,

    Y . Tang, Z. Liu, Z. Zhou, and X. Luo, “Chatgpt vs sbst: A comparative assessment of unit test suite generation,” IEEE Transactions on Software Engineering, vol. 50, pp. 1340–1359, 2023

  10. [10]

    Software testing with large language models: Survey, landscape, and vision,

    J. Wang, Y . Huang, C. Chen, Z. Liu, S. Wang, and Q. Wang, “Software testing with large language models: Survey, landscape, and vision,”IEEE Transactions on Software Engineering , vol. 50, pp. 911–936, 2023. 12

  11. [11]

    Beyond code generation: An observational study of chatgpt usage in software engineering practice,

    R. Khojah, M. Mohamad, P. Leitner, and F. G. de Oliveira Neto, “Beyond code generation: An observational study of chatgpt usage in software engineering practice,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 1819–1840, 2024

  12. [12]

    Navigating the complexity of generative ai adoption in software engineering,

    D. Russo, “Navigating the complexity of generative ai adoption in software engineering,” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 5, 2024

  13. [13]

    Investigating the role of cultural values in adopting large language models for software engineering,

    S. Lambiase, G. Catolino, F. Palomba, F. Ferrucci, and D. Russo, “Investigating the role of cultural values in adopting large language models for software engineering,” 2024. [Online]. Available: https: //arxiv.org/abs/2409.05055

  14. [14]

    What guides our choices? modeling developers’ trust and behavioral intentions towards genai,

    R. Choudhuri, B. Trinkenreich, R. Pandita, E. Kalliamvakou, I. Steinmacher, M. Gerosa, C. Sanchez, and A. Sarma, “What guides our choices? modeling developers’ trust and behavioral intentions towards genai,” 2024. [Online]. Available: https://arxiv.org/abs/2409.04099

  15. [15]

    Building living software systems with generative & agentic ai,

    J. White, “Building living software systems with generative & agentic ai,” arXiv preprint arXiv:2408.01768 , 2024

  16. [16]

    Generative artificial intelligence for software engineering–a research agenda,

    A. Nguyen-Duc, B. Cabrero-Daniel, A. Przybylek, C. Arora, D. Khanna, T. Herda, U. Rafiq, J. Melegati, E. Guerra, K.-K. Kemell et al. , “Generative artificial intelligence for software engineering–a research agenda,” arXiv preprint arXiv:2310.18648 , 2023

  17. [17]

    Consumer acceptance and use of information technology: Extending the unified theory of acceptance and use of technology,

    V . Venkatesh, J. Y . L. Thong, and X. Xu, “Consumer acceptance and use of information technology: Extending the unified theory of acceptance and use of technology,” MIS Quarterly , vol. 36, no. 1, pp. 157–178,

  18. [18]

    Available: http://www.jstor.org/stable/41410412

    [Online]. Available: http://www.jstor.org/stable/41410412

  19. [19]

    Language mod- els are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language mod- els are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

  20. [20]

    Grounded copilot: How programmers interact with code-generating models,

    S. Barke, M. B. James, and N. Polikarpova, “Grounded copilot: How programmers interact with code-generating models,” Proceedings of the ACM on Programming Languages , vol. 7, no. OOPSLA1, pp. 85–111, 2023

  21. [21]

    Llm- based interaction for content generation: A case study on the perception of employees in an it department,

    A. Agossah, F. Krupa, M. Perreira Da Silva, and P. Le Callet, “Llm- based interaction for content generation: A case study on the perception of employees in an it department,” in Proceedings of the 2023 ACM International Conference on Interactive Media Experiences , 2023, pp. 237–241

  22. [22]

    Gender, age, and technology education influence the adoption and appropriation of llms,

    F. Draxler, D. Buschek, M. Tavast, P. H ¨am¨al¨ainen, A. Schmidt, J. Kul- shrestha, and R. Welsch, “Gender, age, and technology education influence the adoption and appropriation of llms,” arXiv preprint arXiv:2310.06556, 2023

  23. [23]

    User acceptance of information technology: Toward a unified view,

    V . Venkatesh, M. G. Morris, G. B. Davis, and F. D. Davis, “User acceptance of information technology: Toward a unified view,” MIS Quarterly, vol. 27, no. 3, pp. 425–478, 2003. [Online]. Available: http://www.jstor.org/stable/30036540

  24. [24]

    A primer on partial least squares structural equation modeling (pls-sem),

    J. F. Hair Junior, G. T. M. Hult, C. M. Ringle, and M. Sarstedt, “A primer on partial least squares structural equation modeling (pls-sem),” 2014

  25. [25]

    Pls-sem for software engineering research: An introduction and survey,

    D. Russo and K.-J. Stol, “Pls-sem for software engineering research: An introduction and survey,” ACM Computing Surveys (CSUR), vol. 54, no. 4, pp. 1–38, 2021

  26. [26]

    Personal opinion surveys,

    B. A. Kitchenham and S. L. Pfleeger, “Personal opinion surveys,” in Guide to advanced empirical software engineering . Springer, 2008, pp. 63–92

  27. [27]

    Conducting research on the internet:: Online survey design, development and implementation guidelines,

    D. Andrews, B. Nonnecke, and J. Preece, “Conducting research on the internet:: Online survey design, development and implementation guidelines,” 2007

  28. [28]

    Data quality of platforms and panels for online behavioral research,

    P. Eyal, R. David, G. Andrew, E. Zak, and D. Ekaterina, “Data quality of platforms and panels for online behavioral research,” Behavior research methods, pp. 1–20, 2021

  29. [29]

    Data quality in on- line human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona,

    B. D. Douglas, P. J. Ewell, and M. Brauer, “Data quality in on- line human-subjects research: Comparisons between mturk, prolific, cloudresearch, qualtrics, and sona,”Plos one, vol. 18, no. 3, p. e0279720, 2023

  30. [30]

    Recruiting software engineers on prolific,

    D. Russo, “Recruiting software engineers on prolific,” arXiv preprint arXiv:2203.14695, 2022

  31. [31]

    Do you really code? designing and evaluating screening questions for online surveys with programmers,

    A. Danilova, A. Naiakshina, S. Horstmann, and M. Smith, “Do you really code? designing and evaluating screening questions for online surveys with programmers,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) . IEEE, 2021, pp. 537– 548

  32. [32]

    Statistical power analyses using g* power 3.1: Tests for correlation and regression analyses,

    F. Faul, E. Erdfelder, A. Buchner, and A.-G. Lang, “Statistical power analyses using g* power 3.1: Tests for correlation and regression analyses,” Behavior research methods , vol. 41, no. 4, pp. 1149–1160, 2009

  33. [33]

    Smartpls 4,

    C. M. Ringle, S. Wende, and J.-M. Becker, “Smartpls 4,” B ¨onningstedt,

  34. [34]

    Available: https://www.smartpls.com/

    [Online]. Available: https://www.smartpls.com/

  35. [35]

    Wohlin, P

    C. Wohlin, P. Runeson, M. H ¨ost, M. C. Ohlsson, B. Regnell, A. Wessl´en et al. , Experimentation in software engineering . Springer, 2012, vol. 236

  36. [36]

    Empirical standards for software engineering research,

    P. Ralph, N. b. Ali, S. Baltes, D. Bianculli, J. Diaz, Y . Dittrich, N. Ernst, M. Felderer, R. Feldt, A. Filieri et al., “Empirical standards for software engineering research,” arXiv preprint arXiv:2010.03525 , 2020

  37. [37]

    Are you a real software engineer? best practices in online recruitment for software engineering studies,

    A. Alami, M. Zahedi, and N. Ernst, “Are you a real software engineer? best practices in online recruitment for software engineering studies,” in Proceedings of the 1st IEEE/ACM International Workshop on Method- ological Issues with Empirical Studies in Software Engineering , 2024, pp. 52–57

  38. [38]

    L. Perri. (2023) What’s new in artificial intelligence from the 2023 gart- ner hype cycle. [Online]. Available: https://www.gartner.com/en/articles/ what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle

  39. [39]

    Exploring individual factors in the adoption of llms for specific software engineering tasks — online appendix,

    S. Lambiase, G. Catolino, F. Palomba, F. Ferrucci, and D. Russo, “Exploring individual factors in the adoption of llms for specific software engineering tasks — online appendix,” 2025. [Online]. Available: https://figshare.com/s/c0d84aafdd5c57dd9099

  40. [40]

    A new criterion for assessing discriminant validity in variance-based structural equation modeling,

    J. Henseler, C. M. Ringle, and M. Sarstedt, “A new criterion for assessing discriminant validity in variance-based structural equation modeling,” Journal of the academy of marketing science , vol. 43, pp. 115–135, 2015

  41. [41]

    The partial least squares approach to structural equation modeling,

    W. W. Chin et al. , “The partial least squares approach to structural equation modeling,” Modern methods for business research , vol. 295, no. 2, pp. 295–336, 1998

  42. [42]

    On the value rel- evance of customer satisfaction. multiple drivers and multiple markets,

    S. Raithel, M. Sarstedt, S. Scharf, and M. Schwaiger, “On the value rel- evance of customer satisfaction. multiple drivers and multiple markets,” Journal of the academy of marketing science , vol. 40, pp. 509–525, 2012

  43. [43]

    Who influences whom? analyzing workplace referents’ social influence on it adoption and non- adoption,

    A. Eckhardt, S. Laumer, and T. Weitzel, “Who influences whom? analyzing workplace referents’ social influence on it adoption and non- adoption,” Journal of Information Technology, vol. 24, pp. 11–24, 2009