pith. machine review for the scientific record. sign in

arxiv: 2605.04532 · v1 · submitted 2026-05-06 · 💻 cs.SE · cs.AI

Recognition: 2 theorem links

Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:03 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords AI coding assistantsTerms of Serviceaccountabilitysoftware engineeringautonomous agentsliabilityresponsibilityresearch roadmap
0
0 comments X

The pith

Terms of Service for AI coding tools consistently shift responsibility for correctness, safety, and legal compliance onto users.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines Terms of Service documents from major AI coding assistants and agent-based development tools to determine how they assign ownership, liability, and obligations. It identifies a repeated pattern where providers disclaim responsibility for errors, safety issues, and legal problems in generated code, while showing wide differences in areas like data reuse and indemnification. The authors conclude that these policies do not match the growing use of autonomous agents that act with less human oversight in software workflows. They then lay out a research agenda covering ways to model accountability, create governance documents, build supporting tools, and study developer experiences.

Core claim

A comparative analysis of Terms of Service for widely used AI coding assistants and agent-enabled tools reveals a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, along with substantial variation in how providers handle indemnification, data reuse, and acceptable use. Existing policy frameworks are poorly aligned with increasingly agent-mediated and autonomous software development workflows, creating the need for a research roadmap on modeling responsibility, designing governance artifacts, developing supportive tooling, and conducting empirical studies of developers' perceptions and practices.

What carries the argument

Comparative analysis of Terms of Service documents that allocate ownership, responsibility, liability, and disclosure between providers and developers.

If this is right

  • Developers using these tools must personally verify all outputs for correctness and compliance without reliable provider backing.
  • Divergent policies across tools create uncertainty when combining multiple AI agents in one project.
  • Autonomous agents that generate and modify code independently will continue to operate under user-borne risks unless policies change.
  • New research is required to model responsibility chains, design governance mechanisms, build accountability tooling, and survey developer practices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This policy misalignment may limit how quickly teams adopt fully autonomous coding agents in production environments.
  • Courts or regulators might eventually reinterpret current ToS when AI agents cause measurable harm, independent of user awareness.
  • Developers could benefit from standardized summaries of ToS clauses that highlight risk allocation before tool adoption.
  • Empirical studies of actual code-review practices might show whether developers treat AI outputs differently based on perceived liability.

Load-bearing premise

The examined Terms of Service documents give a complete and representative picture of accountability practices across the AI coding tool landscape.

What would settle it

Discovery of a major AI coding tool's Terms of Service that explicitly assumes provider liability for code errors or legal issues generated by its agents would contradict the reported consistent shift of responsibility.

read the original abstract

AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern the use of AI-powered development tools. In this vision paper, we present a comparative analysis of the Terms of Service for widely used AI coding assistants and agent-enabled development tools. We examine how these documents allocate ownership, responsibility, liability, and disclosure obligations between tool providers and software developers, and we identify common patterns and divergences between providers. Our analysis reveals a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, as well as substantial variation in how providers address issues such as indemnification, data reuse, and acceptable use. Based on these findings, we argue that existing policy frameworks are poorly aligned with increasingly agent-mediated and autonomous software development workflows. We outline a research roadmap for accountable agents in software engineering, identifying challenges and opportunities for modeling responsibility, designing governance artifacts, developing tooling that supports accountability, and conducting empirical studies of developers' perceptions and practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper is a vision paper presenting a comparative analysis of Terms of Service (ToS) documents for widely used AI coding assistants and agent-enabled development tools. It examines how these documents allocate ownership, responsibility, liability, and disclosure obligations, identifying a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, along with substantial variation in indemnification, data reuse, and acceptable use. The authors argue that existing policy frameworks are poorly aligned with agent-mediated and autonomous software development workflows and outline a research roadmap addressing modeling responsibility, governance artifacts, supporting tooling, and empirical studies of developer perceptions.

Significance. If the patterns identified hold and are representative, the work draws needed attention to accountability gaps in AI tools for software engineering, an area receiving less focus than capability and productivity impacts. The roadmap could help structure future research on responsibility modeling and governance in agentic SE workflows. The analysis of public ToS documents provides a timely, grounded starting point for discussion, though its descriptive nature limits immediate impact without stronger methodological grounding.

major comments (2)
  1. [Comparative analysis section / abstract] The central claim of a 'consistent tendency' to shift responsibility and 'substantial variation' across providers (abstract and comparative analysis section) rests on the sampled ToS documents being representative of the AI coding tool landscape. However, the manuscript provides no explicit selection criteria for the tools examined, no list or table of the specific documents and versions analyzed, and no details on the analysis method or evidence (e.g., key excerpts or coding scheme). This makes the step from sample observations to landscape-wide conclusions about misalignment with agent-mediated workflows under-supported and load-bearing for the argument.
  2. [Analysis and findings sections] The soundness of the findings is limited by the absence of methodological details on how patterns were identified (e.g., qualitative coding process, inter-rater reliability if applicable, or handling of document updates). Without this, the patterns remain stated rather than evidenced in a way that allows readers to assess their robustness, which is essential for a vision paper making policy-alignment claims.
minor comments (3)
  1. [Comparative analysis section] Add a table in the analysis section listing all examined tools, ToS versions/dates accessed, and high-level summary of key clauses to improve transparency and reproducibility.
  2. [Research roadmap section] The research roadmap section could include more concrete references to existing accountability frameworks (e.g., from AI ethics or legal informatics) to better position the proposed challenges and opportunities.
  3. [Throughout] Ensure consistent citation of all ToS sources with access dates throughout the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review, positive assessment of the paper's significance, and helpful identification of areas for improved methodological transparency. We agree that these details will strengthen the manuscript and will incorporate revisions to address both major comments.

read point-by-point responses
  1. Referee: [Comparative analysis section / abstract] The central claim of a 'consistent tendency' to shift responsibility and 'substantial variation' across providers (abstract and comparative analysis section) rests on the sampled ToS documents being representative of the AI coding tool landscape. However, the manuscript provides no explicit selection criteria for the tools examined, no list or table of the specific documents and versions analyzed, and no details on the analysis method or evidence (e.g., key excerpts or coding scheme). This makes the step from sample observations to landscape-wide conclusions about misalignment with agent-mediated workflows under-supported and load-bearing for the argument.

    Authors: We agree that the current version lacks explicit documentation of selection criteria and analysis methods, which limits the ability to evaluate the representativeness of our observations. In the revised manuscript, we will add a dedicated subsection 'Selection Criteria and Analysis Method' in the comparative analysis section. This will specify our criteria (e.g., popularity based on developer adoption metrics, public availability of ToS, and coverage of both coding assistants and agentic tools), include a table listing the exact tools, providers, and ToS document versions with retrieval dates, and summarize the qualitative approach with representative excerpts supporting the identified patterns of responsibility shifting and variation. revision: yes

  2. Referee: [Analysis and findings sections] The soundness of the findings is limited by the absence of methodological details on how patterns were identified (e.g., qualitative coding process, inter-rater reliability if applicable, or handling of document updates). Without this, the patterns remain stated rather than evidenced in a way that allows readers to assess their robustness, which is essential for a vision paper making policy-alignment claims.

    Authors: We concur that greater detail on pattern identification is needed to demonstrate robustness. In the revision, we will expand the relevant sections to describe the qualitative coding process (including the main themes coded such as ownership, liability, and compliance obligations), how document updates were handled via dated retrieval, and our approach to consistency (cross-author verification rather than formal inter-rater reliability, given the small team). We will also include additional supporting excerpts. These changes will provide the evidentiary grounding requested while maintaining the paper's vision-oriented focus on the research roadmap. revision: yes

Circularity Check

0 steps flagged

No circularity: analysis relies on external public ToS documents with no self-referential derivations

full rationale

The paper performs a comparative analysis of publicly available Terms of Service documents from AI coding tools. It identifies patterns in responsibility allocation and argues for misalignment with agent-mediated workflows, followed by a forward-looking research roadmap. No equations, fitted parameters, self-citations, or internal derivations are present that reduce claims to the paper's own inputs by construction. The central claims rest on external document content rather than any self-definitional loop, fitted prediction, or uniqueness theorem imported from prior author work. This is a standard descriptive policy analysis with no load-bearing internal logic that could be circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Qualitative policy analysis paper with no mathematical parameters, axioms, or new postulated entities.

pith-pipeline@v0.9.0 · 9381 in / 866 out tokens · 141924 ms · 2026-05-08T18:03:36.512457+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 30 canonical work pages · 1 internal anchor

  1. [1]

    Matthew Arnold, Rachel K. E. Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilovic, Ravi Nair, Karthikeyan Natesan Ramamurthy, Treude Alexandra Olteanu, David Piorkowski, Darrell Reimer, John T. Richards, Jason Tsay, and Kush R. Varshney. 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity....

  2. [2]

    Owura Asare, Meiyappan Nagappan, and N. Asokan. 2023. Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?Empir. Softw. Eng.28, 6 (2023), 129. doi:10.1007/S10664-023-10380-1

  3. [3]

    Grounded Copilot: How Programmers Interact with Code-Generating Models

    Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models.Proc. ACM Program. Lang.7, OOPSLA1 (2023), 85–111. doi:10.1145/3586030

  4. [4]

    Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InFAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.)....

  5. [5]

    Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. ACM Queue20, 6 (2022), 35–57. doi:10.1145/3582083

  6. [6]

    Markus Borg, Dave Hewett, Nadim Hagatulah, Noric Couderc, Emma Söderberg, Donald Graham, Uttam Kini, and Dave Farley. 2025. Echoes of AI: Investigating the Downstream Effects of AI Assistants on Software Maintainability.CoRR abs/2507.00788 (2025). arXiv:2507.00788 doi:10.48550/ARXIV.2507.00788

  7. [7]

    Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramèr, and Chiyuan Zhang. 2023. Quantifying Memorization Across Neu- ral Language Models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/forum?id=TatRHT_1cK

  8. [8]

    Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel

    Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert- Voss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting Training Data from Large Lan- guage Models. In30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, Michael D. Bailey and Rachel ...

  9. [9]

    Shreya Chappidi, Jennifer Cobbe, Chris Norval, Anjali Mazumder, and Jatin- der Singh. 2025. Accountability Capture: How Record-Keeping to Support AI Transparency and Accountability (Re)shapes Algorithmic Oversight.CoRR abs/2510.04609. arXiv:2510.04609 doi:10.48550/ARXIV.2510.04609

  10. [10]

    Hassan, and Hajimu Iida

    Worawalan Chatlatanagulchai, Hao Li, Yutaro Kashiwa, Brittany Reid, Kundjan- asith Thonglek, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, Bram Adams, Ahmed E. Hassan, and Hajimu Iida. 2025. Agent READMEs: An Empirical Study of Context Files for Agentic Coding.CoRRabs/2511.12884 (2025). arXiv:2511.12884 doi:10.48550/ARXIV.2511.12884

  11. [11]

    Chun Jie Chong, Zhihao Yao, and Iulian Neamtiu. 2024. Artificial-Intelligence Generated Code Considered Harmful: A Road Map for Secure and High-Quality Code Generation.CoRRabs/2409.19182 (2024). arXiv:2409.19182 doi:10.48550/ ARXIV.2409.19182

  12. [12]

    Cognition. 2024. Introducing Devin, the first AI software engineer — cognition.ai. https://cognition.ai/blog/introducing-devin. [Accessed 28-01-2026]

  13. [13]

    Fast and power efficient GPU-based explicit elastic wave propagation analysis by low- ordered orthogonal voxel finite element with INT8 tensor cores

    Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, and Zhen Ming (Jack) Jiang. 2023. GitHub Copilot AI pair programmer: Asset or Liability?J. Syst. Softw.203 (2023), 111734. doi:10.1016/J. JSS.2023.111734

  14. [14]

    Praveen Kumar Donta, Alaa Saleh, Ying Li, Shubham Vaishnav, Kai Fang, Hailin Feng, Yuchao Xia, Thippa Reddy Gadekallu, Qiyang Zhang, Xiaodan Shi, Ali Beik- mohammadi, Sindri Magnússon, Ilir Murturi, Chinmaya Kumar Dehury, Marcin Paprzycki, Lauri Lovén, Sasu Tarkoma, and Schahram Dustdar. 2026. Socio- technical aspects of Agentic AI.CoRRabs/2601.06064 (202...

  15. [15]

    Samuel Ferino, Rashina Hoda, John Grundy, and Christoph Treude. 2026. To- wards an Appropriate Level of Reliance on AI: A Preliminary Reliance-Control Framework for AI in Software Engineering. In2026 2nd Workshop on Human- Centered AI for Software Engineering (HumanAISE). IEEE

  16. [16]

    Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. 2026. Configuring Agentic AI Coding Tools: An Exploratory Study. In2026 3rd IEEE/ACM International Conference on AI-powered Software (AIware). IEEE

  17. [17]

    Gebru, J

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna M. Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for datasets.Commun. ACM64, 12 (2021), 86–92. doi:10.1145/3458723

  18. [18]

    Junda He, Christoph Treude, and David Lo. 2025. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead. ACM Trans. Softw. Eng. Methodol.34, 5 (2025), 124:1–124:30. doi:10.1145/3712003

  19. [19]

    Nikhil Kandpal, Eric Wallace, and Colin Raffel. 2022. Deduplicating Training Data Mitigates Privacy Risks in Language Models. InInternational Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan S...

  20. [20]

    Sam Lau and Philip J. Guo. 2025. The Design Space of LLM-Based AI Coding Assistants: An Analysis of 90 Systems in Academia and Industry. In2025 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2025, Raleigh, NC, USA, October 7-10, 2025. IEEE, 300–313. doi:10.1109/VL-HCC65237. 2025.00041

  21. [21]

    Jooyoung Lee, Thai Le, Jinghui Chen, and Dongwon Lee. 2023. Do Language Models Plagiarize?. InProceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben (Eds.). ACM, 3637–3647. doi:10.1145/3543507.3583199

  22. [22]

    Zhang, Sebas- tian Baltes, and Christoph Treude

    Jai Lal Lulla, Seyedmoein Mohsenimofidi, Matthias Galster, Jie M. Zhang, Sebas- tian Baltes, and Christoph Treude. 2026. On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents.CoRRabs/2601.20404 (2026). arXiv:2601.20404 doi:10.48550/ARXIV.2601.20404

  23. [23]

    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. InProceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, danah boyd and Jamie H. Morgenstern (Ed...

  24. [24]

    Seyedmoein Mohsenimofidi, Matthias Galster, Christoph Treude, and Sebastian Baltes. 2026. Context Engineering for AI Agents in Open-Source Software. In Proceedings of the 23rd IEEE/ACM International Conference on Mining Software Repositories (MSR 2026)

  25. [25]

    Claudio Novelli, Mariarosaria Taddeo, and Luciano Floridi. 2024. Accountability in artificial intelligence: What it is and how it works.AI Soc.39, 4 (2024), 1871–

  26. [26]

    doi:10.1007/S00146-023-01635-Y

  27. [27]

    Ruchika Pandey, Prabhat Singh, Raymond Wei, and Shaila Shankar. 2024. Transforming Software Development: Evaluating the Efficiency and Chal- lenges of GitHub Copilot in Real-World Projects.CoRRabs/2406.17910 (2024). arXiv:2406.17910 doi:10.48550/ARXIV.2406.17910

  28. [28]

    Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot.CoRR abs/2302.06590 (2023). arXiv:2302.06590 doi:10.48550/ARXIV.2302.06590

  29. [29]

    Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. 2023. Do Users Write More Insecure Code with AI Assistants?. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS 2023, Copen- hagen, Denmark, November 26-30, 2023, Weizhi Meng, Christian Damsgaard Jensen, Cas Cremers, and Engin Kirda (Eds.). ACM, 2785–2799....

  30. [30]

    Hundhausen, Summit Haque, and Md

    Yunhan Qiao, Christopher D. Hundhausen, Summit Haque, and Md. Istiak Hos- sain Shihab. 2025. Comprehension-Performance Gap in GenAI-Assisted Brown- field Programming: A Replication and Extension.CoRRabs/2511.02922 (2025). arXiv:2511.02922 doi:10.48550/ARXIV.2511.02922

  31. [31]

    Schiff, Bogdana Rakova, Aladdin Ayesh, Anat Fanti, and Michael Lennon

    Daniel S. Schiff, Bogdana Rakova, Aladdin Ayesh, Anat Fanti, and Michael Lennon. 2020. Principles to Practices for Responsible AI: Closing the Gap.CoRR abs/2006.04707 (2020). arXiv:2006.04707 https://arxiv.org/abs/2006.04707

  32. [32]

    Lewin Schmitt. 2022. Mapping global AI governance: a nascent regime in a fragmented landscape.AI Ethics2, 2 (2022), 303–314. doi:10.1007/S43681-021- 00083-Y

  33. [33]

    Agnia Sergeyuk, Yaroslav Golubev, Timofey Bryksin, and Iftekhar Ahmed. 2025. Using AI-based coding assistants in practice: State of affairs, perceptions, and ways forward.Inf. Softw. Technol.178 (2025), 107610. doi:10.1016/J.INFSOF.2024. 107610

  34. [34]

    Araz Taeihagh. 2025. Governance of Generative AI.Policy and Society44, 1 (01 2025), 1–22. doi:10.1093/polsoc/puaf001

  35. [35]

    Christoph Treude and Marco Aurélio Gerosa. 2025. How Developers Interact with AI: A Taxonomy of Human-AI Collaboration in Software Engineering. In IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering, Forge@ICSE 2025, Ottawa, ON, Canada, April 27-28, 2025. IEEE, 236–

  36. [36]

    doi:10.1109/FORGE66646.2025.00033

  37. [37]

    Christoph Treude and Christopher M. Poskitt. 2025. Bot-Driven Develop- ment: From Simple Automation to Autonomous Software Development Bots. In IEEE/ACM International Workshop on Bots in Software Engineering, BotSE@ICSE 2025, Ottawa, ON, Canada, April 27, 2025. IEEE, 18–22. doi:10.1109/BOTSE67031. 2025.00012

  38. [38]

    Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

    John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. (2024). http://papers.nips. cc/paper_files/paper/2024/hash/5a7c947568c1b1328ccc5230172e1e7c-Abstract- Conference.html

  39. [39]

    Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, and Abhik Roychoudhury. 2024. AutoCodeRover: Autonomous Program Improvement. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, Vienna, Austria, September 16-20, 2024, Maria Christakis and Michael Pradel Accountable Agents in Software Engineering: An Analysis ...

  40. [40]

    Xiyu Zhou, Peng Liang, Beiqi Zhang, Zengyang Li, Aakash Ahmad, Mojtaba Shahin, and Muhammad Waseem. 2025. Exploring the problems, their causes and solutions of AI pair programming: A study on GitHub and Stack Overflow. J. Syst. Softw.219 (2025), 112204. doi:10.1016/J.JSS.2024.112204