The Spec Growth Engine: Spec-Anchored, Code-Coupled, Drift-Enforced Architecture for AI-Assisted Software Development

Hartwig Grabowski

arxiv: 2606.27045 · v1 · pith:UZLIEIUBnew · submitted 2026-06-25 · 💻 cs.SE · cs.AI

The Spec Growth Engine: Spec-Anchored, Code-Coupled, Drift-Enforced Architecture for AI-Assisted Software Development

Hartwig Grabowski This is my paper

Pith reviewed 2026-06-26 03:37 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords spec graphcontext explosionspec-code driftAI coding agentsdrift gatevertical slicesoftware architecturecontext scoping

0 comments

The pith

A spec graph with contract-design separation, ownership-path context scoping, hardest-first vertical slices, and a blocking drift gate prevents context explosion and silent spec-code drift for AI coding agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that AI coding agents suffer from context explosion when forced to reason over whole repositories and from silent spec-code drift when implementations diverge from specifications without detection. It proposes the Spec Growth Engine as a lightweight countermeasure built from a machine-readable spec graph whose nodes separate contracts from designs, a Spine context assembler that limits agent reasoning to an ownership path, a vertical-slice growth protocol that forces hardest-first ordering, and a drift gate that treats any spec-code mismatch as a merge blocker. The approach deliberately recombines familiar practices such as information hiding and fitness functions into a single code-coupled, machine-enforced system rather than layering on heavy process frameworks. If the claim holds, AI-assisted development could scale to larger codebases while keeping specifications and implementations visibly aligned throughout the process.

Core claim

The Spec Growth Engine is a lightweight framework that addresses context explosion and silent spec-code drift through a machine-readable spec graph whose nodes carry explicit contract/design separation, a Spine context assembler that scopes agent context to an ownership path, a vertical-slice growth protocol that enforces hardest-first ordering, and a drift gate that makes spec-code divergence a blocking merge condition; the design synthesises established principles such as Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, and Fitness Functions into a lean, code-coupled, machine-enforced whole without the overhead of heavyweight frameworks.

What carries the argument

The spec graph whose nodes separate contracts from designs, together with the Spine context assembler, the vertical-slice growth protocol, and the drift gate that blocks merges on divergence.

If this is right

AI agents can work across growing repositories without output quality dropping from full-context overload.
Specifications remain visibly coupled to code because divergence is treated as a blocking condition at merge time.
Development follows a hardest-first vertical-slice order that keeps structural decisions visible early.
The same machinery can be applied without adopting heavy process frameworks such as RUP or MDA.
Context supplied to agents is restricted to an ownership path, limiting the information each agent must process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The drift-gate idea could be ported to conventional non-AI workflows to catch divergence earlier in review cycles.
The vertical-slice ordering rule might improve project predictability even when no AI agent is present.
Automated extraction of the initial spec graph from an existing codebase would be a natural next implementation step.
Regulated domains that require traceable alignment between requirements and code could adopt the drift gate as an audit point.

Load-bearing premise

Combining the spec graph, Spine assembler, vertical-slice protocol, and drift gate will remove context explosion and silent drift in real projects without creating new failure modes or unacceptable overhead.

What would settle it

Run the framework on a multi-module application with an AI agent, then check whether context-window usage stays low as the repository grows and whether any spec-code divergence reaches a merge without being rejected by the drift gate.

Figures

Figures reproduced from arXiv: 2606.27045 by Hartwig Grabowski.

**Figure 2.** Figure 2: The silent drift cycle. A passing test suite does not guarantee that the specification and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The two-layer growth rule. Layer 1 invariants are specified up front. Layer 2 features grow [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: High-level architecture of the Spec Growth Engine. The Spec Graph feeds the Context [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: The drift validation gate. The engine derives the Intent Graph from SPEC.md files and [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The spec graph growing with working software. Layer-1 invariants (coral) are seeded [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

AI coding agents dramatically accelerate implementation speed but introduce two structural failure modes that existing spec-driven approaches do not fully solve: (1) context explosion -- the agent must reason over an entire repository at once, degrading output quality as the context window fills; and (2) silent spec-code drift -- code evolves, the specification does not, and the divergence becomes invisible until it is costly to repair. We present the Spec Growth Engine, a lightweight framework that addresses both failure modes through a machine-readable spec graph whose nodes carry explicit contract/design separation, a Spine context assembler that scopes agent context to an ownership path, a vertical-slice growth protocol that enforces hardest-first ordering, and a drift gate that makes spec-code divergence a blocking merge condition. The design synthesises well-established software engineering principles (Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, Fitness Functions) into a lean, code-coupled, machine-enforced whole -- without the overhead of heavy-weight frameworks such as RUP or MDA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Conceptual proposal for an AI coding framework that names real problems but supplies no evidence the described components solve them.

read the letter

This paper proposes the Spec Growth Engine as a way to keep AI coding agents from losing the plot on large repos. It identifies context explosion and silent spec-code drift as the two main issues, then describes four pieces meant to fix them: a machine-readable spec graph that splits contracts from design, a Spine assembler that limits context to ownership paths, a vertical-slice growth protocol that forces hardest-first ordering, and a drift gate that blocks merges on divergence.

The contribution is the specific combination of these elements, built on top of established ideas such as Parnas information hiding, C4 models, ADRs, walking skeletons, and fitness functions. The paper does a straightforward job naming why existing spec-driven methods fall short for AI agents and sketching how each new component targets one failure mode.

The weakness is that none of the claims are supported. There is no worked example, no interaction diagram, no argument showing why this exact assembly produces the desired scoping and enforcement without adding overhead or new failure modes, and no implementation or measurements. The central assumption—that the synthesis will simply work—is left unexamined.

The paper is aimed at software engineering readers who are already thinking about processes for AI-assisted development. Someone in that group could extract a few process ideas from the framing, but there is nothing here that can be tested, cited as a result, or built upon.

I would not send it to peer review. It needs at least one concrete illustration or small pilot before it is worth referee time.

Referee Report

1 major / 0 minor

Summary. The paper claims that the Spec Growth Engine framework—consisting of a machine-readable spec graph with explicit contract/design separation, a Spine context assembler that scopes agent context to an ownership path, a vertical-slice growth protocol enforcing hardest-first ordering, and a drift gate that blocks merges on spec-code divergence—addresses context explosion and silent spec-code drift in AI-assisted development. It does so by synthesizing established principles (Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, Fitness Functions) into a lightweight, code-coupled, machine-enforced system without the overhead of heavy frameworks like RUP or MDA.

Significance. If the proposed synthesis of these components can be shown to deliver the claimed scoping and enforcement effects, the work could provide a practical, enforceable architecture for AI-assisted software development that maintains spec-code alignment and manages context without introducing unacceptable overhead or new failure modes.

major comments (1)

[Abstract] Abstract (second paragraph): the central claim that the spec graph, Spine assembler, vertical-slice protocol, and drift gate 'address both failure modes' rests entirely on an untested synthesis assumption; the manuscript supplies no reasoning, interaction analysis, worked example, or validation demonstrating why this specific combination produces the claimed reductions in context explosion and silent drift without new overhead or failure modes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the need for stronger justification of the framework's claims. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract (second paragraph): the central claim that the spec graph, Spine assembler, vertical-slice protocol, and drift gate 'address both failure modes' rests entirely on an untested synthesis assumption; the manuscript supplies no reasoning, interaction analysis, worked example, or validation demonstrating why this specific combination produces the claimed reductions in context explosion and silent drift without new overhead or failure modes.

Authors: We agree that the current manuscript, being a conceptual design paper, does not include empirical validation or a detailed worked example demonstrating the interactions. The claims are grounded in the synthesis of established principles (Parnas information hiding for scoping, C4 and ADRs for structure, Walking Skeleton and vertical slices for growth, Reflexion Models and Fitness Functions for drift enforcement). However, we acknowledge the absence of explicit reasoning on their combined effects. In the revised manuscript, we will add a new section providing a step-by-step worked example of applying the framework to a sample project, including analysis of how the components interact to mitigate context explosion (via Spine scoping) and silent drift (via drift gate), and discuss why this synthesis does not introduce unacceptable overhead based on the lightweight nature of the components. This will strengthen the justification for the central claim. revision: yes

Circularity Check

0 steps flagged

No circularity; purely conceptual synthesis of external principles

full rationale

The manuscript proposes a framework by combining named external principles (Parnas information hiding, C4, ADRs, Walking Skeleton, Reflexion Models, Fitness Functions) into a spec graph, Spine assembler, vertical-slice protocol, and drift gate. No equations, fitted parameters, derivations, or predictions exist. No self-citations appear as load-bearing justification for the central claim; the synthesis is presented as an untested design assumption rather than a result forced by prior author work or definitional loops. The paper is self-contained as a proposal and does not reduce any claimed outcome to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 4 invented entities

The central claim rests on the premise that the four named components will solve the two failure modes. No free parameters are introduced. Four new components are postulated without independent evidence. One domain assumption is stated explicitly.

axioms (1)

domain assumption AI coding agents suffer from context explosion and silent spec-code drift as primary structural failure modes that existing spec-driven approaches do not fully solve.
This premise is stated in the first sentence of the abstract and motivates the entire framework.

invented entities (4)

Spec graph with explicit contract/design separation no independent evidence
purpose: To structure specifications in a machine-readable form that supports drift detection
Introduced as a core node type in the framework; no independent evidence supplied.
Spine context assembler no independent evidence
purpose: To scope agent context to an ownership path and avoid context explosion
New assembler component proposed; no independent evidence supplied.
Vertical-slice growth protocol no independent evidence
purpose: To enforce hardest-first ordering during development
New protocol proposed; no independent evidence supplied.
Drift gate no independent evidence
purpose: To make spec-code divergence a blocking merge condition
New enforcement mechanism proposed; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5711 in / 1598 out tokens · 66346 ms · 2026-06-26T03:37:29.193059+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 7 canonical work pages

[1]

Aws kiro: Spec-driven ai ide, 2025.https://kiro.dev

Amazon Web Services. Aws kiro: Spec-driven ai ide, 2025.https://kiro.dev

2025
[2]

Exploring gen ai: The tools of spec-driven development

Birgitta Böckeler. Exploring gen ai: The tools of spec-driven development. martinfowler.com, 2025.https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html

2025
[3]

Barry W. Boehm. A spiral model of software development and enhancement.ACM SIGSOFT Software Engineering Notes, 11(4):14–24, 1986. doi: 10.1145/12944.12948. 13

work page doi:10.1145/12944.12948 1986
[4]

2: Visualise, Document and Explore Your Software Architecture

Simon Brown.Software Architecture for Developers, Vol. 2: Visualise, Document and Explore Your Software Architecture. Leanpub, 2018.https://c4model.com

2018
[5]

Context rot: How increasing input tokens impacts llm performance, 2025

Chroma Research. Context rot: How increasing input tokens impacts llm performance, 2025. https://www.trychroma.com/research/context-rot

2025
[6]

Crystal clear: A human-powered methodology for small teams, 2001

Alistair Cockburn. Crystal clear: A human-powered methodology for small teams, 2001. Walking Skeleton pattern

2001
[7]

Robert G. Cooper. Stage-gate systems: A new tool for managing new products.Business Horizons, 33(3):44–54, 1990. doi: 10.1016/0007-6813(90)90040-I

work page doi:10.1016/0007-6813(90)90040-i 1990
[8]

Addison- Wesley, 2003

Eric Evans.Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison- Wesley, 2003. ISBN 978-0321125217

2003
[9]

O’Reilly Media, 2022

Neal Ford, Rebecca Parsons, Patrick Kua, and Pramod Sadalage.Building Evolutionary Architectures, 2nd Edition. O’Reilly Media, 2022. ISBN 978-1492097532

2022
[10]

IT Revolution Press, 2018

Nicole Forsgren, Jez Humble, and Gene Kim.Accelerate: The Science of Lean Software and DevOps. IT Revolution Press, 2018. ISBN 978-1942788331

2018
[11]

Addison-Wesley, 1999

Martin Fowler.Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999. ISBN 978-0201485677

1999
[12]

Addison- Wesley, 2009

Steve Freeman and Nat Pryce.Growing Object-Oriented Software, Guided by Tests. Addison- Wesley, 2009. ISBN 978-0321503626

2009
[13]

Spec kit: Toolkit for spec-driven development, 2025.https://github.com/github/ spec-kit

GitHub. Spec kit: Toolkit for spec-driven development, 2025.https://github.com/github/ spec-kit

2025
[14]

Dumb Zone

Dexter Horthy. No vibes allowed: Engineering with coding agents. Talk, AI Engineer, 2025. Popularises the “Dumb Zone” heuristic for coding agents.https://www.youtube.com/watch? v=rmvDxxNubIg

2025
[15]

Addison-Wesley, 2010

Jez Humble and David Farley.Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, 2010. ISBN 978-0321601919

2010
[16]

Addison-Wesley, 1999

Andrew Hunt and David Thomas.The Pragmatic Programmer. Addison-Wesley, 1999. ISBN 978-0201616224

1999
[17]

Guide to the software engineering body of knowledge (swe- bok), version 4.0, 2024

IEEE Computer Society. Guide to the software engineering body of knowledge (swe- bok), version 4.0, 2024. https://www.computer.org/education/bodies-of-knowledge/ software-engineering

2024
[18]

O’Reilly Media, 2021

Vlad Khononov.Learning Domain-Driven Design. O’Reilly Media, 2021. ISBN 978-1098100131

2021
[19]

The 4+1 view model of architecture.IEEE Software, 12(6):42–50, 1995

Philippe Kruchten. The 4+1 view model of architecture.IEEE Software, 12(6):42–50, 1995. doi: 10.1109/52.469759

work page doi:10.1109/52.469759 1995
[20]

and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , year =

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. doi: 10.1162/tacl_a_00638. 14

work page doi:10.1162/tacl_a_00638 2024
[21]

Martin.Agile Software Development: Principles, Patterns, and Practices

Robert C. Martin.Agile Software Development: Principles, Patterns, and Practices. Prentice Hall, 2002. ISBN 978-0135974445

2002
[22]

In: Proceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering, pp

Gail C. Murphy, David Notkin, and Kevin Sullivan. Software reflexion models: Bridging the gap between design and implementation. InProceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering, pages 18–28, 1995. doi: 10.1145/222124.222136

work page doi:10.1145/222124.222136 1995
[23]

Michael T. Nygard. Documenting architecture decisions, 2011.https://cognitect.com/blog/ 2011/11/15/documenting-architecture-decisions

2011
[24]

David L. Parnas. On the criteria to be used in decomposing systems into modules.Communi- cations of the ACM, 15(12):1053–1058, 1972. doi: 10.1145/361598.361623

work page doi:10.1145/361598.361623 1972
[25]

Parnas et al

David L. Parnas et al. The modular structure of complex systems. Technical report, Naval Research Laboratory, 1979. A-7E project module guide

1979
[26]

Perry and Alexander L

Dewayne E. Perry and Alexander L. Wolf. Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes, 17(4):40–52, 1992. doi: 10.1145/141874.141884

work page doi:10.1145/141874.141884 1992
[27]

LongCodeBench: Evaluating coding LLMs at 1m context windows.arXiv preprint arXiv:2505.07897, 2025.https://arxiv.org/abs/2505.07897

Stefano Rando et al. LongCodeBench: Evaluating coding LLMs at 1m context windows.arXiv preprint arXiv:2505.07897, 2025.https://arxiv.org/abs/2505.07897

arXiv 2025
[28]

The scrum guide, 2020.https://scrumguides.org

Ken Schwaber and Jeff Sutherland. The scrum guide, 2020.https://scrumguides.org

2020
[29]

Tessl framework, 2025

Tessl. Tessl framework, 2025. Private beta; see [2]. 15

2025

[1] [1]

Aws kiro: Spec-driven ai ide, 2025.https://kiro.dev

Amazon Web Services. Aws kiro: Spec-driven ai ide, 2025.https://kiro.dev

2025

[2] [2]

Exploring gen ai: The tools of spec-driven development

Birgitta Böckeler. Exploring gen ai: The tools of spec-driven development. martinfowler.com, 2025.https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html

2025

[3] [3]

Barry W. Boehm. A spiral model of software development and enhancement.ACM SIGSOFT Software Engineering Notes, 11(4):14–24, 1986. doi: 10.1145/12944.12948. 13

work page doi:10.1145/12944.12948 1986

[4] [4]

2: Visualise, Document and Explore Your Software Architecture

Simon Brown.Software Architecture for Developers, Vol. 2: Visualise, Document and Explore Your Software Architecture. Leanpub, 2018.https://c4model.com

2018

[5] [5]

Context rot: How increasing input tokens impacts llm performance, 2025

Chroma Research. Context rot: How increasing input tokens impacts llm performance, 2025. https://www.trychroma.com/research/context-rot

2025

[6] [6]

Crystal clear: A human-powered methodology for small teams, 2001

Alistair Cockburn. Crystal clear: A human-powered methodology for small teams, 2001. Walking Skeleton pattern

2001

[7] [7]

Robert G. Cooper. Stage-gate systems: A new tool for managing new products.Business Horizons, 33(3):44–54, 1990. doi: 10.1016/0007-6813(90)90040-I

work page doi:10.1016/0007-6813(90)90040-i 1990

[8] [8]

Addison- Wesley, 2003

Eric Evans.Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison- Wesley, 2003. ISBN 978-0321125217

2003

[9] [9]

O’Reilly Media, 2022

Neal Ford, Rebecca Parsons, Patrick Kua, and Pramod Sadalage.Building Evolutionary Architectures, 2nd Edition. O’Reilly Media, 2022. ISBN 978-1492097532

2022

[10] [10]

IT Revolution Press, 2018

Nicole Forsgren, Jez Humble, and Gene Kim.Accelerate: The Science of Lean Software and DevOps. IT Revolution Press, 2018. ISBN 978-1942788331

2018

[11] [11]

Addison-Wesley, 1999

Martin Fowler.Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999. ISBN 978-0201485677

1999

[12] [12]

Addison- Wesley, 2009

Steve Freeman and Nat Pryce.Growing Object-Oriented Software, Guided by Tests. Addison- Wesley, 2009. ISBN 978-0321503626

2009

[13] [13]

Spec kit: Toolkit for spec-driven development, 2025.https://github.com/github/ spec-kit

GitHub. Spec kit: Toolkit for spec-driven development, 2025.https://github.com/github/ spec-kit

2025

[14] [14]

Dumb Zone

Dexter Horthy. No vibes allowed: Engineering with coding agents. Talk, AI Engineer, 2025. Popularises the “Dumb Zone” heuristic for coding agents.https://www.youtube.com/watch? v=rmvDxxNubIg

2025

[15] [15]

Addison-Wesley, 2010

Jez Humble and David Farley.Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, 2010. ISBN 978-0321601919

2010

[16] [16]

Addison-Wesley, 1999

Andrew Hunt and David Thomas.The Pragmatic Programmer. Addison-Wesley, 1999. ISBN 978-0201616224

1999

[17] [17]

Guide to the software engineering body of knowledge (swe- bok), version 4.0, 2024

IEEE Computer Society. Guide to the software engineering body of knowledge (swe- bok), version 4.0, 2024. https://www.computer.org/education/bodies-of-knowledge/ software-engineering

2024

[18] [18]

O’Reilly Media, 2021

Vlad Khononov.Learning Domain-Driven Design. O’Reilly Media, 2021. ISBN 978-1098100131

2021

[19] [19]

The 4+1 view model of architecture.IEEE Software, 12(6):42–50, 1995

Philippe Kruchten. The 4+1 view model of architecture.IEEE Software, 12(6):42–50, 1995. doi: 10.1109/52.469759

work page doi:10.1109/52.469759 1995

[20] [20]

and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy , year =

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024. doi: 10.1162/tacl_a_00638. 14

work page doi:10.1162/tacl_a_00638 2024

[21] [21]

Martin.Agile Software Development: Principles, Patterns, and Practices

Robert C. Martin.Agile Software Development: Principles, Patterns, and Practices. Prentice Hall, 2002. ISBN 978-0135974445

2002

[22] [22]

In: Proceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering, pp

Gail C. Murphy, David Notkin, and Kevin Sullivan. Software reflexion models: Bridging the gap between design and implementation. InProceedings of the 3rd ACM SIGSOFT Symposium on Foundations of Software Engineering, pages 18–28, 1995. doi: 10.1145/222124.222136

work page doi:10.1145/222124.222136 1995

[23] [23]

Michael T. Nygard. Documenting architecture decisions, 2011.https://cognitect.com/blog/ 2011/11/15/documenting-architecture-decisions

2011

[24] [24]

David L. Parnas. On the criteria to be used in decomposing systems into modules.Communi- cations of the ACM, 15(12):1053–1058, 1972. doi: 10.1145/361598.361623

work page doi:10.1145/361598.361623 1972

[25] [25]

Parnas et al

David L. Parnas et al. The modular structure of complex systems. Technical report, Naval Research Laboratory, 1979. A-7E project module guide

1979

[26] [26]

Perry and Alexander L

Dewayne E. Perry and Alexander L. Wolf. Foundations for the study of software architecture. ACM SIGSOFT Software Engineering Notes, 17(4):40–52, 1992. doi: 10.1145/141874.141884

work page doi:10.1145/141874.141884 1992

[27] [27]

LongCodeBench: Evaluating coding LLMs at 1m context windows.arXiv preprint arXiv:2505.07897, 2025.https://arxiv.org/abs/2505.07897

Stefano Rando et al. LongCodeBench: Evaluating coding LLMs at 1m context windows.arXiv preprint arXiv:2505.07897, 2025.https://arxiv.org/abs/2505.07897

arXiv 2025

[28] [28]

The scrum guide, 2020.https://scrumguides.org

Ken Schwaber and Jeff Sutherland. The scrum guide, 2020.https://scrumguides.org

2020

[29] [29]

Tessl framework, 2025

Tessl. Tessl framework, 2025. Private beta; see [2]. 15

2025