pith. sign in

arxiv: 2604.16208 · v1 · submitted 2026-04-17 · 💻 cs.SE

From Papers to Progress: Rethinking Knowledge Accumulation in Software Engineering

Pith reviewed 2026-05-10 08:02 UTC · model grok-4.3

classification 💻 cs.SE
keywords software engineeringknowledge accumulationresearch artifactscumulative sciencepublication practicessurvey analysisincentive structuresprovenance tracking
0
0 comments X

The pith

Software engineering research isolates claims in papers, loses context in the pipeline, and rewards novelty over synthesis, preventing cumulative progress.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes survey responses from 280 experienced researchers to diagnose why software engineering struggles to build lasting knowledge despite high output. It identifies four breakdowns where papers embed claims in prose, provenance disappears during publication, claims change without tracking, and incentives push new work instead of consolidation. The authors argue that future research artifacts must follow four technology-agnostic principles: structured claim representations, inspectable methodological provenance, long-lived evolving substrates, and governance that aligns personal rewards with collective knowledge goals. If these hold, the field could move from fragmented papers to reusable, trackable knowledge bases that support steady advancement.

Core claim

The central claim is that four interrelated structural breakdowns block cumulative knowledge in software engineering: papers serve as isolated units with claims buried in prose; context and provenance vanish as results move through the publication process; claims evolve without systematic tracking; and incentive structures favor novelty over consolidation. Addressing them requires reimagining research artifacts according to four principles of structured and interpretable claims with evidence, inspectable and provenance-aware methodological documentation, long-lived and reusable substrates that continue evolving after publication, and governance mechanisms that align individual incentives to

What carries the argument

Four technology-agnostic principles for designing research artifacts that enforce structured claims and evidence, provenance-aware method documentation, long-lived evolving substrates, and incentive governance for collective knowledge building.

If this is right

  • Publication practices would shift from static papers to structured, inspectable artifacts that preserve claims and provenance.
  • Community infrastructure would need to host long-lived substrates capable of tracking claim evolution over time.
  • Incentive systems could be redesigned to reward consolidation and reuse of existing knowledge rather than only novel contributions.
  • Research practice would emphasize creating reusable substrates that continue to develop after initial publication.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption might require new conference tracks or repositories for evolving artifacts, an implementation detail left open by the paper.
  • The approach could connect to similar challenges in other empirical sciences where claims drift across papers without formal linkage.
  • A measurable test would involve tracking whether papers using the new artifact formats receive more integrative citations than traditional ones.

Load-bearing premise

The survey perceptions from 280 researchers represent field-wide structural problems and the proposed principles can be put into practice to support cumulative knowledge building.

What would settle it

A controlled trial in which research groups adopt the four principles for new artifacts and show no measurable increase in claim tracking, context retention, or synthesis of prior work over several years would falsify the central claim.

read the original abstract

Software engineering research has experienced rapid growth in both output and participation over the past decades. Yet concerns persist about the field's ability to accumulate, integrate, and reuse knowledge in ways that support long-term progress. To better understand how the community itself perceives these challenges, we analyze responses from the ICSE 2026 Future of Software Engineering pre-survey, which captures perspectives from 280 globally distributed and highly experienced researchers. Our analysis reveals a tension between increasing research productivity and the limited mechanisms available for synthesizing results, tracking evolving claims, and supporting cumulative understanding over time. Building on these observations, we diagnose four interrelated structural breakdowns: papers function as isolated knowledge units with claims embedded in prose; context and provenance are lost as knowledge moves through the publication pipeline; claims evolve without systematic tracking; and incentive structures favor novelty over consolidation. We argue that addressing these barriers requires rethinking the fundamental properties of research artifacts. We articulate four technology-agnostic principles for future research artifacts: structured and interpretable representations of claims and evidence; inspectable and provenance-aware documentation of methodological decisions; long-lived and reusable substrates that evolve beyond publication; and governance mechanisms that align individual incentives with collective knowledge-building goals. We discuss implications for research practice, publication norms, and community infrastructure, positioning FOSE as a venue for experimenting with alternative artifact designs that support cumulative scientific progress.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This paper analyzes responses from the ICSE 2026 Future of Software Engineering pre-survey completed by 280 experienced researchers. It identifies a tension between increasing research productivity and limited mechanisms for synthesizing, tracking, and reusing knowledge. The authors diagnose four interrelated structural breakdowns: papers as isolated units with claims embedded in prose; loss of context and provenance through the publication pipeline; untracked evolution of claims; and incentive structures that favor novelty over consolidation. They propose four technology-agnostic principles for future research artifacts—structured claim and evidence representations, inspectable provenance-aware documentation, long-lived reusable substrates, and governance mechanisms aligning incentives with collective goals—and discuss implications for practice, norms, and infrastructure.

Significance. If the survey-derived diagnoses hold and the principles can be operationalized, the work could meaningfully influence how software engineering research artifacts are designed and governed, fostering better cumulative knowledge building. Drawing directly on community perspectives from a large, experienced sample provides a grounded basis for these reflections and could encourage experimentation with alternative artifact formats in venues such as FOSE.

major comments (2)
  1. Abstract and Survey Analysis section: The manuscript states that 'our analysis reveals' the four breakdowns but supplies no description of the qualitative methods, coding scheme, thematic derivation process, or inter-coder reliability checks applied to the 280 responses. Because the central diagnoses rest entirely on this interpretation, the absence of these details undermines assessment of the claims' reliability.
  2. Diagnoses section: The four breakdowns are asserted as objective structural issues on the basis of self-reported perceptions alone. No cross-validation against objective indicators (citation networks, claim reuse across follow-up papers, or reuse metrics) is provided, leaving open the possibility that the diagnoses reflect shared respondent narratives rather than field-wide mechanisms.
minor comments (1)
  1. A dedicated methods subsection reporting response rate, demographics, and exact analysis procedures would improve transparency and reproducibility without altering the paper's scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving transparency and rigor, which we address point by point below. We indicate where revisions will be incorporated in the next version of the manuscript.

read point-by-point responses
  1. Referee: Abstract and Survey Analysis section: The manuscript states that 'our analysis reveals' the four breakdowns but supplies no description of the qualitative methods, coding scheme, thematic derivation process, or inter-coder reliability checks applied to the 280 responses. Because the central diagnoses rest entirely on this interpretation, the absence of these details undermines assessment of the claims' reliability.

    Authors: We agree that the current manuscript lacks sufficient detail on the qualitative analysis methods. This is an important omission given that the diagnoses derive from interpretation of the survey responses. In the revised version, we will add a new subsection in the Survey Analysis section that explicitly describes the thematic analysis approach, including how themes corresponding to the four breakdowns were identified and refined from the 280 responses, the process for deriving categories from the data, and any steps taken to support consistency in interpretation. This addition will enable readers to better evaluate the reliability of the findings. revision: yes

  2. Referee: Diagnoses section: The four breakdowns are asserted as objective structural issues on the basis of self-reported perceptions alone. No cross-validation against objective indicators (citation networks, claim reuse across follow-up papers, or reuse metrics) is provided, leaving open the possibility that the diagnoses reflect shared respondent narratives rather than field-wide mechanisms.

    Authors: The diagnoses are presented as insights emerging directly from the self-reported perceptions of 280 experienced researchers in the ICSE 2026 FOSE pre-survey, rather than as independently verified objective mechanisms. We will revise the Diagnoses section to more explicitly frame the findings in this way and to acknowledge the lack of cross-validation with bibliometric or reuse metrics as a limitation of the current work. We will also note potential directions for future studies that could incorporate such objective indicators. This maintains the paper's focus on community perspectives while addressing the concern about potential narrative bias. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on external survey data and logical analysis

full rationale

The paper's chain proceeds from analysis of 280 ICSE 2026 pre-survey responses (external input data) to diagnosis of four structural breakdowns in publication practices, followed by articulation of four technology-agnostic principles. No equations, fitted parameters, self-definitional constructs, or load-bearing self-citations reduce any step to its own inputs by construction. The survey responses serve as independent empirical grounding rather than a tautology, and the principles are presented as argued implications rather than forced renamings or uniqueness theorems imported from prior author work. This is a standard non-circular position paper grounded in collected perceptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper's diagnoses and principles rest on the assumption that survey perceptions reflect systemic field issues and that redesigned artifacts can address them; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The ICSE 2026 Future of Software Engineering pre-survey responses from 280 researchers accurately reflect broader community perceptions of knowledge accumulation challenges.
    The paper uses these responses as primary evidence for the four structural breakdowns.

pith-pipeline@v0.9.0 · 5532 in / 1285 out tokens · 48529 ms · 2026-05-10T08:02:28.757045+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Liz Allen, Jo Scott, Amy Brand, Marjorie Hlava, and Micah Altman. 2014. Publishing: Credit where credit is due.Nature508, 7496 (2014), 312–313. doi:10.1038/508312a

  2. [2]

    Atkins, Kelvin K

    Daniel E. Atkins, Kelvin K. Droegemeier, Stuart I. Feldman, Hector Garcia-Molina, Michael L. Klein, David G. Messerschmitt, Paul Messina, Jeremiah P. Ostriker, and Margaret H. Wright. 2003. Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue Ribbon Advisory Panel on Cyberinfrastructure.Nation...

  3. [3]

    IEEE Transactions on Software Engineering 25, 557–572

    Victor R. Basili, Forrest Shull, and Filippo Lanubile. 1999. Building knowledge through families of experiments.IEEE Transactions on Software Engineering25, 4 (1999), 456–473. doi:10.1109/32.799955

  4. [4]

    Susanne Bødker and Clemens Nylandsted Klokmose. 2011. The human–artifact model: An activity theoretical approach to artifact ecologies.Human–Computer Interaction26, 4 (2011), 315–371

  5. [5]

    Andrew C Chang and Phillip Li. 2015. Is economics research replicable? Sixty published papers from thirteen journals say’usually not’. (2015)

  6. [6]

    Open Science Collaboration. 2015. Estimating the reproducibility of psychological science.Science349, 6251 (2015), aac4716

  7. [7]

    Proebsting

    Christian Collberg and Todd A. Proebsting. 2016. Repeatability in Computer Systems Research.Commun. ACM59, 3 (2016), 62–69. doi:10.1145/2812803

  8. [8]

    Daniela S Cruzes and Tore Dybå. 2010. Synthesizing evidence in software engi- neering research. InProceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 1–10. From Papers to Progress: Rethinking Knowledge Accumulation in Software Engineering ICSE-FoSE, April 12–18, 2026, Rio de Janeiro, Brazil

  9. [9]

    Christopher J Ferguson and Moritz Heene. 2012. A vast graveyard of undead theo- ries: Publication bias and psychological science’s aversion to the null.Perspectives on Psychological Science7, 6 (2012), 555–561

  10. [10]

    Bergstrom, Katy Börner, James A

    Santo Fortunato, Carl T. Bergstrom, Katy Börner, James A. Evans, Dirk Helbing, Staša Milojević, Alexander M. Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, Alessandro Vespignani, Ludo Waltman, Dashun Wang, and Albert-László Barabási. 2018. Science of science.Science359, 6379 (2018), eaao0185. doi:10. 1126/science.aao0185

  11. [11]

    González-Barahona and Gregorio Robles

    Jesús M. González-Barahona and Gregorio Robles. 2012. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories.Empirical Software Engineering17, 1-2 (2012), 75–89. doi:10.1007/ s10664-011-9181-9

  12. [12]

    Ben Hermann, Stefan Winter, and Janet Siegmund. 2020. Community Ex- pectations for Research Artifacts and Evaluation Processes. InProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 469–480. doi:10.1145/3368089.3409767

  13. [13]

    John P. A. Ioannidis. 2005. Why Most Published Research Findings Are False. PLOS Medicine2, 8 (2005), e124. doi:10.1371/journal.pmed.0020124

  14. [14]

    Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jen- nifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. 2019. Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Schol- arly Knowledge. InProceedings of the 10th International Conference on Knowledge Capture. 243–246. doi:10.1145/3360901.3364435

  15. [15]

    Oliver Karras, Laura Budde, Paulina Merkel, Jörg Hermsdorf, Malte Stonis, Ludger Overmeyer, Bernd-Arno Behrens, and Sören Auer. 2024. Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case.Data Science Journal23, 1 (2024), 52. doi:10.5334/dsj-2024-052

  16. [16]

    Zhang, Xiang Jing, Zhenpeng Chen, and Yun Ma

    Mugeng Liu, Xiaolong Huang, Wei He, Yibing Xie, Jie M. Zhang, Xiang Jing, Zhenpeng Chen, and Yun Ma. 2024. Research Artifacts in Software Engineering Publications: Status and Trends.Journal of Systems and Software213 (2024), 112032. doi:10.1016/j.jss.2024.112032

  17. [17]

    2016.Reviving 76 of 131 of Tools from ICSE and FSE

    Emerson Murphy-Hill et al . 2016.Reviving 76 of 131 of Tools from ICSE and FSE. Technical Report. GitHub (unpublished). https://github.com/ SoftwareEngineeringToolDemos/Paper

  18. [18]

    Nosek, Jeffrey R

    Brian A. Nosek, Jeffrey R. Spies, and Matt Motyl. 2012. Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishabil- ity.Perspectives on Psychological Science7, 6 (2012), 615–631. doi:10.1177/ 1745691612459058

  19. [19]

    Carver, Sira Vegas, and Natalia Juristo

    Forrest Shull, Jeffrey C. Carver, Sira Vegas, and Natalia Juristo. 2008. The role of replications in Empirical Software Engineering. InEmpirical Software Engineering and Verification. Springer, 211–244. doi:10.1007/978-3-540-71301-2_8

  20. [20]

    Dag I. K. Sjøberg, Tore Dybå, Bente C. D. Anda, and Jo E. Hannay. 2008. Build- ing Theories in Software Engineering.Guide to Advanced Empirical Software Engineering(2008), 312–336. doi:10.1007/978-1-84800-044-5_12

  21. [21]

    The natural selection of bad science

    Paul E. Smaldino and Richard McElreath. 2016. The natural selection of bad science.Royal Society Open Science3, 9 (2016), 160384. doi:10.1098/rsos.160384

  22. [22]

    2025.Community Survey for ICSE 2026 Future of Software Engineering: Toward a Healthy Software Engineering Community

    Margaret Storey and Andre van der Hoek. 2025.Community Survey for ICSE 2026 Future of Software Engineering: Toward a Healthy Software Engineering Community. doi:10.5281/zenodo.18217799

  23. [23]

    Gyt˙e Tamašauskait˙e and Paul Groth. 2023. Defining a Knowledge Graph Devel- opment Process Through a Systematic Review.ACM Transactions on Software Engineering and Methodology32, 1, Article 27 (2023). doi:10.1145/3522586

  24. [24]

    Chuanyi Wang, Hao Hu, Fangcheng Yang, Yuwei Zeng, and Qingshan Zheng

  25. [25]

    Application of knowledge graph in software engineering field: A systematic literature review.Information and Software Technology163 (2023), 107304. doi:10. 1016/j.infsof.2023.107304

  26. [26]

    Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft Academic Graph: When experts are not enough.Quantitative Science Studies1, 1 (2020), 396–413. doi:10.1162/qss_a_00021

  27. [27]

    Springer, 2 edn

    Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. (2012). doi:10.1007/978-3-642-29044-2