pith. sign in

arxiv: 2604.08207 · v1 · submitted 2026-04-09 · 💻 cs.SE

Empirical Evaluation of Taxonomic Trace Links: A Case Study

Pith reviewed 2026-05-10 17:42 UTC · model grok-4.3

classification 💻 cs.SE
keywords traceabilitytaxonomic trace linksTTLempirical studyindustrial case studyrequirements engineeringsoftware engineering
0
0 comments X

The pith

Taxonomic Trace Links complement traditional methods to enable more traceability use cases in industrial software projects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper conducts an empirical mixed-methods evaluation of Taxonomic Trace Links at Ericsson, establishing links among hundreds of business use cases, test cases, and requirements while gathering practitioner feedback through focus groups. TTL connects artifacts via a domain-specific taxonomy and classifier to tackle neglected traceability issues such as granularity, structure, and responsibility. The study finds the approach useful for one identified scenario but less so for another, with taxonomy development and classifier precision emerging as major hurdles. If these can be resolved, TTL would support earlier and broader trace creation as a complement rather than a substitute for conventional links.

Core claim

In the Ericsson case study, TTL was established between 463 business use cases, 64 test cases, and 277 ISO-standard requirements across two traceability scenarios; practitioners judged it useful for one but highlighted the effort of building and maintaining the domain taxonomy plus the need for higher classifier precision before daily adoption, leading to the conclusion that TTL complements rather than replaces traditional trace links to expand use cases and promote early linking.

What carries the argument

Taxonomic Trace Links (TTL), which routes connections between source and target artifacts through a shared domain-specific taxonomy combined with automated classification.

If this is right

  • TTL enables traceability scenarios that traditional direct links do not readily support.
  • The method encourages creation of trace links at earlier development stages.
  • TTL works best when used together with existing trace link practices rather than in isolation.
  • Heterogeneous artifact structures remain a barrier that taxonomy design must address.
  • Classifier accuracy is the immediate technical limit on scaling TTL to daily use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If classifier precision improves through domain-specific training data, TTL could reduce manual tracing effort in large organizations.
  • Standardized taxonomy templates for common software domains might lower the entry cost for new adopters.
  • The two-scenario distinction observed at Ericsson suggests TTL value depends on the specific traceability goal rather than applying uniformly.

Load-bearing premise

A domain-specific taxonomy can be developed and maintained with reasonable effort and the classifier precision can reach a level high enough for routine practical work.

What would settle it

A follow-up measurement showing that total effort for taxonomy creation, updates, and correction of classification errors exceeds traceability time savings across a full product release cycle.

Figures

Figures reproduced from arXiv: 2604.08207 by Krzysztof Wnuk, Michael Unterkalmsteiner, Parisa Yousefi, Peter L\"owenadler, Waleed Abdeen.

Figure 1
Figure 1. Figure 1: Traditional vs. taxonomic trace links A1 - voice call and B1 - subscriber, we can say that R1, TC1, and TC2 can be classified using the classes A1 and B1. Consequently, the following trace links pairs [R1 ¡-¿ TC1], R1 ¡-¿ TC2] exist between the artifacts, as they are both classified using the same classes. We argue that TTL addresses three key challenges of traditional traceability: 1. Difficulty identifyi… view at source ↗
Figure 2
Figure 2. Figure 2: Zero-Shot Classifier (Abdeen et al., 2025) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The process began with defining the study’s goal, objectives, and primary [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 3
Figure 3. Figure 3: Case Study Design 4.2 Case Study Design [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: TTL creation and usage resources); however, they are presented on a high level. Consequently, we developed a taxonomy for the purpose of this study. We evaluated two automated taxonomy generation approaches to build a tax￾onomy for the telecommunication domain: TaxoGen (Zhang et al., 2018), an unsu￾pervised method that constructs topic taxonomies through embedding and cluster￾ing of domain corpora, and Tax… view at source ↗
Figure 5
Figure 5. Figure 5: Instructions provided to ChatGPT: All at once strategy [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Instructions provided to ChatGPT: Bottom-up strategy [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Instructions provided to ChatGPT: Level-Branch strategy [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance of the Zero-Shot classifier to generate TTL [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Questionnaire Quantitative Results RQ2 What is the performance of the ZSL classifier in recovering trace links between use cases and test cases, measured in terms of precision, recall, and F1-score? The performance results of trace links recovery using the ZSL classifier are pre￾sented in [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
read the original abstract

Context: Traceability is a key quality attribute of artifacts that are used in knowledge-intensive tasks and supports software engineers in producing higher-quality software. Despite its clear benefits, traceability is often neglected in practice due to challenges such as granularity of traces, lack of a common artifact structure, and unclear responsibility. The Taxonomic Trace Links (TTL) approach connects source and target artifacts through a domain-specific taxonomy, aiming to address these common traceability challenges. Objective: In this study, we empirically evaluate TTL in an industrial setting to identify its strengths and weaknesses for real-world adoption. Method: We conducted a mixed-methods study at Ericsson involving one of its software products. Quantitative and qualitative data were collected across two traceability use cases. We established trace links between 463 business use cases, 64 test cases, and 277 ISO-standard requirements. Additionally, we held three focus group sessions with practitioners. Results: We identified two practically relevant scenarios where traceability is required and evaluated TTL in each. Overall, practitioners found TTL to be a useful solution for one of the scenarios, while less useful for the other. However, developing a domain-specific taxonomy and managing heterogeneous artifact structures were noted as significant challenges. Moreover, the precision of the classifier that is used to create trace links needs to be improved to make the solution practical. Conclusion: TTL is a promising approach that can be adopted in practice and enables traceability use cases. However, TTL is not a replacement for traditional trace links, but rather complements them to enable more traceability use cases and encourage the early creation of trace links.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports a mixed-methods industrial case study evaluating the Taxonomic Trace Links (TTL) approach at Ericsson. It establishes trace links among 463 business use cases, 64 test cases, and 277 ISO-standard requirements across two traceability scenarios, supplemented by three practitioner focus groups. Results indicate TTL is useful in one scenario for enabling additional traceability use cases but faces challenges with domain-specific taxonomy development, heterogeneous artifact structures, and classifier precision; the conclusion positions TTL as a complement to traditional trace links that can encourage earlier link creation.

Significance. If the findings hold, the work supplies concrete empirical data from a real product on TTL's scenario-specific strengths and adoption barriers, including explicit artifact counts and focus-group input. This strengthens the evidence base for traceability research in software engineering by showing how a taxonomy-mediated approach can complement existing methods without replacing them, while surfacing actionable challenges for tool builders and practitioners.

major comments (3)
  1. [Conclusion] Conclusion: the assertion that TTL 'is a promising approach that can be adopted in practice' rests on the unquantified assumption that classifier precision can be improved and taxonomy effort kept reasonable, yet the study provides no measurements of taxonomy creation cost, classifier F1-score, or similar metrics despite establishing links among 463 use cases, 64 test cases, and 277 requirements and noting these as significant challenges in the Results.
  2. [Method] Method: the process for constructing the domain-specific taxonomy (participants, steps, and effort) and for training/evaluating the classifier that generated the reported trace links is not described. This information is load-bearing for interpreting the focus-group finding that taxonomy development is a 'significant challenge' and for assessing whether the 'reasonable effort' premise for practical adoption can be met.
  3. [Results] Results: no baseline performance metrics (precision, recall, or F1) are reported for the classifier on the 463+64+277 artifacts, nor is the qualitative analysis method for the three focus groups (e.g., coding scheme or inter-rater process) specified. These omissions limit the ability to evaluate the strength of the scenario-specific usefulness claims and the call for precision improvements.
minor comments (2)
  1. [Abstract] The two traceability scenarios are referenced but not explicitly labeled or contrasted in the abstract or early sections, making it harder for readers to map the usefulness findings to specific use cases.
  2. A table or figure summarizing the artifact counts, link types, and focus-group participant roles would improve clarity and allow quicker assessment of the scale of the quantitative component.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important areas for clarification in our mixed-methods industrial case study. We address each major comment below and have revised the manuscript to improve transparency and precision in the claims.

read point-by-point responses
  1. Referee: [Conclusion] the assertion that TTL 'is a promising approach that can be adopted in practice' rests on the unquantified assumption that classifier precision can be improved and taxonomy effort kept reasonable, yet the study provides no measurements of taxonomy creation cost, classifier F1-score, or similar metrics

    Authors: We agree the conclusion should more explicitly qualify the scope of our claims. The study evaluates practitioner-perceived usefulness of TTL in two industrial scenarios via focus groups after establishing links with the classifier; it is not a performance benchmark of the classifier itself. In the revised version we will temper the conclusion to state that TTL shows promise as a complement in specific scenarios provided the identified challenges (including precision) are addressed, and we will add a dedicated limitations paragraph noting the absence of cost and F1 measurements. revision: partial

  2. Referee: [Method] the process for constructing the domain-specific taxonomy (participants, steps, and effort) and for training/evaluating the classifier that generated the reported trace links is not described

    Authors: We acknowledge that the Method section omitted granular details on taxonomy construction and classifier training to maintain focus on the evaluation outcomes. In the revision we will expand the Method section with: (1) the roles and number of practitioners who contributed to taxonomy development, (2) the iterative steps followed, and (3) a high-level description of the classifier training procedure and any internal validation performed, even though full performance metrics were not collected. revision: yes

  3. Referee: [Results] no baseline performance metrics (precision, recall, or F1) are reported for the classifier on the 463+64+277 artifacts, nor is the qualitative analysis method for the three focus groups specified

    Authors: The classifier served as a tool to generate candidate links for the focus-group discussions rather than as the primary object of evaluation; therefore precision/recall/F1 were not computed. We will add an explicit statement in the Results section clarifying this scope and noting that quantitative classifier evaluation remains future work. For the focus groups we will describe the analysis procedure, including the coding approach and any steps taken to ensure consistency across the three sessions. revision: yes

standing simulated objections not resolved
  • We do not possess quantitative classifier performance metrics (precision, recall, F1) because they were not collected as part of the original study design.

Circularity Check

0 steps flagged

No circularity: purely empirical case study with independent data

full rationale

The paper conducts a mixed-methods industrial case study collecting quantitative trace link data across 463 use cases, 64 test cases and 277 requirements plus three practitioner focus groups. No equations, fitted parameters, predictions derived from inputs, or self-referential definitions appear. Central claims about TTL usefulness rest on observed results and direct feedback rather than reducing to the study's own inputs by construction. Any self-citations are incidental and non-load-bearing for the evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As an empirical evaluation study, the central claims rest on the representativeness of the selected software product, the validity of focus-group insights, and the assumption that the two chosen traceability use cases are practically relevant. No mathematical models or fitted parameters are involved.

axioms (1)
  • domain assumption Focus group sessions with practitioners provide reliable qualitative insights into the practical usefulness of TTL
    The study draws conclusions about adoption potential directly from three focus group sessions.

pith-pipeline@v0.9.0 · 5597 in / 1396 out tokens · 152440 ms · 2026-05-10T17:42:51.620922+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archive author booktitle chapter doi edition editor eid eprint howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all ...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION add.period duplicate empty 'skip "." * add.blank if FUNCTION if.digit duplicate "0" = swap duplicate "1" = swap duplicate "2" = swap duplicate "3" = swap duplicate "4" = swap duplicate "5" = swap duplicate "6" = swap duplicate "7" = swap duplicate "8" = swap "9" = or or or or or or or or or FUNCTION ...

  3. [3]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'after.sentence := #3 '...

  4. [4]

    write newline

    " write newline "" before.all 'output.state := FUNCTION if.digit duplicate "0" = swap duplicate "1" = swap duplicate "2" = swap duplicate "3" = swap duplicate "4" = swap duplicate "5" = swap duplicate "6" = swap duplicate "7" = swap duplicate "8" = swap "9" = or or or or or or or or or FUNCTION n.separate 't := "" #0 'numnames := t empty not t #-1 #1 subs...

  5. [5]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'after.sentence := #3 '...

  6. [6]

    write newline

    " write newline "" before.all 'output.state := FUNCTION if.digit duplicate "0" = swap duplicate "1" = swap duplicate "2" = swap duplicate "3" = swap duplicate "4" = swap duplicate "5" = swap duplicate "6" = swap duplicate "7" = swap duplicate "8" = swap "9" = or or or or or or or or or FUNCTION n.separate 't := "" #0 'numnames := t empty not t #-1 #1 subs...