pith. sign in

arxiv: 1906.11715 · v1 · pith:ER3HEX2Vnew · submitted 2019-06-27 · 💻 cs.SE

Evaluating data-flow coverage in spectrum-based fault localization

Pith reviewed 2026-05-25 14:30 UTC · model grok-4.3

classification 💻 cs.SE
keywords spectrum-based fault localizationdata-flow spectracontrol-flow spectradefinition-use associationsfault rankingsoftware debuggingSFL metrics
0
0 comments X

The pith

Data-flow spectra place up to 50% more faults in the top-15 ranks than control-flow spectra in spectrum-based fault localization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares data-flow spectra, based on definition-use associations, to control-flow spectra based on lines for use in spectrum-based fault localization across ten ranking metrics. It evaluates them on 163 faults from five real-world open source programs with large test suites. The results show data-flow spectra improve the ranking of faults, allowing more to be found in top positions. This suggests SFL can help developers locate bugs with less inspection effort, though at higher computational cost for collecting the spectra. Data-flow also gives info on suspicious variables.

Core claim

Using data-flow spectra, up to 50% more faults are ranked in the top-15 positions compared to control-flow spectra. Most SFL ranking metrics present better effectiveness using data-flow to inspect up to the top-40 positions. The execution cost of data-flow spectra is higher, with an average overhead of 353% compared to 102% for control-flow.

What carries the argument

Definition-use association (DUA) spectra versus line spectra applied to ten SFL ranking metrics on 163 faults.

If this is right

  • Developers may need to inspect less code to find faults when using data-flow spectra.
  • Most ranking metrics perform better with data-flow up to the top-40 positions.
  • Data-flow spectra provide additional information about suspicious variables that can aid fault localization.
  • The extra execution time for data-flow, from 22 seconds to under 9 minutes, remains practical for use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining data-flow and control-flow spectra could yield even better results in hybrid SFL techniques.
  • Applying this to other types of faults or larger programs might reveal scalability limits.
  • Integration with variable-level analysis could further reduce the code developers need to review.

Load-bearing premise

The 163 faults and five open-source programs with their test suites are representative of typical software systems without systematic bias from data-flow instrumentation.

What would settle it

Running the same comparison on a new set of programs and faults and observing no increase in the number of faults ranked in the top-15 with data-flow spectra.

Figures

Figures reproduced from arXiv: 1906.11715 by Fabio Kon, Henrique Lemos Ribeiro, Higor Amario de Souza, Marcos Lordello Chaim, Roberto Paulo de Andrioli Araujo.

Figure 1
Figure 1. Figure 1: Code of max program B. Control-flow spectra Fault localization techniques use different types of control￾flow spectra: statements are executable lines of code; basic blocks (or simply blocks) are sets of statements that are always executed together; branches are statements that transfer the control-flow execution among blocks. Control-flow information of a program is represented by a graph with nodes and e… view at source ↗
Figure 3
Figure 3. Figure 3: Effectiveness of DUA and line spectra in fault localization using different ranking metrics [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effectiveness of DUA and line spectra in fault localization using different ranking metrics number of faults: 75 out of 163 using Ochiai and Zoltar whilst line spectrum required the inspection of less code only for 40 and 39 faults, respectively, for the same ranking metrics—a difference of 87.5%. Around 20 out of 163 (12%) faults can be located by investigating only the top 5 lines, using either DUA or li… view at source ↗
read the original abstract

Background: Debugging is a key task during the software development cycle. Spectrum-based Fault Localization (SFL) is a promising technique to improve and automate debugging. SFL techniques use control-flow spectra to pinpoint the most suspicious program elements. However, data-flow spectra provide more detailed information about the program execution, which may be useful for fault localization. Aims: We evaluate the effectiveness and efficiency of ten SFL ranking metrics using data-flow spectra. Method: We compare the performance of data- and control-flow spectra for SFL using 163 faults from 5 real-world open source programs, which contain from 468 to 4130 test cases. The data- and control-flow spectra types used in our evaluation are definition-use associations (DUAs) and lines, respectively. Results: Using data-flow spectra, up to 50% more faults are ranked in the top-15 positions compared to control-flow spectra. Also, most SFL ranking metrics present better effectiveness using data-flow to inspect up to the top-40 positions. The execution cost of data-flow spectra is higher than control-flow, taking from 22 seconds to less than 9 minutes. Data-flow has an average overhead of 353% for all programs, while the average overhead for control-flow is of 102%. Conclusions: The results suggest that SFL techniques can benefit from using data-flow spectra to classify faults in better positions, which may lead developers to inspect less code to find bugs. The execution cost to gather data-flow is higher compared to control-flow, but it is not prohibitive. Moreover, data-flow spectra also provide information about suspicious variables for fault localization, which may improve the developers' performance using SFL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The paper evaluates spectrum-based fault localization (SFL) using data-flow spectra (definition-use associations, DUAs) versus control-flow spectra (program lines) across ten ranking metrics. On 163 faults from five open-source programs (468–4130 tests each), it reports that data-flow spectra place up to 50% more faults in the top-15 positions, yield better effectiveness for most metrics up to the top-40 positions, incur higher but feasible overhead (353% average vs. 102%), and supply additional variable-suspiciousness information.

Significance. If the empirical comparison holds after addressing methodological gaps, the result would be useful for SFL research by showing that richer execution spectra can improve ranking quality on real programs without prohibitive cost. The direct head-to-head measurement on actual faults and test suites is a concrete strength; however, the absence of statistical testing and limited subject selection limit the strength of the general claim that 'SFL techniques can benefit from using data-flow spectra'.

major comments (4)
  1. [Method] Method section: no statistical significance tests (e.g., paired Wilcoxon or bootstrap) are reported for the top-15 and top-40 effectiveness differences that underpin the 'up to 50%' and 'better effectiveness' claims; without them the headline numbers cannot be distinguished from sampling variation.
  2. [Method] Method / Results: the paper supplies no description of tie-breaking rules or how programs containing multiple faults are counted when computing the 'faults ranked in top-15' metric; both choices directly affect the reported percentages.
  3. [Evaluation] Evaluation setup: the five programs and 163 faults are presented without explicit selection criteria, stratification by fault type, or threats-to-validity discussion of domain or language bias, making it impossible to assess whether the observed DUA advantage generalizes beyond the chosen subjects.
  4. [Method] Method: potential systematic bias introduced by the DUA instrumentation itself (e.g., altered execution timing or coverage) is not measured or bounded, yet the spectra comparison treats the two kinds of spectra as directly comparable.
minor comments (2)
  1. [Abstract] Abstract: the overhead sentence 'taking from 22 seconds to less than 9 minutes' should clarify whether these are per-program extremes or averages and should reference the corresponding table or figure.
  2. [Results] Results: tables or figures comparing the ten metrics should include the raw counts of faults localized at each rank threshold rather than only relative percentages, to allow independent verification.

Simulated Author's Rebuttal

4 responses · 0 unresolved

Thank you for the constructive feedback. We have revised the manuscript to strengthen its methodological transparency and address all major concerns raised.

read point-by-point responses
  1. Referee: [Method] Method section: no statistical significance tests (e.g., paired Wilcoxon or bootstrap) are reported for the top-15 and top-40 effectiveness differences that underpin the 'up to 50%' and 'better effectiveness' claims; without them the headline numbers cannot be distinguished from sampling variation.

    Authors: We agree this is a gap. The revised manuscript now includes paired Wilcoxon signed-rank tests on the per-metric effectiveness differences at top-15 and top-40 positions, with p-values and effect sizes reported in the Results section. Most differences remain statistically significant (p < 0.05). revision: yes

  2. Referee: [Method] Method / Results: the paper supplies no description of tie-breaking rules or how programs containing multiple faults are counted when computing the 'faults ranked in top-15' metric; both choices directly affect the reported percentages.

    Authors: We have added an explicit subsection in Method describing tie-breaking (average rank assigned to tied elements, standard in SFL) and clarified that the study uses single-fault versions of the programs, consistent with the majority of prior SFL benchmarks. Multi-fault handling is noted as out of scope. revision: yes

  3. Referee: [Evaluation] Evaluation setup: the five programs and 163 faults are presented without explicit selection criteria, stratification by fault type, or threats-to-validity discussion of domain or language bias, making it impossible to assess whether the observed DUA advantage generalizes beyond the chosen subjects.

    Authors: The revised Threats to Validity section now states the selection criteria (programs drawn from prior SFL studies with available test suites and real faults), notes lack of stratification by fault type, and explicitly discusses language (Java) and domain limitations on generalizability. revision: yes

  4. Referee: [Method] Method: potential systematic bias introduced by the DUA instrumentation itself (e.g., altered execution timing or coverage) is not measured or bounded, yet the spectra comparison treats the two kinds of spectra as directly comparable.

    Authors: We acknowledge the concern. The revision adds a paragraph in Method noting that both spectra are collected from the same instrumented executions (ensuring internal comparability) and bounds the timing impact via the separately reported overhead figures. We could not retroactively quantify any differential coverage distortion without new instrumentation experiments. revision: partial

Circularity Check

0 steps flagged

No circularity: direct empirical comparison of measured spectra on fixed subjects

full rationale

The paper performs an empirical evaluation comparing data-flow (DUA) and control-flow (line) spectra for SFL ranking metrics across 163 faults in 5 open-source programs. No equations, fitted parameters, predictions, or derivations appear in the abstract or described method. Effectiveness claims (e.g., up to 50% more faults in top-15) are reported as direct observations from the experiment, not reduced by construction to any self-defined quantity or prior self-citation. The reader's assessment of score 1.0 is consistent; generalizability concerns exist but are orthogonal to circularity. No load-bearing self-citation chains or ansatzes are present.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper is an empirical evaluation study. It introduces no new mathematical entities or derivations and therefore carries no free parameters or invented entities. It rests on two standard domain assumptions common to software-testing experiments.

axioms (2)
  • domain assumption The five selected open-source programs and their 163 faults are representative of real-world software and faults.
    Generalization from the reported results depends on this premise.
  • domain assumption Definition-use association spectra can be collected by instrumentation without introducing measurement bias or altering program behavior.
    The comparison of spectra effectiveness assumes accurate collection.

pith-pipeline@v0.9.0 · 5853 in / 1415 out tokens · 32616 ms · 2026-05-25T14:30:02.954622+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    The economic impacts of inadequate infrastructure for software testing,

    G. Tassey, “The economic impacts of inadequate infrastructure for software testing,” National Institute of Standards and Technology, RTI Project, vol. 7007, no. 011, 2002

  2. [2]

    Slice-based statistical fault localization,

    X. Mao, Y . Lei, Z. Dai, Y . Qi, and C. Wang, “Slice-based statistical fault localization,” Journal of Systems and Software , vol. 89, no. 0, pp. 51–62, 2014

  3. [3]

    State dependency probabilistic model for fault localization,

    G. Dandan, S. Xiaohong, W. Tiantian, M. Peijun, and Y . Wang, “State dependency probabilistic model for fault localization,” Information and Software Technology, vol. 57, no. 0, pp. 430–445, 2014

  4. [4]

    Zeller, Why programs fail: A guide to systematic debugging , 2nd ed

    A. Zeller, Why programs fail: A guide to systematic debugging , 2nd ed. Burlington, MA: Morgan Kaufmann Publishers, 2009

  5. [5]

    Visualization of test informa- tion to assist fault localization,

    J. A. Jones, M. J. Harrold, and J. Stasko, “Visualization of test informa- tion to assist fault localization,” in Proceedings of the 24th International Conference on Software Engineering , ser. ICSE’02, 2002, pp. 467–477

  6. [6]

    Lightweight fault- localization using multiple coverage types,

    R. Santelices, J. A. Jones, Y . Yu, and M. J. Harrold, “Lightweight fault- localization using multiple coverage types,” in Proceedings of the 31st International Conference on Software Engineering , ser. ICSE’09, 2009, pp. 56–66

  7. [7]

    On the accuracy of spectrum-based fault localization,

    R. Abreu, P. Zoeteweij, and A. J. C. van Gemund, “On the accuracy of spectrum-based fault localization,” in Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION, ser. TAICPART-MUTATION’07, 2007, pp. 89–98

  8. [8]

    Spectral debugging with weights and incremental ranking,

    L. Naish, H. J. Lee, and K. Ramamohanarao, “Spectral debugging with weights and incremental ranking,” in Proceedings of the 16th Asia- Pacific Software Engineering Conference , ser. APSEC’09, 2009, pp. 168–175

  9. [9]

    A family of code coverage- based heuristics for effective fault localization,

    W. E. Wong, V . Debroy, and B. Choi, “A family of code coverage- based heuristics for effective fault localization,” Journal of Systems and Software, vol. 83, no. 2, pp. 188–208, 2010

  10. [10]

    Automatic error detection techniques based on dynamic invariants,

    A. Gonzalez-Sanchez, “Automatic error detection techniques based on dynamic invariants,” Master’s thesis, Delft University of Technology, 2007

  11. [11]

    Evaluating and improving fault localization,

    S. Pearson, J. Campos, R. Just, G. Fraser, R. Abreu, M. D. Ernst, D. Pang, and B. Keller, “Evaluating and improving fault localization,” in Proceedings of the 39th International Conference on Software Engi- neering, ser. ICSE’17, 2017, pp. 609–620

  12. [12]

    Uniformly evaluating and comparing ranking metrics for spectral fault localization,

    C. Ma, Y . Zhang, T. Zhang, Y . Lu, and Q. Wang, “Uniformly evaluating and comparing ranking metrics for spectral fault localization,” in Pro- ceedings of the 14th International Conference on Quality Software , ser. QSIC’14, 2014, pp. 315–320

  13. [13]

    Fault localization based on information flow coverage,

    W. Masri, “Fault localization based on information flow coverage,” Software Testing, Verification and Reliability , vol. 20, no. 2, pp. 121– 147, 2010

  14. [14]

    Experiments of the effectiveness of dataflow- and controlflow-based test adequacy cri- teria,

    M. Hutchins, H. Foster, T. Goradia, and T. Ostrand, “Experiments of the effectiveness of dataflow- and controlflow-based test adequacy cri- teria,” in Proceedings of the 16th International Conference on Software Engineering, ser. ICSE’94, 1994, pp. 191–200

  15. [15]

    Fault localization with nearest neighbor queries,

    M. Renieris and S. P. Reiss, “Fault localization with nearest neighbor queries,” in Proceedings of the 18th IEEE International Conference on Automated Software Engineering , ser. ASE’03, 2003, pp. 30–39

  16. [16]

    Debugging in parallel,

    J. A. Jones, J. F. Bowring, and M. J. Harrold, “Debugging in parallel,” in Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA’07, 2007, pp. 16–26

  17. [17]

    HOLMES: Effective statistical debugging via efficient path profiling,

    T. M. Chilimbi, B. Liblit, K. Mehra, A. V . Nori, and K. Vaswani, “HOLMES: Effective statistical debugging via efficient path profiling,” in Proceedings of the 31st International Conference on Software Engi- neering, ser. ICSE’09, 2009, pp. 34–44

  18. [18]

    Demand-driven structural testing with dynamic instrumentation,

    J. Misurda, J. A. Clause, J. L. Reed, B. R. Childers, and M. L. Soffa, “Demand-driven structural testing with dynamic instrumentation,” in Proceedings of the 27th International Conference on Software Engi- neering, ser. ICSE’05, 2005, pp. 156–165

  19. [19]

    Efficiently monitoring data-flow test coverage,

    R. Santelices and M. J. Harrold, “Efficiently monitoring data-flow test coverage,” in Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE’07, 2007, pp. 343–352

  20. [20]

    An efficient bitwise algorithm for intra-procedural data-flow testing coverage,

    M. L. Chaim and R. P. A. d. Araujo, “An efficient bitwise algorithm for intra-procedural data-flow testing coverage,” Information Processing Letters, vol. 113, no. 8, pp. 293–300, 2013

  21. [21]

    Data-flow testing in the large,

    R. P. A. de Araujo and M. L. Chaim, “Data-flow testing in the large,” in Proceedings of the 7th IEEE International Conference on Software Testing, Verification and Validation, ser. ICST’14, 2014, pp. 81–90

  22. [22]

    Jaguar: A spectrum-based fault localization tool for real- world software,

    H. L. Ribeiro, H. A. de Souza, R. P. A. de Araujo, M. L. Chaim, and F. Kon, “Jaguar: A spectrum-based fault localization tool for real- world software,” in Proceedings of the 11th International Conference on Software Testing, Verification and Validation , ser. ICST’18, 2018, pp. 404–409

  23. [23]

    Releng of the nerds: Open source release engineering,

    K. Moir, “Releng of the nerds: Open source release engineering,” March 2011, SDK code coverage with JaCoCo. [Online]. Available: http://relengofthenerds.blogspot.com.br/2011/03/ sdk-code-coverage-with-jacoco.html

  24. [24]

    Selecting software test data using data flow information,

    S. Rapps and E. J. Weyuker, “Selecting software test data using data flow information,” IEEE Transactions on Software Engineering, vol. 11, no. 4, pp. 367–375, 1985

  25. [25]

    The use of program profiling for software maintenance with applications to the year 2000 problem,

    T. Reps, T. Ball, M. Das, and J. Larus, “The use of program profiling for software maintenance with applications to the year 2000 problem,” in Proceedings of the 6th European Software Engineering Conference Held Jointly with the 5th ACM SIGSOFT Symposium on the Foundations of Software Engineering , ser. ESEC/FSE’97, 1997, pp. 432–449

  26. [26]

    The impact of software evolution on code coverage information,

    S. Elbaum, D. Gable, and G. Rothermel, “The impact of software evolution on code coverage information,” in Proceedings of the 19th IEEE International Conference on Software Maintenance, ser. ICSM’01, 2001, pp. 170–179

  27. [27]

    An empirical investiga- tion of program spectra,

    M. J. Harrold, G. Rothermel, R. Wu, and L. Yi, “An empirical investiga- tion of program spectra,” SIGPLAN Notices, vol. 33, no. 7, pp. 83–90, 1998

  28. [28]

    A consensus-based strategy to improve the quality of fault localization,

    V . Debroy and W. E. Wong, “A consensus-based strategy to improve the quality of fault localization,” Software: Practice and Experience, vol. 43, no. 8, pp. 989–1011, 2013

  29. [29]

    A dynamic fault localization technique with noise reduction for java programs,

    J. Xu, W. K. Chan, Z. Zhang, T. H. Tse, and S. Li, “A dynamic fault localization technique with noise reduction for java programs,” in Proceedings of the 11th International Conference on Quality Software , ser. QSIC’11, 2011, pp. 11–20

  30. [30]

    A debugging strategy based on requirements of testing,

    M. L. Chaim, J. C. Maldonado, and M. Jino, “A debugging strategy based on requirements of testing,” in Proceedings of the 7th Euro- pean Conference on Software Maintenance and Reengineering , ser. CSMR’03, 2003, pp. 160–169

  31. [31]

    Defects4j: A database of existing faults to enable controlled testing studies for java programs,

    R. Just, D. Jalali, and M. D. Ernst, “Defects4j: A database of existing faults to enable controlled testing studies for java programs,” in Pro- ceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA’14, 2014, pp. 437–440

  32. [32]

    Are automated debugging techniques actually helping programmers?

    C. Parnin and A. Orso, “Are automated debugging techniques actually helping programmers?” in Proceedings of the ACM SIGSOFT Inter- national Symposium on Software Testing and Analysis , ser. ISSTA’11, 2011, pp. 199–209

  33. [33]

    Practitioners’ expectations on automated fault localization,

    P. S. Kochhar, X. Xia, D. Lo, and S. Li, “Practitioners’ expectations on automated fault localization,” in Proceedings of the 25th International Symposium on Software Testing and Analysis , ser. ISSTA’16, 2016, pp. 165–176

  34. [34]

    A test of goodness of fit,

    T. W. Anderson and D. A. Darling, “A test of goodness of fit,” Journal of the American Statistical Association , vol. 49, no. 268, pp. 765–769, 1954

  35. [35]

    Individual comparisons by ranking methods,

    F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics bulletin, vol. 1, no. 6, pp. 80–83, 1945

  36. [36]

    Dominance statistics: Ordinal analyses to answer ordinal questions,

    N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions,” Psychological Bulletin, vol. 114, no. 3, pp. 494–509, 1993

  37. [37]

    Assessment of spectrum-based fault localization for practical use,

    H. A. de Souza, “Assessment of spectrum-based fault localization for practical use,” PhD thesis, Institute of Mathematics and Statistics – University of S ˜ao Paulo, S ˜ao Paulo, Brazil, April 2018

  38. [38]

    Effective statistical fault localization using program slices,

    Y . Lei, X. Mao, Z. Dai, and C. Wang, “Effective statistical fault localization using program slices,” in Proceedings of the IEEE 36th Annual International Computers, Software and Applications Conference, ser. COMPSAC’12, 2012, pp. 1–10

  39. [39]

    Hsfal: Effective fault localization using hybrid spectrum of full slices and execution slices,

    X. Ju, S. Jiang, X. Chen, X. Wang, Y . Zhang, and H. Cao, “Hsfal: Effective fault localization using hybrid spectrum of full slices and execution slices,” Journal of Systems and Software , vol. 90, no. 0, pp. 3–17, 2014

  40. [40]

    Locating faults using multiple spectra-specific models,

    K. Yu, M. Lin, Q. Gao, H. Zhang, and X. Zhang, “Locating faults using multiple spectra-specific models,” in Proceedings of the 26th ACM Symposium on Applied Computing , ser. SAC’11, 2011, pp. 1404–1410

  41. [41]

    Software-defect localisation by mining dataflow-enabled call graphs,

    F. Eichinger, K. Krogmann, R. Klug, and K. B ¨ohm, “Software-defect localisation by mining dataflow-enabled call graphs,” in Proceedings of the Joint European Conference on Machine Learning and Principles and Practice on Knowledge Discovery in Databases , ser. ECML PKDD 2010, 2010, pp. 425–441

  42. [42]

    How effective are code coverage criteria?

    H. Hemmati, “How effective are code coverage criteria?” in 2015 IEEE International Conference on Software Quality, Reliability and Security , ser. QRS’15, 2015, pp. 151–156