Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

· 2018 · cs.CL · arXiv 1806.01045

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Using the 6,638 case descriptions of societal impact submitted for evaluation in the Research Excellence Framework (REF 2014), we replicate the topic model (Latent Dirichlet Allocation or LDA) made in this context and compare the results with factor-analytic results using a traditional word-document matrix (Principal Component Analysis or PCA). Removing a small fraction of documents from the sample, for example, has on average a much larger impact on LDA than on PCA-based models to the extent that the largest distortion in the case of PCA has less effect than the smallest distortion of LDA-based models. In terms of semantic coherence, however, LDA models outperform PCA-based models. The topic models inform us about the statistical properties of the document sets under study, but the results are statistical and should not be used for a semantic interpretation - for example, in grant selections and micro-decision making, or scholarly work-without follow-up using domain-specific semantic maps.

representative citing papers

Not All Bugs Are the Same: Understanding, Characterizing, and Classifying the Root Cause of Bugs

cs.SE · 2019-07-25 · unverdicted · novelty 5.0

Manual analysis of 1,280 bug reports across three ecosystems produces a nine-category root cause taxonomy; an ML classifier achieves 64% F-Measure and 74% AUC-ROC overall.

citing papers explorer

Showing 1 of 1 citing paper.

Not All Bugs Are the Same: Understanding, Characterizing, and Classifying the Root Cause of Bugs cs.SE · 2019-07-25 · unverdicted · none · ref 38 · internal anchor
Manual analysis of 1,280 bug reports across three ecosystems produces a nine-category root cause taxonomy; an ML classifier achieves 64% F-Measure and 74% AUC-ROC overall.

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

fields

years

verdicts

representative citing papers

citing papers explorer