A pipeline using SBERT/UMAP/HDBSCAN clustering on 339 repositories identifies 692k recurring Gherkin slices, labels 200 of them, and trains an XGBoost model that achieves F1 0.891 for extraction-worthiness, outperforming rule and LLM baselines, with prevalence statistics released.
IEEE Transactions on Software Engineering , volume =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
No citing papers match the current filters.