InProceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE Industry Track) (ESEC/FSE 2020)

Reducing DNN Labelling Cost using Surprise Adequacy: An Industrial Case Study for Autonomous Driving · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs

cs.SE · 2025-09-22 · unverdicted · novelty 7.0

Clotho ranks LLM test inputs by failure likelihood using pre-generation hidden states and GMMs, achieving 0.716 ROC-AUC after labeling 5.4% of inputs on average across eight tasks and three models, with transfer to proprietary models.

citing papers explorer

Showing 1 of 1 citing paper.

Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs cs.SE · 2025-09-22 · unverdicted · none · ref 21
Clotho ranks LLM test inputs by failure likelihood using pre-generation hidden states and GMMs, achieving 0.716 ROC-AUC after labeling 5.4% of inputs on average across eight tasks and three models, with transfer to proprietary models.

InProceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE Industry Track) (ESEC/FSE 2020)

fields

years

verdicts

representative citing papers

citing papers explorer