Using the Distribution of Performance for Studying Statistical NLP Systems and Corpora

Yuval Krymolowski

arxiv: cs/0106043 · v1 · submitted 2001-06-20 · 💻 cs.CL

Using the Distribution of Performance for Studying Statistical NLP Systems and Corpora

Yuval Krymolowski This is my paper

classification 💻 cs.CL

keywords singlesystemscorporadatadistributionobtainedperformancesplit

0 comments

read the original abstract

Statistical NLP systems are frequently evaluated and compared on the basis of their performances on a single split of training and test data. Results obtained using a single split are, however, subject to sampling noise. In this paper we argue in favour of reporting a distribution of performance figures, obtained by resampling the training data, rather than a single number. The additional information from distributions can be used to make statistically quantified statements about differences across parameter settings, systems, and corpora.

This paper has not been read by Pith yet.

Using the Distribution of Performance for Studying Statistical NLP Systems and Corpora

discussion (0)