danmaku icon

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

0 ViewMay 16, 2024

When creating a new dataset for evaluation, one of the first considerations is the size of the dataset. If our evaluation data is too small, we risk making unsupported claims based on the results on such data. If, on the other hand, the data is too large, we waste valuable annotation time and costs that could have been used to widen the scope of our evaluation (i.e.\ annotate for more domains/languages). Hence, we investigate the effect of the size, and a variety of sampling strategies of evaluation data to optimize annotation efforts, using dependency parsing as a test case. We show that for in-language, in-domain datasets, 5,000 tokens is enough to obtain a reliable ranking of different parsers; especially if the data is distant enough from the training split (otherwise, we recommend 10,000). In cross-domain setups, the same amounts are required, but in cross-lingual setups much less (2,000 tokens) is enough.
warn iconRepost is prohibited without the creator's permission.
creator avatar

Recommended for You

  • All
  • Anime