Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

Feedback
Report

9 Views PremiumMay 16, 2024

When creating a new dataset for evaluation, one of the first considerations is the size of the dataset. If our evaluation data is too small, we risk making unsupported claims based on the results on such data. If, on the other hand, the data is too large, we waste valuable annotation time and costs that could have been used to widen the scope of our evaluation (i.e.\ annotate for more domains/languages). Hence, we investigate the effect of the size, and a variety of sampling strategies of evaluation data to optimize annotation efforts, using dependency parsing as a test case. We show that for in-language, in-domain datasets, 5,000 tokens is enough to obtain a reliable ranking of different parsers; especially if the data is distant enough from the training split (otherwise, we recommend 10,000). In cross-domain setups, the same amounts are required, but in cross-lingual setups much less (2,000 tokens) is enough.

Repost is prohibited without the creator's permission.

0 Follower · 11 Videos

Recommended for You

All
Anime

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

5:45

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

26 Views

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

10:00

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

13 Views

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

10:00

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

17 Views

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

1:55

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

19 Views

We Need to Talk About train-dev-test Splits

8:00

We Need to Talk About train-dev-test Splits

19 Views

Where are we Still Split on Tokenization?

4:46

Where are we Still Split on Tokenization?

6 Views

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

6:03

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

11 Views

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

6:32

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

27 Views

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

12:15

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

8 Views

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

0:39

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

9 Views

Which province's residents love buying luxury cars the most?

0:56

Which province's residents love buying luxury cars the most?

0 View

The biker sister is so proactive! #femalebiker #BMW R1200R

0:30

The biker sister is so proactive! #femalebiker #BMW R1200R

0 View

A kid spent 6 yuan on skincare products, but the parents want to send the child to a psychiatric hos

0:48

A kid spent 6 yuan on skincare products, but the parents want to send the child to a psychiatric hos

laoqizhuangyuange

1 View

New surnames that already exist

1:57

New surnames that already exist

0 View

Standing by and watching is also a form of violence—do you dare to look directly into the victim's e

0:49

Standing by and watching is also a form of violence—do you dare to look directly into the victim's e

0 View

"Up to this point," "has become," "art"

1:24

"Up to this point," "has become," "art"

0 View

Let's clarify the five social insurances and one housing fund once and for all—be sure to save this!

8:20

Let's clarify the five social insurances and one housing fund once and for all—be sure to save this!

0 View

Major Documentary "The Ultimate Humans in the Book of Mathematics"

1:10

Major Documentary "The Ultimate Humans in the Book of Mathematics"

1 View

It's only natural for a husband to protect his wife—what do you think? #Law #Marriage #Self-Defense

0:45

It's only natural for a husband to protect his wife—what do you think? #Law #Marriage #Self-Defense

0 View

UAE Abu Dhabi's "Liwa Desert Festival": A Racing Carnival in the Heart of the Desert

1:32

UAE Abu Dhabi's "Liwa Desert Festival": A Racing Carnival in the Heart of the Desert

palmer_collins_03

0 View