This repository hosts the official SimpleText Task4 @ CLEF'24, i.e. the SOTA? Tracking the State-of-the-Art in Scholarly Publications task corpus. The full corpus is released in the dataset
repository organized as follows:
[dataset]/
|--- [train]/
|--- [article-id-folder]/
| |--- [article-id].tex
| |--- annotations.json
|___ ...
|--- [validation]/
|--- [article-id-folder]/
| |--- [article-id].tex
| |--- annotations.json
|___ ...
|--- [test1-few-shot-papers]/
|--- [article-counter-folder]/
| |--- [article-id].tei.xml
|___ ...
|--- [test1-few-shot-annotations]/ # hidden during the competition
|--- [article-counter-folder]/
| |--- annotations.txt
| |--- code-link.txt # optional
|___ ...
|--- [test2-zero-shot-papers]/
|--- [article-counter-folder]/
| |--- [article-id].tei.xml
|___ ...
|--- [test2-zero-shot-annotations]/ # hidden during the competition
|--- [article-counter-folder]/
| |--- annotations.txt
| |--- code-link.txt # optional
|___ ...
The dataset dump originates from paperswithcode.com.
Each folder in the respective dump corresponds to a scholarly article originally downloaded in LaTeX format from arXiv.
There are 12,288+100 total papers in the train+validation sets,respectively. Furthermore, note each annotations.json
file either contains (task, dataset, metric, score) annotations for all papers reporting model scores. Otherwise annotations.json
contains the value "unanswerable." This is for those papers that do not report any model scores and therefore leaderboards cannot be populated from them. Models trained on our dataset should, in a first step, distinguish papers with leaderboards and those without, then for the former set of papers, extract their leaderboard tuples as annotations. The train dataset has 7,936 papers with leaderboard annotations and the remaining 4,352 papers without leaderboard annotations and therefore annotated as "unanswerable." The validation dataset has 51 papers with leaderboard and 49 papers without leaderboard annotations.
Below are provided some detailed statistics relevant to the leaderboard annotations in our dataset offering a glimpse into the corpus.
Parameter | train+validation (counts) |
---|---|
Unique Tasks | 1,372 |
Unique Datasets | 4,795 |
Unique Metrics | 2,782 |
Unique (Task, Dataset, Metric) triples | 11,977 |
Avg. (Task, Dataset, Metric) triples occurrences per paper | 6.93 |
Ten most common Tasks, Datasets, and Metrics in the Train+Validation set:
# | Most Common Tasks | Most Common Dataset | Most Common Metric | |||
Task | Frequency | Dataset | Frequency | Metric | Frequency | |
1 | image classification | 2273 | imagenet | 1603 | accuracy | 4383 |
2 | atari games | 1448 | coco test-dev | 792 | score | 1515 |
3 | node classification | 1113 | human3.6m | 624 | f1 | 1384 |
4 | object detection | 1001 | cifar-10 | 585 | psnr | 1144 |
5 | video retrieval | 997 | coco minival | 310 | map | 1068 |
6 | link prediction | 941 | youtube-vos 2018 | 295 | miou | 862 |
7 | semantic segmentation | 901 | cifar-100 | 252 | ssim | 799 |
8 | semi-supervised video object segmentation | 890 | msr-vtt-1ka | 247 | top 1 accuracy | 789 |
9 | 3d human pose estimation | 889 | fb15k-237 | 244 | 1:1 accuracy | 787 |
10 | question answering | 866 | msu super-resolution for video compression | 225 | number of params | 759 |
Ten most common (Task, Dataset, Metric) triples in Train+Validation Set:
(Task, Dataset, Metric) | Count |
---|---|
(image classification, imagenet, top 1 accuracy) | 524 |
(image classification, imagenet, number of params) | 313 |
(image classification, imagenet, gflops) | 256 |
(3d human pose estimation, human3.6m, average mpj...) | 197 |
(image classification, cifar-10, percentage correct) | 128 |
(action classification, kinetics-400, acc@1) | 108 |
(object detection, coco test-dev, box map) | 106 |
(image classification, cifar-100, percentage correct) | 105 |
(semantic segmentation, ade20k, validation miou) | 92 |
(neural architecture search, imagenet, top-1 erro...) | 83 |
Since each paper is accompanied with an annotations file, this section concludes with statistics for each of the four types in the tuple, what proportion of those can actually be found in the accompanying full-text.
- for Tasks, 60.24% of the annotation labels can be found in the accompanying paper full-text.
- for Datasets, 45.48% of the annotation labels can be found in the accompanying paper full-text.
- for Metrics, 42.69% of the annotation labels can be found in the accompanying paper full-text.
- for Scores, 58.86% of the annotations can be found in the accompanying paper full-text.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.