You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The skip_datasets in the config could be improved, I think.
The Multi* versions should not be excluded, still valuable. Specially if the language pair is not english-centric.
If SPC is not failing anymore, should be removed. For the two or three language pairs that I have taken a look in this corpus, it was quite clean. Mightbe a good resource.
There should definitely be another skip_datasets by language pair. For example Ubuntu and PHP are full of garbage for Chinese but are good for other language pairs.
The text was updated successfully, but these errors were encountered:
The skip_datasets in the config could be improved, I think.
Multi*
versions should not be excluded, still valuable. Specially if the language pair is not english-centric.SPC
is not failing anymore, should be removed. For the two or three language pairs that I have taken a look in this corpus, it was quite clean. Mightbe a good resource.skip_datasets
by language pair. For example Ubuntu and PHP are full of garbage for Chinese but are good for other language pairs.The text was updated successfully, but these errors were encountered: