Skip to content

ialfina/revised-id-pud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 

Repository files navigation

The Revised UD Indonesian PUD Treebank

(Korpus Gold Standard Dependency Treebank dalam Bahasa Indonesia)

We proposed revisions to UD Indonesian PUD Treebank provided by Universal Dependencies (UD) so that it conforms to Indonesian grammar.

Note: We donated this dataset to UD in 2020 and will maintain the dataset on the UD repository. Hence, the newest version of the dataset can be found there.

Documentation

The short annotation guidelines for this revision can be found on the UD website.

Changelog

  • 2020-10-27 v2.0

    • added lemma
    • added features (14 features, 27 feature tags)
    • revised MWE words annotation
      • removed compound:prt
      • UPOS correction for MWE words
    • revised word segmentation
      • for words ended with clitic -nya, especially for predicate nominalisation cases
    • revised annotations of multiword token (MWT), especially for words ended with clitic -nya or particles lah/kah/tah/pun, including revising the annotation for SpaceAfter=No
    • changed the UPOS:
      • of personal pronouns for possessiveness from DET to PRON
    • added and removed subtypes:
      • nmod:lmod used for locative nouns
      • renamed flat:range to just flat.
      • renamed some flat tokens to flat:name (for PROPN-PROPN pairs)
  • 2019-08-17 v1.0

    • revised tokenization (major revision, especially reduplicated words)
    • revised UPOS (major revision)
    • proposed changes to language specific dependency relation for Indonesian
    • revised syntactic annotation (major revision)

Acknowledgments

Contributors of Revision v2.0:

  • Designing Indonesian annotation guidelines: Ika Alfina, Daniel Zeman, and Arawinda Dinakaramani
  • Annotators: Ika Alfina, Arawinda Dinakaramani, Muhammad Yudistira Hanifmuti, Jessica Naraiswari Arwidarasti, Yogi Lesmana Sulestio

Contributors of Revision v1.0:

  • Ika Alfina
  • Arawinda Dinakaramani

Reference

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

About

Revised UD Indonesian PUD Treebank

Resources

Stars

Watchers

Forks

Packages

No packages published