Skip to content

Binded Sources

Nicolay Rusnachenko edited this page Aug 6, 2023 · 14 revisions

Includes:

  • Downloader script that allows to fetch all the resources with all the versions.
  • API over zip files as a default mechanism which allows to perform access to the sources contents.

Represents a lexicon which describes sentiments and connotations conveyed with a predicate in a verbal or nominal form.

RuSentRel corpus [paper] of version 1.1 consisted of analytical articles from Internet-portal inosmi.ru. These are translated into Russian texts in the domain of international politics obtained from foreign authoritative sources. The collected articles contain both the author's opinion on the subject matter of the article and a large number of references mentioned between the participants of the described situations. In total, 73 large analytical texts were labeled with about 2000 relations.

Represents a collection of automatically labeled sentiment attitudes, which is developed using distant supervision (DS) approach. It is considered as an application for machine learning model training.

At present, collection is not available for public aceess. Please contact Natalia Loukachevitch and Nicolay Rusnachenko for granting an access.

A Russian dataset [paper] with nested named entities, relations, events and linked entities. It is significantly larger than priorly existing: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. It contains the annotation of events involving named entities and their roles in the events.

NEREL-BIO -- an annotation scheme and corpus of PubMed abstracts in Russian and in English. NEREL-BIO extends the general domain dataset NEREL. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments.