-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
86 lines (51 loc) · 2.93 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
output: github_document
---
# NorSentLex <img src="man/figures/norsentlex.png" align="right" width="120"/>
<!-- badges: start -->
<!-- [![CRAN Version](http://www.r-pkg.org/badges/version/stortingscrape)](https://cran.r-project.org/package=stortingscrape) -->
[![Github Version](https://img.shields.io/github/r-package/v/martigso/NorSentLex?color=yellowgreen)](https://github.com/martigso/NorSentLex)
<!-- [![Downloads](http://cranlogs.r-pkg.org/badges/stortingscrape)](https://cran.r-project.org/package=stortingscrape) -->
<!-- [![Total Downloads](http://cranlogs.r-pkg.org/badges/grand-total/stortingscrape?color=orange)](https://cran.r-project.org/package=stortingscrape) -->
<!-- [![R-CMD-check](https://github.com/martigso/stortingscrape/actions/workflows/check-standard.yaml/badge.svg)](https://github.com/martigso/stortingscrape/actions/workflows/check-standard.yaml) -->
<!-- badges: end -->
This repository is a R-format version of the [Norwegian sentiment lexicons](https://github.com/ltgoslo/norsentlex) as shown in Barnes et.al (2019).
## Installation
The package can be installed by using the `install_github()` function from the `devtools` package in R:
```{r, eval=FALSE}
devtools::install_github("martigso/NorSentLex")
library(NorSentLex)
?nor_fullform_sent
?nor_lemma_sent
```
## Structure and usage
The package mirrors the structure of the vanilla [NorSentLex](https://github.com/ltgoslo/norsentlex) repository, but in a typical R type format. There are two available datasets: `nor_fullform_sent` and `nor_lemma_sent`. These can be easily loaded in R:
```{r}
data("nor_fullform_sent", package = "NorSentLex")
data("nor_lemma_sent", package = "NorSentLex")
```
The data are structured as follows:
| Token form | Sentiment | POS |
|:---------- |:----------|:----|
| Fullform | Positive <br> Negative |TBD|
| Lemma | Positive <br><br><br><br> Negative | adjective <br> noun <br> participle adjective <br> verb <br> adjective <br> noun <br> participle adjective <br> verb |
### Fullform
The fullform data contains a list with one element ("positive") of 6103 positive fullform tokens and one element ("negative") of 14839 negative fullform tokens. These can be extracted by name after loading the data into R (see above):
```{r}
nor_fullform_sent$positive |>
head()
nor_fullform_sent$negative |>
head()
```
### Lemma
The lemmatized part of the data contain a list element for positive and negative lexicons for each of the following parts-of-speech: adjective, noun, participle adjective, and verb:
```{r}
names(nor_lemma_sent)
```
These lexicons can also be extracted by calling the names within the list:
```{r}
nor_lemma_sent$lemma_noun_positive |>
tail()
```
## References
Barnes et al. (2019) Lexicon information in neural sentiment analysis: a multi-task learning approach. Proceedings of the 22nd Nordic Conference on Computational Linguistics. Turku, Finland [ACL Anthology](https://www.aclweb.org/anthology/W19-6119/)