This repository is a R-format version of the Norwegian sentiment lexicons as shown in Barnes et.al (2019).
The package can be installed by using the install_github()
function
from the devtools
package in R:
devtools::install_github("martigso/NorSentLex")
library(NorSentLex)
?nor_fullform_sent
?nor_lemma_sent
The package mirrors the structure of the vanilla
NorSentLex repository, but in a
typical R type format. There are two available datasets:
nor_fullform_sent
and nor_lemma_sent
. These can be easily loaded in
R:
data("nor_fullform_sent", package = "NorSentLex")
data("nor_lemma_sent", package = "NorSentLex")
The data are structured as follows:
Token form | Sentiment | POS |
---|---|---|
Fullform | Positive Negative |
TBD |
Lemma | Positive Negative |
adjective noun participle adjective verb adjective noun participle adjective verb |
The fullform data contains a list with one element (“positive”) of 6103 positive fullform tokens and one element (“negative”) of 14839 negative fullform tokens. These can be extracted by name after loading the data into R (see above):
nor_fullform_sent$positive |>
head()
## [1] "absolutt" "absolutta" "absolutte" "absoluttene" "absolutter"
## [6] "absoluttet"
nor_fullform_sent$negative |>
head()
## [1] "abnorm" "abnorme" "abnormt" "abort" "aborten" "abortene"
The lemmatized part of the data contain a list element for positive and negative lexicons for each of the following parts-of-speech: adjective, noun, participle adjective, and verb:
names(nor_lemma_sent)
## [1] "lemma_adj_negative" "lemma_adj_positive" "lemma_noun_negative"
## [4] "lemma_noun_positive" "lemma_padj_negative" "lemma_padj_positive"
## [7] "lemma_verb_negative" "lemma_verb_positive"
These lexicons can also be extracted by calling the names within the list:
nor_lemma_sent$lemma_noun_positive |>
tail()
## [1] "åpenbaring" "ærbødighet" "ære" "ærlighet" "økning"
## [6] "ønske"
Barnes et al. (2019) Lexicon information in neural sentiment analysis: a multi-task learning approach. Proceedings of the 22nd Nordic Conference on Computational Linguistics. Turku, Finland ACL Anthology