-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
327 lines (245 loc) · 20.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
---
output: github_document
link-citations: TRUE
bibliography: 'citation_README.bib'
csl: 'unified-style-sheet-for-linguistics'
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
[![R-CMD-check](https://github.com/gederajeg/happyr/workflows/R-CMD-check/badge.svg)](https://github.com/gederajeg/happyr/actions) [![Coverage status](https://codecov.io/gh/gederajeg/happyr/branch/master/graph/badge.svg)](https://codecov.io/github/gederajeg/happyr?branch=master) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1436330.svg)](https://doi.org/10.5281/zenodo.1436330)
```{r setup, include = FALSE}
knitr::opts_chunk$set(
fig.width = 6,
fig.asp = 0.618,
fig.align = "center",
dpi = 300,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "90%"
)
```
# happyr <a href='https://gederajeg.github.io/happyr/'><img src='man/figures/happyr-logo.png' align="right" height="139" /></a>
The goal of **happyr** is to document the R codes and the dataset for the quantitative analyses in Rajeg's [-@rajeg_metaphorical_2018] PhD thesis (submitted for examination on 27 September 2018 and passed without amendments for the award of the degree on 1 April 2019). The study focuses on metaphors for <span style='font-variant:small-caps;'>happiness</span> near-synonyms in Indonesian. The corpus data for the study mainly come from the *Indonesian Leipzig Corpora Collection* [@quasthoff_indonesian_2013; @goldhahn_building_2012; @biemann_leipzig_2007]. The Leipzig Corpora are freely available for [download](http://wortschatz.uni-leipzig.de/en/download) and their use is licensed under the Creative Common License [CC-BY](https://creativecommons.org/licenses/by/4.0/) (see the [Terms of Usage](http://wortschatz.uni-leipzig.de/en/usage) page for further details).
The **happyr** package is based on the core packages in the [tidyverse](https://www.tidyverse.org), and is built under R version 4.0.5 (2021-03-31) -- "Shake and Throw" (see the [**Session Info**](#session-info) section at the bottom of the page for further details on the dependencies).
## Acknowledgement
The thesis was supervised by Associate Professor [Alice Gaby](https://bit.ly/2D91Lp9) (main), Dr. [Howard Manns](https://bit.ly/2UelYPY) (associate), and Dr. [Simon Musgrave](https://bit.ly/2Z1tCAV) (associate). The panel members during the author's candidature milestones consisted of Dr. [Anna Margetts](https://research.monash.edu/en/persons/anna-margetts), Dr. [Réka Benczes](https://www.researchgate.net/profile/Reka_Benczes), and Prof. [John Newman](https://www.johnnewm.org). The two external examiners of the thesis were Prof. [Martin Hilpert](http://members.unine.ch/martin.hilpert/) (Université de Neuchâtel, Switzerland) and Dr. [Karen Sullivan](https://languages-cultures.uq.edu.au/profile/1106/kari-sullivan) (The University of Queensland, Australia) The PhD research of the author was fully funded by [Monash University](https://www.monash.edu), Australia through the [International Graduate Research Scholarships](https://www.monash.edu/graduate-research/future-students/scholarships) schemes (i.e. _Monash International Postgraduate Research Scholarships_ (MIPRS, now MITS) and _Monash Graduate Scholarships_ (MGS)). The author also benefited from generous research and travel funding provided by the [Monash Arts Graduate Research](https://arts.monash.edu/graduate-research/) and the [Monash Graduate Research Office](https://www.monash.edu/graduate-research).
## Installation
The **happyr** package can be installed from [GitHub](https://github.com/gederajeg/happyr) with the [remotes](https://cran.r-project.org/web/packages/remotes/index.html) package:
``` r
# Install remotes if needed
if(!require(remotes)) install.packages("remotes")
# Then, install the happyr package from GitHub
remotes::install_github("gederajeg/happyr")
```
## Citing *happyr*
```{r happyr-citation, eval = TRUE, echo = TRUE}
citation("happyr")
```
## Examples
First, load the **happyr** and **tidyverse** packages using the `library()` function.
```{r load-packages}
# load the required packages
library(happyr)
library(tidyverse)
```
### Chapter 3 - Interrater-agreement computation
All codes for the Kappa's calculation in the interrater agreement trial are presented in the *Examples* section of the documentation of the `kappa_tidy()` function. Type `?kappa_tidy()` in the R console to see them or check the [online documentation](https://gederajeg.github.io/happyr/reference/kappa_tidy.html).
The ggplot2 codes for generating Figure 3.1 in Rajeg [-@rajeg_metaphorical_2018, Ch. 3] is wrapped into a function called `plot_cxn_interrater()`. The input data frame is `top_cxn_data`.
```{r cxn-interrater-figure, fig.asp = 0.75, fig.width = 6.5}
# prepare plot title and caption
plot_title <- expression(paste("Distribution of the constructional patterns for the agreed cases (", N["patterns"] >= "5)", sep = ""))
plot_caption <- "The values inside the bars are the token frequency of the patterns"
plot_cxn_interrater(df = top_cxn_data) +
# add plot title and caption
labs(title = plot_title,
caption = plot_caption) +
# adjust the size of the plot title and caption
theme(plot.title = element_text(size = 10),
plot.caption = element_text(size = 7))
```
### Chapter 5 and Chapter 6 - Token frequency, type frequency, and type/token ratio analyses
The main metaphor data for Chapter 5, 6, and 7 is stored as a tibble in `phd_data_metaphor`. The relevant function for the token, type, and type/token ratio analyses in Chapter 5 and 6 is `ttr()`.
```{r ttr, warning = FALSE, message = FALSE}
# calculation for the token, type, and type/token ratio
ttr_metaphor <- ttr(df = phd_data_metaphor,
schema_var = "metaphors", # specify col.name of the metaphor variable
lexunit_var = "lu", # specify col.name of the lexical unit variable
float_digits = 2)
```
The following code retrieves the top-10 metaphors sorted according to their token frequencies [@rajeg_metaphorical_2018, Ch. 5, Table 5-1]. A function for rendering the metaphors strings as small-capital in the MS Word output is available in the package as `scaps()`; keyboard shortcut to produce the so-called "pipe" `%>%` in the code-chunk below is `Ctrl + Shift + M` (on Windows) or `Cmd + Shift + M` (on macOS).
```{r most-frequent-metaphors}
top_n(x = ttr_metaphor, n = 10L, wt = token) %>%
mutate(metaphors = scaps(metaphors)) %>% # render the metaphors into small capitals to be printed in MS Word output
knitr::kable(caption = "Top-10 most frequent metaphors", row.names = TRUE)
```
The column `token` shows the token frequency of a metaphor meanwhile the column `type_lu` represents the number of different lexical-unit types evoking the source domain frames of the metaphor in the metaphorical expressions. The original values of the `type_per_token_lu` are normalised into the number of type per 100 tokens [cf. @oster_emotions_2018, pp. 206-207]. Thus, the closer the TTR of a metaphor to 100, the higher the rate of different lexical-unit type per 100 tokens of the metaphor (see further below) [@stefanowitsch_corpus-based_2016, pp. 118-120; @stefanowitsch_corpus_2017, p. 282; @oster_emotions_2018, p. 206; @oster_using_2010, pp. 748-749].
Use `get_lu_table()` to retrieve the source frame *lexical units* in the metaphorical expressions instantiating a given metaphor. It is illustrated here with the linguistic expressions for <span style="font-variant:small-caps;">happiness is a desired goal</span> metaphor [@rajeg_metaphorical_2018, Ch. 5, Table 5-3]:
```{r lu-desired-goal}
# print the top-10 Lexical Units of the HAPPINESS IS A DESIRED GOAL metaphor
get_lu_table(metaphor = "is a desired goal$",
top_n_only = TRUE,
top_n_limit = 10L,
df = phd_data_metaphor) %>%
knitr::kable(caption = paste("Top-10 most frequent lexical units for ",
scaps("happiness is a desired goal."), sep = ""),
row.names = TRUE)
```
The column `Perc_overall` indicates the percentage of a given LU from the total tokens of the <span style="font-variant:small-caps;">happiness is a desired goal</span> metaphor. More linguistic citations for the metaphorical expressions are presented in the thesis.
From the output of `ttr()` above, which is stored in the `ttr_metaphor` table, we can retrieve the top-10 metaphors with high type frequencies with the following codes [@rajeg_metaphorical_2018, Ch. 6, Table 6-1]; the type frequency of a metaphor indicates the number of different lexical unit types expressing a given metaphor.
```{r top-10-productive-metaphors}
# sort by type frequency
productive_metaphor <-
ttr_metaphor %>%
arrange(desc(type_lu)) %>% # sort in descending order for the type frequency
top_n(10, type_lu) %>% # get the top-10 rows
mutate(metaphors = scaps(metaphors)) # small-caps the metaphors texts
# print as table
productive_metaphor %>%
select(Metaphors = metaphors,
Token = token,
`%Token` = perc_token,
Type = type_lu,
`%Type` = perc_type_lu) %>%
knitr::kable(caption = 'Top-10 metaphors sorted on their type frequency.', row.names = TRUE)
```
The codes below generates Table 6-2 [@rajeg_metaphorical_2018, Ch. 6] that ranks metaphors with high type frequency above according to their type/token ratio.
```{r top-type-ttr}
productive_metaphor %>%
arrange(desc(type_per_token_lu)) %>%
select(Metaphors = metaphors,
Token = token,
Type = type_lu,
`Type/token ratio` = type_per_token_lu) %>%
knitr::kable(caption = 'Metaphors with high type frequency sorted by their Type/Token Ratio (TTR).', row.names = TRUE)
```
It is clear that the first two metaphors in the table above (i.e. `r scaps("happiness is an imperilled entity")` and `r scaps("happiness is light")`) have higher ratio for different types of linguistic instantiations per 100 tokens, despite the vast difference in their token frequencies compared to the remanining metaphors with high token frequencies in the table. This suggests that these two metaphors are expressed with relatively wider range of expressions with respect to their token frequencies, compared to the frequent metaphors.
Next, a helper function called `get_creative_metaphors()` is available to retrieve the top-10 creative metaphors [@rajeg_metaphorical_2018, Ch. 6, Table 6-5]. I filter and discuss the metaphors with high type/token ratio and occurring at least three tokens in the sample, as shown in the codes below
```{r creative-metaphors}
min_freq <- 3L
table_caption <- paste('Top-10 creative metaphors sorted based on the TTR value and occurring at least ', happyr::numbers2words(min_freq), ' tokens.', sep = "")
creative_metahors <-
ttr_metaphor %>%
get_creative_metaphors(min_token = min_freq,
top_n_limit = 10L) %>%
mutate(metaphors = scaps(metaphors))
# print the table
creative_metahors %>%
select(Metaphors = metaphors,
Token = token,
Type = type_lu,
`Type/token ratio` = type_per_token_lu) %>%
knitr::kable(caption = table_caption, row.names = TRUE)
```
One way to interpret the values in the `Type/token ratio` (TTR) column is to conceive them as representing the number of unique lexical-unit types per 100 tokens of a metaphor. The higher the ratio, the more creative a given metaphor is linguistically expressed. For instance, the TTR value of `r happyr::scaps("happiness is an adversary")` (i.e. `r unlist(ttr_metaphor[ttr_metaphor$metaphors=="happiness is an adversary", "type_per_token_lu"])`) indicates that there are about `r unlist(ttr_metaphor[ttr_metaphor$metaphors=="happiness is an adversary", "type_per_token_lu"])` unique types per 100 tokens of the `r happyr::scaps("happiness is an adversary")`, which is much higher than the TTR value of `r scaps("happiness is a possessable object")` (i.e. `r unlist(ttr_metaphor[ttr_metaphor$metaphors=="happiness is a possessable object", "type_per_token_lu"])`). The TTR value of a metaphor is used to represent the *creativity ratio* of a metaphor in its linguistic manifestation [cf. @oster_emotions_2018, p. 206; @oster_using_2010, pp. 748-749].
#### Retrieving the frequency of submappings and semantic source frames of the metaphors
The data for retrieving the information on the submappings and the source frames of metaphors is contained within `phd_data_metaphor`. Among the relevant functions for retrieving these information are `get_submappings()` and `get_frames()`. The illustration is based on data for the `r happyr::scaps("happiness is liquid in a container")` metaphor.
```{r submappings-liquid-in-container}
# get the submappings for the liquid in a container
get_submappings(metaphor = "liquid in a container",
df = phd_data_metaphor) %>%
mutate(submappings = scaps(submappings)) %>%
knitr::kable(caption = paste("Submappings for ", scaps("happiness is liquid in a container."), sep = ""), row.names = TRUE)
```
Column `n` shows the 'token frequency' of the submappings (with `perc` indicates the token's percentage). Meanwhile `type` shows the 'type frequency' of the submappings (i.e., the number of different lexical unit types evoking the corresponding submappings of a given metaphor).
Use `get_frames()` to retrieve frequency profiles of the source frames for a given metaphor:
```{r frames-liquid-in-container}
# get the source frames evoked by the metaphorical expressions for the liquid in a container
get_frames(metaphor = "liquid in a container",
df = phd_data_metaphor) %>%
mutate(frames = scaps(frames)) %>%
knitr::kable(caption = paste("Source frames for ", scaps("happiness is liquid in a container."), sep = ""), row.names = TRUE)
```
Based on the same data, it is also possible to retrieve a frequency table for the lexical units and the submappings they evoke for a given metaphor. Use `get_lu_submappings_table()` for this purpose.
```{r lu-submet-liquid-in-container}
get_lu_submappings_table(metaphor = "liquid in a container",
df = phd_data_metaphor) %>%
mutate(submappings = scaps(submappings), # small-cap the submapping
lu = paste("*", lu, "*", sep = "")) %>% # italicised the printed lexical units
knitr::kable(caption = paste("Evoked submappings for the lexical units of the ",
scaps("happiness is liquid in a container"), " metaphor.", sep = ""),
row.names = TRUE)
```
The column `perc_expr_overall` indicates the percentages of the token frequencies of the lexical units for the given metaphor. Meanwhile `perc_expr_by_submappings` indicates the percentages of the lexical units for each submapping of the given metaphor.
#### Visualising the frequency of occurrences for the body-part terms in the metaphorical expressions
The function for generating Figure 5.1 in Chapter 5 is `plot_body_part()` with `phd_data_metaphor` as the only input argument:
```{r body-part-figure, fig.asp = 0.7, fig.width = 6.5}
plot_body_part(df = phd_data_metaphor)
```
The barplot shows the distribution of the body-part terms that are explicitly mentioned in metaphorical expressions about <span style=font-variant:small-caps;">happiness</span> in the sample.
The following codes are used to generate Table 5-12 in Chapter 5 for the top-10 most frequent co-occurrence of body-part terms and the metaphors:
```{r body-metaphors-frequency-table}
# body-part gloss
bp_gloss <- tibble(gloss = c('chest/bosom', 'self', 'liver', 'eyes', 'face', 'body', 'face', 'face', 'deepest part of the heart', 'lips', 'mouth', 'body; bodily'),
body_part_terms = c('dada', 'diri', 'hati', 'mata', 'muka', 'tubuh', 'wajah', 'paras', 'lubuk kalbu', 'bibir', 'mulut', 'jasmani'))
# generate the table
phd_data_metaphor %>%
filter(body_part_inclusion %in% c('y')) %>%
count(body_part_terms, metaphors) %>%
arrange(desc(n)) %>%
left_join(bp_gloss, by = 'body_part_terms') %>% # join the glossing tibble
select(metaphors, body_part_terms, gloss, n) %>%
mutate(metaphors = scaps(metaphors),
body_part_terms = paste("*", body_part_terms, "* '", gloss, "'", sep = "")) %>%
select(Body_parts = body_part_terms, Metaphors = metaphors, N = n) %>%
top_n(10, N) %>%
knitr::kable(caption = 'The ten most frequent <span style="font-variant:small-caps;">Body-part</span>`*`<span style="font-variant:small-caps;">Metaphors</span> co-occurrence for <span style="font-variant:small-caps;">Happiness</span> in Indonesian.', row.names = TRUE)
```
### Chapter 7 - Distinctive metaphors and collocates for <span style="font-variant:small-caps;">happiness</span> near-synonyms in Indonesian
The distinctiveness of a given metaphor and collocate with each happiness synonym is measured using one-tailed, Binomial Test implemented in the *Multiple Distinctive Collexeme Analysis* (MDCA) [cf., e.g., @hilpert_distinctive_2006; @hoffmann_collostructional_2013, pp. 299-300]. The function to perform MDCA is `mdca()`.
```{r mdca-metaphors}
# MDCA for metaphor * synonyms
mdca_res <- mdca(df = phd_data_metaphor,
cxn_var = "synonyms", # `cxn_var` = constructions column
coll_var = "metaphors") # `coll_var` = collexeme/collocates column
```
The input data frame for performing MDCA for the distinctive collocates are available as `colloc_input_data`. The English gloss/translation for the distinctive collocates are stored in `dist_colloc_gloss`.
```{r mdca-collocates}
# mdca for window-span collocational data
mdca_colloc <- mdca(df = colloc_input_data,
cxn_var = "synonyms",
coll_var = "collocates")
```
The package also provides two related functions to retrieve the *attracted*/*distinctive* and the *repelled* items from the results of MDCA. They are `mdca_attr()` and `mdca_repel()`. The following example shows how to get the distinctive metaphors for *kesenangan* 'pleasure; happiness' having the association strength of equal to, or greater than, two (i.e. *p*~binomial~ < 0.01) [@rajeg_metaphorical_2018, Ch. 7, Table 7-5]:
```{r dist-for-kesenangan}
mdca_res %>%
mdca_attr(filter_by = "cxn",
cxn_type = "kesenangan",
min_assocstr = 2) %>%
mutate(exp = round(exp, 3L), # round the expected co-occurrence frequency
metaphors = scaps(metaphors)) %>%
select(-synonyms) %>%
as.data.frame() %>%
knitr::kable(caption = "Distinctive metaphors for *kesenangan* 'pleasure'", row.names = TRUE)
```
The `p_holm` column provides the Holm's corrected significance level [@gries_statistics_2009, pp. 249, 251] of the Binomial Test *p*-value (`p_binomial`). The Binomial *p*-value is used as the basis for the association strength value (`assocstr`) [cf. @hoffmann_collostructional_2013, p. 305], which is derived via the log-transformed *p*~Binomial~-value to the base of 10. The `dec` column indicates the significane of the association between the metaphor and *kesenangan* 'pleasure' at the corrected level. Column `exp` shows the 'expected' co-occurrence frequency of the metaphor with *kesenangan* while `n` is the 'observed' co-occurrence frequency in the sample.
The following code shows the use of `mdca_repel()` for retrieving metaphors strongly dissociated with *kesenangan* 'pleasure' [@rajeg_metaphorical_2018, Ch. 7, Table 7-6]:
```{r repel-by-kesenangan}
mdca_res %>%
mdca_repel(filter_by = "cxn",
cxn_type = "kesenangan",
min_assocstr = -2) %>%
mutate(exp = round(exp, 3L),
metaphors = scaps(metaphors)) %>%
select(-synonyms) %>%
knitr::kable(caption = "Repelled metaphors for *kesenangan* 'pleasure'", row.names = TRUE)
```
Finally, the codes below show how to retrieve the top-20 most distinctive collocates co-occurring with *kesenangan* 'pleasure' within the span of 4 words to the left and right of *kesenangan* [@rajeg_metaphorical_2018, Ch. 7, Table 7-7].
```{r colloc-res-kesenangan}
# present the result table for collocational analysis of *kesenangan*
mdca_attr(mdca_colloc,
cxn_type = '^kesenangan') %>%
top_n(20, assocstr) %>%
left_join(dist_colloc_gloss,
by = "collocates") %>% # left-join the gloss for the distinctive collocates
select(-synonyms) %>%
select(collocates, gloss, everything()) %>%
mutate(exp = round(exp, 3),
collocates = paste("*", collocates, "*", sep = "")) %>%
knitr::kable(caption="The 20 most distinctive, 4-window span collocates for *kesenangan* 'pleasure' in the whole Indonesian Leipzig Corpora collection.", row.names = TRUE)
```
It appears that *kesenangan* 'pleasure' is strongly associated with negative nuance as it more frequently co-occurs with words, such as *dosa* 'sin', *hawa nafsu* 'lust', *nafsu* 'lust', *seksual* 'sexual', and *duniawi* 'worldly; earthly'.
## Session info {#sessioninfo}
```{r sess-info}
devtools::session_info()
```
## References