-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
148 lines (115 loc) · 4.48 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please don't edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
eval = FALSE,
collapse = TRUE,
comment = "#>"
)
library(DiagrammeR)
library(DiagrammeRsvg)
library(rsvg)
```
# QualMiner <a href='https://bedatadriven.github.io/QualMiner/'><img src='https://raw.githubusercontent.com/bedatadriven/QualMiner/master/media/qualminer-logo.png' align="right" width="200"/></a>
Exploring qualitative indicators via text mining methods.
This repository contains the source code. The data used in the analysis can only
be accessed by using [*ActivityInfo*](https://www.activityinfo.org/) with proper
permissions.
### Usage
The website is generated by using [bookdown](https://bookdown.org/). Here are
the common steps to generate the analysis are explained below. For more detailed
information about how bookdown works, it is advised to take a look at the
(online) book.
*Yihui Xie (2019).
bookdown: Authoring Books and Technical Documents with R Markdown.
Chapman and Hall/CRC. ISBN 978-1138700109*
<https://bookdown.org/yihui/bookdown/>
#### 0. Authenticate
The call below authenticates the current user for the requests to get the data
from *ActivityInfo*. See [**Notes**](#notes) section for more details.
```r
activityinfo::activityInfoLogin()
```
#### 1. Retrieve data & ETL
Source `etl.R` file in the `R/` directory to pull the data from *ActivityInfo API*
and process it to make it ready for analyses. At the end of the pull, a JSON file
containing the data will
be saved in the `data/` directory.
#### 2. Add new sections to notebook
Render [RMarkdown](https://rmarkdown.rstudio.com/) files are placed in the
`analysis/` directory.
For instance, you create a new file called, e.g.
`analysis/new-section.Rmd`.
Then, add this new file name to the `analysis/_bookdown.yml` inside the `rmd_files`
array. The place where you add the path in the array is important because the
array orders the sections of the notebook.
#### 3. Render notebook
Call this command in *R* console in order to render the bookdown site:
```r
source("R/render.R"); render_bookdown()
```
Rendering does not require any connection with the *ActivityInfo API* but the
JSON data file must exist in the `data/` directory.
You will see in the git status that there are a bunch of new files created inside
the `docs/` folder because the rendered site is set to live in the `docs/`
folder in GitHub
([see why](https://help.github.com/en/articles/configuring-a-publishing-source-for-github-pages)).
#### 4. Publish
Once you are happy with the changes in local, commit the rendered files in the
`docs/` folder and push them to the remote. You can use the following command in
your shell:
```bash
git add docs/
git commit -m "Render site"
git push origin master
```
### Process diagram
<!-- DIAGRAM START -->
```{r graph generation, eval=TRUE, include=FALSE}
script <- 'digraph diagram {
graph [layout=dot];
edge [decorate=true];
rankdir=LR;
// AI color codes:
// #DBF1F1 (light turquoise)
// #242934 (almost black)
// OBJECTS:
ActivityInfoSource [shape=cylinder,
label="ActivityInfo\nSource",
style=filled color="#242934" fillcolor="#DBF1F1"];
MainData [shape=parallelogram, label="Data generated and loaded"];
Analyses [shape=box, label="Analyses"];
InteractiveNotebook [shape=tab, label="Interactive\nNotebook"];
// ACTIONS:
DataPulling [shape=rect, label="Pull data\nActivityInfo API"];
ETL [shape=rect, label="ETL\ndata preparation"];
// FLOW:
{rank=same
ActivityInfoSource->DataPulling;
}
DataPulling->ETL;
ETL->MainData;
MainData->Analyses;
Analyses->InteractiveNotebook;
}'
graph <- DiagrammeR::grViz(script)
```
```{r svg to file, eval=TRUE, include=FALSE}
svg <- DiagrammeRsvg::export_svg(graph)
diagram.fname <- file.path("media", "diagram.png")
rsvg::rsvg_png(charToRaw(svg), diagram.fname)
```
```{r display, eval=TRUE, echo=FALSE, out.width = "95%"}
knitr::include_graphics(diagram.fname)
```
<!-- DIAGRAM END -->
### Technical notes
+ The data can be accessed with *ActivityInfo* API by using user credentials.
The [**ActivityInfo R Language Client**](https://github.com/bedatadriven/activityinfo-R)
provides good documentation.
+ [**dplyr**](https://cran.r-project.org/package=dplyr) package is chosen as it is
useful for rapid ad-hoc analyses. The selected analysis code can be rewritten in
base R, which is proven to be more robust and stable for production
environments, towards the end of the project.