-
Notifications
You must be signed in to change notification settings - Fork 0
/
slides.qmd
129 lines (83 loc) · 4.36 KB
/
slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "Farming communities<br>in Dutch media"
title-slide-attributes:
data-background-color: "#00A6D6"
subtitle: "Web Scraping & Topic Modeling using R"
date: "20 March 2024"
author:
- "Claudiu Forgaci"
- "Shuyu Zhang"
format:
revealjs:
slide-number: c/t
# logo: fig/Logo_Rbanism_White.png
css: [style.css, logo.css]
# theme: [default, custom.scss]
chalkboard: true
footer: "Web Scraping & Topic Modeling with R"
---
# Workshop outline {.smaller background-color="#00A6D6"}
1. Tutorial: Scraping data from social media (Shuyu) - 30 min
2. Tutorial: Finding topics in Dutch news (Claudiu) - 1 hour
3. Exercises: Visualising scraped data & topic modeling with your own data - 1 hour
# Setup {background-color="#00A6D6"}
1. Log in to RStudio Server at **[edu.nl/ph3a6](https://edu.nl/ph3a6){target="_blank"}** with
- the username from your handout
- the password <b>Rban1smGuest</b>
2. Open **[edu.nl/qh34g](https://edu.nl/qh34g){target="_blank"}** in a new tab
3. When you are ready put a green sticky note on your laptop and follow the instructions on the presenter's screen
# Part 1: Scraping data from social media {.center background-color="#009B77"}
# The question {.center background-color="#009B77"}
What are the main farming-related topics discussed in social media?
# The data {.center background-image="fig/x.png"}
Social media data<br>from Facebook or X
## Why social media data?
In the digital media era, social media platforms and websites serve as valuable sources for collecting user-generated content, which in turn represents the voices and opinions of citizens. Twitter (X)'s real-time nature and vast array of tweets cover diverse topics, providing insights into current events, public reactions, and emerging trends. Similarly, Facebook's extensive user base and features like status updates and comments offer rich data on user experiences and interactions.
# The approach {.center background-color="#009B77"}
Web scraping using the Chrome extension<br><b>[Web Data Research Assistant](https://southampton.ac.uk/~lac/WebDataResearchAssistant/)</b>
## What is web scraping?
Web scraping is the process of extracting data from websites. It involves automated techniques to collect information from web pages or social media website, typically in formats such as HTML, XML, or JSON. Web scraping allows you to retrieve specific data elements, such as text, images, or tables, from web pages and store them for analysis or other purposes.
# Application in RStudio {background-color="#009B77"}
Open **`script_socialmedia.qmd`**<br>
and follow the instructions there
# Part 2: Finding topics in the Dutch news {.center background-color="#0076C2"}
# The question {.center background-color="#0076C2"}
What are the main farming-related topics discussed in Dutch news since 2022?
# The data {.center background-image="fig/news.jpg"}
51 newspaper articles<br>in Dutch
# The approach {.center background-color="#0076C2"}
Quantitative analysis:<br>Topic modeling
## Quantitative analysis
<b>Why?</b>
- Reproducible - re-run the analysis with the same results
- Automated - run the analysis on other data
- Scalable - run the analysis on (much) more data
<b>Good to know:</b>
- Requires (some) knowledge of statistics
- Results depend on the amount and quality of data available
## The method
Topic modeling using Latent Dirichlet Allocation (LDA) is a method used to reveal latent topics in unstructured text data.
::: {.columns}
::: {.column width="65%"}
![](fig/lda.png)
:::
::: {.column width="35%"}
In an LDA model:
- documents are a mixture of topics
- topics are a mixture of words
:::
:::
# Application in RStudio {background-color="#0076C2"}
Open **`script.qmd`**<br>
and follow the instructions there
## Found this workshop useful? {.smaller}
**Reach out to us with questions**:
- Claudiu Forgaci: C.Forgaci@tudelft.nl
- Shuyu Zhang: S-Zhang-19@tudelft.nl
**Follow Rbanism**:
- Sign up for our newsletter: [edu.nl/wxx98](https://edu.nl/wxx98)
- Follow us on [instagram.com/rbanism_](https://instagram.com/rbanism_)
- See our website: [rbanism.org](https://rbanism.org)
**Citation**:
- Zhang, S., & Forgaci, C. (2024). *What do Facebook or X users say about farming in the Netherlands? - Word cloud analysis using R* TU Delft.
- Forgaci, C. (2024). *What do the Dutch news say about farming communities? - A Topic modeling approach using R* TU Delft.