Scripts for processing and mining (classic) literature and PDF files
The script covers
- downloading and processing public domain works in the Project Gutenberg collection with gutenbergr
- transforming works into a tidy format
- mining works by
- calculating and plotting word frequencies
- plotting word and comparison clouds
- conducting sentiment analyses (nrc)
using the example of Bram Stoker's Dracula.
Corresponding blog post: https://lhehnke.github.io/notes/2018/01/25/text_mining_the_room
The script covers
- downloading, importing and processing PDF files in R
- transforming PDF files into a tidy format
- mining PDF files by
- calculating and plotting word frequencies
- conducting sentiment analyses (nrc; bing)
- plotting word and comparison clouds
- visualizing the most frequent positive and negative words (bing sentiments)
using the script of The Room a.k.a. the worst film ever made (directed, produced, written by and starring Tommy Wiseau).
Source: https://theroomscriptblog.files.wordpress.com/2016/04/the-room-original-script-by-tommy-wiseau.pdf
Example plot: