About to move to a static analysis version. Big changes coming!
I started this purely out of curiosity to see if Theresa May was really repeating herself a lot. But by way of comparison I quickly extended it to other speakers. I also wondered if I could capture "the essence" of a piece without actually having to read it.
It was originally called Theresaurus - which looks cool - but Trumpasaurus is much more enjoyable to say.
The analysis is client-side JavaScript but the speeches are loaded on demand by AJAX so it must be served by a web server.
Like this one: https://deanturpin.github.io/trumpasaurus/
During dev you can run one locally with python (one level up from your repo)
python -m SimpleHTTPServer
And connect with your web browser: http://0.0.0.0:8000/trumpasaurus/
There's also a Greasemonkey script.
Tested on recent Firefox, Safari and Chrome on the desktop. Chrome and Safari on iPhone.
This can be done entirely within github.
- Fork this repo
- Add a new text file in speeches
- In index.html: add a new option in the select tag with the new file
- In your repo settings: select "master" branch as the source in the GitHub Pages section
- View it on your github.io
Conversations need preprocessing to split them into separate files. I used the
tools/split-speech.sh
script. I've left the speeches largely untouched
unless some anomalies jump out of the results. The PDF to text conversion of the
Lib Dem manifesto for example was littered with ●● - an artifact of the PDF to
text conversion - so I removed them by hand (in vim).
Keyword counts were generated by running the keywords.sh
script in the
speeches folder. Pass it a list of files to compare.
$ cd speeches
$ ../tools/keywords.sh
Conserv SinnFei DUP Labour Green UKIP Libdem
1 abortion
1 badger
6 blair
8 7 1 25 2 42 23 brexit
1 1 cameron
clegg
14 6 5 12 climate
28 4 6 66 11 21 68 community|communities
41 68 17 32 conservative
1 1 3 corbyn
13 1 2 23 25 15 crime
8 6 4 1 2 cyber
2 2 7 1 5 4 debt
Converting manifesto PDFs to text
$ pdftotext DUP_Wminster_Manifesto_2017_v5.{pdf,txt}
I find reloading the page automatically really useful during development.
// Peridoically reload page if there's a "reload" token in the URL
setInterval(function() {
if (window.location.href.split("?").pop() === "reload")
window.location.reload()
}, 2000)