Skip to content

Latest commit

 

History

History
20 lines (13 loc) · 999 Bytes

README.md

File metadata and controls

20 lines (13 loc) · 999 Bytes

Text mining movie scripts to perform the Bechdel test using spaCy

The Bechdel test is a measure of representation of women in fiction. It asks whether a work features at least two female characters who talk to each other about something other than a man.
Within the framework of this project I used text mining techniques on 1500+ movie scripts that I downloaded/parsed from the internet to explore long-term trend of female representation in movies by performing the test on them.

Data files can be downloaded from:
https://drive.google.com/drive/folders/1konx-AYGYk2zGTdHR97vgQAl_IB2r9Q2

Data files not included:

  • Raw, unprocessed movie scripts (2071 txt/pdf/rtf/doc file, ~1.1GB) - can be downloaded through .py files
  • Results evaluation from 3 judges
  • Exported CSV files

The project report can be accessed at: https://www.dropbox.com/s/96czpl7e5xerhtp/IRTM_project_report.pdf?dl=0

Inspirations for the data structures were taken from: https://www.youtube.com/watch?v=jRKKPYDs44o