Using data on TV schedules and metadata on TV programs from Wikipedia, we explore the popularity of different genres of shows over time, and the race and gender composition of directors, producers, creators, presenters, cast members, music composers, etc.
A small preview of what we find:
The percentage of shows that are on crime has gone from 0 to about 10% over the last 70 years.
The percentage of black cast members, presenters, directors, and producers has remained less than 5%, scraping 0 some years. Trends for gender are slightly more hopeful. The percentage of female cast members has risen from around 30% to about 40% in recent years. The percentage of female directors has risen at a steady clip from 0 to about 10%, while the most pleasing trend has been among female producers, which has risen from 0 to about 35%.
The rest of the document is arranged as follows:
- Scripts --- how did we get the data and run the analyses
- Data --- final data used for the analyses
- Results --- tables and figures
For running the Python scripts at your end, see the readme
-
Get the data
-
Parse and Augment the data
-
Analyze the data
Data are from TV Schedules Data on Wikipedia and gray boxes (informational boxes on the right-hand side of Wikipedia pages of TV programs)
-
- fields: no (id), year (years show was on the air), period (prime time, daytime), day (day of the week), season (Fall, Winter, ...), channel (channel name), channel_optional (misc.), program (program name), begin (start time), end (end time)
-
US TV Schedules plus info. from the gray box of the program
- additional fields: meta_link (link), genre (Crime, Game show, etc.), running_time, meta (JSON of all the meta fields), audio_format, picture_format, created_by, directed_by, starring, presented_by, executive_producers, producers, composers
-
(Augmented) Names of people on TV with crosswalk to TV schedules
- name (name of the person), field (role on the tv program), index (index number in data/us_tv_schedules_meta.csv, gender, race (imputed via Python demographics package, and R packages gender, and ethnicolor)
-
Further Augmented Names File with Imputations of Race and Gender from R
-
Tables
- Gender and Race of Producers, Presenters, etc. over time
- Python pkg. imputation
- R packages imputation + Python imputation
- year, prop1_female_producers (proportion of shows with at least one female producer = total number of shows with at least one female producer/total number of shows), prop1_female_directors (proportion of shows with at least one female director), prop1_female_creators (proportion of shows with at least one female creator), prop_female_producers (total number of female producers divided by the total number of producers), prop_female_directors (total number of female directors divided by the total number of directors), prop_female_creators (total number of female creators divided by the total number of creators), prop_female_cast (total number of female cast members/total number of cast members), prop_female_presenters (total number of female presenters divided by the total number of presenters), prop1_black_producers, ...
- Gender and Race of Producers, Presenters, etc. over time
-
A Few Graphs:
- Percentage of Crime Shows Over time
- Percentage of Black Producers over time
- Percentage of Female Producers over time
- Percentage of Black Presenters over time
- Percentage of Female Presenters over time
- Percentage of Black Directors over time
- Percentage of Female Directors over time
- Percentage of Black Cast Members over time
- Percentage of Female Cast Members over time
Gaurav Sood and Suriyan Laohaprapanon
Released under CC BY 2.0.
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.