A Bayesian analysis of polling data and historical voting records to try to predict the outcome of the 2016 U.S. Presidential race.
Carl Ehrett, November 7 2016
In this project I present a Bayesian model for predicting the outcome of the 2016 presidential election based on a combination of historical election results (from the previous 10 elections) and polling data (from Google Consumer Surveys).
The model views the election in each state
This project was completed in early November 2016, prior to the election. The forecast gives Hillary Rodham Clinton a 54% chance of becoming President, and Donald Trump a 46% chance.
For full details of the model, the data sources, the methodology used, and the results of the analysis, please consult the report here. The report includes a breakdown by state, showing for each state the estimated probability of Clinton victory. The report also indicates which states should be considered swing states.
The contents of this github repository are as follows:
main.r: Main R script, which performs the Bayesian analysis.
all_polls.csv: Collection of 13 Google Consumer Survey polls from Aug. 10 to Nov. 1 2016.
modern_results_by_state.csv: Spreadsheet of presidential electoral results by state and election year, from 1976 to 2012.
In addition to the files included in the repository, the coda library must be installed in R in order to perform the analysis.
To run the software with respect to the 2016 election using the included data from the Google Consumer Survey polls, simply run the R script main.r in a working directory containing the spreadsheets all_polls.csv and modern_results_by_state.csv.
To run the software with respect to another Presidential election, simply replace the two spreadsheets with spreadsheets containing polling data and historical data relevant to the election in question. The historical data should take the form (like modern_results_by_state.csv) of a sheet in which each row corresponds to a state (in alphabetical order), each column corresponds to an election year (in chronological order), and each entry contains a 1 for Democratic victory in that state and year, and a 0 for Republican victory. (No third party has won electoral votes in recent history; if this changes, substantive revisions will be required for this model to be used.) The polling data should conform to the format of the Google Survey polls (for which, see either the full report or else directly examine the spreadsheet all_polls.csv). Alternatively, polling data in a different format could be used, with modifications made to main.r in order to accommodate the new format.