Skip to content

Visual data exploration with Virginia's public COVID-19 cases dataset

Notifications You must be signed in to change notification settings

jammy-bot/va-covid-eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plotting COVID-19 in Virginia Localities

Histogram rug of hospitalizations per 1,000 population, by locality (excerpt) Histogram rug of hospitalizations per 1,000 population, by locality (excerpt)

Overview and Motivation

Effective July 1, 2020, the state of Virginia entered the third phase of the “Forward Virginia” plan to gradually ease restrictions in place for COVID-19. On July 28, additional restrictions were imposed on restaurants and bars in the Hampton Roads area of southeastern Virginia (Schneider, Gregory S. (July 28, 2020). "Virginia governor adds restrictions in Hampton Roads region after surge in Coronavirus cases". The Washington Post. Retrieved July 28, 2020.). This project was inspired out of an interest in comparing the severity of later outbreaks, in southeaster Virginia with the number and proportion of cases in other areas of the state. In other words, in areas where the cases, hospitalizations, or deaths were decreasing, were they higher or lower than in lately restricted areas?

This is project not a predictive analysis, instead serving comparative analysis for a limited subset of relevant data over particular time frames.

Objectives

To gauge how the Hampton Roads numbers compare to other areas of Virginia, such as the state's capital city of Richmond, this study primarily investigates the data using interactive plotting with Plotly Express. This approach enables visualization of data for multiple localities on a single figure, with the option to hover or drill - down for greater detail (see published notebooks).

The Dataset

Plot image of the United States, with Virginia colored red and all other states in light gray with white borders. Virginia highlighted on a map of the United States.

Virginia's public COVID-19 cases dataset

Data is sourced from the Virginia Department of Health (VDH). The particular copy of the dataset used in this repository was last updated July 30, 2020. VDH is, itself, a robust source of data and visualizations related to this health crisis. Their dataset continues to be updated regularly.

Each row in the dataset represents the overall count of COVID-19 cases, hospitalizations, deaths for each locality in Virginia by report date since reporting began for this dataset.

Column Name Description Type
Report Date Date when the case, hospitalization, or death is published Date & Time
FIPS 5-digit code (51XXX) for the locality Plain Text
Locality Independent city or county in Virginia Plain Text
VDH Health District Health district name assigned by the Virginia Department of Health. There are 35 health districts in Virginia. Plain Text
Total Cases Total number of COVID-19 cases Number
Hospitalizations Total number of COVID-19 hospitalizations Number
Deaths Total number of COVID-19 deaths Number

State population data has been integrated, for additional context and insight.

Population estimate data for Virginia localities

Data was sourced from University of Virginia's Weldon Cooper Center for Public Service Demographics Research Group, published on January 27, 2020.

Column Name Description
FIPS Code 3-digit code (XXX) for the locality
Locality Independent city or county in Virginia
April 1, 2010 Census Official population, count from the 2010 Census
July 1, 2019 Estimate Population approximation "based on a variety of observed administrative record data, such as births, deaths, school enrollment, and residential housing construction"

Data Preparation

Since data sources used are actively employed in the public presentation of state health information, they are usable as obtained. Data is read data into Pandas dataframes from CSV files, with a few adjustments required:

  • Converting and cleaning column names to
  • Changing a few data types
  • Dropping an unneeded VDH Health District feature

Visualization / Analysis

Cases by Locality Static capture, from interactive Plotly Express line plot

Thanks to interactive plotting, the first visualization was capable of answering many of the raised questions (link to interactive version in the Featured Notebooks section, below). It is easy to see (in the interactive plot) which localities lead the case count and a simple matter to view hover statistics for each line on the plot. Of course, since there are 133 localities, things do get a bit dense in places.

Hospitalizations by Locality Static bar plot of 10 highest locality hospitalization counts

Static plots in matplotlib (with no interactivity necessary), show the localities with each of the highest total cases, hospitalizations, and deaths related to COVID-19 for the period under analysis. Fairfax far exceeds other localities, in each case. Of the Hampton Roads localities, only Virginia Beach is listed in the 10 for each of the three plots (Chesapeake is also in the top - 10, for hospitalizations).

Total cases over time, by Locality Static image of animated bar plot, cases by locality

An animated plot of total cases over time, by locality, shows Richmond cases seeming most likely to have resulted in death, through mid - July (link to interactive version in the Featured Notebooks section, below). It was then surpassed by Virginia Beach in both the number of deaths and in the total number of cases.

Select locality deaths, July 30 From animated bar plot, deaths by locality, July 30

A plot of deaths over time, by locality, indicates that COVID cases were less - likely to receive hospital treatment, in Norfolk, compared to Richmond (link to interactive version in the Featured Notebooks section, below). As the rate of death appears to slow toward the end of July, for Richmond, it appears to pick up pace in Virginia Beach. Meanwhile, the number of Virginia Beach hospitalizations appear well below that of Richmond.

Select locality deaths, July 01 July 01 deaths by locality, from animated scatter plot

Animated scatter plots show cases growing more rapidly in Richmond at the start of our timeline, with Virginia Beach later overtaking the capital in daily deaths and total cases (link to interactive version in the Featured Notebooks section, below). While Virginia Beach leads in the number of hospitalizations, at the beginning of our timeline, it is far surpassed by Richmond from the second week of May through July.

Adding Population Data

Select locality deaths, July 01 July 30 cases & deaths against hospitalizations

Features engineering for the project involved merging dataframes on the Federal Information Processing Standard (FIPS) codes, present in both source datasets. This required additional preparation for population data. A data subset was read into Pandas, county FIPS codes were padded with leading zeros and prepended with the two - digit, Virginia state code. In addition, the codes were converted from floats to integers, unneeded columns and rows were dropped, and column names were cleaned. Finally, the dataframes were merged, the duplicate FIPS code column was removed, and new features were created to reflect each of the statistical columns per 1,000 population.

Additional visualizations include interactive histogram plots (link to interactive version in the Featured Notebooks section, below), in which we observe long tails to the right due to both the presence of localities with relatively low populations (making each of their cases more significant as a fraction of population) and to the number of records (daily) for each of those localities. To our original question, regarding comparison of Hampton Roads community pandemic statistics to those of the capital city: visualizations show a significantly greater number of cases in Norfolk and Virginia Beach, by the end of our timeline; however, hospitalizations and deaths in Richmond exceed those of each of the SE Virginia localities, both in raw numbers and per 1,000 of their respective population estimates.

Notebook Hyperlink
Filename virginia-covid-OSE.ipynb
Description Obtain, scrub, and explore with static plots.
Repository Location notebooks: virginia-covid-OSE.ipynb
Interactive Notebook (external) On Deepnote
Notebook Hyperlink
Filename virginia-covid-limited-explore.ipynb
Description Explore locality statistics and interactively plot for localities of interest.
Repository Location notebooks: virginia-covid-limited-explore.ipynb
Interactive Notebook (external) On Deepnote
Notebook Hyperlink
Filename va-covid-merge-explore.ipynb
Description Explore against population data.
Repository Location notebooks: virginia-covid-merge-explore.ipynb
Interactive Notebook (external) On Deepnote
Notebook Hyperlink
Filename va_covid_plots.ipynb
Description Single notebook, complete project (pre - notebook segmentation)
Repository Location va_covid_plots.ipynb
not externally published

Technologies

  • Python
    • Datapane
    • Matplotlib
    • Numpy
    • OS
    • Pandas
    • Pickle
    • Plotly Express
  • Deepnote
  • Jupyter Notebooks

About

Visual data exploration with Virginia's public COVID-19 cases dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published