Skip to content

Commit

Permalink
updated readme to reflect file organization
Browse files Browse the repository at this point in the history
  • Loading branch information
savannahmhunter authored May 27, 2021
1 parent dc4d50e commit cf4cd74
Showing 1 changed file with 0 additions and 6 deletions.
6 changes: 0 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,6 @@ It is the third compenent that has likely led you here. We are managing our work
## Organization of files
Each script represents a stage in the data cleaning or analysis process. This section details the order in which to run the files. The OSHA_Data.Rproj is the project file.

The files are organized into two main folders.
* First, the 1_Data_Creation_Cleaning folder contains files used to create the final dataset.
* Second, the 2_Data_Exploration_and_Analyis folder contains files for exploratory data analysis that are in progress.

Within the 1_Data_Creation_Cleaning folder there are two additional folders Cleaned_Data and Raw_Data. The Raw_Data are converted to Cleaned_Data using the numbered rscripts in this folder. [You can learn about how to obtain OSHA data in our wiki](https://github.com/Carceral-Ecologies/Caceral-OSHA-Data/wiki/How-to-obtain-OSHA-data). Here is order in which to run the files and brief explanation of what each file does.
* First, 1_OSHA_inspections loads in the OSHA inspection datasets and filters the data to include only prisons using NAICS codes, SIC codes, and establishment name information.
* Second, 2_OSHA_violations loads in the violations data and then filters that data to only include prisons identified in the 1_OSHA_inspections script.
Expand All @@ -49,8 +45,6 @@ Within the 1_Data_Creation_Cleaning folder there are two additional folders Clea
* Sixth, 6_open_refine_JSON_cleaning_code is a text document that contains JSON code designed to be run in OpenRefine to clean the city, address, and establishment name information in the final dataset. While OpenRefine is very helpful ultimately this step could not accomplish the level of cleaning required. Thus in step 8 there is an additional file to run for cleaning address and facility name information. Future projects may want to skip the OpenRefine step and add additional cleaning steps to step 8, thus making the process slightly more parsimonious.
* Seventh, at this point the dataset is for prisons in the US. This file subsets the data to California prisons only. We detail some of the issues comparing data across states in the [Wiki](https://github.com/Carceral-Ecologies/Caceral-OSHA-Data/wiki/Data-Quality).
* Eighth, this file systematically cleans address and facility name information so identical facilities can be identified (the names of facilities do not appear to be standardized by OSHA.) and produces the final dataset in both long and wide formats.

Within the 2_Data_Exploration_and_Analysis folder contains several files for preliminary data exploration. These files are incomplete and inprogress.

## Code of Conduct

Expand Down

0 comments on commit cf4cd74

Please sign in to comment.