-
Notifications
You must be signed in to change notification settings - Fork 3
Data Quality
OSHA coverage differs by state. Some states are covered by federal OSHA (which does not cover public employees), some states have plans that add additional coverage for state or local employees and some states have state plans covering both public and private employees. You can see a map of OSHA coverage by state here.
Likely due to these differences in coverage some states are not included in our dataset. Our dataset is also limited to prison facilities from 2010 to 2019. Perhaps some prison facilities were inspected prior to 2010; however, this means that no prisons from the below federal states have been inspected in the past 10 years.
Federal OSHA States missing from the dataset: DE, KS, MT, NE, ND, RI, SD
Additionally, some federal OSHA states in the dataset only appear to have inspection records from federally run prison facilities, but not private. We should examine if these states have private prison facilities and how many to get a sense if missingness is because there are no private facilities in that state.
Federal OSHA states with inspection records only from federal prisons: PA, MS, FL, WV, AL, AR, LA, MA, OK
Overall there also appears to be a lack of private prisons inspected in both states with plans coveraging public employees and states with their own state plans covering both private and public employees.
States with plans covering only public/local employees that also do not have any inspection records for private prisons: IL, NY, CT, VI
States with plans covering both public and private employees that also do not have any inspection records for private prisons: AK, AZ, HI, IA, KY, MN, SC, UT, WY
How might this affect data analysis?
- We need to check how many private prisons are in the above states to get a sense of just how much missingness there is. Perhaps there are no private prisons in our dataset because there are no private prisons in those states (but this seems unlikely).
- We could also check if there are inspection records for private prisons from these states prior to 2010.
- Comparison between states is challenging. If we do want to do any comparison we should compare like with like. Meaning we should compare states that have similar types of OSHA coverage.
- It might be interesting to visualize this lack of coverage in some way. Perhaps showing how many facilities there actually are compared to what we have in our dataset.
- We know that the Federal Bureau of Prisons also has their own OSH program. Perhaps prisons are inspected under this umbrella rather than OSHA. We should send in a FOIA request. See #issue9.
Date: October 7, 2020 Contributor: Savannah Hunter
The OSHA inspections data is organized with each activity number as a row. The activity number is a unique number assigned by OSHA for a specific investigation of a facility. A facility could be inspected several times in its history and it will be assigned a unique activity number each time it is inspected. This is important to understand because the analysis currently cannot be done at the facility level but at the activity number level.
The OSHA violations data is organized in long format with each row being a violation. Some facilities have one violation and some have many. It is important to remember that each row is not necessarily a unique facility or activity number but rather a unique violation.
Each violation has an activity number attached to it and we use that number to link the violations data to the inspections data. Creating the combined dataset. This dataset is also in long format with each row being a unique violation. The descriptive analysis file converts this dataset to wide format so that each row is an activity number making it easier to interpret.
Once we get the facility names cleaned we can organize the data by facility and check for multiple activity numbers.
Date: October 13, 2020 Contributor: Savannah Hunter
Some facilities have a number attached to their name. How are these numbers determined? Is there a pattern to the numbers? Do we need to keep the numbers attached?
Indiana has two numbers attached. Each number seems to be a unique facility because they have different addresses.
- 343496048 108957 - INDIANA DEPARTMENT OF CORRECTIONS 4490 REFORMATORY RD
- 343499539 108918 - INDIANA DEPARTMENT OF CORRECTION 711 GREEN ROAD
MN uses numbers (sometimes) too. But they have different numbers and appear to be the same facility but different inspections.
- 97197 - MN DEPARTMENT OF CORRECTIONS - LINO LAKES - 7525 4TH AVE (Activity_nr = 343069472)
- 98216 - MN DEPARTMENT OF CORRECTIONS - LINO LAKES - 7525 4TH AVE (Activity_nr = 343484531)
NC uses numbers (sometimes) too. But they have different numbers and appear to be the same facility but different inspections.
- 342988581 137238 - COUNTY OF STOKES - JAIL 1013 MAIN STREET
- 343747986 140951 - COUNTY OF STOKES - JAIL 1013 MAIN STREET
OR uses numbers (sometimes) too. For the most part it seems like the numbers are like activity_nrs. Each number is unique to the investigation. But there are some inconsistencies (this isn't all of them). For Oregon the number before the name seems unnecessary because we already have an activity_nr showing the inspection is unique.
- 344454335 317724881 - MULTNOMAH COUNTY 1120 SW 3RD AVE - This facility has a number
- 317339257 MULTNOMAH COUNTY 1120 SW 3RD AVENE STE 400 - This one doesn't.
- 342366937 317716993 - STATE OF OREGON DEPARTMENT OF CORRECTIONS 24499 SW GRAHAMS FERRY RD - Has a number
- 313682213 STATE OF OREGON DEPARTMENT OF CORRECTIONS 24499 SW GRAHAMS FERRY RD - Doesn't have a number
- 340954213 317709782 - STATE OF OREGON DEPARTMENT OF CORRECTIONS 2575 CENTER ST NE - Has a number
- 316639483 STATE OF OREGON DEPARTMENT OF CORRECTIONS 2575 CENTER ST NE - Doesn't have a number
WA also uses numbers (sometimes). Like above the number is unique to the activity_nr, not the facility.
- 340715705 WA317936834 - STATE OF WASHINGTON DEPT OF CORRECTIONS 1403 COMMERCIAL STREET - Here their number is the same bc same activity_nr
- 340715705 WA317936834 - STATE OF WASHINGTON DEPT OF CORRECTIONS 1403 COMMERCIAL STREET
- 340926385 WA317937963 - STATE OF WASHINGTON DEPT OF CORRECTIONS MCNEIL ISLAND35 SETLERS RD - Here the numbers are different because different facilities
- 340926419 WA317937896 - STATE OF WASHINGTON DEPT OF CORRECTIONS MCNEIL ISLANDPO BOX 881460
Overall it seems like these numbers listed before facility names are like activity_nrs. They are unique to the inspection, not the facility. So we can probably remove them. However, in the case of Indiana we would have to make sure we knew it was a different facility by looking at the address. Otherwise by the name alone they appear to be the same facility.
Something else to think about is the idea of facility versus entity. Many of these are overseen by the same state office like Indiana Dept of Corrections. So the question is if we are interested in the state's record or if we are interested in the facility's record.
For this analysis I am attempting to understand what facilities are in our data compared to the population of all prisons in CA. I am comparing our dataset to the HIFLD. However, we need to clean the facility names in the OSHA data in order to be able to make comparisons to HIFLD. These are data quality/methodology issues encountered in this analysis.
Method for Cleaning Facility Names I converted the OSHA dataset to wide format by facility name. Then I sorted by zipcode. I then compared facility names with identical zipcodes. If they appeared to be similar but had slightly different names I renamed facilities to the titles that appeared in the HIFLD data. I typically re-named "smaller" facilities or departments to the name of the larger facility in line with how the HIFLD data appears to consider these facilities.
Examples of the facilities in the HIFLD are likely have multiple departments or serve multiple functions.
- For example, "PITCHESS DETENTION CENTER-EAST" provides "provides correctional programs, disaster services, environmental services, holiday assistance, law enforcement services, substance abuse services and youth services for the unincorporated areas of Los Angeles County and contracting cities." However, in HIFLD it is listed as one facility. For our purposes I also treat it as one facility (the OSHA dataset has it listed multiple times under similar but not identical names).
- Stockton has three youth correctional facilities according to the HIFLD data. However, it is impossible in the OSHA data to determine which is which. I am naming them all N.A. CHADERJIAN YOUTH CORRECTIONAL FACILITY because this is what google shows at the address and the majority of the Stockton facilities are called this already.
- Folsom prison in Represa has multiple facilities I used the address to determine whether or not they were part of the old or new Folsom prison. I also renames CA CORRECTIONS AND REHAB DIVISION OF FACILITY PLANNING CONSTRUCTION AND MANAGEMENT and CA CORRECTIONAL HEALTH CARE SERVICES to CALIFORNIA STATE PRISON, SACRAMENTO (SAC)(AKA New Folsom State Prison).
Exceptions:
- HIFLD treats CENTRAL JAIL COMPLEX in Santa Ana as one facility. But looking at the OSHA data it is too different to justify doing that. We have a coroner's office and an inmate health care facility and a jail.
How might this method be limiting?
One way in which cleaning facility names in order to make them identical may be limiting is that facility names may reveal something about the institution.
- For example, "RJ DONOVAN CORRECTIONAL FACILITY PIA SHOE FACTORY" seems to indicate that prison labor occurs here. But in changing the name to the HIFLD "R J DONOVAN CORRECTIONAL FACILITY (RJD)" that bit of information is lost.
- For example, US DEPT OF JUSTICE BUREAU OF PRISONS VICTORVILLE has multiple facilities with different security levels. Only some of the entries in OSHA indicate this. So I labeled them all as "USP VICTORVILLE". We lose specificity. Part is an issue with the OSHA data. Part is an issue on in my coding because I choose to make them all uniform for parsimony.
- For example, changing various activity numbers to SALINAS VALLEY STATE PRISON (SVSP) means we lose that the incident was isolated at one of the mental health facilities.
- Generalizing the facility name may have us run into issues where the same facility names has different attributes (that you would like would be the same) like whether or not the facility was private, public, or federally run.
** Other Data Quality Issues Discovered During Facility Renaming**
- Under Calipatria State Prison we have two separate addresses. 7018 BLAIR RD Calipatria AND HWY 8 AND DUNBAR RD Seeley. I cannot find another facility at HWY 8 and Dunbar RD. In fact I cannot find Dunbar RD. While these could be two separate facilities I cannot determine that. For the time being they will remain under the same name.
HIFLD Data Issues
- There appears to be an error in the HIFLD data. It says "PITCHESS DETENTION CENTER-EAST" is closed. However it appears to still be open.
Date: December 1, 2020 Contributor: Savannah Hunter
Evaluation of the dataset reveals that facilities with the same name sometimes have different owner types listed. This kind of inconsistency is problematic because it has to be corrected manually. This, like the fact that the facility names are not uniform already, could indicate that OSHA does not appear to have a uniform naming or record system. Although this error could also have been introduced during the facility name cleaning step. I list the facilities below with two or more owner types and include a note if the issue could have developed because OSHA facility name recording challenges led me to combine facilities that serve multiple functions.
The following facilities had two or more owner types listed:
- BARRY J NIDORF
- CA DEPT OF STATE HOSPITALS
- CIM
- LAC
- CALIPATRIA - - potential facility naming issue
- CVSP
- DVI
- ISP
- N.A. CHADERJIAN YOUTH CORRECTIONAL FACILITY - potential facility naming issue
- PITCHESS DETECTION CENTER - - potential facility naming issue
- VENTURA YOUTH CORRECTIONAL FACILITY
Date: June 25, 2021 Contributor: Savannah Hunter