Update: osha.org is blocking traffic from scrapers.
TODO: Use splash headless browser and/or implement proxy to circumvent block.
A scraper API for Fatality and Catastrophe Investigation Summaries
To install, you can clone the project:
git clone git@github.com:jwc20/fcis-api.git
cd fcis-api
pip install -r requirement.txt
You can use description, abstract, and keyword words to search the reports.
Note that "abstract" served the same purpose as "description" for older reports.
import fcis
desc = ["employee"]
abst = ["employee"]
keyw = ["fire"]
client = fcis.FCIS(descriptions=desc, keywords=keyw)
To search workplace accident report, type:
reports = client.get_accidents(p_show=100)
print(reports)
and you will get:
[{'accident_id': '141245.015',
'summary_url': 'https://www.osha.gov/pls/imis/accidentsearch.accident_detail?id=141245.015',
'summary_nr': '141245.01',
'event_date': '11/17/2021',
'report_id': '0213900',
'fatality': None,
'sic_url': None,
'sic_number': None,
'event_description': 'Employee Is Killed After Falling Through Elevator Shaft',
'fatility': 'X'},
...
To get details of individual report, type:
import fcis
client = fcis.FCIS()
You can use use the id of the accident details (found in searching the results) to get the details.
details = client.get_accident_details(ids=["570341"])
print(details)
To get:
{'accident_number': '570341',
'report_id': '0522300',
'event_date': '08/15/1984',
'inspection_url': 'establishment.inspection_detail?id=1667450',
'inspection_number': '1667450',
'open_date': '08/16/1984',
'sic_number': '4741',
'establishment_name': 'Mobile Tank Car Services',
'detail_description': 'THREE EMPLOYEES WERE CLEANING A RAILROAD TANK CAR CONTAINING RESIDUES OF COAL TAR LIGHT OIL, A FLAMMABLE LIQUID. ONE WAS ON TOP OF THE CAR, THE OTHER TWO WERE INSIDE. THEY WERE USING STEEL SHOVELS AND A NON EXPLOSION-PROOF LIGHT INSIDE THE CAR. THE VAPORS IGNITED, KILLING THE TWO EMPLOYEES INSIDE AND BURNING THE ONE ON TOP. THE OTHER EMPLOYEES WERE INJURED IN THE RESCUE ATTEMPT.',
'keywords': ['burn',
' coal tar light oil',
' flammable vapors',
' railroad tank car',
' cleaning',
' explosion'],
'Employee': [{'Employee #': '1',
'Inspection': '1667450',
'Age': '',
'Sex': '',
'Degree': 'Fatality',
'Nature': 'Asphyxia',
'Occupation': 'Occupation not reported',
'': ''},
{'Employee #': '2',
'Inspection': '1667450',
'Age': '',
'Sex': '',
'Degree': 'Fatality',
'Nature': 'Asphyxia',
'Occupation': 'Occupation not reported',
'': ''},
...
}
- beautifulsoup4
- lxml
- requests