-
The aim of the project is to analyze correlations between the threat status of a particular species tracked on the IUCN Red List, and their threats and stresses.
-
This repository is dedicated to scrapping the necessary datafields from the IUCN Red List to prove such correlations.
-
This project is a collaboration with Uttara Mendiratta and Anand M Ossuri from the Nature Conservation Foundation, India.
-
The
birds.csv
andmammals.csv
contain the species for which the data has to be scrapped. -
The permissions of the
start.sh
have to be changed before the first run of the code.user@computer:~/Red chmod +X start.sh
-
The pipeline is triggered using the
start.sh
script, that in-turn triggers thescraper.py
code.user@computer:~/Red ./start.sh
-
The scrapped data is stored to the disc in the form of a
X_WORKING.csv
file, a copy of the original.csv
, ensuring the originals are not tampered with.
- The model is made of two components: 1.
interface.py
and 2.scraper.py
.
Figure 2.1 Model to scrape data from IUCN Red List
-
Disk write/read operations are handled by the
interface.py
code. -
The
pandas
dataframe is saved to the disc by theinterface.py
code after each run.
-
The
scraper.py
interacts with the webpage using the Selenium framework for performance testing. -
The
HTML
tags
contained in thepage_source
gathered by theSelenium
middleware code is made searchable usingBeautifulSoup
-
The
scraper.py
pipeline collects the prescribedHTML
tags from the website queried and updates apandas
dataframe with the information. -
The
speciesCounter()
of thescraper.py
script returns thesno
of the last species that's missing thestable
,unknown
ordecline
population trend tags, which all scrapped species must have.
-
While writing elements to the
pandas
dataframe an element maybe right-shifting a column(s). This error may lead to apandas
memory warning, considreing entities of multiple datatypes occupy the same column. -
Some species are not indexed by the IUCN Red List. This may cause the
start.sh
script to loop while trying to collect the speciesURL
from the searchpage.
If you decide to use our client, scraper or cleaner for your project, or as a means to interface with the IUCN database, please cite our 2021 Conservation Letters paper!
@article{mendiratta2021mammal,
title={Mammal and bird species ranges overlap with armed conflicts and associated conservation threats},
author={Mendiratta, Uttara and Osuri, Anand M and Shetty, Sarthak J and Harihar, Abishek},
journal={Conservation Letters},
volume={14},
number={5},
pages={e12815},
year={2021},
publisher={Wiley Online Library}
}