Serverless web scraper for real-time covid-19 data in Nigeria.
- Data gotten from NCDC's official website
- Web scraper built with python using
BeautifulSoup
andrequests
modules - Scheduled scraping with AWS lambdas and AWS CloudWatch
You can install requirements using the pip package manager by running
pip3 install datetime beautifulsoup4 requests lxml
To manually scrape the data from NCDC, run
python3 naijacovidscraper.py
- Create an S3 bucket.
- Create AWS
LambdaExecute
policy to access S3 bucket. - Create a new AWS Lambda and upload zipped python script (with dependencies)
- Create a Lambda function (see lambda_function.py) and add layer in Step 4.
- Create new Event/rule using AWS CloudWatch
cron expression for 12 hourly schedule:
0 */12 * * ? *
seejsontodf.py
to convert JSON files to pandas Dataframes