This repository contains four main folders:
-
scripts:
- This folder contains all the code necessary for cleaning our web scraping data as part of the assignment.
- It includes Python scripts which will help to clean the data we extracted from websites.
-
output:
- The "Output" folder is where you will find the result of the web scraping process.
- This folder contains the output Excel sheet generated as a result of running the code.
- You can access the final scraped data and results in this folder.
-
data_files:
- The "data_files" folder contains source files required to support the code which we extracted from web scraping using Postman API.
-
Standard Operating Procedure (SOP):
- The "SOP" folder provides detailed instructions and guidelines on how to use the code in the "Code" folder for cleaning the assignment.
- It also details how we should perform web scraping to obtain data from the Postman API.
- This document outlines the step-by-step procedure for executing the process of web scraping using various tools.
To begin using this project, follow these steps:
-
Code Execution:
- Refer to the SOP (Standard Operating Procedure) provided in the "SOP" folder for detailed instructions on how to execute the web scraping code located in the "Code" folder.
-
Review Output:
- Once the code has been executed, the resulting data will be stored in the "Output" folder in an Excel sheet.
- You can review this data to ensure the web scraping process was successful.
- Clone the repository to your local machine:
git clone <repository-url>
- Navigate to the project directory:
cd <repository-directory>
- Follow the instructions in the SOP document to execute the code and perform web scraping.
- Python 3.x
- Required Python libraries (listed in the SOP document)