A simple tool to scrape NUSWhispers's website and save the contents into a file.
This scrapper scraps the NUSWhispers's website by iteratively scrapping the posts by their post index. Therefore, we will need to specify a start_idx
that denotes the the index of the first post that will be scrapped and end_idx
that denotes the index of the last post that will be scrapped.
pip install -r requirements.txt
First make sure that you are in the root folder. Then,
python src/main.py <start_idx> <end_idx>
Example: To scrape the post with indices from 10 to 15 (inclusive).
python src/main.py 10 15
After running the commands shown in the "Usage" section, you should be able to see a Mircrosoft Excel file in /src folder. Each post will have 7 columns of data consisting of:
- Index of the post
- Categoies given by NUSWhispers
- Textual content of the post
- Number of likes
- Number of comments
- Age of the post
- Number of favourites
*Note:
- The column names were not included in the Excel file.
- Sample outputs can be found in the /output folder.*
First make sure that you are in the root folder. Then,
cd test
python -m unittest