Skip to content
This repository has been archived by the owner on Sep 23, 2021. It is now read-only.

yaofeng-wang/NUSWhispers_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NUSWhispers_scraper

A simple tool to scrape NUSWhispers's website and save the contents into a file.

This scrapper scraps the NUSWhispers's website by iteratively scrapping the posts by their post index. Therefore, we will need to specify a start_idx that denotes the the index of the first post that will be scrapped and end_idx that denotes the index of the last post that will be scrapped.

Installation

pip install -r requirements.txt

Usage

First make sure that you are in the root folder. Then,

python src/main.py <start_idx> <end_idx>

Example: To scrape the post with indices from 10 to 15 (inclusive).

python src/main.py 10 15

Output

After running the commands shown in the "Usage" section, you should be able to see a Mircrosoft Excel file in /src folder. Each post will have 7 columns of data consisting of:

  • Index of the post
  • Categoies given by NUSWhispers
  • Textual content of the post
  • Number of likes
  • Number of comments
  • Age of the post
  • Number of favourites

*Note:

  1. The column names were not included in the Excel file.
  2. Sample outputs can be found in the /output folder.*

Testing

First make sure that you are in the root folder. Then,

cd test
python -m unittest

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages