Skip to content

Scrapes watched episodes from you s.to account

License

Notifications You must be signed in to change notification settings

Kamiikaze/sto-scraper

Repository files navigation

Series Scraper for s.to

Description

This script is a web scraper that extracts information about movies and TV shows from a website called "s.to". The extracted data includes the title, season number, episode number, and the date when the episode was watched.

The script was mainly generated by using ChatGPT, a GPT-2 based chatbot that can generate code from natural language. The script was then modified to work with the s.to and aniworld.to website.

Requirements

  • Node.js and npm
  • A s.to / aniworld.to account

Usage

To use this script, you need to have Node.js and npm installed on your computer.

  1. Clone the repository and navigate to the project directory in your terminal.

    git clone https://github.com/Kamiikaze/sto-scraper
    cd sto-scraper
  2. Install the required dependencies using npm.

    npm install
    
  3. Create a `.env` file in the root directory of the project, and add the following variables with your own values:

    PAGE_USERNAME=your_username
    PAGE_PASSWORD=your_password
    
    HEADLESS=true
    DONT_LOAD_STYLES=true
    DO_SCREENSHOTS=false

    `PAGE_USERNAME:` your username on the s.to website.

    `PAGE_PASSWORD:` your password on the s.to website.

    `HEADLESS:` whether to run the browser in headless mode.

    `DONT_LOAD_STYLES:` whether to block unnecessary resources such as stylesheets and fonts to speed up page loading.

    `DO_SCREENSHOTS:` whether to take a screenshot of the logged in page.

  4. Run the script using the following command:

    npm run start
  5. The script will start running and will output the data to a file in the ./public/data directory. If the DO_SCREENSHOTS variable is set to true, screenshots of the login page and each scraped page will also be saved in the dist directory.

    Note: The script will stop after scraping the first X pages, where X is the value of the firstXPages variable in the script. If you want to scrape all pages, set this variable to 0.

Example Output

{
   "totalMovies": 1,
   "movieTitles": {
      "Peripherie": {
         "seasonCount": 1,
         "totalEpisodesCount": 6,
         "seasons": {
            "1": {
               "episodeCount": 6,
               "episodes": {
                  "1": {
                     "title": "Alternative Realität",
                     "seenAt": "21.02.2023 15:08:06 Uhr vor einem Tag"
                  },
                  "2": {
                     "title": "Empathiebonus",
                     "seenAt": "21.02.2023 16:15:57 Uhr vor einem Tag"
                  },
                  "3": {
                     "title": "Haptischer Nebel",
                     "seenAt": "21.02.2023 17:16:05 Uhr vor einem Tag"
                  },
                  "4": {
                     "title": "Jackpot",
                     "seenAt": "22.02.2023 15:36:28 Uhr vor 5 Stunden"
                  },
                  "5": {
                     "title": "Was ist mit Bob?",
                     "seenAt": "22.02.2023 16:39:28 Uhr vor 4 Stunden"
                  },
                  "6": {
                     "title": "Fick dich und friss Scheiße!",
                     "seenAt": "22.02.2023 17:38:39 Uhr vor 3 Stunden"
                  }
               }
            }
         }
      }
   }
}

Web View

I also created a web view for the scraped data. Just run the following command to start the web server:

npm run web-view

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Scrapes watched episodes from you s.to account

Resources

License

Stars

Watchers

Forks

Releases

No releases published