This script is a web scraper that extracts information about movies and TV shows from a website called "s.to". The extracted data includes the title, season number, episode number, and the date when the episode was watched.
The script was mainly generated by using ChatGPT, a GPT-2 based chatbot that can generate code from natural language. The script was then modified to work with the s.to and aniworld.to website.
- Node.js and npm
- A s.to / aniworld.to account
To use this script, you need to have Node.js and npm installed on your computer.
-
Clone the repository and navigate to the project directory in your terminal.
git clone https://github.com/Kamiikaze/sto-scraper cd sto-scraper
-
Install the required dependencies using npm.
npm install
-
Create a `.env` file in the root directory of the project, and add the following variables with your own values:
PAGE_USERNAME=your_username PAGE_PASSWORD=your_password HEADLESS=true DONT_LOAD_STYLES=true DO_SCREENSHOTS=false
`PAGE_USERNAME:` your username on the s.to website.
`PAGE_PASSWORD:` your password on the s.to website.
`HEADLESS:` whether to run the browser in headless mode.
`DONT_LOAD_STYLES:` whether to block unnecessary resources such as stylesheets and fonts to speed up page loading.
`DO_SCREENSHOTS:` whether to take a screenshot of the logged in page.
-
Run the script using the following command:
npm run start
-
The script will start running and will output the data to a file in the
./public/data
directory. If the DO_SCREENSHOTS variable is set to true, screenshots of the login page and each scraped page will also be saved in the dist directory.Note: The script will stop after scraping the first X pages, where X is the value of the firstXPages variable in the script. If you want to scrape all pages, set this variable to 0.
{
"totalMovies": 1,
"movieTitles": {
"Peripherie": {
"seasonCount": 1,
"totalEpisodesCount": 6,
"seasons": {
"1": {
"episodeCount": 6,
"episodes": {
"1": {
"title": "Alternative Realität",
"seenAt": "21.02.2023 15:08:06 Uhr vor einem Tag"
},
"2": {
"title": "Empathiebonus",
"seenAt": "21.02.2023 16:15:57 Uhr vor einem Tag"
},
"3": {
"title": "Haptischer Nebel",
"seenAt": "21.02.2023 17:16:05 Uhr vor einem Tag"
},
"4": {
"title": "Jackpot",
"seenAt": "22.02.2023 15:36:28 Uhr vor 5 Stunden"
},
"5": {
"title": "Was ist mit Bob?",
"seenAt": "22.02.2023 16:39:28 Uhr vor 4 Stunden"
},
"6": {
"title": "Fick dich und friss Scheiße!",
"seenAt": "22.02.2023 17:38:39 Uhr vor 3 Stunden"
}
}
}
}
}
}
}
I also created a web view for the scraped data. Just run the following command to start the web server:
npm run web-view
This project is licensed under the MIT License. See the LICENSE file for details.