Books to Scrape Web Scraper

This project contains a web scraper that extracts data from the website Books to Scrape. The scraper gathers information about books, including titles, prices, availability, ratings, and thumbnails, and saves the data in a CSV file. Thumbnails are also downloaded and saved locally.

Features

Scrapes book details including title, price, availability, rating, and thumbnail URL.
Downloads and saves thumbnail images locally.
Saves extracted data to a CSV file in a structured format.
Processes the first 10 pages of the website.

Requirements

Python 3.8+
BeautifulSoup 4.9.3+
pandas 1.2.0+
requests 2.25.1+

Installation

Clone the repository:

git clone https://github.com/your-username/books-to-scrape-web-scraper.git
cd books-to-scrape-web-scraper

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the scraper script:
```
python scrape_books.py
```
The script will extract data from the first 10 pages of the website, save the data to a CSV file located in the data_sheet directory, and download thumbnails to the images directory.

Output

data_sheet/books_data.csv: Contains the scraped book details.
images/: Contains the downloaded thumbnail images.

Video

For a detailed tutorial on how to use this script, please refer to the Books to Scrape 📚.

Directory Structure

To help organize your project, here's a suggested directory structure:

books-to-scrape-web-scraper/
├── data_sheet/
│   └── books_data.csv
├── images/
│   └── (thumbnails)
├── scrape_books.py
├── requirements.txt
└── README.md

flowchart TD
    A([Start]) --> B[Initialize base URLs and create directories]
    B --> C{Loop through pages 1 to 10}
    C --> D[Request page content]
    D --> E[Parse HTML content]
    E --> F[Extract book details]
    F --> G[Save book thumbnail]
    G --> H[Append details to the list]
    H --> I[Save data to CSV file]
    I --> J([End])
    style A fill:#f96,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style C fill:#ff9,stroke:#333,stroke-width:2px
    style D fill:#bbf,stroke:#333,stroke-width:2px
    style E fill:#ff9,stroke:#333,stroke-width:2px
    style F fill:#bbf,stroke:#333,stroke-width:2px
    style G fill:#ff9,stroke:#333,stroke-width:2px
    style H fill:#bbf,stroke:#333,stroke-width:2px
    style I fill:#f96,stroke:#333,stroke-width:2px
    style J fill:#f96,stroke:#333,stroke-width:2px

Loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Books to Scrape Web Scraper

Features

Requirements

Installation

Usage

Output

Video

Directory Structure

Files

README.md

Latest commit

History

README.md

File metadata and controls

Books to Scrape Web Scraper

Features

Requirements

Installation

Usage

Output

Video

Directory Structure