This program is a Python script that utilizes the Selenium web driver and the PRAW (Python Reddit API Wrapper) library to scrape a specified number of tweets containing a particular keyword from Twitter and a specified number of posts from a subreddit on Reddit. The scraped data is stored in an Excel file that can be easily accessed and analyzed.
To use this program, you need to have the following installed on your system:
You can install these dependencies using pip, by running the following commands:
pip install selenium
pip install pandas
pip install praw
- Clone this repository or download the script to your local machine.
- Open the script in a Python IDE or text editor.
- Change the path of your Firefox binary and geckodriver executable on lines 17 and 18.
- Run the script and enter the required inputs in the command prompt when prompted:
- Your Twitter username
- Your Twitter password
- The number of tweets you want to scrape
- The keyword you want to search for
- The name of the Excel file to be stored
- The script will start scraping tweets and store them in an Excel file with the specified name.
- The Excel file will be automatically opened after the script has finished running.
- Clone this repository or download the script to your local machine.
- Create a Reddit Script app and get your credential and secret.
- Open the script in a Python IDE or text editor.
- Run the script and enter the required inputs in the command prompt when prompted:
- The subreddit name you want to scrape
- The number of posts you want to scrape
- The name of the CSV file to be stored
- The script will start scraping posts and store them in an CSV file with the specified name.
- The CSV file will be automatically opened after the script has finished running.
- The program scrapes a specified number of tweets containing a particular keyword from Twitter and a specified number of posts from a subreddit on Reddit.
- The scraped tweets/posts are stored in an Excel or CSV file.
- The program uses the Selenium web driver to automate the process of logging in to Twitter (if required) and searching for tweets.
- The program prompts the user to enter their Twitter login credentials (if required) and the number of tweets they want to scrape (if they want to scrape Twitter).
- The program allows the user to specify the keyword they want to search for (if they want to scrape Twitter), the subreddit they want to scrape (if they want to scrape Reddit), and the name of the Excel or CSV file to be stored.
Contributions to this project are welcome! In addition to improving the existing Twitter and Reddit scrapers, there are opportunities to develop similar scripts for other social media websites such as Facebook, Instagram, and more.
If you're interested in contributing, here are some ideas:
- Develop a scraper for a different social media website
- Improve the existing Twitter/Reddit scraper by adding new features or optimizing performance
- Create a user-friendly UI for the scraper ✅
- Add support for scraping multimedia content such as images and videos
- Implement natural language processing techniques to analyze the scraped content
To contribute, you can fork the repository, make your changes, and submit a pull request. Before making any major changes, please create an issue to discuss your proposed changes with the project maintainers.
We appreciate any contributions to this project and look forward to seeing what the community can create!
Contributions to this project are always welcome! If you would like to contribute, please follow these steps:
- Fork the repository
- Clone the repository to your local machine
- Create a new branch for your feature or bug fix
- Make your changes and commit them with descriptive commit messages
- Push your changes to your fork
- Create a pull request to the main repository
This project is licensed under the MIT License - see the LICENSE file for details. By contributing to this project, you agree that your contributions will be licensed under its MIT License.