This tool is designed to scrape and combine the text from all files in a GitHub repository into a single text file. It supports both cloning a repository directly from GitHub (Online Version) and processing a repository that has already been downloaded to your local machine (Offline Version).
- Git must be installed on your system.
- Python 🐍 must be installed on your system.
- Ensure you have internet access and the necessary permissions to clone the target repository.
- Open
online-scraper.py
in your python development software (such as PyCharm) - Replace
https://github.com/GithubName/RepoName.git
with the URL of the GitHub repository you want to scrape. - Run the script:
python online-scraper.py
- The script will clone the repository and combine the contents of all files into
scraped.txt
.
- download the repo that you want to scrape
- Open
offline-scraper.py
in your python development software (such as PyCharm) - Replace
C:\Users\SomeRandomAssFolder\Downloads\YourDownloadedRepoFolder
with the path to the repo you want to scrape. - Run the script:
python offline-scraper.py
- all your shit should be scraped into a file called
scraped.txt
that is located in the same directory as the python script