Need some proxys but don't want to scrap them manually?, just give this script the domain!
Give the project a star!
Report Bug
·
Request Feature
Hi there! The purpose of this script is to demonstrate the power and the functions that will be implemented in the future module of Universal Proxy Scraper, at first, this will be developed until the v 1.0.0 came out, that version will be the first version deployed as module :)
Just built-in modules! (Python >= 3.0)
Let's get to it!
To set-up the websites you wanna get the proxies from you have to place every single URL you wanna scrape into a list, just as:
http://free-proxy.cz/es/
https://free-proxy-list.net/
http://www.freeproxylists.net/
https://hidemy.name/es/proxy-list/#list
(For a better reference see the test_urls.txt that is in this same repository).
The usage of command line is pretty simple :D Ex.
path/to/the/script: python main.py -h
██╗ ██╗ ██████╗ ███████╗ █████╗ ██████╗
██║ ██║ ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║ ██║ ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║ ██║ ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗ ███████║╚═╝╚█████╔╝╚██████╔╝
╚═════╝ ╚═╝╚═╝╚═╝ ╚══════╝ ╚════╝ ╚═════╝
Proxy
Universal Scraper | Your ideal proxy scraper ;)
by: @freshSauce
0.1.6
usage: main.py [-h] -f FILE [-o] [-q QUANTITY] [-v] [-p]
Command-line option for the Universal Scraper
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE name of the file with the sites
-o, --output if used, stores the scraped proxies
-q QUANTITY, --quantity QUANTITY
if used, stores the scraped proxies
-v, --verify if used, verify every single proxy and returns the live ones
-p, --print if used, prints out the obtained list of proxies
As you may see, there's a lot of options you can use :)
- file (required, value needed) : path or name of the file that contains all the webistes you want to scrape.
- output (optional, no value needed) : if used, writes a file named "output.txt" with every single scraped proxy.
- quantity (optional, value needed, 10 by default) : it declares the quantity of proxies to be scraped.
- verify (optional, no value needed) : if used, verifys every single proxy scraped, and returns the list with those that are alive.
- print (optional, no value needed) : if used, prints out the list that contains all the proxies.
path/to/the/script: python main.py -f test_urls.txt -p -o -v -q 5
██╗ ██╗ ██████╗ ███████╗ █████╗ ██████╗
██║ ██║ ██╔══██╗██╔════╝██╗██╔══██╗██╔═████╗
██║ ██║ ██████╔╝███████╗╚═╝╚█████╔╝██║██╔██║
██║ ██║ ██╔═══╝ ╚════██║██╗██╔══██╗████╔╝██║
╚██████╔╝██╗██║██╗ ███████║╚═╝╚█████╔╝╚██████╔╝
╚═════╝ ╚═╝╚═╝╚═╝ ╚══════╝ ╚════╝ ╚═════╝
Proxy
Universal Scraper | Your ideal proxy scraper ;)
by: @freshSauce
0.1.6
Connection to http://free-proxy.cz/es/ timed out
Proxies obtained !!!
['172.67.181.214:80', '172.67.80.190:80', '45.82.139.34:4443', '188.168.56.82:55443', '150.129.54.111:6667']
Everything is done !!! Wanna get more proxies? (Y[es]/N[o]): n
Have a nice day !!!
In order to make it work with our own code we have to import it as module, just like:
from main import ProxyScraper
There's no need to import it as 'main', you can change the script's name and import it with the name you gave to the script. Now, once you done that you can use it as you please.
# Storing it on a variable
proxy_scraper = ProxyScraper('test_urls.txt')
proxy_list = proxy_scraper.Proxies()
# Iterating through each proxy
for proxy in ProxyScraper('test_urls.txt').Proxies()
...
# Saving the proxies to a file
proxy_scraper = ProxyScraper('test_urls.txt', output=True)
proxy_list = proxy_scraper.Proxies() # This will give you the scraped proxies and save them into a file.
It's pretty easy-to-use! just make sure to pass the URLs correctly and you're ready to go!
from main import ProxyScraper
proxy_list = ProxyScraper('test_urls.txt').Proxies() # Will save the proxies list on a variable
ProxyScraper('test_urls.txt', output=True).Proxies() # Will save the output into an output file
proxy_list = ProxyScraper('test_urls.txt').Proxies(quantity=15) # Will save 15 of the scraped proxies into a variable (10 by default)
proxy_list = ProxyScraper('test_urls.txt', check=True).Proxies(quantity=15) # Will save 15 of the scraped proxies and will check each one of them
Hope it is useful for you!
Wanna contribute to the project? Great! Please follow the next steps in order to submit any feature or bug-fix :) You can also send me your ideas to my Telegram, any submit is greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the AGPL-3.0 License. See LICENSE
for more information.
Telegram: - @freshSauce
Project Link: https://github.com/freshSauce/UniversalProxyScraper
- Added custom exceptions plus minor changes.
- Added command-line support (yeah, no 0.1.3 nor 0.1.4, heh)
- Added support to the first specific site: spys.one.
Now, I want to say that, if needed, I will create specific scripts for specific sites, this doesn't mean that I won't keep looking for an 'universal' solution, is just that sites like that one are pretty much different from the others.
Module created for that site.
- Added support to some sites with JS-based write, such as: 'document.write'.
- Added handlers for some exceptions.
- Added proxy checker function
- Fixed some typos on the script documentation.