autoproxy

About

This is a rewrite of my public proxy farm. It uses redis to record and store reliability statistics for publicly available proxy servers.

After recording sufficient data, it is able to create a database of proxy servers and choose the most reliable proxy to use for crawling a given website.

Pre-requisites

docker
docker-compose

Overview

When web crawling, proxies are essential for maintaining anonymity and circumventing bot detection. There are a number of free public proxy servers spread across the Internet, however their performance is inconsistent. This project utilizes a redis store to temporarily store and cache proxy server information for use by a scrapy middleware. This cache is then periodically synced to a Postgres database intended to be a more permanent and practical storage medium for proxy statistics.

Example Usage

I'm still working on this, but here's how to run it:

git clone https://github.com/dchrostowski/autoproxy.git
cd autoproxy
docker-compose build scrapyd
docker-compose build spider_scheduler
docker-compose up scrapyd spider_scheduler

Getting proxies

There are a few spiders (see autoproxy/autoproxy/spiders) that are scheduled to crawl a few sites to constantly pull in more proxies and then test those proxies against the sites they've scraped.

To access the Postgres database, you can run the following:

docker exec -it autoproxy_db psql -U postgres proxies

The default password is somepassword

Future plans

I'm planning on publishing the autoproxy_package/ contents as a module/package eventually.

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
autoproxy		autoproxy
autoproxy_package		autoproxy_package
data/init_sql		data/init_sql
misc		misc
scheduler		scheduler
scrapyd		scrapyd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
dry_run.py		dry_run.py
start.sh		start.sh
stop.sh		stop.sh
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autoproxy

About

Pre-requisites

Overview

Example Usage

Getting proxies

Future plans

About

Releases

Packages

Contributors 2

Languages

License

dchrostowski/autoproxy

Folders and files

Latest commit

History

Repository files navigation

autoproxy

About

Pre-requisites

Overview

Example Usage

Getting proxies

Future plans

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages