Skip to content

Top 1000000 Alexa Website's Titles Crawler - The crawler will be get list URLs from .txt file and then save the result to .txt with Python multiple threads.

License

Notifications You must be signed in to change notification settings

tieutantan/1m-Alexa-Website-Titles-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1m Alexa Website's Titles Crawler

This package is compatible with Python 3.8.2. You can choose your threads number to process on console. And configuration the user agent + crawl timeout in config.py file.

  1. Download top-1m.csv.zip and unzip top-1m.csv to root folder.
  2. Install modules.
  3. Run it!

Source 1m websites http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Install Modules pip install -r requirements.txt

Run python run.py

Thank you for reading!

About

Top 1000000 Alexa Website's Titles Crawler - The crawler will be get list URLs from .txt file and then save the result to .txt with Python multiple threads.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages