-
Notifications
You must be signed in to change notification settings - Fork 0
Arachpyd - Your friendly neighborhood spider and web crawler.
License
denisgomes/arachpyd
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
**arackpy** =========== **arackpy** is a simple but powerful web crawler and scraper. Although it is good natured and respectful at heart, it can be used to do evil. Remember with great power comes great responsibility. Requirements ------------ **arackpy** currently supports Python 2.7 to 3.6+ out of the box. Depending on how you want to extract data, several other dependencies from the list below is required to be installed to support the various backends: * lxml - for html parsing and url extraction * requests - for downloading html pages * pysocks - for making tor based connections * fake_useragent - for browser spoofing * selenium (coming soon!) Installation ------------ For a vanilla **arackpy** install with no other dependencies: pip install arackpy For proxy and tor support: pip install lxml, requests, fake_useragent, pysocks Quickstart ---------- Open up your favorite python text editor and type the following: # hello_spider.py from __future__ import print_function # python 2 support from arackpy.spider import Spider class HelloSpider(Spider): """A simple spider in just ten lines of working code""" start_urls = ["https://www.python.org"] def parse(self, url, html): """Extract data from the raw html""" print("Crawling url, %s" % url) if __name__ == "__main__": print("Press Ctrl-c to stop crawling") spider = HelloSpider() spider.crawl() Run the program using: $ python hello_spider.py Note Press Ctrl-c to terminate crawling. To use proxies or tor, change the backend accordingly. Documentation ------------- To learn more, go read the docs at https://arackpy.readthedocs.org. The **arackpy** logo was taken from https://clipart-library.com.
About
Arachpyd - Your friendly neighborhood spider and web crawler.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published