Simple python module for scraping images from the web, created for AI development.
- scrape images from google.com and duckduckgo.com
- search duplicated and eliminate them.
- allow to create complex databases from the engine top search of supplied keyword.
- use tor network with firefox for scraping. (optional)
>> imageCrawler.py -k cats dogs
>> Select by number the queries to ignore:
>> ( 0 ) cats
>> ( 1 ) cats with hats
>> 1
>> Start with cats download 4000 at engines\cats
>> 100%
>> Select by number the queries to ignore:
>> ( 0 ) dogs
>> ( 1 ) dogs with hats
>> 1
>> Start with dogs download 4000 at engines\dogs
>> 100%
>> Searching duplicated...
>> END
\engines
\cats
\ keys.json
\ +4000 images files
\dogs
\ keys.json
\ +4000 images files
- Firefox
- TorBrowser (OPTIONAL).
geckodriver combability check
pip install engineCrawler