scansio_download

Download files listed on the Internet-Wide Scan Data Repository, https://scans.io/

##Description Downloads files from the scan repository and tracks successfully downloaded files in a JSON file or with Elastic Search. The JSON file mirrors the structure of the scan repository.

Optionally, sends the downloaded file names to a parsing function.

##Examples

Basic download and add to JSON catalog

Create instance of JSON catalog class
Create instance of Download class
Download the latest study using the study's unique ID
1. Use optional filter for studies that release multiple study types
The returned value is the filename that was successfully downloaded, or False
1. If successfully downloaded, the study and file metadata is stored in the JSON catalog

catalog = scansio.JSONCatalog()
dl = scansio.Download()
newest_study = dl.download_latest_study('scott-top-one-million', catalog, 'umbrella')

Download latest 5 studies and add to Elastic Search catalog

Create instance of Elastic Search catalog class
1. Include a parsing function to index the file contents in Elastic Search (study_parser in example below)
  1. Takes the filename as the only parameter
  2. Must return True to add downloaded files to the catalog index
Create instance of Download class
Download the most recent 5 studies using the study's unique ID
1. Use optional filter for studies that release multiple study types
The returned value is a list of the filenames that were successfully downloaded, or False
1. If successfully downloaded, the study and file metadata is indexed in the Elastic Search catalog

import scansio.esindex
from elasticsearch import ElasticSearch
import gzip

def study_parser(filename):
    with gzip.open(filename, 'rb') as f:
        file_content = f.read()
        return True
    return False
    
esconnection = Elasticsearch([{'host': '127.0.0.1', 'port': 443}], timeout=20)
catalog = scansio.esindex.ESCatalog(esconnection, study_parser)
dl = scansio.Download()
newest_study_list = dl.download_study_files('scott-top-one-million', catalog, -5, 'umbrella')

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
scansio		scansio
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scansio_download

About

Releases

Packages

Languages

License

malvidin/scansio_download

Folders and files

Latest commit

History

Repository files navigation

scansio_download

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages