SOCH Download CLI lets you do multithreaded batch downloads of Swedish Open Cultural Heritage (K-samsök) records for offline processing and analytics.
- Python >=3.4 and PIP
pip install soch-download
Heads up: This program might use all the systems available CPUs.
Download records based on a SOCH search query (Text, CQL, indexes, etc):
soch-download --action=query --query=thumbnailExists=j --outdir=path/to/target/directory
Download records from an specific institution:
soch-download --action=institution --institution=raa --outdir=path/to/target/directory
Download records using a predefined action/query:
soch-download --action=all --outdir=path/to/target/directory
soch-download --action=geodata-exists --outdir=path/to/target/directory
Unpacking
The download actions by default downloads large XML files containing up to 1000 RDFs each, after such a download you can use the unpack
argument to convert all those files into individual RDF files:
soch-download --unpack=path/to/xml/files --outdir=path/to/target/directory
Misc
List all available parameters and actions:
soch-download --help
Target a custom SOCH API endpoint:
soch-download --action=query --query=itemKeyWord=hus --outdir=path/to/target/directory --endpoint=http://lx-ra-ksam2.raa.se:8080/