Skip to content

Latest commit

 

History

History
130 lines (100 loc) · 9.39 KB

snb-bi-pre-generated-data-sets.md

File metadata and controls

130 lines (100 loc) · 9.39 KB

Pre-generated data sets

Streaming decompression

To download and decompress the data sets on-the-fly, make sure you have curl and zstd installed, then run:

export DATASET_URL=...
curl --silent --fail ${DATASET_URL} | tar -xv --use-compress-program=unzstd

For multi-file data sets, first download them. Then, to recombine and decompress, run:

cat <data-set-filename>.tar.zst* | tar -xv --use-compress-program=unzstd

This command works on both standalone files (.tar.zst) and chunked ones (.tar.zst.XXX).

Data sets links

SURF

The data sets are available in the SURF/CWI repository. We provide direct download links and a download script (which stages the data sets from tape storage if they are not immediately available).

Cloudflare R2

Substitution parameters

Validation parameters

Validation parameters for SF10, generated with Umbra:

Compressed CSVs in the composite-merged-fk format

Checksums: https://pub-383410a98aef4cb686f0c7601eddd25f.r2.dev/bi-pre-audit/bi-composite-merged-fk-md5sums.tar.zst

Compressed CSVs in the composite-projected-fk format

Checksums: https://pub-383410a98aef4cb686f0c7601eddd25f.r2.dev/bi-pre-audit/bi-composite-projected-fk-md5sums.tar.zst

Compressed CSVs in the composite-projected CSV format with quotes and without headers

Checksums: https://pub-383410a98aef4cb686f0c7601eddd25f.r2.dev/bi-pre-audit/bi-composite-projected-fk-with-quotes-without-headers-md5sums.tar.zst

Raw (up to SF30)

Checksums: https://pub-383410a98aef4cb686f0c7601eddd25f.r2.dev/bi-pre-audit/bi-raw-md5sums.tar.zst

Factor tables

Checksums: https://pub-383410a98aef4cb686f0c7601eddd25f.r2.dev/bi-pre-audit/bi-factors-md5sums.tar.zst