This is a Ruby script to generate CSV manifests for use in the guardian workflow for transferring archives to Glacier in batches.
- Ruby 2.2.5 or higher
- A YAML file populated with shared manifest data and a list of the directive names of each archive to be transferred to Glacier in this batch with guardian
To generate a guardian-compliant CSV, issue the following command from the terminal:
ruby guardian_manifest.rb inventory.yml
Where inventory.yml
is the path to the YAML file you are using to create this manifest.
This will output a CSV manifest called guardian_manifest.csv
in the directory where the command was run. If you wish to override the default, you can provide a custom filename at which the manifest will be saved locally, as an optional third argument like so:
ruby guardian_manifest.rb inventory.yml output_file.csv
The YAML inventory used to populate the guardian manifest should be in the following format:
source: x
workspace: x
compressed_destination: x
compressed_extension: x
verification_destination: x
verification_sample_size: x
vault: x
application: x
method: x
todo_prefix: x
description_values:
owner: x
location: x
directive_names:
- dir_name_1
- dir_name_2
- dir_name_3
The values should correspond to:
-
source
- The source on guardian for the content that will populate the archive- Valid term(s) for use with the Docker deployment:
bulwark_gitannex_remote
- NFS path to the directory containing the directories listed in
directortive_names
- Valid term(s) for use with the Docker deployment:
-
workspace
- The path on guardian where the archive will be pulled from the source and assembled into its compressed for- Valid term(s) for use with the Docker deployment:
zip_workspace
- Valid term(s) for use with the Docker deployment:
-
compressed_destination
- The path on guardian where the compressed archive will be staged for transfer to Glacier- Valid terms for use with the Docker deployment:
zip_workspace
- Valid terms for use with the Docker deployment:
-
compressed_extension
- The file extension of the compressed archive- Valid term(s) for use with the guardian codebase:
zip
- Valid term(s) for use with the guardian codebase:
-
verification_sample_size
- Optional; the proportion of archives for which for to perform post-compression contents verification- guardian_manifest will set
verify_compressed_archive
totrue
for a random selection of directive names in the specified proportion - Valid values for use with guardian_manifest:
ALL
- a fractional proportion like
7/10
or33 / 100
- guardian_manifest will set
-
verification_destination
- Required ifverification_sample_size
is specified; the path on guardian where the compressed archive will be decompressed to verify the integrity of compressed archive contents.- Valid terms for use with the Docker deployment:
zip_workspace
- Valid terms for use with the Docker deployment:
-
vault
- The name of the Glacier vault where the archive will be transferred -
application
- The name of the application stewarding the source data for the archives in this batch- Supported application(s)
bulwark
openn
- Supported application(s)
-
method
- The retrieval method required by the application to pull source content to the workspace so that it can be compressed and transferred- Supported method(s)
gitannex
rsync
- Supported method(s)
-
todo_prefix
- an optional value to prepend to each directive name in thetodo_base
in the output CSV; e.g., iftodo_prefix
is0030_
,todo_base
values will be0030_dir_name_1
,0030_dir_name_2
, etc. -
description_values
- The key/value store of data apart from the directive name to add to the JSON blob of metadata for each archive transferred to Glacier. The valuesowner
andlocation
are prepopulated in the example above, and any other key/value pairings can be added to the description metadata in the same way to your YAML file. -
directive_names
- A multi-valued list of the directive names of each object to have an archive generated and transferred to Glacier.
Bug reports and pull requests are welcome on GitHub at https://github.com/upenn-libraries/guardian_manifest.
This code is available as open source under the terms of the Apache 2.0 License.