Skip to content

Transformation Adapters

Minh Pham edited this page Jun 2, 2020 · 13 revisions

1. Topoflow4ClimateWriteFunc:

Transform GPM weather netCDF4 files to rts/rti files that can be used as input for Topoflow. Perform cropping, regridding and resampling using GDAL.

List of parameters:

* input_dir: path to input directory (GPM netCDF files)
* temp_dir: path to temporary directory
* output_file: path to output file
* var_name: variable name
* DEM_bounds: bounding box
* DEM_xres_arcsecs: x-resolution
* DEM_yres_arcsecs: y-resolution

Sample configuration file can be found here. A sample GPM netCDF file can be found here.

2. Topoflow4SoilWriteFunc:

Transform ISRIC soil files to rts/rti files that can be used as input for Topoflow. Perform cropping, regridding and resampling using GDAL.

List of parameters:

* input_dir: path to input directory (ISRIC soil files)
* output_dir: path to output directory
* layer: soil layer
* DEM_bounds: bounding box
* DEM_xres_arcsecs: x-resolution
* DEM_yres_arcsecs: y-resolution

Sample configuration file can be found here. A sample ISRIC soil folder can be found here.

3. GLDAS2CyclesFunc:

Transform GLDAS data to the input format needed for Cycles.

List of parameters:

* start_date: start date of GLDAS data
* end_date: end date of GLDAS data
* gldas_path: path to GLDAS data directory
* output_prefix: prefix for output files
* latitude: latitude (either lat/long or coord_file need to be provided)
* longitude: longitude
* coord_file: path to coordinate file (contains a list of lat/long for many locations)

Sample configuration and input files can be found here. You can download the complete GLDAS dataset (2000-2019) here to run the full transformation pipeline.

4. DcatReadFunc:

Fetches a dataset's resources and metadata from the MINT Data-Catalog. Serves as an entry point in the pipeline.

List of parameters:

* dataset_id: record_id of the dataset
* start_time: start date to filter resources by temporal coverage
* end_time: end date to filter resources by temporal coverage

Sample configuration files can be found here and here.

5. CroppingTransFunc:

Crops a dataset by a shapefile or bounding box, and filters it by a Standard Variable.

List of parameters:

* dataset: input dataset (a wired input)
* variable_name: Standard Variable name
* shape: input dataset for shapefile (a wired input, either shape or bounding box parameters need to be provided)
* xmin: bounding box xmin coordinate
* ymin: bounding box ymin coordinate
* xmax: bounding box xmax coordinate
* ymax: bounding box ymax coordinate
* region_label: bounding box region label

6. CroppingTransWrapper:

A Wrapper around CroppingTransFunc to handle streaming inputs.

List of parameters:

* dataset: same as CroppingTransFunc (a wired input stream)
* variable_name: same as CroppingTransFunc (can be a wired input stream)
* shape: same as CroppingTransFunc (a wired input stream, either shape or bounding box parameters need to be provide)
* xmin: same as CroppingTransFunc
* ymin: same as CroppingTransFunc
* xmax: same as CroppingTransFunc
* ymax: same as CroppingTransFunc
* region_label: same as CroppingTransFunc

Sample configuration files can be found here and here.

7. DcatRangeStream:

A streaming adapter that produces a range of start_time and end_time for a dataset from the MINT Data-Catalog. Serves as an input to DcatReadFunc.

List of parameters:

* dataset_id: record_id of the dataset
* start_time: start date to loop from (start_time, end_time and step_time are optional)
* end_time: end date to loop to
* step_time: ISO 8601 duration string representing the step (or timedelta) to loop from start_time to end_time

Sample configuration files can be found here and here.

8. DcatVariableStream:

A streaming adapter that produces Standard Variables for a dataset from the MINT Data-Catalog. Serves as an input to CroppingTransWrapper.

List of parameters:

* dataset_id: record_id of the dataset

Sample configuration file can be found here.

9. VariableAggregationFunc:

Aggregates a dataset using Group By with functions.

List of parameters:

* dataset: input dataset (a wired input)
* group_by: list of properties to group by. Each property is a dictionary like { "prop": <property of the class>, "value": <function that extracts the group key of each value of the property, possible values: exact, (for time: minute, hour, date, month, year)> }
* function: aggregation function (sum, count and average)

Sample configuration files can be found here and here.

10. CSVWriteFunc:

A writer adapter. Generates a csv/json file for an input dataset.

List of parameters:

* data: input dataset (a wired input)
* output_file: path to output file

Sample configuration files can be found here and here.