During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr #68

CSyl · 2024-10-01T17:47:56Z

What happened?

I have a zarr that is the subset of the ERA5 zarr in the gcp storage, https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr. When running anemoi-training train & anemoi-training train --config-name=debug.yaml, I encountered the following error:

"AttributeError: "NoneType" object has no attribute 'lower'" from the anemoi/training/data/datamodule.py line 101 in _check_resolution.

If the resolution must be set within the configuration file, is there a way to verify what the resolution of the https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr is in terms of the "o" resolution or in whatever prefix term(s) the anemoi-training module will accept as shown in the anemoi/training/config/data/zarr.yaml file so, that I can set the resolution in the configuration file for training - perhaps that will remove this error of "NoneType" that I am recieving?

What are the steps to reproduce the bug?

Data & Graph used: Using a anemoi-formatted ERA5 zarr subset extracted from https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr and preprocessed using the anemoi-datasets module & created a graph against the anemoi-formatted ERA5 zarr subset generated using the anemoi-graphs module.

Configuration file (config.yaml) I am using for training module is:

defaults:
- data: zarr
- dataloader: native_grid
- diagnostics: eval_rollout
- hardware: example
- graph: multi_scale
- model: gnntransformer
- training: default
- _self_

Data Configuration file (anemoi/training/config/data/zarr.yaml) for training module I am using is:

format: zarr
resolution: o384 #o96
# Time frequency requested from dataset
frequency: 1h #6h
# Time step of model (must be multiple of frequency)
timestep: 1h #6h

# features that are not part of the forecast state
# but are used as forcing to generate the forecast state
forcing:
- "cos_latitude"
- "cos_longitude"
- "sin_latitude"
- "sin_longitude"
- "cos_julian_day"
- "cos_local_time"
- "sin_julian_day"
- "sin_local_time"
- "insolation"
- "lsm"
- "sdor"
- "slor"
- "z"
# features that are only part of the forecast state
# but are not used as the input to the model
diagnostic:
- tp
- cp
remapped:

normalizer:
  default: "mean-std"
  min-max:
  max:
  - "sdor"
  - "slor"
  - "z"
  none:
  - "cos_latitude"
  - "cos_longitude"
  - "sin_latitude"
  - "sin_longitude"
  - "cos_julian_day"
  - "cos_local_time"
  - "sin_julian_day"
  - "sin_local_time"
  - "insolation"
  - "lsm"

imputer:
  default: "none"
remapper:
  default: "none"

# processors including imputers and normalizers are applied in order of definition
processors:
  # example_imputer:
  #   _target_: anemoi.models.preprocessing.imputer.InputImputer
  #   _convert_: all
  #   config: ${data.imputer}
  normalizer:
    _target_: anemoi.models.preprocessing.normalizer.InputNormalizer
    _convert_: all
    config: ${data.normalizer}
  # remapper:
  #   _target_: anemoi.models.preprocessing.remapper.Remapper
  #   _convert_: all
  #   config: ${data.remapper}

# Values set in the code
num_features: null # number of features in the forecast state

Dataloader Configuration file (anemoi/training/config/dataloader/native_grid.yaml) for training module I am using is:

prefetch_factor: 2

num_workers:
  training: 8
  validation: 8
  test: 8
  predict: 8
batch_size:
  training: 2
  validation: 4
  test: 4
  predict: 4

# ============
# Default effective batch_size for training is 16
# For the o96 resolution, default per-gpu batch_size is 2 (8 gpus required)
# The global lr is calculated as:
# global_lr = local_lr * num_gpus_per_node * num_nodes / gpus_per_model
# Assuming a constant effective batch_size, any change in the per_gpu batch_size
# should come with a rescaling of the local_lr to keep a constant global_lr
# ============

# runs only N training batches [N = integer | null]
# if null then we run through all the batches
limit_batches:
  training: null
  validation: null
  test: 20
  predict: 20

# ============
# Dataloader definitions
# These follow the anemoi-datasets patterns
# You can make these as complicated for merging as you like
# See https://anemoi-datasets.readthedocs.io
# ============

dataset: ${hardware.paths.data}/${hardware.files.dataset}

training:
  dataset: ${dataloader.dataset}
  start: 2020-12-31 00:00:00 #null
  end: 2021-01-20 23:00:00 #2021
  frequency: ${data.frequency}
  drop:  []

validation:
  dataset: ${dataloader.dataset}
  start: 2021-01-21 00:00:00 #2021
  end: 2021-01-24 23:00:00 #2021
  frequency: ${data.frequency}
  drop:  []

test:
  dataset: ${dataloader.dataset}
  start: 2021-01-25 00:00:00 #2021
  end: 2021-02-01 23:00:00 #null
  frequency: ${data.frequency}

Hardware Data Configuration I am using is:

data: /Location of where the anemoi-formatted zarr is saved within was add here
grids: ???
output: /Location of where to save the training log was added here
logs:
  base: ${hardware.paths.output}logs/
  wandb: ${hardware.paths.logs.base}
  mlflow: ${hardware.paths.logs.base}mlflow/
  tensorboard: ${hardware.paths.logs.base}tensorboard/
checkpoints: ${hardware.paths.output}checkpoint/
plots: ${hardware.paths.output}plots/
profiler: ${hardware.paths.output}profiler/
graph: ${hardware.paths.output}graphs/

Configuration file for anemoi-graphs module I am using is:

# Encoder-Processor-Decoder graph
# Note: Resulting graph will only work with a Transformer processor because there are no connections between the hidden nodes.
nodes:
  data:
    node_builder: # how to generate data node
      _target_: anemoi.graphs.nodes.ZarrDatasetNodes
      dataset: anemoi-local-gcp-sample-zarr.zarr
  hidden:
    node_builder: # how to generate hidden node
      _target_: anemoi.graphs.nodes.ZarrDatasetNodes
      dataset: anemoi-local-gcp-sample-zarr.zarr
edges:
  # A) Encoder connections/edges: Encodes input data intolatent space via connecting data nodes w/ hidden nodes.
  - source_name: data
    target_name: hidden
    edge_builder:
      _target_: anemoi.graphs.edges.CutOffEdges # method to build edges 
      cutoff_factor: 0.7
  # B) Decoder connections/edges: Decodes latent space into the output data via connecting hidden nodes w/ data nodes 
  - source_name: hidden
    target_name: hidden
    edge_builder:
      _target_: anemoi.graphs.edges.KNNEdges # method to build edges via KNN
      num_nearest_neighbours: 3
 # C) Processor connections/edges
  - source_name: hidden
    target_name: data
    edge_builder:
      _target_: anemoi.graphs.edges.KNNEdges  # method to build edges via KNN
      num_nearest_neighbours: 3

Executed anemoi-training train --config-name=config.yaml
& obtained the error:

AttributeError: "NoneType" object has no attribute 'lower' " from the anemoi/training/data/datamodule.py line 101 in _check_resolution.

Version

0.1.0

Platform (OS and architecture)

Linux

Relevant log output

No response

Accompanying data

No response

Organisation

No response

(ccing @mchantry )

The text was updated successfully, but these errors were encountered:

mchantry · 2024-10-30T11:27:47Z

Hi CSyl
Sorry for the slow reply.
Could you provide access to a small Anemoi dataset -style zarr so we can understand how the resolution has been described in the dataset.
Or a config for anemoi datasets on how you have built the zarr.
Thanks

CSyl · 2024-11-05T13:53:50Z

Hi @mchantry,
No worries. Thank you for the response. The following is the configuration file mentioned in the steps below was used to transform the zarr data into an anemoi dataset:

Steps Taken when the Zarr was converted to an anemoi dataset:

Saving a subset of the zarr from GCP storage to local:

# Generate an ERA5 sample from GCP's GS storage
import xarray as xr
import gcsfs
gs_url = "gs://gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr"
chunk_sz = 48
gcp_ar_era5_subset = xr.open_zarr(gs_url, 
                    chunks={'time': chunk_sz},
                    consolidated=True)
start_date = '2020-12-31'
end_date = '2021-02-01'
gcp_ar_era5_subset = gcp_ar_era5_subset.sel(time=slice(start_date, end_date))

# Save ERA5 data subset to local 
gcp_ar_era5_subset.to_zarr('gcp_era5_subset.zarr')

Set YAML configuration file (recipe.yaml) for the Zarr to be convert to an anemoi dataset:

dates:
  start: 2020-12-31T00:00
  end: 2021-02-01T23:00
  frequency: 6h

input:
  xarray-zarr:
    url: "./gcp_era5_subset.zarr"
    param: [2m_temperature,
    10m_u_component_of_wind,
    geopotential,
    10m_v_component_of_wind,
    surface_pressure]

Execute anemoi-datasets create recipe.yaml gcp_era5_subset.zarr

HCookie · 2024-11-15T10:51:32Z

Hi, @CSyl
Thank you for providing your dataset script.
I will take a look next week and attempt to reproduce the error.

HCookie · 2024-11-21T14:29:59Z

As the data you are building an anemoi-dataset from is a source we cannot establish a resolution from, it is expected behaviour that it is None.
It is primarily used for metadata tracking, cataloguing, and inspection.
In #120, we are removing the check of the resolution, as expect for the line in which you crash on, it is not used.

CSyl added the bug Something isn't working label Oct 1, 2024

CSyl changed the title ~~During training, obtaining a resolution of None.~~ During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr Oct 9, 2024

HCookie mentioned this issue Nov 5, 2024

Update sanity checks for training data consistency #120

Open

mchantry assigned HCookie Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr #68

During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr #68

CSyl commented Oct 1, 2024 •

edited

Loading

mchantry commented Oct 30, 2024

CSyl commented Nov 5, 2024 •

edited

Loading

HCookie commented Nov 15, 2024 •

edited

Loading

HCookie commented Nov 21, 2024

During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr #68

During training, obtaining a Resolution of None Against an Anemoi Data Processed ERA5 Zarr #68

Comments

CSyl commented Oct 1, 2024 • edited Loading

What happened?

What are the steps to reproduce the bug?

Version

Platform (OS and architecture)

Relevant log output

Accompanying data

Organisation

mchantry commented Oct 30, 2024

CSyl commented Nov 5, 2024 • edited Loading

HCookie commented Nov 15, 2024 • edited Loading

HCookie commented Nov 21, 2024

CSyl commented Oct 1, 2024 •

edited

Loading

CSyl commented Nov 5, 2024 •

edited

Loading

HCookie commented Nov 15, 2024 •

edited

Loading