📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

MyGitHub-G · 2024-10-23T03:22:02Z

How would you describe the priority of this documentation request

Critical (currently preventing usage)

Is this for new documentation, or an update to existing docs?

New

Describe the incorrect/future/missing documentation

what is the full name of cwb? i know the mean of cwa by reading the paper, but what is the meaning of cwb..py in the code? is it the code processing of the corrdiff cwa dataset?

MyGitHub-G · 2024-10-23T03:25:23Z

i download the dataset from https://catalog.ngc.nvidia.com/orgs/nvidia/teams/modulus/resources/modulus_datasets_cwa

i need to unzip it to cwa_dataset.zarr folder to use it, right? What's more, why the file in cwa_dataset.zarr is .0 / .1 , not the .nc file

zomosky · 2024-10-24T08:27:30Z

cwa_dateset is a type of .zarr file, and its open by dataloader in dataset.py and cwb.py (line 68), the mini_dataset (hrrr_mini) is a type of .nc file, because its size small (~ 2G), .zarr is a better choice of large dataset

mnabian · 2024-10-25T18:00:06Z

Yes, cwb.py processes the CWA dataset.
Make sure you use the NGC CLI to download the dataset: https://github.com/NVIDIA/modulus/tree/main/examples/generative/corrdiff#dataset--datapipe
Downloading with wget will corrupt the files.

MyGitHub-G · 2024-10-30T00:39:08Z

thanks for your work.
I have an another question. i have to unzip the cwa_dateset.zip to cwa_dataset.zarr to use it, right?
What's the size of the dataset? I have tried to extract the data, but the disk is full. I wonder how big the entire dataset will be after extraction, so I can reserve disk space accordingly.

zomosky · 2024-10-30T02:55:14Z

thanks for your work. I have an another question. i have to unzip the cwa_dateset.zip to cwa_dataset.zarr to use it, right? What's the size of the dataset? I have tried to extract the data, but the disk is full. I wonder how big the entire dataset will be after extraction, so I can reserve disk space accordingly.

看了一下解压后的文件，大概1.1T左右，如果需要预留硬盘空间的话，还需要加上原始压缩文件的大小，大概2T左右。顺便，mini版本的训练集可以更方便进行训练和测试，其大小只有2G左右，在4090这种显卡上计算也比较快。Nearly 1.1T after extract, before unzip the data, 2T for free in disk may better. By the way, the hrrr_mini dataset is smaller and more suitable for testing.

MyGitHub-G · 2024-10-30T03:17:17Z

thanks for your work. I have an another question. i have to unzip the cwa_dateset.zip to cwa_dataset.zarr to use it, right? What's the size of the dataset? I have tried to extract the data, but the disk is full. I wonder how big the entire dataset will be after extraction, so I can reserve disk space accordingly.

看了一下解压后的文件，大概1.1T左右，如果需要预留硬盘空间的话，还需要加上原始压缩文件的大小，大概2T左右。Nearly 1.1T after extract, before unzip the data, 2T for free in disk may better

十分感谢！

MyGitHub-G added the ? - Needs Triage Need team to review and classify label Oct 23, 2024

MyGitHub-G closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

MyGitHub-G commented Oct 23, 2024

MyGitHub-G commented Oct 23, 2024

zomosky commented Oct 24, 2024

mnabian commented Oct 25, 2024

MyGitHub-G commented Oct 30, 2024 •

edited

Loading

zomosky commented Oct 30, 2024 •

edited

Loading

MyGitHub-G commented Oct 30, 2024

📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

Comments

MyGitHub-G commented Oct 23, 2024

How would you describe the priority of this documentation request

Is this for new documentation, or an update to existing docs?

Describe the incorrect/future/missing documentation

MyGitHub-G commented Oct 23, 2024

zomosky commented Oct 24, 2024

mnabian commented Oct 25, 2024

MyGitHub-G commented Oct 30, 2024 • edited Loading

zomosky commented Oct 30, 2024 • edited Loading

MyGitHub-G commented Oct 30, 2024

MyGitHub-G commented Oct 30, 2024 •

edited

Loading

zomosky commented Oct 30, 2024 •

edited

Loading