Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

Closed
MyGitHub-G opened this issue Oct 23, 2024 · 6 comments
Closed

📚[DOC]: corrdiff- what is the meaning of cwb dataset? #697

MyGitHub-G opened this issue Oct 23, 2024 · 6 comments
Labels
? - Needs Triage Need team to review and classify

Comments

@MyGitHub-G
Copy link

How would you describe the priority of this documentation request

Critical (currently preventing usage)

Is this for new documentation, or an update to existing docs?

New

Describe the incorrect/future/missing documentation

what is the full name of cwb? i know the mean of cwa by reading the paper, but what is the meaning of cwb..py in the code? is it the code processing of the corrdiff cwa dataset?

@MyGitHub-G MyGitHub-G added the ? - Needs Triage Need team to review and classify label Oct 23, 2024
@MyGitHub-G
Copy link
Author

i download the dataset from https://catalog.ngc.nvidia.com/orgs/nvidia/teams/modulus/resources/modulus_datasets_cwa

i need to unzip it to cwa_dataset.zarr folder to use it, right? What's more, why the file in cwa_dataset.zarr is .0 / .1 , not the .nc file

@zomosky
Copy link

zomosky commented Oct 24, 2024

cwa_dateset is a type of .zarr file, and its open by dataloader in dataset.py and cwb.py (line 68), the mini_dataset (hrrr_mini) is a type of .nc file, because its size small (~ 2G), .zarr is a better choice of large dataset

@mnabian
Copy link
Collaborator

mnabian commented Oct 25, 2024

Yes, cwb.py processes the CWA dataset.
Make sure you use the NGC CLI to download the dataset: https://github.com/NVIDIA/modulus/tree/main/examples/generative/corrdiff#dataset--datapipe
Downloading with wget will corrupt the files.

@MyGitHub-G
Copy link
Author

MyGitHub-G commented Oct 30, 2024

thanks for your work.
I have an another question. i have to unzip the cwa_dateset.zip to cwa_dataset.zarr to use it, right?
What's the size of the dataset? I have tried to extract the data, but the disk is full. I wonder how big the entire dataset will be after extraction, so I can reserve disk space accordingly.
Image

@zomosky
Copy link

zomosky commented Oct 30, 2024

thanks for your work. I have an another question. i have to unzip the cwa_dateset.zip to cwa_dataset.zarr to use it, right? What's the size of the dataset? I have tried to extract the data, but the disk is full. I wonder how big the entire dataset will be after extraction, so I can reserve disk space accordingly. Image

看了一下解压后的文件,大概1.1T左右,如果需要预留硬盘空间的话,还需要加上原始压缩文件的大小,大概2T左右。顺便,mini版本的训练集可以更方便进行训练和测试,其大小只有2G左右,在4090这种显卡上计算也比较快。Nearly 1.1T after extract, before unzip the data, 2T for free in disk may better. By the way, the hrrr_mini dataset is smaller and more suitable for testing. Image

@MyGitHub-G
Copy link
Author

thanks for your work. I have an another question. i have to unzip the cwa_dateset.zip to cwa_dataset.zarr to use it, right? What's the size of the dataset? I have tried to extract the data, but the disk is full. I wonder how big the entire dataset will be after extraction, so I can reserve disk space accordingly. Image

看了一下解压后的文件,大概1.1T左右,如果需要预留硬盘空间的话,还需要加上原始压缩文件的大小,大概2T左右。Nearly 1.1T after extract, before unzip the data, 2T for free in disk may better Image

十分感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify
Projects
None yet
Development

No branches or pull requests

3 participants