This repo contains a mechanism to run multiple GaNDLF experiments on the UPenn CUBIC cluster.
- This repo will allow you to submit multiple GPU jobs on the cluster, which gives you great power; and you know what comes with that. If Mark's wrath falls upon you, you are on your own.
- Be intimately familiar with the data you are going to use.
- Be familiar with GaNDLF's usage, and try to do a single epoch training on the toy dataset.
- You have installed GaNDLF on your home directory or comp_space.
- You have run a single epoch of the GaNDLF training loop (training and validation) using your own data somewhere (either CUBIC cluster or own machine - doesn't matter), so that you know how to customize the configuration.
All configuration options can be changed depending on the experiment at hand.
- In the file
config_generator.py
, there are examples using which the various hyper-parameters can be altered to create different configurations. - Maximum flexibility is given to the user to decide the folder and configuration file structure.
- It is suggested that the user alters few hyper-parameters while keeping the rest consistent. This allows meaningful comparisons between different experiments.
- This repo allows the creation of such an extensive experimental design.
- The only requirement is that the configurations should be generated under a single folder structure. An example of such a structure exploring 2 different architectures for learning rates of
[0.1,0.01]
with optimizers of[adam,sgd]
is shown:
experiment_template_folder
│
| README.md
└───unet
│ │
│ | lr_0.1_adam.yaml
│ | lr_0.01_adam.yaml
│ | lr_0.1_sgd.yaml
│ | lr_0.01_sgd.yaml
│
└───transunet
│ │
│ | lr_0.1_adam.yaml
│ | lr_0.01_adam.yaml
│ | lr_0.1_sgd.yaml
│ | lr_0.01_sgd.yaml
│
│ ...
│
└───unetr
│ │ ...
- Once the experimental design has been established, the configurations can be generated using the
config_generator.py
script. - The user can edit this file to create the desired configurations.
python config_generator.py
python submitter.py -h
usage: GANDLF_Experiment_Submitter [-h] [-i] [-g] [-d] [-f] [-r] [-e] [-gpu] [-gpur]
Submit GaNDLF experiments on CUBIC Cluster.
Contact: software@cbica.upenn.edu
This program is NOT FDA/CE approved and NOT intended for clinical use.
Copyright (c) 2023 University of Pennsylvania. All rights reserved.
optional arguments:
-h, --help show this help message and exit
-i , --interpreter Full path of python interpreter to be called.
-g , --gandlfrun Full path of 'gandlf_run' script to be called.
-d , --datafile Full path to 'data.csv'.
-f , --foldertocopy Full path to the data folder to copy into the location in '$CBICA_TMP'.
-r , --runnerscript 'runner.sh' script to be called.
-e , --email Email address to be used for notifications.
-gpu , --gputype The parameter to pass after '-l' to the submit command.
-gpur , --gpuratio The number of jobs (starting from '0') to send to 'gpu' vs 'A40', since 'gpu' is more prevalent - ignores parameter `--gputype`.
The following command will collect the training and validation logs from all experiments and provide the best loss values along with specified metrics for each experiment:
python config_generator.py -c False
This will generate a file best_info.csv
in the current directory. This file can be used to generate a table of best results for each experiment.
- All parameters have some defaults, and should be changed based on the experiment at hand.
- Use this repo as template to create a new PRIVATE repo.
- Update common config properties as needed.
- Edit the
data.csv
file to fill in updated data list (channel list should not matter as long as it is consistent). Ensure you have read access to the data. This can be changed to separatetrain.csv
andval.csv
files if needed, which can be passed as comma-separated. - Run
python ./submitter.py
with correct options (OR change the defaults - whatever is easier) to submit the experiments.