Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train only discs landmarks #56

Merged
merged 48 commits into from
Oct 2, 2024
Merged

Train only discs landmarks #56

merged 48 commits into from
Oct 2, 2024

Conversation

yw7
Copy link
Collaborator

@yw7 yw7 commented Sep 13, 2024

To fix errors in the vertebrae segmentation we train model using only disc landmarks:

image

We also update the model description with this Illustration:

Figure 1
We also update the model description with this Illustration

Related to #50

Revamped the training script to better accommodate dataset-specific requirements:
- Removed custom trainer initialization as it's no longer necessary.
- Updated nnUNet configurations for different training setups.
- Added conditional logic to adjust nnUNetPlans for dataset 103.
- Ensured dataset fingerprint extraction if missing.
- Streamlined preprocessing steps and ensured compatibility with plan configurations.
- Implemented checks to avoid redundant decompression of already processed datasets.

These changes make the script more flexible and efficient, reducing unnecessary steps and focusing on specific needs of different datasets.
Introduce parameter overrides for nnUNetTrainer, nnUNetPlanner, and nnUNetPlans to provide more flexibility in script configuration. Improved echo statements for clarity and corrected preprocessing checks to accurately prevent redundant operations.
Pinning nnunetv2 to version 2.4.2 due to compatibility issues with nnUNetTrainer_DASegOrd0_NoMirroring in version 2.5.1. Disabled `--verify_dataset_integrity` in training script as it is not functioning correctly in version 2.4.2.
Adjusted vertebrae numbering and label mappings to ensure consistency across the dataset and facilitate better alignment during the segmentation process. This change also simplifies the regions_class_order, removing unused labels. These updates should improve the accuracy of segmentation in the pipeline.
@yw7 yw7 added enhancement New feature or request invalid This doesn't seem right labels Sep 13, 2024
@yw7 yw7 self-assigned this Sep 13, 2024
Introduce a '--tries' argument to specify the number of retry attempts when copying files, addressing potential issues with unreliable file system operations. The default is set to 10 attempts. This change enhances robustness by managing transient errors and ensuring more reliable file transfers.
Corrected several typos in argument help descriptions to improve clarity and professionalism. Enhanced file copy logic by checking if the destination file exists before accessing its size, preventing potential errors and improving reliability.
…0921

Aligned dataset paths and version numbers in pyproject.toml and README.md to new release. Refactored scripts and JSON mappings for label consistency and clarity. Removed obsolete nnUNetTrainer additions to streamline inference. Updated augmentation examples and usage in scripts to reflect new labeling scheme. These adjustments enhance maintainability and ensure correct data processing workflows.
It should be done manually for nnUNetPlannerResEncL
Introduced a new argument for providing a mapping of input labels to
output segmentation labels. This facilitates custom label conversions,
such as mapping the input C1 label to a specified output label.

Updated related functions and modules to handle and apply this mapping
throughout the segmentation process. This enhancement increases
flexibility and accuracy in label assignments.
Enhanced landmark mapping logic to ensure landmarks are only mapped if they cover the majority of the associated voxels. This adjustment prevents improper landmark assignments and improves segmentation accuracy.
Added a filter to verify landmarks in segmentation data, ensuring accurate landmark mapping. This improves the reliability of landmark detection, addressing an issue where incorrect mapping could occur if landmarks were not properly isolated.
Enhanced the `_get_canal_centerline_indices` function to raise a `ValueError` when no canal labels are found in the segmentation data. This ensures that subsequent operations do not proceed with invalid or incomplete data, improving overall robustness and reliability.
Enhanced the segmentation to image transformation utility by introducing an interpolation option. This allows using 'nearest', 'linear', or 'label' interpolation methods, with the chosen method influencing the resampling process. Specifically, the 'label' option includes padding and center of mass calculations to preserve label integrity during transformation. This flexibility improves accuracy and application suitability for different types of segmentation data processing.
@yw7 yw7 removed the invalid This doesn't seem right label Sep 22, 2024
Refactor padding and dilation calculations to better account for voxel size discrepancies between image and segmentation. This ensures labels at the edge are appropriately managed and reduces interpolation artifacts during the transformation process. Fixes potential misalignment issues and improves accuracy.
Simplified and unified calculation of `pad_width` and `dilation_size` by converting them directly to integers. This reduces type casting and potential errors. Addresses readability and maintains functional integrity.
@yw7 yw7 mentioned this pull request Sep 24, 2024
Adjusted description of segmentation steps to clarify class breakdown and process. Added details on manual label generation and usage of additional public datasets for sacrum segmentation. Updated illustration and dataset references for accuracy.
Updated logic to correctly handle labels near padding boundaries in segmentation images. Previously, labels were derived from the original segmentation data, missing those added during padding. Now, labels are correctly identified from the padded segmentation data, and additional steps ensure labels extending beyond padding are mapped correctly. This resolves inaccuracies in center of mass calculations for boundary labels.
Simplified the logic for setting labels at center of mass by:
- Rounding and clipping indices directly within the loop.
- Eliminating redundant steps for handling labels outside of padding.
- Streamlining padding removal.

This improves readability and maintainability while ensuring label indices stay within valid bounds.
Simplified comprehensions for flattening nested lists used in label extraction across multiple scripts. Addressed minor typos and improved documentation for `iterative_label.py` to clarify the matching process of vertebrae and discs using localizer segmentations.

This increases code readability and maintainability.
Introduced a new argument `--region-default-sizes` to specify default disc/vertebrae counts for various spine regions. This addition enhances configurability, allowing users to define both maximum and default sizes separately. Updated relevant function signatures and documentation to incorporate this new parameter. This change addresses the need for finer control over segmentation parameters.
@NathanMolinier
Copy link
Collaborator

I just moved from my old version to this PR to run the code and I got really bad results.
res_step2_output

The most probable reason is that my old models (step1 and step2) were used to run this PR. Is there a way to make sure that we are using the appropriate weights ? Maybe we could attach the release name to the weights:

  • Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX
  • Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

@yw7
Copy link
Collaborator Author

yw7 commented Sep 29, 2024

I just moved from my old version to this PR to run the code and I got really bad results. res_step2_output

The most probable reason is that my old models (step1 and step2) were used to run this PR. Is there a way to make sure that we are using the appropriate weights ? Maybe we could attach the release name to the weights:

  • Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX
  • Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

Yes!! Definitely we should,
We will do it in the next release.
Also I wanted to discuss what the best way to make the script update the model? Should we remove and update if out of date? Prompt the user? add another arguments to overwrite existing? Or simply add to the readme to run rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results}

@NathanMolinier
Copy link
Collaborator

NathanMolinier commented Sep 29, 2024

Also I wanted to discuss what the best way to make the script update the model? Should we remove and update if out of date? Prompt the user? add another arguments to overwrite existing? Or simply add to the readme to run rm -rf $TOTALSPINESEG_DATA/*

I personally think that we should not expect the user to know if he's using the right models. But these recent changes are also really particular because we are changing the outputs of the networks which should not happen too frequently.

However, I still think that we should point to a specific release even inside the main code. This is already done when we download the weights so let's do what I suggested:

  • Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX
  • Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

In the pyproject.toml:

[project.urls]
homepage = "https://github.com/neuropoly/totalspineseg"
repository = "https://github.com/neuropoly/totalspineseg"
Dataset101_TotalSpineSeg_step1 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset101_TotalSpineSeg_step1.zip"
Dataset102_TotalSpineSeg_step2 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset102_TotalSpineSeg_step2.zip"

Concerning the old weights I think that we can keep them.

@yw7
Copy link
Collaborator Author

yw7 commented Sep 29, 2024

Also I wanted to discuss what the best way to make the script update the model? Should we remove and update if out of date? Prompt the user? add another arguments to overwrite existing? Or simply add to the readme to run rm -rf $TOTALSPINESEG_DATA/*

I personally think that we should not expect the user to know if he's using the right models. But these recent changes are also really particular because we are changing the outputs of the networks which should not happen too frequently.

However, I still think that we should point to a specific release even inside the main code. This is already done when we download the weights so let's do what I suggested:

  • Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX
  • Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

In the pyproject.toml:

[project.urls]
homepage = "https://github.com/neuropoly/totalspineseg"
repository = "https://github.com/neuropoly/totalspineseg"
Dataset101_TotalSpineSeg_step1 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset101_TotalSpineSeg_step1.zip"
Dataset102_TotalSpineSeg_step2 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset102_TotalSpineSeg_step2.zip"

Concerning the old weights I think that we can keep them.

Yess !! you definitely right! We should put the release name in the file mane.
The problem is that after extracting the model to results folder it is more complicates to know if it is in the right version, also if not, should we remove this release folder without prompting the user?
I think the easy way for us for now is to instruct the user to run rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results} after update.
But we can discuss other strategies

@NathanMolinier
Copy link
Collaborator

Why can't we keep the release inside the name of the folder after extraction ? Also, we should not remove anything and keep all the previous weights

@NathanMolinier
Copy link
Collaborator

Update --> I removed the weights from the folder and got the same bad results

@yw7
Copy link
Collaborator Author

yw7 commented Sep 29, 2024

rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results}

Did you run rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results} ?
you should remove both folders

@NathanMolinier
Copy link
Collaborator

NathanMolinier commented Sep 29, 2024

I removed the full folder data

Screenshot 2024-09-29 at 12 03 46

Updated the dataset URLs to include version-specific filenames for step1 and step2 datasets. This change enhances clarity and ensures that users download the correct dataset versions, reducing potential confusion with future releases.
Refactored logic to download pretrained model by fetching the download URL from package metadata in `pyproject.toml` instead of relying on local zip files. This ensures more reliable and up-to-date access to pretrained models, improving the robustness of the model installation process. If the zip file is not present locally, it is now downloaded directly using the metadata URL. Resolves potential issues where pretrained models might be missing.
@NathanMolinier
Copy link
Collaborator

Solution: I needed to rerun pip install -e . to deal with the updated pyproject.toml
res_step2_output_tags

Enhanced the retrieval of dataset URLs by parsing `pyproject.toml` directly instead of using the `importlib.metadata` package, improving flexibility.

Simplified installation steps for pretrained models by combining URL and dataset handling to reduce redundancy. These changes aim to make the code more maintainable and user-friendly, particularly when managing multiple datasets or custom URLs.
Corrected the directory check logic to properly validate the presence of both step1 and step2 models. This ensures that the inference will correctly switch to the release subfolder if either model is missing. This addresses initialization issues where the inference might wrongly assume models are available
Replaced manual parsing of 'pyproject.toml' with 'importlib.metadata' to streamline and improve URL extraction for datasets. This reduces complexity and eliminates the need for regular expression parsing, leading to more maintainable code.
Cleaned up the `inference.py` module by removing an unused import to improve code readability and maintainability. Reduces potential confusion and slight overhead associated with unnecessary imports. No functional changes introduced.
Added a note in the README advising users to reinstall the package after pulling new versions from GitHub. This ensures that the latest updates are applied correctly. Helps avoid potential issues from outdated dependencies.

Fixes issues related to update application.
shutil.copy2(src_path, dst_path)

# Try copying up to num_tries times
for attempt in range(num_tries):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't work the first time, is possible that it will work after ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes
In network drive like in compute canada

yw7 and others added 2 commits September 30, 2024 17:24
Co-authored-by: Nathan Molinier <nathan.molinier@gmail.com>
Co-authored-by: Nathan Molinier <nathan.molinier@gmail.com>
@NathanMolinier
Copy link
Collaborator

NathanMolinier commented Sep 30, 2024

I noticed an issue with the labeling algorithm: here the vertebra C6 does not appear and C5 is used 2 times

step1 raw step2 raw output with tags
sub-242472_acq-sagittal_T2w_sag_0 5_step1_raw sub-242472_acq-sagittal_T2w_sag_0 5_step2_raw sub-242472_acq-sagittal_T2w_sag_0 5_step2_output_tags

Removed the merging of vertebrae labels when no disc is found in-between and implemented a new logic to correctly handle cases where two discs without a vertebrae are found or the last vertebrae disc is missing. This ensures more accurate label assignments in these scenarios, particularly where only spinous processes are segmented.
@yw7
Copy link
Collaborator Author

yw7 commented Oct 1, 2024

I noticed an issue with the labeling algorithm: here the vertebra C6 does not appear and C5 is used 2 times

The problem is related to the way the algorithm sort the labels - it projects center of mass of each on the canal centerline then sort by the z index, while this strategy usually works, sometimes like in this case it is not. The ultimate way to solve this is to straighten the segmentation by the canal, like SCT do for the cord and then to project center of mass and sort by z axis.
To address this in the meantime, I've pushed a fix that look for 2 discs without vertebrae in between follow by 2 vertebrae without disc in between and switch them.
I hope this will make the algorithm more robust.
Anyway, our main mission here is labeling the cord which done by the discs in step one.

Copy link
Collaborator

@NathanMolinier NathanMolinier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be working fine for the rest. The labeling issue has to be fixed see #67 (comment).

Enhanced logic for correcting labeling errors where discs and vertebrae
are misaligned. Added checks and swaps for scenarios where discs are
at the top or bottom of the label sequence with adjacent vertebrae. This
improves segmentation accuracy, especially in edge cases with partially
segmented vertebrae.
Refreshed URLs for the 'Localizer' and 'Model Output Preview' images in the README to provide clearer visual examples. This ensures that users see up-to-date and accurate representations, improving the documentation's utility.
@yw7 yw7 merged commit 6c98d77 into main Oct 2, 2024
@yw7 yw7 deleted the train-only-discs-landmarks branch October 2, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace override with overwrite Suggestions for label improvements to make the model smaller
2 participants