Train only discs landmarks #56

yw7 · 2024-09-13T08:48:24Z

To fix errors in the vertebrae segmentation we train model using only disc landmarks:

We also update the model description with this Illustration:

We also update the model description with this Illustration

Related to #50

Revamped the training script to better accommodate dataset-specific requirements: - Removed custom trainer initialization as it's no longer necessary. - Updated nnUNet configurations for different training setups. - Added conditional logic to adjust nnUNetPlans for dataset 103. - Ensured dataset fingerprint extraction if missing. - Streamlined preprocessing steps and ensured compatibility with plan configurations. - Implemented checks to avoid redundant decompression of already processed datasets. These changes make the script more flexible and efficient, reducing unnecessary steps and focusing on specific needs of different datasets.

Regarding #40

Introduce parameter overrides for nnUNetTrainer, nnUNetPlanner, and nnUNetPlans to provide more flexibility in script configuration. Improved echo statements for clarity and corrected preprocessing checks to accurately prevent redundant operations.

Pinning nnunetv2 to version 2.4.2 due to compatibility issues with nnUNetTrainer_DASegOrd0_NoMirroring in version 2.5.1. Disabled `--verify_dataset_integrity` in training script as it is not functioning correctly in version 2.4.2.

Adjusted vertebrae numbering and label mappings to ensure consistency across the dataset and facilitate better alignment during the segmentation process. This change also simplifies the regions_class_order, removing unused labels. These updates should improve the accuracy of segmentation in the pipeline.

Introduce a '--tries' argument to specify the number of retry attempts when copying files, addressing potential issues with unreliable file system operations. The default is set to 10 attempts. This change enhances robustness by managing transient errors and ensuring more reliable file transfers.

Corrected several typos in argument help descriptions to improve clarity and professionalism. Enhanced file copy logic by checking if the destination file exists before accessing its size, preventing potential errors and improving reliability.

…0921 Aligned dataset paths and version numbers in pyproject.toml and README.md to new release. Refactored scripts and JSON mappings for label consistency and clarity. Removed obsolete nnUNetTrainer additions to streamline inference. Updated augmentation examples and usage in scripts to reflect new labeling scheme. These adjustments enhance maintainability and ensure correct data processing workflows.

It should be done manually for nnUNetPlannerResEncL

Introduced a new argument for providing a mapping of input labels to output segmentation labels. This facilitates custom label conversions, such as mapping the input C1 label to a specified output label. Updated related functions and modules to handle and apply this mapping throughout the segmentation process. This enhancement increases flexibility and accuracy in label assignments.

Enhanced landmark mapping logic to ensure landmarks are only mapped if they cover the majority of the associated voxels. This adjustment prevents improper landmark assignments and improves segmentation accuracy.

Added a filter to verify landmarks in segmentation data, ensuring accurate landmark mapping. This improves the reliability of landmark detection, addressing an issue where incorrect mapping could occur if landmarks were not properly isolated.

Enhanced the `_get_canal_centerline_indices` function to raise a `ValueError` when no canal labels are found in the segmentation data. This ensures that subsequent operations do not proceed with invalid or incomplete data, improving overall robustness and reliability.

Enhanced the segmentation to image transformation utility by introducing an interpolation option. This allows using 'nearest', 'linear', or 'label' interpolation methods, with the chosen method influencing the resampling process. Specifically, the 'label' option includes padding and center of mass calculations to preserve label integrity during transformation. This flexibility improves accuracy and application suitability for different types of segmentation data processing.

Refactor padding and dilation calculations to better account for voxel size discrepancies between image and segmentation. This ensures labels at the edge are appropriately managed and reduces interpolation artifacts during the transformation process. Fixes potential misalignment issues and improves accuracy.

Simplified and unified calculation of `pad_width` and `dilation_size` by converting them directly to integers. This reduces type casting and potential errors. Addresses readability and maintains functional integrity.

Adjusted description of segmentation steps to clarify class breakdown and process. Added details on manual label generation and usage of additional public datasets for sacrum segmentation. Updated illustration and dataset references for accuracy.

Updated logic to correctly handle labels near padding boundaries in segmentation images. Previously, labels were derived from the original segmentation data, missing those added during padding. Now, labels are correctly identified from the padded segmentation data, and additional steps ensure labels extending beyond padding are mapped correctly. This resolves inaccuracies in center of mass calculations for boundary labels.

Simplified the logic for setting labels at center of mass by: - Rounding and clipping indices directly within the loop. - Eliminating redundant steps for handling labels outside of padding. - Streamlining padding removal. This improves readability and maintainability while ensuring label indices stay within valid bounds.

Simplified comprehensions for flattening nested lists used in label extraction across multiple scripts. Addressed minor typos and improved documentation for `iterative_label.py` to clarify the matching process of vertebrae and discs using localizer segmentations. This increases code readability and maintainability.

Introduced a new argument `--region-default-sizes` to specify default disc/vertebrae counts for various spine regions. This addition enhances configurability, allowing users to define both maximum and default sizes separately. Updated relevant function signatures and documentation to incorporate this new parameter. This change addresses the need for finer control over segmentation parameters.

NathanMolinier · 2024-09-29T15:17:19Z

I just moved from my old version to this PR to run the code and I got really bad results.

The most probable reason is that my old models (step1 and step2) were used to run this PR. Is there a way to make sure that we are using the appropriate weights ? Maybe we could attach the release name to the weights:

Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX
Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

yw7 · 2024-09-29T15:26:23Z

I just moved from my old version to this PR to run the code and I got really bad results.

The most probable reason is that my old models (step1 and step2) were used to run this PR. Is there a way to make sure that we are using the appropriate weights ? Maybe we could attach the release name to the weights:

Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX

Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

Yes!! Definitely we should,
We will do it in the next release.
Also I wanted to discuss what the best way to make the script update the model? Should we remove and update if out of date? Prompt the user? add another arguments to overwrite existing? Or simply add to the readme to run rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results}

NathanMolinier · 2024-09-29T15:35:33Z

Also I wanted to discuss what the best way to make the script update the model? Should we remove and update if out of date? Prompt the user? add another arguments to overwrite existing? Or simply add to the readme to run rm -rf $TOTALSPINESEG_DATA/*

I personally think that we should not expect the user to know if he's using the right models. But these recent changes are also really particular because we are changing the outputs of the networks which should not happen too frequently.

However, I still think that we should point to a specific release even inside the main code. This is already done when we download the weights so let's do what I suggested:

Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX
Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

In the pyproject.toml:

[project.urls]
homepage = "https://github.com/neuropoly/totalspineseg"
repository = "https://github.com/neuropoly/totalspineseg"
Dataset101_TotalSpineSeg_step1 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset101_TotalSpineSeg_step1.zip"
Dataset102_TotalSpineSeg_step2 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset102_TotalSpineSeg_step2.zip"

Concerning the old weights I think that we can keep them.

yw7 · 2024-09-29T15:50:33Z

Also I wanted to discuss what the best way to make the script update the model? Should we remove and update if out of date? Prompt the user? add another arguments to overwrite existing? Or simply add to the readme to run rm -rf $TOTALSPINESEG_DATA/*

I personally think that we should not expect the user to know if he's using the right models. But these recent changes are also really particular because we are changing the outputs of the networks which should not happen too frequently.

However, I still think that we should point to a specific release even inside the main code. This is already done when we download the weights so let's do what I suggested:

Dataset101_TotalSpineSeg_step1 --> Dataset101_TotalSpineSeg_step1_rXXXXX

Dataset102_TotalSpineSeg_step2 --> Dataset102_TotalSpineSeg_step2_rXXXXX

In the pyproject.toml:
[project.urls]
homepage = "https://github.com/neuropoly/totalspineseg"
repository = "https://github.com/neuropoly/totalspineseg"
Dataset101_TotalSpineSeg_step1 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset101_TotalSpineSeg_step1.zip"
Dataset102_TotalSpineSeg_step2 = "https://github.com/neuropoly/totalspineseg/releases/download/r20240921/Dataset102_TotalSpineSeg_step2.zip"
Concerning the old weights I think that we can keep them.

Yess !! you definitely right! We should put the release name in the file mane.
The problem is that after extracting the model to results folder it is more complicates to know if it is in the right version, also if not, should we remove this release folder without prompting the user?
I think the easy way for us for now is to instruct the user to run rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results} after update.
But we can discuss other strategies

NathanMolinier · 2024-09-29T15:52:28Z

Why can't we keep the release inside the name of the folder after extraction ? Also, we should not remove anything and keep all the previous weights

NathanMolinier · 2024-09-29T15:58:07Z

Update --> I removed the weights from the folder and got the same bad results

yw7 · 2024-09-29T16:02:39Z

rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results}

Did you run rm -rf $TOTALSPINESEG_DATA/nnUNet/{exports,results} ?
you should remove both folders

NathanMolinier · 2024-09-29T16:03:14Z

I removed the full folder data

Updated the dataset URLs to include version-specific filenames for step1 and step2 datasets. This change enhances clarity and ensures that users download the correct dataset versions, reducing potential confusion with future releases.

Refactored logic to download pretrained model by fetching the download URL from package metadata in `pyproject.toml` instead of relying on local zip files. This ensures more reliable and up-to-date access to pretrained models, improving the robustness of the model installation process. If the zip file is not present locally, it is now downloaded directly using the metadata URL. Resolves potential issues where pretrained models might be missing.

NathanMolinier · 2024-09-29T19:01:57Z

Solution: I needed to rerun pip install -e . to deal with the updated pyproject.toml

Enhanced the retrieval of dataset URLs by parsing `pyproject.toml` directly instead of using the `importlib.metadata` package, improving flexibility. Simplified installation steps for pretrained models by combining URL and dataset handling to reduce redundancy. These changes aim to make the code more maintainable and user-friendly, particularly when managing multiple datasets or custom URLs.

Corrected the directory check logic to properly validate the presence of both step1 and step2 models. This ensures that the inference will correctly switch to the release subfolder if either model is missing. This addresses initialization issues where the inference might wrongly assume models are available

Replaced manual parsing of 'pyproject.toml' with 'importlib.metadata' to streamline and improve URL extraction for datasets. This reduces complexity and eliminates the need for regular expression parsing, leading to more maintainable code.

Cleaned up the `inference.py` module by removing an unused import to improve code readability and maintainability. Reduces potential confusion and slight overhead associated with unnecessary imports. No functional changes introduced.

Added a note in the README advising users to reinstall the package after pulling new versions from GitHub. This ensures that the latest updates are applied correctly. Helps avoid potential issues from outdated dependencies. Fixes issues related to update application.

NathanMolinier · 2024-09-30T14:15:44Z

totalspineseg/utils/cpdir.py

-    shutil.copy2(src_path, dst_path)
+
+    # Try copying up to num_tries times
+    for attempt in range(num_tries):


If it doesn't work the first time, is possible that it will work after ?

Yes
In network drive like in compute canada

totalspineseg/utils/extract_alternate.py

Co-authored-by: Nathan Molinier <nathan.molinier@gmail.com>

NathanMolinier · 2024-09-30T14:29:01Z

I noticed an issue with the labeling algorithm: here the vertebra C6 does not appear and C5 is used 2 times

step1 raw	step2 raw	output with tags

Removed the merging of vertebrae labels when no disc is found in-between and implemented a new logic to correctly handle cases where two discs without a vertebrae are found or the last vertebrae disc is missing. This ensures more accurate label assignments in these scenarios, particularly where only spinous processes are segmented.

…alspineseg into train-only-discs-landmarks

yw7 · 2024-10-01T08:25:54Z

I noticed an issue with the labeling algorithm: here the vertebra C6 does not appear and C5 is used 2 times

The problem is related to the way the algorithm sort the labels - it projects center of mass of each on the canal centerline then sort by the z index, while this strategy usually works, sometimes like in this case it is not. The ultimate way to solve this is to straighten the segmentation by the canal, like SCT do for the cord and then to project center of mass and sort by z axis.
To address this in the meantime, I've pushed a fix that look for 2 discs without vertebrae in between follow by 2 vertebrae without disc in between and switch them.
I hope this will make the algorithm more robust.
Anyway, our main mission here is labeling the cord which done by the discs in step one.

NathanMolinier

Seems to be working fine for the rest. The labeling issue has to be fixed see #67 (comment).

Enhanced logic for correcting labeling errors where discs and vertebrae are misaligned. Added checks and swaps for scenarios where discs are at the top or bottom of the label sequence with adjacent vertebrae. This improves segmentation accuracy, especially in edge cases with partially segmented vertebrae.

Refreshed URLs for the 'Localizer' and 'Model Output Preview' images in the README to provide clearer visual examples. This ensures that users see up-to-date and accurate representations, improving the documentation's utility.

yw7 added 7 commits September 12, 2024 22:53

Refactor JSON label mappings for consistency

133b30c

Regarding #40

Fix dependencies format in pyproject.toml

6c956b8

Fix preprocessing check in train.sh script

ab4c342

Pin nnunetv2 to 2.4.2 and disable integrity check

b47a9c9

Pinning nnunetv2 to version 2.4.2 due to compatibility issues with nnUNetTrainer_DASegOrd0_NoMirroring in version 2.5.1. Disabled `--verify_dataset_integrity` in training script as it is not functioning correctly in version 2.4.2.

yw7 added enhancement New feature or request invalid This doesn't seem right labels Sep 13, 2024

yw7 self-assigned this Sep 13, 2024

yw7 added 10 commits September 16, 2024 09:38

Merge branch 'change-labels' into train-only-discs-landmarks

3014b69

Remove _small Plan handeling

1707d2d

It should be done manually for nnUNetPlannerResEncL

Improve robustness of landmark mapping in segmentation

653f786

Enhanced landmark mapping logic to ensure landmarks are only mapped if they cover the majority of the associated voxels. This adjustment prevents improper landmark assignments and improves segmentation accuracy.

Improve landmark detection in segmentation

01827c6

Added a filter to verify landmarks in segmentation data, ensuring accurate landmark mapping. This improves the reliability of landmark detection, addressing an issue where incorrect mapping could occur if landmarks were not properly isolated.

yw7 removed the invalid This doesn't seem right label Sep 22, 2024

yw7 added 2 commits September 24, 2024 02:38

Refactor padding and dilation calculations

16a7fcc

Simplified and unified calculation of `pad_width` and `dilation_size` by converting them directly to integers. This reduces type casting and potential errors. Addresses readability and maintains functional integrity.

yw7 mentioned this pull request Sep 24, 2024

Images storage for README #21

Open

yw7 requested a review from NathanMolinier September 24, 2024 12:24

yw7 added 4 commits September 25, 2024 12:27

yw7 added 2 commits September 29, 2024 20:44

Update dataset URLs with versioned filenames

2060bd9

Updated the dataset URLs to include version-specific filenames for step1 and step2 datasets. This change enhances clarity and ensures that users download the correct dataset versions, reducing potential confusion with future releases.

yw7 added 5 commits September 29, 2024 23:29

Remove unused import in inference module

74c329a

Cleaned up the `inference.py` module by removing an unused import to improve code readability and maintainability. Reduces potential confusion and slight overhead associated with unnecessary imports. No functional changes introduced.

NathanMolinier reviewed Sep 30, 2024

View reviewed changes

totalspineseg/utils/extract_alternate.py Outdated Show resolved Hide resolved

NathanMolinier reviewed Sep 30, 2024

View reviewed changes

totalspineseg/utils/extract_alternate.py Outdated Show resolved Hide resolved

yw7 and others added 2 commits September 30, 2024 17:24

Update extract_alternate.py

f74255a

Co-authored-by: Nathan Molinier <nathan.molinier@gmail.com>

Update extract_alternate.py

d0a4728

Co-authored-by: Nathan Molinier <nathan.molinier@gmail.com>

yw7 added 2 commits October 1, 2024 10:51

Merge branch 'train-only-discs-landmarks' of github.com:neuropoly/tot…

2d69358

…alspineseg into train-only-discs-landmarks

NathanMolinier approved these changes Oct 1, 2024

View reviewed changes

yw7 added 2 commits October 1, 2024 23:38

Update README image links for better visual clarity

1781c3c

Refreshed URLs for the 'Localizer' and 'Model Output Preview' images in the README to provide clearer visual examples. This ensures that users see up-to-date and accurate representations, improving the documentation's utility.

yw7 merged commit 6c98d77 into main Oct 2, 2024

yw7 deleted the train-only-discs-landmarks branch October 2, 2024 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train only discs landmarks #56

Train only discs landmarks #56

yw7 commented Sep 13, 2024 •

edited

Loading

NathanMolinier commented Sep 29, 2024

yw7 commented Sep 29, 2024 •

edited

Loading

NathanMolinier commented Sep 29, 2024 •

edited

Loading

yw7 commented Sep 29, 2024

NathanMolinier commented Sep 29, 2024

NathanMolinier commented Sep 29, 2024

yw7 commented Sep 29, 2024

NathanMolinier commented Sep 29, 2024 •

edited

Loading

NathanMolinier commented Sep 29, 2024

NathanMolinier Sep 30, 2024

yw7 Sep 30, 2024

NathanMolinier commented Sep 30, 2024 •

edited

Loading

yw7 commented Oct 1, 2024 •

edited

Loading

NathanMolinier left a comment •

edited

Loading

Train only discs landmarks #56

Train only discs landmarks #56

Conversation

yw7 commented Sep 13, 2024 • edited Loading

NathanMolinier commented Sep 29, 2024

yw7 commented Sep 29, 2024 • edited Loading

NathanMolinier commented Sep 29, 2024 • edited Loading

yw7 commented Sep 29, 2024

NathanMolinier commented Sep 29, 2024

NathanMolinier commented Sep 29, 2024

yw7 commented Sep 29, 2024

NathanMolinier commented Sep 29, 2024 • edited Loading

NathanMolinier commented Sep 29, 2024

NathanMolinier Sep 30, 2024

Choose a reason for hiding this comment

yw7 Sep 30, 2024

Choose a reason for hiding this comment

NathanMolinier commented Sep 30, 2024 • edited Loading

yw7 commented Oct 1, 2024 • edited Loading

NathanMolinier left a comment • edited Loading

Choose a reason for hiding this comment

yw7 commented Sep 13, 2024 •

edited

Loading

yw7 commented Sep 29, 2024 •

edited

Loading

NathanMolinier commented Sep 29, 2024 •

edited

Loading

NathanMolinier commented Sep 29, 2024 •

edited

Loading

NathanMolinier commented Sep 30, 2024 •

edited

Loading

yw7 commented Oct 1, 2024 •

edited

Loading

NathanMolinier left a comment •

edited

Loading