-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major PR to update some inconsistencies in the specification. #726
Conversation
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Code Suggestions ✨Explore these optional code suggestions:
|
Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>
WalkthroughThe changes involve updates to multiple documentation files within the SDRF-Proteomics project. The Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
sdrf-proteomics/README.adoc (2)
118-118
: Consider enhancing the MONDO entry with a brief description and adjusting formatting.The addition of the Mondo Disease Ontology (MONDO) aligns with the PR objectives to use MONDO for disease definitions. However, to improve clarity and consistency:
- Consider adding a brief description of MONDO, as suggested in a previous review.
- Adjust the formatting to match the existing list style (add a tab at the beginning of the line).
Here's a suggested revision:
- - Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources + - Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resourcesThis change will provide more context about MONDO and maintain consistency with the existing list style.
Line range hint
1-118
: Overall document review and next stepsThe addition of the Mondo Disease Ontology (MONDO) to the list of supported ontologies is a positive step towards improving the SDRF-Proteomics specification. However, the current changes don't fully address all the objectives outlined in the PR summary. To complete this update:
Implement the remaining changes mentioned in the PR objectives, including:
- Updating references to EFO throughout the document
- Making the 'technology type' field required and positioning it after the 'assay name'
- Updating all relevant templates and referencing these changes in the document
Review the entire document for consistency with the new changes, especially sections that discuss disease ontologies or data acquisition methods.
Update the version number and date in the "Status of this document" section to reflect these significant changes.
Consider adding a brief section or note explaining the transition from EFO to MONDO for disease definitions, to help users understand the rationale behind this change.
These additional updates will ensure that the SDRF-Proteomics specification fully incorporates the intended improvements and maintains internal consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
templates/README.adoc (1)
Line range hint
20-38
: Remove duplicate "technology type" rowThe "technology type" row appears twice in the table (lines 20 and 38). This duplication seems unintentional and may cause confusion.
To fix this, please remove one of the duplicate rows. Since the PR objectives state that "technology type" should be positioned after "assay name", I suggest keeping the row on line 38 and removing the one on line 20.
Apply this diff to fix the issue:
|characteristics[cell type] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: -|technology type | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |characteristics[individual] | |:zero: |:zero: |:zero: |:zero: |:zero: |characteristics[cell line] | | | | | |:white_check_mark: |characteristics[biological replicate] |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
sdrf-proteomics/README.adoc (1)
249-249
: Typographical correction: Remove extra backtick in 'assay name'There's an extra backtick in the phrase "the `assay name`` column". Please remove the extra backtick for correct formatting.
Apply this diff to fix the typo:
- We RECOMMEND including the technology type column immediately after the `assay name`` column + We RECOMMEND including the technology type column immediately after the `assay name` column
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (7)
templates/sdrf-cell-line.sdrf.tsv
is excluded by!**/*.tsv
templates/sdrf-default.sdrf.tsv
is excluded by!**/*.tsv
templates/sdrf-human.sdrf.tsv
is excluded by!**/*.tsv
templates/sdrf-invertebrates.sdrf.tsv
is excluded by!**/*.tsv
templates/sdrf-nonvertebrates.sdrf.tsv
is excluded by!**/*.tsv
templates/sdrf-plants.sdrf.tsv
is excluded by!**/*.tsv
templates/sdrf-vertebrates.sdrf.tsv
is excluded by!**/*.tsv
📒 Files selected for processing (3)
- .gitignore (1 hunks)
- sdrf-proteomics/README.adoc (2 hunks)
- templates/README.adoc (2 hunks)
🧰 Additional context used
🔇 Additional comments (4)
.gitignore (2)
60-60
: Approval: Addition of.vscode/
to .gitignoreThe addition of
.vscode/
to the .gitignore file is a good practice. This prevents Visual Studio Code workspace settings from being shared in the repository, which is generally desirable as these settings are often user-specific.
58-59
: Clarify the purpose ofnew_path/
and/new_path/
entries.The newly added entries
new_path/
and/new_path/
are quite generic and could potentially lead to unintended ignoring of directories.
new_path/
will ignore any directory named "new_path" in any subdirectory of the project./new_path/
will ignore a directory named "new_path" only in the root of the project.Could you please clarify the specific purpose of these entries? If they are meant to ignore specific directories, it would be better to use more descriptive names to avoid accidentally ignoring important files or directories.
To help verify the impact of these new entries, you can run the following command:
This will show any existing directories that would be ignored by these new entries, helping to ensure no important directories are unintentionally ignored.
✅ Verification successful
The
.gitignore
entriesnew_path/
and/new_path/
do not currently exclude any directories.No existing directories match these patterns, so these entries have no effect. If they were added for a specific purpose, please provide clarification. Otherwise, consider removing them to keep the
.gitignore
file clean.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any existing directories that match the new ignore patterns echo "Directories matching 'new_path/' pattern:" fd -t d -H "new_path" echo "Directories matching '/new_path/' pattern (only in root):" fd -t d -H -d 1 "new_path"Length of output: 278
templates/README.adoc (1)
38-38
: Correct placement of "technology type" fieldThe placement of the "technology type" field after "assay name" and its requirement for all experiment types is correct and aligns with the PR objectives.
sdrf-proteomics/README.adoc (1)
246-258
: Excellent addition of the 'Technology type' sectionThe inclusion of the 'Technology type' section effectively clarifies the requirement of specifying the technology applied in the study. This aligns with the PR objectives and enhances the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main question is about the MONDO ontology. It's not immediately clear to me, does it include all of the EFO terms about disease? Can there be conflicts because of this change? What do we need to do in the validator to support it?
The other comment is about technology type terms, I suggest to directly list the ones we should use in proteomics.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
sdrf-proteomics/README.adoc (1)
118-118
: Approve addition of MONDO with suggestions for improvementThe addition of the Mondo Disease Ontology (MONDO) to the list of supported ontologies is a positive change that aligns with the PR objectives. However, there are two minor improvements that could be made:
- Consider adding a brief description of MONDO to provide context, as suggested in a previous review.
- Adjust the formatting to maintain consistency with the existing list style.
Here's a suggested revision:
-- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources. +- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- sdrf-proteomics/README.adoc (3 hunks)
🧰 Additional context used
📓 Learnings (1)
sdrf-proteomics/README.adoc (1)
Learnt from: ypriverol PR: bigbio/proteomics-sample-metadata#726 File: sdrf-proteomics/README.adoc:118-118 Timestamp: 2024-10-11T12:41:07.359Z Learning: For disease annotations, use the MONDO ontology instead of EFO in SDRF-Proteomics documents, as MONDO provides a broader range of definitions.
🔇 Additional comments (2)
sdrf-proteomics/README.adoc (2)
194-195
: LGTM: Updated disease annotation recommendationThe change to recommend using MONDO for diseases instead of EFO is well-explained and aligns with the PR objectives. This update provides clearer guidance for users and reflects the broader coverage of MONDO for disease terms.
246-263
: Approve new Technology type section with minor suggestionsThe addition of the Technology type section is a valuable improvement to the documentation. It clearly explains the concept, its importance, and provides useful examples. This aligns well with the PR objectives.
A couple of minor points:
- There's a small typo in the example table. An extra backtick should be removed:
-|sample 1| run 1 | proteomic profiling by mass spectrometry +|sample 1| run 1 | proteomic profiling by mass spectrometry
- The URL for technology types (line 258) is mentioned as currently unavailable. It would be helpful to verify if this URL is now accessible or if there's an estimated time for when it will be available.
To check the availability of the technology types URL, you can run:
If the URL is available, it should return a 200 status code.
✅ Verification successful
Approve new Technology type section with minor fixes
The Technology type section is a valuable addition to the documentation, providing clear explanations and useful examples.
A minor fix:
- Remove the extra backtick in the example table.
Additionally, the technology types URL is accessible.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash curl -I https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000663
Length of output: 729
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good except for a small typo. Also about the term: I found it in the PRIDE ontology here. but in the current annotations there are also occurrences of "proteomic profiling by mass spectrometry", which sounds even more natural. I don't see any synonyms in the ontology, so I guess "mass spectrometry" is now technically incorrect? Do you want to add these synonyms to the ontology? (maybe also "proteomics profiling", that would be 4 options though)
Co-authored-by: Lev Levitsky <lev.levitsky@phystech.edu>
Which is also in EFO and other ontologies but now we are controlling in PRIDE. I will update the term in my side to |
Should we also change it to "mass spectrometry" in all annotations then? |
I will after we agree on this PR ;) |
The following PR updates some major inconsistencies in the specification:
technology type
is now defined as REQUIRED in the specification and is recommended to be written afterassay name
. This is related to the following issues: We should make clear what is technology type #727 and The column technology typecannot be before the assay name -- ERROR sdrf-pipelines#177The current PR PRIDE-Archive/pride-ontology#108 in PRIDE ontology should also be merged before.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Documentation