Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major PR to update some inconsistencies in the specification. #726

Merged
merged 12 commits into from
Oct 11, 2024

Conversation

ypriverol
Copy link
Member

@ypriverol ypriverol commented Oct 1, 2024

The following PR updates some major inconsistencies in the specification:

The current PR PRIDE-Archive/pride-ontology#108 in PRIDE ontology should also be merged before.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Added support for the Mondo Disease Ontology (MONDO) in the documentation, enhancing disease classification.
    • Introduced a new section on "Technology type" to specify the technology applied in proteomics studies.
  • Documentation

    • Updated the README to include MONDO in the list of supported ontologies/controlled vocabularies, improving comprehensiveness.
    • Expanded the sample metadata templates section for clarity and usability, including links to specific template files for various experiment types.
    • Added a note regarding the necessity for regular updates to accommodate new use cases in the README documentation.
    • Enhanced the organization of the table outlining minimum sample attributes for different experiment types, including clearer indications of required and optional attributes.

Copy link
Contributor

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Documentation Update
The PR adds Mondo Disease Ontology (MONDO) to the list of supported ontologies/controlled vocabularies. Reviewer should verify if this addition is correct and if any related documentation or code changes are needed.

Copy link
Contributor

codiumai-pr-agent-pro bot commented Oct 1, 2024

PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Enhancement
✅ Add a brief description of the newly added ontology to provide context

Consider adding a brief description or explanation of the Mondo Disease Ontology
(MONDO) to provide context for users who may not be familiar with it. This can help
readers understand its relevance and importance in the list of supported ontologies.

sdrf-proteomics/README.adoc [118]

-+- Mondo Disease Ontology (MONDO)
++- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources

[Suggestion has been applied]

Suggestion importance[1-10]: 7

Why:

7
Best practice
✅ Adjust the formatting to maintain consistency with the existing list style

To maintain consistency with the existing list format, consider adding a hyphen (-)
at the beginning of the new line for the Mondo Disease Ontology. This will ensure
that the new entry matches the style of the other items in the list.

sdrf-proteomics/README.adoc [118]

-+- Mondo Disease Ontology (MONDO)
++	- Mondo Disease Ontology (MONDO)

[Suggestion has been applied]

Suggestion importance[1-10]: 7

Why:

7

💡 Need additional feedback ? start a PR chat

@ypriverol ypriverol changed the title Update README.adoc Major PR to update some inconsistencies in the specification. Oct 11, 2024
Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>
Copy link

coderabbitai bot commented Oct 11, 2024

Walkthrough

The changes involve updates to multiple documentation files within the SDRF-Proteomics project. The README.adoc file has been enhanced by adding the Mondo Disease Ontology (MONDO) to the supported ontologies section and introducing a new section on technology type. Additionally, the README.adoc in the templates directory has been modified to clarify sample metadata templates and experiment types. The .gitignore file has also been updated to include new patterns for ignored files and directories.

Changes

File Path Change Summary
sdrf-proteomics/README.adoc Added Mondo Disease Ontology (MONDO) and new "Technology type" section.
.gitignore Added .vscode/ and new_path/ to ignored files and directories.
templates/README.adoc Expanded sample metadata templates section and clarified experiment types.

Poem

In the garden of science, we grow,
With MONDO now added, our knowledge will flow.
Ontologies flourish, like flowers in bloom,
Enhancing our research, dispelling the gloom.
With each little change, our data takes flight,
A hop towards clarity, in the soft moonlight. 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between cfa8be3 and 5b5f5e2.

📒 Files selected for processing (1)
  • sdrf-proteomics/README.adoc (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdrf-proteomics/README.adoc

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
sdrf-proteomics/README.adoc (2)

118-118: Consider enhancing the MONDO entry with a brief description and adjusting formatting.

The addition of the Mondo Disease Ontology (MONDO) aligns with the PR objectives to use MONDO for disease definitions. However, to improve clarity and consistency:

  1. Consider adding a brief description of MONDO, as suggested in a previous review.
  2. Adjust the formatting to match the existing list style (add a tab at the beginning of the line).

Here's a suggested revision:

-	- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources
+	- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources

This change will provide more context about MONDO and maintain consistency with the existing list style.


Line range hint 1-118: Overall document review and next steps

The addition of the Mondo Disease Ontology (MONDO) to the list of supported ontologies is a positive step towards improving the SDRF-Proteomics specification. However, the current changes don't fully address all the objectives outlined in the PR summary. To complete this update:

  1. Implement the remaining changes mentioned in the PR objectives, including:

    • Updating references to EFO throughout the document
    • Making the 'technology type' field required and positioning it after the 'assay name'
    • Updating all relevant templates and referencing these changes in the document
  2. Review the entire document for consistency with the new changes, especially sections that discuss disease ontologies or data acquisition methods.

  3. Update the version number and date in the "Status of this document" section to reflect these significant changes.

  4. Consider adding a brief section or note explaining the transition from EFO to MONDO for disease definitions, to help users understand the rationale behind this change.

These additional updates will ensure that the SDRF-Proteomics specification fully incorporates the intended improvements and maintains internal consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ad76680 and 9370bb6.

📒 Files selected for processing (1)
  • sdrf-proteomics/README.adoc (1 hunks)
🧰 Additional context used

sdrf-proteomics/README.adoc Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
templates/README.adoc (1)

Line range hint 20-38: Remove duplicate "technology type" row

The "technology type" row appears twice in the table (lines 20 and 38). This duplication seems unintentional and may cause confusion.

To fix this, please remove one of the duplicate rows. Since the PR objectives state that "technology type" should be positioned after "assay name", I suggest keeping the row on line 38 and removing the one on line 20.

Apply this diff to fix the issue:

 |characteristics[cell type]             | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
-|technology type                        | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
 |characteristics[individual]            |                    |:zero:             |:zero:             |:zero:             |:zero:             |:zero:
 |characteristics[cell line]             |                    |                   |                   |                   |                   |:white_check_mark:
 |characteristics[biological replicate]  |:white_check_mark:  |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark:
sdrf-proteomics/README.adoc (1)

249-249: Typographical correction: Remove extra backtick in 'assay name'

There's an extra backtick in the phrase "the `assay name`` column". Please remove the extra backtick for correct formatting.

Apply this diff to fix the typo:

- We RECOMMEND including the technology type column immediately after the `assay name`` column
+ We RECOMMEND including the technology type column immediately after the `assay name` column
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 9370bb6 and b671ea4.

⛔ Files ignored due to path filters (7)
  • templates/sdrf-cell-line.sdrf.tsv is excluded by !**/*.tsv
  • templates/sdrf-default.sdrf.tsv is excluded by !**/*.tsv
  • templates/sdrf-human.sdrf.tsv is excluded by !**/*.tsv
  • templates/sdrf-invertebrates.sdrf.tsv is excluded by !**/*.tsv
  • templates/sdrf-nonvertebrates.sdrf.tsv is excluded by !**/*.tsv
  • templates/sdrf-plants.sdrf.tsv is excluded by !**/*.tsv
  • templates/sdrf-vertebrates.sdrf.tsv is excluded by !**/*.tsv
📒 Files selected for processing (3)
  • .gitignore (1 hunks)
  • sdrf-proteomics/README.adoc (2 hunks)
  • templates/README.adoc (2 hunks)
🧰 Additional context used
🔇 Additional comments (4)
.gitignore (2)

60-60: Approval: Addition of .vscode/ to .gitignore

The addition of .vscode/ to the .gitignore file is a good practice. This prevents Visual Studio Code workspace settings from being shared in the repository, which is generally desirable as these settings are often user-specific.


58-59: Clarify the purpose of new_path/ and /new_path/ entries.

The newly added entries new_path/ and /new_path/ are quite generic and could potentially lead to unintended ignoring of directories.

  • new_path/ will ignore any directory named "new_path" in any subdirectory of the project.
  • /new_path/ will ignore a directory named "new_path" only in the root of the project.

Could you please clarify the specific purpose of these entries? If they are meant to ignore specific directories, it would be better to use more descriptive names to avoid accidentally ignoring important files or directories.

To help verify the impact of these new entries, you can run the following command:

This will show any existing directories that would be ignored by these new entries, helping to ensure no important directories are unintentionally ignored.

✅ Verification successful

The .gitignore entries new_path/ and /new_path/ do not currently exclude any directories.

No existing directories match these patterns, so these entries have no effect. If they were added for a specific purpose, please provide clarification. Otherwise, consider removing them to keep the .gitignore file clean.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any existing directories that match the new ignore patterns

echo "Directories matching 'new_path/' pattern:"
fd -t d -H "new_path"

echo "Directories matching '/new_path/' pattern (only in root):"
fd -t d -H -d 1 "new_path"

Length of output: 278

templates/README.adoc (1)

38-38: Correct placement of "technology type" field

The placement of the "technology type" field after "assay name" and its requirement for all experiment types is correct and aligns with the PR objectives.

sdrf-proteomics/README.adoc (1)

246-258: Excellent addition of the 'Technology type' section

The inclusion of the 'Technology type' section effectively clarifies the requirement of specifying the technology applied in the study. This aligns with the PR objectives and enhances the documentation.

sdrf-proteomics/README.adoc Show resolved Hide resolved
Copy link
Collaborator

@levitsky levitsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question is about the MONDO ontology. It's not immediately clear to me, does it include all of the EFO terms about disease? Can there be conflicts because of this change? What do we need to do in the validator to support it?

The other comment is about technology type terms, I suggest to directly list the ones we should use in proteomics.

sdrf-proteomics/README.adoc Outdated Show resolved Hide resolved
@ypriverol
Copy link
Member Author

My main question is about the MONDO ontology. It's not immediately clear to me, does it include all of the EFO terms about disease? Can there be conflicts because of this change? What do we need to do in the validator to support it?

The other comment is about technology type terms, I suggest to directly list the ones we should use in proteomics.

  • For the validator, this only means that we will look for diseases now in MONGO and EFO.
  • The URL of the technology type is not available yet because it needs to be indexed in OLS. I will also update the terms as you suggested in the specification because it will not be a long list.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
sdrf-proteomics/README.adoc (1)

118-118: Approve addition of MONDO with suggestions for improvement

The addition of the Mondo Disease Ontology (MONDO) to the list of supported ontologies is a positive change that aligns with the PR objectives. However, there are two minor improvements that could be made:

  1. Consider adding a brief description of MONDO to provide context, as suggested in a previous review.
  2. Adjust the formatting to maintain consistency with the existing list style.

Here's a suggested revision:

-- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources.
+- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4129802 and 428d16a.

📒 Files selected for processing (1)
  • sdrf-proteomics/README.adoc (3 hunks)
🧰 Additional context used
📓 Learnings (1)
sdrf-proteomics/README.adoc (1)
Learnt from: ypriverol
PR: bigbio/proteomics-sample-metadata#726
File: sdrf-proteomics/README.adoc:118-118
Timestamp: 2024-10-11T12:41:07.359Z
Learning: For disease annotations, use the MONDO ontology instead of EFO in SDRF-Proteomics documents, as MONDO provides a broader range of definitions.
🔇 Additional comments (2)
sdrf-proteomics/README.adoc (2)

194-195: LGTM: Updated disease annotation recommendation

The change to recommend using MONDO for diseases instead of EFO is well-explained and aligns with the PR objectives. This update provides clearer guidance for users and reflects the broader coverage of MONDO for disease terms.


246-263: Approve new Technology type section with minor suggestions

The addition of the Technology type section is a valuable improvement to the documentation. It clearly explains the concept, its importance, and provides useful examples. This aligns well with the PR objectives.

A couple of minor points:

  1. There's a small typo in the example table. An extra backtick should be removed:
-|sample 1|  run 1           | proteomic profiling by mass spectrometry                   
+|sample 1|  run 1           | proteomic profiling by mass spectrometry
  1. The URL for technology types (line 258) is mentioned as currently unavailable. It would be helpful to verify if this URL is now accessible or if there's an estimated time for when it will be available.

To check the availability of the technology types URL, you can run:

If the URL is available, it should return a 200 status code.

✅ Verification successful

Approve new Technology type section with minor fixes

The Technology type section is a valuable addition to the documentation, providing clear explanations and useful examples.

A minor fix:

  1. Remove the extra backtick in the example table.

Additionally, the technology types URL is accessible.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
curl -I https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000663

Length of output: 729

Copy link
Collaborator

@levitsky levitsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good except for a small typo. Also about the term: I found it in the PRIDE ontology here. but in the current annotations there are also occurrences of "proteomic profiling by mass spectrometry", which sounds even more natural. I don't see any synonyms in the ontology, so I guess "mass spectrometry" is now technically incorrect? Do you want to add these synonyms to the ontology? (maybe also "proteomics profiling", that would be 4 options though)

sdrf-proteomics/README.adoc Outdated Show resolved Hide resolved
Co-authored-by: Lev Levitsky <lev.levitsky@phystech.edu>
@ypriverol
Copy link
Member Author

ypriverol commented Oct 11, 2024

This looks good except for a small typo. Also about the term: I found it in the PRIDE ontology here. but in the current annotations there are also occurrences of "proteomic profiling by mass spectrometry", which sounds even more natural. I don't see any synonyms in the ontology, so I guess "mass spectrometry" is now technically incorrect? Do you want to add these synonyms to the ontology? (maybe also "proteomics profiling", that would be 4 options though)

  • The terms were added last night, but still need to be indexed in OLS.
  • The term added is:
[Term]
id: PRIDE:0000664
name: proteomic profiling by mass spectrometry
def: "An assay where proteins in a sample are detected, quantified or otherwise analysed using mass spectrometry" [EFO:EFO_0002766]
is_a: PRIDE:0000663 ! technology type

Which is also in EFO and other ontologies but now we are controlling in PRIDE. I will update the term in my side to proteomic profiling by mass spectrometry

@levitsky
Copy link
Collaborator

Should we also change it to "mass spectrometry" in all annotations then?

@ypriverol
Copy link
Member Author

Should we also change it to "mass spectrometry" in all annotations then?

I will after we agree on this PR ;)

@ypriverol ypriverol merged commit 93d7199 into master Oct 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants