From 863c01f341a4af9620795e621cd8e836fb95d768 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Tue, 1 Oct 2024 14:34:20 +0100 Subject: [PATCH 01/10] Update README.adoc --- sdrf-proteomics/README.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index 346ed82c..c540e568 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -115,6 +115,7 @@ The list of ontologies/controlled vocabularies (CV) supported are: - NCBI organismal classification - PATO - the Phenotype and Trait Ontology - PRIDE Controlled Vocabulary (CV) +- Mondo Disease Ontology (MONDO) [[sdrf-file-format]] == SDRF-Proteomics file format From 8b725457398bf017ef8207c95299ba7f15de82ce Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Thu, 10 Oct 2024 19:08:11 +0100 Subject: [PATCH 02/10] ignore .vscode --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 9d36652b..5b10a738 100644 --- a/.gitignore +++ b/.gitignore @@ -57,3 +57,4 @@ release.properties new_path/ /new_path/ +.vscode/ From 287788d979e5181f4958a2caafcb224883a4fbb4 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 09:14:42 +0100 Subject: [PATCH 03/10] major changes related with technology type --- sdrf-proteomics/README.adoc | 14 ++++++++++++++ templates/sdrf-cell-line.sdrf.tsv | 2 +- templates/sdrf-default.sdrf.tsv | 2 +- templates/sdrf-human.sdrf.tsv | 2 +- templates/sdrf-invertebrates.sdrf.tsv | 2 +- templates/sdrf-nonvertebrates.sdrf.tsv | 1 - templates/sdrf-plants.sdrf.tsv | 2 +- templates/sdrf-vertebrates.sdrf.tsv | 2 +- 8 files changed, 20 insertions(+), 7 deletions(-) delete mode 100644 templates/sdrf-nonvertebrates.sdrf.tsv diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index c540e568..f7902ecd 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -243,6 +243,20 @@ The model of the mass spectrometer SHOULD be specified as _comment[instrument]_. Additionally, it is strongly RECOMMENDED to include comment[MS2 analyzer type]. This is important, e.g., for Orbitrap models where MS2 scans can be acquired either in the Orbitrap or in the ion trap. Setting this value allows differentiating high-resolution MS/MS data. Possible values of _comment[MS2 analyzer type]_ are mass analyzer types. +[[technology-type]] +=== Technology type + +Technology type is used in SDRF and MAGE-TAB formats to specify the technology applied in the study to capture the data. For transcriptomics, common values include technologies such as microarray, RNA-seq, and ChIP-seq (as seen in https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-13567[ArrayExpress Example]). In SDRF-Proteomics, the technology type field is REQUIRED to describe the experimental approach used to generate the data. We RECOMMEND including the technology type column immediately after the `assay name`` column in the SDRF file, clearly indicating which technology was used to produce the data files. + +|=== +| | assay name | technology type +|sample 1| run 1 | proteomic profiling by mass spectrometry +|=== + +NOTE: While we RECOMMEND positioning the technology type column after the assay name, in some original templates, this column was placed before the assay name. We will allow the technology type column to appear either directly before or after the assay name column but RECOMMEND placing it after the assay name for consistency. + +For proteomics experiments the possible values for technology types can be obtained from PRIDE Ontology term https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000663[technology type]. + [[additional-data-files]] === Additional Data files technical properties diff --git a/templates/sdrf-cell-line.sdrf.tsv b/templates/sdrf-cell-line.sdrf.tsv index 37219fe2..cbf0ca47 100644 --- a/templates/sdrf-cell-line.sdrf.tsv +++ b/templates/sdrf-cell-line.sdrf.tsv @@ -1 +1 @@ -source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[cell line] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument] +source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[cell line] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument] diff --git a/templates/sdrf-default.sdrf.tsv b/templates/sdrf-default.sdrf.tsv index 9241fede..63e378ff 100644 --- a/templates/sdrf-default.sdrf.tsv +++ b/templates/sdrf-default.sdrf.tsv @@ -1,2 +1,2 @@ -source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument] +source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument] diff --git a/templates/sdrf-human.sdrf.tsv b/templates/sdrf-human.sdrf.tsv index cf299b71..c617fdd4 100644 --- a/templates/sdrf-human.sdrf.tsv +++ b/templates/sdrf-human.sdrf.tsv @@ -1 +1 @@ -source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[ancestry category] characteristics[age] characteristics[sex] characteristics[disease] characteristics[individual] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] +source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[ancestry category] characteristics[age] characteristics[sex] characteristics[disease] characteristics[individual] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] diff --git a/templates/sdrf-invertebrates.sdrf.tsv b/templates/sdrf-invertebrates.sdrf.tsv index d38fb310..e1565a36 100644 --- a/templates/sdrf-invertebrates.sdrf.tsv +++ b/templates/sdrf-invertebrates.sdrf.tsv @@ -1 +1 @@ -source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[cell type] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] +source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[cell type] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] diff --git a/templates/sdrf-nonvertebrates.sdrf.tsv b/templates/sdrf-nonvertebrates.sdrf.tsv deleted file mode 100644 index d38fb310..00000000 --- a/templates/sdrf-nonvertebrates.sdrf.tsv +++ /dev/null @@ -1 +0,0 @@ -source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[cell type] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] diff --git a/templates/sdrf-plants.sdrf.tsv b/templates/sdrf-plants.sdrf.tsv index ae777b73..5a7fec2b 100644 --- a/templates/sdrf-plants.sdrf.tsv +++ b/templates/sdrf-plants.sdrf.tsv @@ -1 +1 @@ -source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] +source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[disease] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[instrument] comment[cleavage agent details] diff --git a/templates/sdrf-vertebrates.sdrf.tsv b/templates/sdrf-vertebrates.sdrf.tsv index 8b5c7ff0..42a8e33d 100644 --- a/templates/sdrf-vertebrates.sdrf.tsv +++ b/templates/sdrf-vertebrates.sdrf.tsv @@ -1 +1 @@ -source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[developmental stage] characteristics[disease] characteristics[biological replicate] technology type assay name comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument] +source name characteristics[organism] characteristics[organism part] characteristics[cell type] characteristics[developmental stage] characteristics[disease] characteristics[biological replicate] assay name technology type comment[technical replicate] comment[data file] comment[fraction identifier] comment[label] comment[cleavage agent details] comment[instrument] From de80e6b51e5f0f940c92db21089a3ac741679943 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 09:17:56 +0100 Subject: [PATCH 04/10] technology type added in all templates --- templates/README.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/templates/README.adoc b/templates/README.adoc index 9593befa..c6acf22d 100644 --- a/templates/README.adoc +++ b/templates/README.adoc @@ -17,7 +17,7 @@ NOTE: Each of the templates is a tsv file with the minimum columns to describe t *Sample attributes*: Minimum sample attributes for primary cells from different species and cell lines |=== -| | Default |Human | Vertebrates | Invertebrates | Plants | Cell lines +| | Default |Human | Vertebrates | Invertebrates | Plants | Cell lines |source name | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |characteristics[organism] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |characteristics[strain/breed] | | | |:zero: | |:zero: @@ -35,6 +35,7 @@ NOTE: Each of the templates is a tsv file with the minimum columns to describe t |characteristics[biological replicate] |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: | | | | | | | |assay name | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: +|technology type | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |comment[data file] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |comment[technical replicate] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |comment[fraction identifier] | :white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: |:white_check_mark: From 9370bb6e4724a619a82790f654e9f232464ee6d3 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 09:22:07 +0100 Subject: [PATCH 05/10] Update sdrf-proteomics/README.adoc Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com> --- sdrf-proteomics/README.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index c540e568..14d05a8d 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -115,7 +115,7 @@ The list of ontologies/controlled vocabularies (CV) supported are: - NCBI organismal classification - PATO - the Phenotype and Trait Ontology - PRIDE Controlled Vocabulary (CV) -- Mondo Disease Ontology (MONDO) +- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources [[sdrf-file-format]] == SDRF-Proteomics file format From b671ea48798c61803206d7d904a450649e726a26 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 09:23:17 +0100 Subject: [PATCH 06/10] minor change --- sdrf-proteomics/README.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index 67350560..ab1de447 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -115,7 +115,7 @@ The list of ontologies/controlled vocabularies (CV) supported are: - NCBI organismal classification - PATO - the Phenotype and Trait Ontology - PRIDE Controlled Vocabulary (CV) -- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources +- Mondo Disease Ontology (MONDO): A unified disease ontology integrating multiple disease resources. [[sdrf-file-format]] == SDRF-Proteomics file format From 4129802f54a2c271582f0eda973c0bbfca5fcd59 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 13:18:26 +0100 Subject: [PATCH 07/10] update list of technology type --- sdrf-proteomics/README.adoc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index ab1de447..277ed7a0 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -255,7 +255,11 @@ Technology type is used in SDRF and MAGE-TAB formats to specify the technology a NOTE: While we RECOMMEND positioning the technology type column after the assay name, in some original templates, this column was placed before the assay name. We will allow the technology type column to appear either directly before or after the assay name column but RECOMMEND placing it after the assay name for consistency. -For proteomics experiments the possible values for technology types can be obtained from PRIDE Ontology term https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000663[technology type]. +For proteomics experiments the possible values for technology types can be obtained from PRIDE Ontology term https://www.ebi.ac.uk/ols4/ontologies/pride/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPRIDE_0000663[technology type]. + +Here, the list of valid values: + +- proteomic profiling by mass spectrometer [[additional-data-files]] === Additional Data files technical properties From 428d16aae8b58a1d7683120f9decda9625ec14bb Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 13:52:28 +0100 Subject: [PATCH 08/10] EFO and mondo diseases --- sdrf-proteomics/README.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index 277ed7a0..4043ca72 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -191,7 +191,7 @@ NOTE: Additional characteristics can be added depending on the type of the exper Some important notes: -- Each characteristic name in the column header SHOULD be a CV term from the EFO ontology. For example, the header _characteristics[organism]_ corresponds to the ontology term Organism. +- Each characteristic name in the column header SHOULD be a CV term from the EFO ontology. For example, the header _characteristics[organism]_ corresponds to the ontology term Organism. However the values could be from EFO or other ontologies. For example, for diseases we RECOMMEND to use MONDO for diseases because it has better coverage than EFO. - Multiple values (columns) for the same characteristics term are allowed in SDRF-Proteomics. However, it is RECOMMENDED not to use the same column in the same file. If you have multiple phenotypes, you can specify what it refers to or use another more specific term, e.g., "immunophenotype". From cfa8be3673d3f63dd42e018e4199fecac8296909 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 14:39:06 +0100 Subject: [PATCH 09/10] Update sdrf-proteomics/README.adoc Co-authored-by: Lev Levitsky --- sdrf-proteomics/README.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index 4043ca72..b6d0dafa 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -191,7 +191,7 @@ NOTE: Additional characteristics can be added depending on the type of the exper Some important notes: -- Each characteristic name in the column header SHOULD be a CV term from the EFO ontology. For example, the header _characteristics[organism]_ corresponds to the ontology term Organism. However the values could be from EFO or other ontologies. For example, for diseases we RECOMMEND to use MONDO for diseases because it has better coverage than EFO. +- Each characteristic name in the column header SHOULD be a CV term from the EFO ontology. For example, the header _characteristics[organism]_ corresponds to the ontology term Organism. However the values could be from EFO or other ontologies. For example, we RECOMMEND to use MONDO for diseases because it has better coverage than EFO. - Multiple values (columns) for the same characteristics term are allowed in SDRF-Proteomics. However, it is RECOMMENDED not to use the same column in the same file. If you have multiple phenotypes, you can specify what it refers to or use another more specific term, e.g., "immunophenotype". From 4a70deb182ab9428f5c8993d87ad740d88a22506 Mon Sep 17 00:00:00 2001 From: Yasset Perez-Riverol Date: Fri, 11 Oct 2024 15:40:51 +0100 Subject: [PATCH 10/10] change --- sdrf-proteomics/README.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sdrf-proteomics/README.adoc b/sdrf-proteomics/README.adoc index 4043ca72..790c8c7c 100644 --- a/sdrf-proteomics/README.adoc +++ b/sdrf-proteomics/README.adoc @@ -259,7 +259,7 @@ For proteomics experiments the possible values for technology types can be obtai Here, the list of valid values: -- proteomic profiling by mass spectrometer +- proteomic profiling by mass spectrometry [[additional-data-files]] === Additional Data files technical properties