diff --git a/README.md b/README.md
index c6106f6..ed4c8ff 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ DIMet supports the analysis of full metabolite abundances and isotopologue contr
_Note_: DIMet is intended for downstream analysis of tracer metabolomics data that has been corrected for the presence of natural isotopologues.
-_Formatting and normalisation helper_: xxx
+_Formatting and normalisation helper_: scripts for formatting and normalization are provided in [Tracegroomer](experimentaltracegroomerxxxxx)
# Installing DIMet
@@ -44,6 +44,111 @@ Or if you are a developer working in a local cloned version, you can install:
* with pytest, by running `pytest` from `DIMet`
* Place yourself in `DIMet/tests` and execute `python -m unittest`
+-----------------------------------------------------------------------------------------------
+
+# Using DIMet
+
+DIMet runs in the command line environment.
+
+To **test the use of DIMet**, we provide datasets, configuration and bash scripts corresponding to the results presented in the manuscript "DIMet: An open-source tool for Differential analysis of Isotope-labeled targeted Metabolomics data" by J. Galvis *et al*.
+are available at [Zenodo (manuscript_data)](https://sandbox.zenodo.org/record/todo:updatexxx):
+
+ - Download and uncompress the file `datasets_manuscript_DIMet.zip`.
+
+
+
+ Whole structure of the downloaded folder (click to show/hide)
+
+
+ ```
+ datasets_manuscript_DIMet
+ ├── config
+ │ ├── analysis
+ │ │ ├── abundance_plot_Cycloserine.yaml
+ │ │ ├── abundance_plot_LDHAB-Control.yaml
+ │ │ ├── dataset
+ │ │ │ ├── Cycloserine_data.yaml
+ │ │ │ ├── LDHAB-Control_data_integrate.yaml
+ │ │ │ └── LDHAB-Control_data.yaml
+ │ │ ├── differential_analysis_pairwise_LDHAB-Control.yaml
+ │ │ ├── enrichment_lineplot_Cycloserine.yaml
+ │ │ ├── isotopologues_plot_Cycloserine.yaml
+ │ │ ├── isotopologues_plot_LDHAB-Control.yaml
+ │ │ ├── metabologram_abundance_LDHAB-Control.yaml
+ │ │ ├── metabologram_enrichment_LDHAB-Control.yaml
+ │ │ ├── pca_plot_LDHAB-Control.yaml
+ │ │ ├── pca_tables_Cycloserine.yaml
+ │ │ └── timecourse_analysis_Cycloserine.yaml
+ │ ├── general_config_abundance_plot_Cycloserine.yaml
+ │ ├── general_config_abundance_plot_LDHAB-Control.yaml
+ │ ├── general_config_differential_analysis_LDHAB-Control.yaml
+ │ ├── general_config_enrichment_lineplot_Cycloserine.yaml
+ │ ├── general_config_isotopologues_plot_Cycloserine.yaml
+ │ ├── general_config_isotopologues_plot_LDHAB-Control.yaml
+ │ ├── general_config_metabologram_abundance_LDHAB-Control.yaml
+ │ ├── general_config_metabologram_enrichment_LDHAB-Control.yaml
+ │ ├── general_config_pca_plot_LDHAB-Control.yaml
+ │ ├── general_config_pca_tables_Cycloserine.yaml
+ │ └── general_config_timecourse_analysis_Cycloserine.yaml
+ ├── data
+ │ ├── Cycloserine_data
+ │ │ └── raw
+ │ │ ├── CorrectedIsotopologues.csv
+ │ │ ├── FracContribution_C.csv
+ │ │ ├── metadata_cycloser.csv
+ │ │ └── rawAbundances.csv
+ │ └── LDHAB-Control_data
+ │ ├── integration_files
+ │ │ ├── DEG_Control_LDHAB.csv
+ │ │ ├── pathways_kegg_metabolites.csv
+ │ │ ├── pathways_kegg_transcripts.csv
+ │ │ └── readme.txt
+ │ └── raw
+ │ ├── AbundanceCorrected.csv
+ │ ├── IsotopologuesAbs.csv
+ │ ├── IsotopologuesProp.csv
+ │ ├── MeanEnrichment13C.csv
+ │ └── metadata_endo_ldh.csv
+ ├── run_Cycloserine_timeseries.sh
+ └── run_LDHAB-Control.sh
+ ```
+
+
+
+* Make sure you have activated your virtual environment. In the `datasets_manuscript_DIMet` folder you have two `.sh` files: `run_LDHAB-Control.sh` and `run_Cycloserine_timeseries.sh`; Make them executable:
+ ```
+ chmod a+x *.sh
+ ```
+* and finally run the test:
+ ```
+ ./run_LDHAB-Control.sh
+ ./run_Cycloserine_timeseries.sh
+ ```
+
+
+
+## Available analyses
+
+- _pca_analysis_ computes the PCA and outputs tables with principal components and explained variances
+- _pca_plot_ generates classical PCA plots
+- _abundance_plot_ plots with bars of total metabolite abundances
+- _mean_enrichment_line_plot_ generates lineplots of mean enrichment
+- _isotopologue_proportions_plot_ generates stacked bars of isotopologue proportions
+- _differential_analysis_ runs differential analysis and computes the corresponding statistics
+- _multi_group_comparison_ same as differential analysis before, but for > 2 groups
+- _time_course_analysis_ runs differential analysis for time-course experiments in pairwise fashion for consecutive time points
+- _metabologram_integration_ pathway-based integration between *labeled targeted metabolomic* and *trascriptomic* data, resulting in metabologram plots
+
+To run each analysis it is necessary that the user provides
+his data and configuration files, structured as explained in the section [Organising your data for the analysis](#organising-your-data-for-the-analysis).
+
+After the files' organization step, the generic command for running one analysis is:
+
+```commandline
+python -m dimet -cd config -cn GENERAL_CONFIGURATION_FILENAME
+```
+
+
-----------------------------------------------------------------------------------------------------
# Organising your data for the analysis
@@ -212,16 +317,22 @@ DIMet offers the possitibilty of pathway-based integration of the metabolome and
```
- * Files for differentially expressed genes (DEGs) must be provided in the tab delimited .csv format.
+ * Files for differentially expressed genes (DEGs)
-Formatting example of differentially expressed genes files
+Files for differentially expressed genes (DEGs) must be provided in the tab delimited .csv format. For each file:
+1. The rows represent the genes (except the first one, which is the header having the names of the columns)
+2. The columns provide the information to be integrated, two columns are compulsory:
+ 1. the gene names, given as strings
+ 2. the Fold-Changes (or the log2 Fold-Changes) in numeric format (no letters or symbols, only numbers)
+
+Formatting example of differentially expressed genes files:
-| ensembl | name | FC | log2FoldChange | padj | gene_symbol |
-|------------------|--------------------------------|-----------|-------------------|----------|-------------|
-| ENSG00000105220 | glucose-6-phosphate isomerase | 0.0000136 | -16.1660338229612 | 1.00E-10 | GPI |
-| ENSG00000156515 | hexokinase 1 | 10 | 3.32192809488736 | 1.00E-03 | HK1 |
-| ENSG00000153574 | ribose 5-phosphate isomerase A | 5 | 2.32192809488736 | 0.001 | RPIA |
-| ENSG00000141959 | phosphofructokinase, liver | 1.75 | 0.807354922057604 | 0.05 | PFKL |
+| log2FoldChange | gene_symbol |
+|-------------------|--------------|
+| -16.1660338229612 | GPI |
+| 3.32192809488736 | HK1 |
+| 2.32192809488736 | RPIA |
+| 0.807354922057604 | PFKL |
* The *metabolites per pathway* and *genes or transcripts per pathway* files
@@ -248,7 +359,8 @@ _Example_ for genes per pathway:
| PKFL | RBKS | ... |
| ... | ... | ... |
-
+All these files must be provided in the tab delimited .csv format.
+
@@ -401,7 +513,7 @@ we used in our data).
defaults:
- dataset: # <- name of the dataset cofiguration file
- - method: differential_analysis
+ - method: differential_analysis # <- see 'Available analyses'
comparisons :
- [[cond2, T24], [cond1, T24]] # <-
@@ -451,7 +563,7 @@ we used in our data).
columns_transcripts:
ID: # <- the gene symbols column name, fill after the colon
- values: # <- the numeric column name, fillafter the colon
+ values: # <- the numeric column name, fill after the colon
compartment:
en
@@ -494,246 +606,19 @@ we used in our data).
Note that across all the types of configuration files, any referenced file name must be written without the extension.
-
+
Examples of configuration files, with their respective datasets are provided
-in [Zenodo (manuscript_data)](https://sandbox.zenodo.org/record/todo:updaate).
+in [Zenodo (manuscript_data)](https://sandbox.zenodo.org/record/todo:updaatexx).
Also complementary minimal datasets with their configuration files are provided in
-[Zenodo (minimal_examples)](https://sandbox.zenodo.org/record/todo:update). **todo:update**
-
-## Provided datasets
-
-Datasets, configuration and bash scripts corresponding to the results presented in the manuscript "DIMet: An open-source tool for Differential analysis of Isotope-labeled targeted Metabolomics data" by J. Galvis *et al*.
-are available at [Zenodo (manuscript_data)](https://sandbox.zenodo.org/record/todo:update).
-
-Download and uncompress the file `datasets_manuscript_DIMet.zip`.
-
-
-
-Whole structure of the downloaded folder (click to show/hide)
-
-
-```
-datasets_manuscript_DIMet
-├── config
-│ ├── analysis
-│ │ ├── abundance_plot_Cycloserine.yaml
-│ │ ├── abundance_plot_LDHAB-Control.yaml
-│ │ ├── dataset
-│ │ │ ├── Cycloserine_data.yaml
-│ │ │ ├── LDHAB-Control_data_integrate.yaml
-│ │ │ └── LDHAB-Control_data.yaml
-│ │ ├── differential_analysis_pairwise_LDHAB-Control.yaml
-│ │ ├── enrichment_lineplot_Cycloserine.yaml
-│ │ ├── isotopologues_plot_Cycloserine.yaml
-│ │ ├── isotopologues_plot_LDHAB-Control.yaml
-│ │ ├── metabologram_abundance_LDHAB-Control.yaml
-│ │ ├── metabologram_enrichment_LDHAB-Control.yaml
-│ │ ├── pca_plot_LDHAB-Control.yaml
-│ │ ├── pca_tables_Cycloserine.yaml
-│ │ └── timecourse_analysis_Cycloserine.yaml
-│ ├── general_config_abundance_plot_Cycloserine.yaml
-│ ├── general_config_abundance_plot_LDHAB-Control.yaml
-│ ├── general_config_differential_analysis_LDHAB-Control.yaml
-│ ├── general_config_enrichment_lineplot_Cycloserine.yaml
-│ ├── general_config_isotopologues_plot_Cycloserine.yaml
-│ ├── general_config_isotopologues_plot_LDHAB-Control.yaml
-│ ├── general_config_metabologram_abundance_LDHAB-Control.yaml
-│ ├── general_config_metabologram_enrichment_LDHAB-Control.yaml
-│ ├── general_config_pca_plot_LDHAB-Control.yaml
-│ ├── general_config_pca_tables_Cycloserine.yaml
-│ └── general_config_timecourse_analysis_Cycloserine.yaml
-├── data
-│ ├── Cycloserine_data
-│ │ └── raw
-│ │ ├── CorrectedIsotopologues.csv
-│ │ ├── FracContribution_C.csv
-│ │ ├── metadata_cycloser.csv
-│ │ └── rawAbundances.csv
-│ └── LDHAB-Control_data
-│ ├── integration_files
-│ │ ├── DEG_Control_LDHAB.csv
-│ │ ├── pathways_kegg_metabolites.csv
-│ │ ├── pathways_kegg_transcripts.csv
-│ │ └── readme.txt
-│ └── raw
-│ ├── AbundanceCorrected.csv
-│ ├── IsotopologuesAbs.csv
-│ ├── IsotopologuesProp.csv
-│ ├── MeanEnrichment13C.csv
-│ └── metadata_endo_ldh.csv
-├── run_Cycloserine_timeseries.sh
-└── run_LDHAB-Control.sh
-```
-
-
+[Zenodo (minimal_examples)](https://sandbox.zenodo.org/record/todo:updatexxxx). **todo:update xxxx**
-------------------------------------------------------------
-
-
-
-1. The data
folder (click to show/hide)
-
-
- The structure of the data
folder is: (click to show/hide)
-
-```
-data
-├── Cycloserine_data
-│ └── raw
-│ ├── CorrectedIsotopologues.csv
-│ ├── FracContribution_C.csv
-│ ├── metadata_cycloser.csv
-│ └── rawAbundances.csv
-└── LDHAB-Control_data
- ├── integration_files
- │ ├── DEG_Control_LDHAB.csv
- │ ├── pathways_kegg_metabolites.csv
- │ ├── pathways_kegg_transcripts.csv
- │ └── readme.txt
- └── raw
- ├── AbundanceCorrected.csv
- ├── IsotopologuesAbs.csv
- ├── IsotopologuesProp.csv
- ├── MeanEnrichment13C.csv
- └── metadata_endo_ldh.csv
-```
-
-
-By zooming into the content of any of the **`raw`** subfolders we can understand the file formatting that is required for using DIMet, both for **quantification** file(s) and the **metadata** file:
-
-
- 1.1. Quantification files
-
-
-
- See here the first lines of the quantification files (click to show/hide)
-
- The first lines of the Isotopologue absolute values file (`IsotopologuesAbs.csv`) which is inside the `raw` subfolder of `LDHAB-Control_data`:
-
-| ID | T48_AB_1 | T48_AB_2 | T48_AB_3 | T48_Cont_1 | T48_Cont_2 | T48_Cont_3 |
-|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
-| 2_3-PG_m+0 | 703151.9167 | 856725.4533 | 961394.0385 | 42043.98974 | 56438.37354 | 37427.49772 |
-| 2_3-PG_m+1 | 9099.30813 | 0 | 0 | 0 | 0 | 0 |
-| 2_3-PG_m+2 | 35196.39397 | 34163.9901 | 37498.28763 | 20998.75488 | 22388.47005 | 21257.21399 |
-| 2_3-PG_m+3 | 1808396.988 | 2237113.191 | 2446548.943 | 1641241.102 | 1488116.365 | 1673205.23 |
-| 2-OHGLu_m+0 | 2464867.606 | 2190608.337 | 2650274.946 | 7496654.147 | 6077978.087 | 6881666.103 |
-
- The first lines of the total metabolite abundances (`AbundanceCorrected.csv`) in the same subfolder:
-
-
-| ID | T48_AB_1 | T48_AB_2 | T48_AB_3 | T48_Cont_1 | T48_Cont_2 | T48_Cont_3 |
-|---------|----------------|----------------|----------------|----------------|----------------|----------------|
-| 2_3-PG | 2555844.6068 | 3128002.6344 | 3445441.26913 | 1704283.84662 | 1566943.20859 | 1731889.94171 |
-| 2-OHGLu | 3373345.61683 | 3426388.69792 | 3988439.5147 | 26362483.1589 | 19664735.89344 | 22660528.8544 |
-| 6-PG | 1272239.434813 | 1390994.801623 | 1477360.701829 | 4835294.623232 | 2975614.11154 | 4462008.850759 |
-| a-KG | 15141020.4483 | 20989621.8864 | 24554966.1982 | 1280021.24849 | 1087730.89083 | 1605672.06536 |
-
-
- The first lines of the MeanEnrichment13C (`MeanEnrichment13C.csv`) in the same subfolder:
-
-| ID | T48_AB_1 | T48_AB_2 | T48_AB_3 | T48_Cont_1 | T48_Cont_2 | T48_Cont_3 |
-|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
-| 2-OHGLu | 0.206825196 | 0.276522548 | 0.262837217 | 0.49518099 | 0.477725447 | 0.481746398 |
-| 2_3-PG | 0.717920936 | 0.722470358 | 0.717338538 | 0.971223353 | 0.9592192 | 0.974297884 |
-| 6-PG | 0.86699873 | 0.852246374 | 0.86035646 | 0.944487954 | 0.957595706 | 0.957011737 |
-| ADP | 0.389205628 | 0.392401693 | 0.39551035 | 0.234449886 | 0.232809563 | 0.240030529 |
-
-
- The first lines of the Isotopologue proportions file (`IsotopologuesAbs.csv`) in the same subfolder:
-
-| ID | T48_AB_1 | T48_AB_2 | T48_AB_3 | T48_Cont_1 | T48_Cont_2 | T48_Cont_3 |
-|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
-| 2_3-PG_m+0 | 0.275115285 | 0.273888981 | 0.279033646 | 0.024669594 | 0.036018136 | 0.021610783 |
-| 2_3-PG_m+1 | 0.003560196 | 0 | 0 | 0 | 0 | 0 |
-| 2_3-PG_m+2 | 0.013770944 | 0.010921983 | 0.01088345 | 0.012321161 | 0.014287991 | 0.012273998 |
-| 2_3-PG_m+3 | 0.707553575 | 0.715189037 | 0.710082904 | 0.963009246 | 0.949693873 | 0.966115219 |
-| 2-OHGLu_m+0 | 0.730689317 | 0.639334451 | 0.664489191 | 0.284368286 | 0.309080077 | 0.30368515 |
-
-
-
-
-
-
-
-
-
- 1.2. The metadata file
-
-
-
- This is the content of the file metadata_endo_ldh.csv
file:
-
-
-
- | name_to_plot | condition | timepoint | timenum | short_comp | original_name |
- |---------------|-----------|-----------|---------|------------|---------------|
- | sgLDHAB_T48-1 | sgLDHAB | T48 | 48 | en | T48_AB_1 |
- | sgLDHAB_T48-2 | sgLDHAB | T48 | 48 | en | T48_AB_2 |
- | sgLDHAB_T48-3 | sgLDHAB | T48 | 48 | en | T48_AB_3 |
- | Cont_T48-1 | Control | T48 | 48 | en | T48_Cont_1 |
- | Cont_T48-2 | Control | T48 | 48 | en | T48_Cont_2 |
- | Cont_T48-3 | Control | T48 | 48 | en | T48_Cont_3 |
-
-
-
-
-
-
-
-
-
- 1.3. The files for performing the omics integration (click to show/hide)
-
-
-By zooming into the content of the `integration_files` subfolder, we see the files for performing the pathway-based **omics integration** of the labeled targeted metabolomics data and transcriptomics data:
-
- ```
- ├── data
- │ └── LDHAB-Control_data
- │ ├── integration_files # <--- this is the subfolder with the integration files
- │ │ ├── DEG_Control_LDHAB.csv
- │ │ ├── pathways_kegg_metabolites.csv
- │ │ └── pathways_kegg_transcripts.csv
- ```
-
-
-
-
-
-
-----------------------------------------------
-# Using DIMet
-
-DIMet runs in the command line environment.
-
-## Running analyses on the provided datasets
-
-Make sure you have activated your virtual environment. In the `datasets_manuscript_DIMet` folder you have two `.sh` files: `run_LDHAB-Control.sh` and `run_Cycloserine_timeseries.sh`; Make them executable:
-```
-chmod a+x *.sh
-```
-and finally run:
-```
-./run_LDHAB-Control.sh
-./run_Cycloserine_timeseries.sh
-```
-
-## Available analyses
-- _pca_analysis_ computes the PCA and outputs tables with principal components and explained variances
-- _pca_plot_ generated classical PCA plots
-- _abundance_plot_ plots with bars of total metabolite abundances
-- _enrichment_plot_ generates lineplots of mean enrichment
-- _isotopologues_plot_ generates stacked bars of isotopologue proportions
-- _differential_analysis_ runs differential analysis and computes the corresponding statistics
-- _differential_multigroup_analysis_ same as before, but for > 2 groups
-- _timecourse_analysis_ runs differential analysis for time-course experiments in pairwise fashion for consecutive time points
-- _metabologram_ network integration between SIRM and trascriptomic data, resulting in metabologram plots
# Getting help