Update documentation

cumc · Aug 10, 2023 · 110df91 · 110df91
1 parent 5fb2f23
commit 110df91
Show file tree

Hide file tree

Showing 151 changed files with 8,513 additions and 5,261 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 4cad58e9ef15fe40a224fa6a47ca11eb
+config: d0888c4227c66edf5bcfe37f8698285b
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/README.html b/README.html
diff --git a/_images/429dca9b037f6cebd6580ac9b87760d969f0c1f38fd5728bc7b6011c4f8f3929.jpg b/_images/429dca9b037f6cebd6580ac9b87760d969f0c1f38fd5728bc7b6011c4f8f3929.jpg
diff --git a/_images/Fine-mapping results plotting.ipynb b/_images/Fine-mapping results plotting.ipynb
diff --git a/_images/Protocol_level_scheme - Fine-mapping.jpeg b/_images/Protocol_level_scheme - Fine-mapping.jpeg
diff --git a/_sources/README.md b/_sources/README.md
@@ -24,37 +24,42 @@ We provide this [toy example for running SoS pipeline on a typical HPC cluster e
 ### Source code
 
 - Source code of pipelines and containers implemented in this repository are available at https://github.com/cumc/xqtl-pipeline/tree/main/code. 
-- Container configurations (both Docker and Singularity) for required software environments are available at https://github.com/cumc/xqtl-pipeline/tree/main/container.
+- Container specifications for required software environments are available at https://github.com/cumc/xqtl-pipeline/tree/main/container.
 
 ### Organization of the resource
 
 The website https://cumc.github.io/xqtl-pipeline is generated from files under the `code` folder of the source code repository. The `pipeline` folder contains symbolic links automatically generated for pipeline files under `code.` The logic of the entire xQTL analysis workflow is roughly reflected on the **left sidebar**:
 
-- The **GETTING STARTED** section contains the link to the xQTL protocol demonstration notebook for users interested in testing all the pipelines in this protocol using the example data-set we prepare (so that they can also adapt the protocol to their own data-sets).
-- The **COMMAND GENERATOR** section is reserved for "push button" commands that generate the entire QTL analysis pipeline workflow script from a simple configuration file. Notebooks under these sections are meant to be **executed as command line software** to generate data analysis commands. The generated commands can be executed as is to complete all available analyses or can be used to help customize specific analysis tasks by making modifications to them. The configuration file itself helps centralized control and bookkeeping of workflows executed.
-- Other sections in bold contain various types of analysis available, roughly showing in order from upstream to downstream analysis. They are consisted of ***mini-protocols*** as various non-bold, clickable text under each analysis group linking to some notebooks. These notebooks illustrate commands to perform analysis implemented in each mini-protocol. Most of them are "tutorials" in nature and are meant to be **executed interactively in Jupyter or in the command terminal** to run the SoS pipelines line by line. A few are the actual ***pipeline modules*** implementing pipelines in SoS, as will be discussed next.
-- *Mini-protocols* can be expanded by clicking on the down arrows to access the SoS workflows implementation of ***pipeline modules***. These are the core pipeline implementations to be **executed as command line software**and are meant to be **self-contained** --- they may be used in other contexts not specific to the xQTL data analysis.
+- The **GETTING STARTED**  section serves as the main landing page or index of the xQTL protocol, guiding users through the various pipelines implemented in this repository. It's structured to mirror the logic of the xQTL analysis we've crafted. Because this page provides pointers to other sections, users can primarily focus here without having to sift through the rest of the pages pages in this repository.
+- The **COMMAND GENERATOR** section is designed as a one-stop hub for "push button" commands, enabling users to generate the full QTL analysis pipeline workflow scripts from a straightforward configuration file. Notebooks within this section are intended to be **executed as command line software** for data analysis command generation. Users can then execute these generated commands directly to conduct all preset analyses. Alternatively, they can tweak the commands to cater to particular analysis requirements. The configuration file serves a dual purpose: streamlining control and maintaining a record of executed workflows.
+- Other sections in bold fonts provide an array of available analyses, presented roughly from upstream to downstream processes. Most of these sections feature ***mini-protocols***, represented as clickable, non-bold text under each analysis category, leading to specific notebooks. These notebooks detail the commands necessary for the analyses defined in the respective mini-protocols. Predominantly tutorial-based, they are designed to be **executed interactively in Jupyter or via the command terminal**, allowing users to navigate through the SoS pipelines step by step. A few of these sections serve as actual ***pipeline modules*** which we'll discuss next (see below).
+- *Mini-protocols*, as mentioned earlier, can be expanded by clicking the downward arrows, revealing the SoS implementations of ***pipeline modules***. These represent the crux of the pipeline implementations and are intended to be **executed as command line software**. They're also **self-contained**, allowing for reusability beyond the specific context of xQTL data analysis.
 
-### Getting started
+### Setting up
 
-- In order to run the protocol on your computer (or a High Performance Computing cluster), please install Script of Scripts [(see here for a tutorial to set it up)](https://wanggroup.org/orientation/jupyter-setup). 
-    - For Linux desktop users you can either install the container [Singularity](https://sylabs.io/singularity/) or [Docker](https://www.docker.com/).
-    - For Linux-based HPC users, your system may already have Singularity installed. If not please communicate with the IT support for the HPC. Typically Docker is not allowed on HPC.
-    - For Mac desktop users, it is best to install and use [Docker](https://www.docker.com/).
-    - For Windows users, you will need to install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) (we have tested it on WSL2 and not on WSL1) and then install singularity within it as instructed in this [post](https://www.blopig.com/blog/2021/09/using-singularity-on-windows-with-wsl2/).
-- We have published example data-sets and singularity containers images to [this Synapse folder](https://www.synapse.org/#!Synapse:syn36416559/files/). The instruction for downloading the data programmatically can be found [here](https://help.synapse.org/docs/Upload-and-Download-Data-in-Bulk.2003796248.html). To setup a synapse client, please follow [this post](https://help.synapse.org/docs/Installing-Synapse-API-Clients.1985249668.html).
-  - In the `test_data` folder, you can find the data, prefixed with **MWE**,  used to perform unit testing for each module (i.e., whether there is anything wrong within the code).
-  - In the `protocol_data` folder, you can find a more sophisticated collection of data, which are used to demonstrate the complete usage of our protocole in [this notebook](https://cumc.github.io/xqtl-pipeline/code/xqtl_protocol_demo.html) with [source code](https://github.com/cumc/xqtl-pipeline/blob/main/code/xqtl_protocol_demo.ipynb).
-  - In the `container` folder above, you can find the Singularity images released for the software environment. If you use Docker (eg on a Linux or Mac Desktop) you **do not** need to download this folder.
-- Please clone this repository https://github.com/cumc/xqtl-pipeline to your computer. This is the source code for this resource. All pipelines are symbolic links under `pipeline` folder to various notebooks under `code` folder. You can follow our mini-protocols to run the pipelines under `pipeline` folder.
+- In order to run the xQTL protocol on your computer (or a High Performance Computing cluster), please install Script of Scripts [(see here for a tutorial to set it up with `micromamba`)](https://wanggroup.org/orientation/jupyter-setup). 
+    - For Linux and Mac desktop users you can either install the container [`Singularity`](https://docs.sylabs.io/guides/3.2/user-guide/installation.html#) or [`Docker`](https://www.docker.com/). In the xQTL project we primarily use `Singularity`.
+    - For Linux-based HPC users, your system may already have `Singularity` installed. If not please communicate with the IT support for the HPC. Typically Docker is not allowed on HPC.
+    - For Windows users, you will need to install [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) (we have tested it on WSL2 and not on WSL1) and then install `Singularity` within WSL as instructed in this [post](https://www.blopig.com/blog/2021/09/using-singularity-on-windows-with-wsl2/).
+- We have provided example data-sets and `Singularity` container images in [this Synapse folder](https://www.synapse.org/#!Synapse:syn36416559/files/). For guidance on downloading the data programmatically, refer to [this documentation](https://help.synapse.org/docs/Upload-and-Download-Data-in-Bulk.2003796248.html). If you need to set up a Synapse client, consult [this guide](https://help.synapse.org/docs/Installing-Synapse-API-Clients.1985249668.html).
+  - Within the `test_data` folder, datasets prefixed with **MWE** (Minimal Working Example) are provided. These are used for unit testing each module, ensuring the integrity of the code.
+- The `protocol_data` folder houses a comprehensive set of data, illustrating the full extent of our protocol. This is showcased in [this notebook](https://cumc.github.io/xqtl-pipeline/code/xqtl_protocol_demo.html), with the [source code](https://github.com/cumc/xqtl-pipeline/blob/main/code/xqtl_protocol_demo.ipynb) available for reference.
+- The `container` folder contains the released Singularity images for the software environment. For Docker users (e.g., on Linux or Mac Desktops), downloading this folder is **not** necessary.
+- Please clone this repository https://github.com/cumc/xqtl-pipeline onto your computer. This is the source code for this resource. All pipelines are symbolic links in the `pipeline` folder. Users are encouraged to execute from the root of the repository folders by typing 
+
+```
+sos run pipelines/<pipeline_file>.ipynb
+```
+
+that is, executing the symbolic links directly to perform the analysis.
 
 ### See Also
 
-- Some analysis from FunGen-xQTL project using our protocol can be found in the [`cumc/brain-xqtl-analysis` github repo](https://github.com/cumc/brain-xqtl-analysis)
+- Analysis from FunGen-xQTL consortium using this protocol can be found at https://github.com/cumc/brain-xqtl-analysis
 
 ## Our team
 
-This repository is developed by the NIA FunGen-xQTL consortium.
+This repository is developed by the Analysis Working Group of the NIA FunGen-xQTL consortium.
 
 ### Developers
 
@@ -63,24 +68,34 @@ Lead developers
 - Hao Sun, Department of Neurology, Columbia University
 - Gao Wang, Department of Neurology, Columbia University
 
-Contributors
+Main contributors (largely based on GitHub Pull Requests)
 
 - Xuanhe Chen, Department of Biostatistics, Columbia University
-- Wenhao Gou, Department of Biostatistics, Columbia University
-- Yuqi Miao, Department of Biostatistics, Columbia University
-- Liucheng Shi, Department of Biostatistics, Columbia University
-- Amanda Tsai, Department of Biostatistics, Columbia University  
+- Wenhao Gou & Liucheng Shi, Department of Biostatistics, Columbia University
+- Haochen Sun, Department of Biostatistics, Columbia University
+- Zining Qi, Department of Biostatistics, Columbia University
+- Ru Feng, Department of Neurology, Columbia University
+- Alexandre Pelletier, Department of Medicine, Boston University
+- Travyse Edwards, Mount Sinai & University of Pennsylvania
+- Daniel Nachun, Department of Pathology, Stanford University
+- Jiacheng Li, Department of Neurology, Columbia University
+- Mintao Lin, Department of Medicine, Boston University
 
 ### Leadership
 
-FunGen leadership
+FunGen-AD
 
 - Philip De Jager, Department of Neurology, Columbia University
 - Carlos Crunchaga, Department of Psychiatry, Neurology and Genetics, Washington University in St. Louis
 
-FunGen-xQTL methods and data integration working group
+FunGen-xQTL Analysis Working Group
 
-- Gao Wang (work group leader), Department of Neurology, Columbia University
+- Gao Wang, Department of Neurology, Columbia University
 - Xiaoling Zhang, Departments of Medicine and Biostatistics, Boston University
 - Edoardo Marcora, Departments of Neuroscience, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai
 - Fanny Leung (also leads data standardization WG), Department of the Pathology and Laboratory Medicine, University of Pennsylvania
+- Julia TCW, Department of Medicine, Boston University
+- Kushal K. Dey, Memorial Sloan Kettering 
+- Alan Renton, Departments of Neuroscience, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai
+- Stephen Montgomery, Department of Pathology, Stanford University
+- Xiaoquan Wen, Department of Biostatistics, University of Michigan
diff --git a/_sources/code/association_scan/APEX/APEX.ipynb b/_sources/code/association_scan/APEX/APEX.ipynb
@@ -206,6 +206,7 @@
     "parameter: cwd = path('output')\n",
     "# Container option for software to run the analysis: docker or singularity\n",
     "parameter: container = ''\n",
+    "parameter: entrypoint={('micromamba run -n' + ' ' + container.split('/')[-1][:-4]) if container.endswith('.sif') else f''}\n",
     "# Prefix for the analysis output\n",
     "parameter: name = f\"{phenotype_list:bn}_{covariate_file:bn}\"\n",
     "# Specify the scanning window for the up and downstream radius to analyze around the region of interest, in units of bp\n",
@@ -253,7 +254,7 @@
     "input: file_inv, group_by = len(file_inv[0]) ,group_with = \"chr_inv\"\n",
     "output: f'{cwd:a}/{name}.theta.gz'\n",
     "task: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'\n",
-    "bash: expand= \"$[ ]\", stderr = f'{_output}.stderr', stdout = f'{_output}.stdout', container = container\n",
+    "bash: expand= \"$[ ]\", stderr = f'{_output}.stderr', stdout = f'{_output}.stdout', container = container, entrypoint = entrypoint\n",
     "    apex lmm $[\"--rankNormal\" if rankNormal else \"\"] --vcf $[_input[1]] \\\n",
     "    --bed $[_input[0]] \\\n",
     "    --cov $[covariate_file] \\\n",
@@ -291,7 +292,7 @@
     "       f'{cwd:a}/{name}.{_chr_inv}{\".LMM\" if LMM else \".OLS\"}.cis_gene_table.txt.gz',\n",
     "       f'{cwd:a}/{name}.{_chr_inv}{\".LMM\" if LMM else \".OLS\"}.cis_sumstats.txt.gz'\n",
     "task: trunk_workers = 1, trunk_size=job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'\n",
-    "bash: expand= \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout', container = container\n",
+    "bash: expand= \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout', container = container, entrypoint = entrypoint\n",
     "    apex cis $[\"--rankNormal\" if rankNormal else \"\"] --vcf $[_input[1]] \\\n",
     "    --bed $[_input[0]] \\\n",
     "    --cov $[covariate_file] \\\n",
@@ -319,7 +320,7 @@
     "output: f'{cwd:a}/{name}.{_chr_inv}{\".LMM\" if LMM else \".OLS\"}.trans_long_table.txt.gz',\n",
     "        f'{cwd:a}/{name}.{_chr_inv}{\".LMM\" if LMM else \".OLS\"}.trans_gene_table.txt.gz'\n",
     "task: trunk_workers = 1, trunk_size=job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'\n",
-    "bash: expand= \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout', container = container\n",
+    "bash: expand= \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout', container = container, entrypoint = entrypoint\n",
     "    apex trans $[\"--rankNormal\" if rankNormal else \"\"] --vcf $[_input[1]] \\\n",
     "    --bed $[_input[0]] \\\n",
     "    --cov $[covariate_file] \\\n",
@@ -340,7 +341,7 @@
     "[*s_2]\n",
     "output: f'{_input[0]:nn}.reformated.txt'\n",
     "task: trunk_workers = 1, trunk_size=job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'\n",
-    "R: expand = \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout',container = container\n",
+    "R: expand = \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout',container = container, entrypoint = entrypoint\n",
     "    library(\"dplyr\")\n",
     "    library(\"tibble\")\n",
     "    library(\"readr\")\n",
@@ -361,7 +362,7 @@
     "[*s_3]\n",
     "input: group_by = \"all\"\n",
     "output: f'{cwd}/APEX_QTL_recipe.tsv',f'{cwd:a}/APEX_column_info.txt'\n",
-    "python: expand = \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout'\n",
+    "python: expand = \"$[ ]\", stderr = f'{_output[0]}.stderr', stdout = f'{_output[0]}.stdout',container = container, entrypoint = entrypoint\n",
     "    import pandas as pd \n",
     "    data_tempt = pd.DataFrame({\n",
     "    \"#chr\" : [int(x.split(\".\")[-5].replace(\"chr\",\"\")) for x in  [$[_input:br,]]],\n",
@@ -413,7 +414,7 @@
      "sos"
     ]
    ],
-   "version": "0.22.6"
+   "version": "0.24.1"
   }
  },
  "nbformat": 4,