sstar-analysis

Introduction

This repo contains Snakemake pipelines for replicating the analysis in our sstar paper. These pipelines were tested on Linux operating systems (CentOS8, Oracle Linux 8, and Ubuntu 20.04).

To replicate our analysis, users should install Anaconda or Miniconda first, then use the following commands to create a virtual environment for the analysis.

conda install mamba -n base -c conda-forge
mamba env create -f environment.yml
conda activate sstar-analysis

For tools that cannot be installed through conda, users could follow the commands below.

mkdir ext && cd ext

# Download SPrime and its pipeline
mkdir SPrime && cd SPrime
wget https://faculty.washington.edu/browning/sprime.jar
git clone https://github.com/YingZhou001/sprimepipeline
chmod a+x sprimepipeline/pub.pipeline.pbs/tools/map_arch_genome/map_arch
sed 's/out<-c()/out<-data.frame()/' sprimepipeline/pub.pipeline.pbs/tools/score_summary.r > tmp
mv tmp sprimepipeline/pub.pipeline.pbs/tools/score_summary.r
cd ..

# Download SkovHMM
git clone https://github.com/LauritsSkov/Introgression-detection SkovHMM
cd SkovHMM
git checkout 3d1865a56b8fdecc2768448e6cb982a157c37c50
cd ..

# Download ArchaicSeeker2.0
git clone https://github.com/Shuhua-Group/ArchaicSeeker2.0
cd ArchaicSeeker2.0
cd libnlopt.so.0_free
chmod a+x ArchaicSeeker2
cd ../..

Users also need to download ms.tar.gz from Hudson Lab and decompress it under the ext folder and compile it with the following commands.

cd msdir
${CONDA_PREFIX}/bin/gcc -o ms ms.c streec.c rand1.c -lm
cd ../..

Running the pipelines

After installing the tools above, users could test the pipelines locally by using dry-run.

snakemake -np

If users want to run all the pipelines, users can use the following command.

snakemake -c 1 --use-conda

-c specifies the number of threads and snakemake could run jobs parallelly as many as possible with the given number of threads.

However, we recommend users to run each pipeline individually. Users need to run the two pipelines for simulation first.

snakemake -s workflows/1src/simulation.snake -c 1
snakemake -s workflows/2src/simulation.snake -c 1

Other pipelines, including workflows/1src/sstar.snake, workflows/1src/sprime.snake, workflows/1src/skovhmm.snake, workflows/2src/sstar.snake, workflows/2src/sprime.snake, workflows/2src/archaicseeker2.snake, can be executed in any order after simulation.

For the SkovHMM pipeline, users need to add the --use-conda argument, because it requires Python2, which is different from the main environment sstar-analysis. A specific environment for SkovHMM needs to be created.

snakemake -s workflows/1src/skovhmm.snake -c 1 --use-conda

Finally, users could plot the results.

snakemake -s workflows/plot.snake -c 1

The plots and tables are in results/plots. They may be slightly different from those in our manuscript because of random effects.

Users could also submit the pipelines to HPC. Users should create profiles depending on the HPC scheduler. Users could find an example profile for SLURM in config/slurm/config.yaml. To submit jobs by SLURM, users first create a folder logs_slurm.

mkdir logs_slurm

Then, for example, to run simulation in the cluster,

snakemake -s workflows/1src/simulation.snake --profile config/slurm -j 200

-j specifies the number of threads in the cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
config		config
utils		utils
workflows		workflows
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sstar-analysis

Introduction

Running the pipelines

About

Releases

Packages

Languages

admixVIE/sstar-analysis

Folders and files

Latest commit

History

Repository files navigation

sstar-analysis

Introduction

Running the pipelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages