📝 About the project

aPhyloGeo-Covid: Reproducible Phylogeographic Analysis Platform of SARS-CoV-2 Variation using Neo4j and Snakemake

📝 About the project

This platform enables users to collect and visualize phylogeographic data related to SARS-CoV-2. Additionally, aPhyloGeo-Covid offers a scalable and reproducible workflow for investigating the relationship between geographic features and the patterns of variation in different SARS-CoV-2 variants. The integrated Neo4j database consolidates a diverse range of COVID-19 pandemic-related sequences, climate data, and metadata from public databases, allowing users to filter and organize input data for phylogeographical studies efficiently.

Data

Various data sources related to SARS-CoV-2 were integrated into a Neo4j database, covering the period from January 1, 2020, to December 31, 2022.

Data on COVID-19 (coronavirus) by Our World in Data
Pango Lineages:Latest epidemiological lineages of SARS-CoV-2 by Cov-Lineages
Daily climate data for regions of sequencing from the NASA/POWER website
SARS-CoV-2 sequence data information from SARS-CoV-2 Data Hub

Neo4j

Labels Within the Neo4j database include Lineage, Protein, Nucleotide, Location, and LocationDay.

The Protein and Nucleotide labels store sequencing data information such as Accession, length, collection date, and collected country.
The Lineage label stores lineage development information, including the most common country, latest date, and earliest date associated with each lineage.
The LocationDay label stores climate information such as temperature, precipitation, wind speed, humidity and sky shortwave irradiance for each location and specific day.
The Location label contains basic information about hospitals, health, and the economy of each country, including GDP, median age, life expectancy, population, the proportion of people aged 65 and older, proportion of smokers, proportion of extreme poverty, diabetes prevalence, human development index, and more.
Once input sequencing has been defined, an Input node is generated and labelled accordingly in Neo4j graph database.
The values of the parameters are saved in the node Analysis as properties.
The output are saved in the node Output as properties.
Once the analysis is completed, the user is assigned a unique output ID, which they can use to query and visualize the results in the web platform.

Snakemake workflow: aPhyloGeo-pipeline

With alignment results and related environmental data as input, the Snakemake workflow will be triggered in the backend.

aPhyloGeo-pipeline is a user-friendly, scalable, reproducible, and comprehensive workflow that can explore how patterns of variation within species coincide with geographic features, such as climatic features.

For more information about aPhyloGeo-pipeline:

GitHub of aPhyloGeo-pipeline

Wiki of aPhyloGeo-pipeline

🚀 Getting started

1. Clone this repo.

git clone https://github.com/tahiri-lab/iPhyloGeo.js.git
cd DashNeo

2. Install dependencies.

2.1 If you do not have Conda installed, then use the following method to install it. If you already have Conda installed, then refer directly to the next step (2.2).

# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

# install Conda (respond by 'yes')
bash miniconda.sh

# update Conda
conda update -y conda

2.2 Create a conda environment named aPhyloGeo and install all the dependencies in that environment.

# create a new environment with dependencies 
conda env create -n aPhyloGeo-Covid -f environment.yaml

2.3 Activate the environment

conda activate aPhyloGeo-Covid

3. Run workflow

python index.py

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
DashNeo		DashNeo
Data		Data
Data_preprocessing		Data_preprocessing
Neo4j		Neo4j
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aPhyloGeo-Covid: Reproducible Phylogeographic Analysis Platform of SARS-CoV-2 Variation using Neo4j and Snakemake

📝 About the project

Data

Neo4j

Snakemake workflow: aPhyloGeo-pipeline

🚀 Getting started

About

Releases

Packages

Contributors 2

Languages

tahiri-lab/iPhyloGeo.js

Folders and files

Latest commit

History

Repository files navigation

aPhyloGeo-Covid: Reproducible Phylogeographic Analysis Platform of SARS-CoV-2 Variation using Neo4j and Snakemake

📝 About the project

Data

Neo4j

Snakemake workflow: aPhyloGeo-pipeline

🚀 Getting started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages