Bayesian LOgistic REgression
A tool for meta-analysis in GWAS using Bayesian multiple logistic regression
B-LORE is a command line tool that creates summary statistics from multiple logistic regression on GWAS data, and combines the summary statistics from multiple studies in a meta-analysis. It can also incorporate functional information about the SNPs from other external sources. Several genetic regions, or loci are preselected for analysis with B-LORE.
- Association probability: B-LORE outputs probabilities of the input genetic loci being statistically associated with the phenotype.
- Finemapping: B-LORE also outputs the probability of each SNP being statistically associated with the phenotype.
- Leverage functional genomic data as a prior probability to improve prioritization.
- Models data with logistic regression, and is suited for case/control studies.
- Combines information over all SNPs in a locus with multiple regression.
B-LORE is written in python and C++. To run B-LORE, you will need
- python version 3.4 or higher,
- the Python packages for scientific computing NumPy and SciPy.
- C++ compiler
To use B-LORE, you have to download the repository and compile the C++ shared libraries:
git clone https://github.com/soedinglab/b-lore.git
cd b-lore
make
The Makefile
uses g++
by default, which you can change depending on the compiler available on your system.
For calculating summary statistics, it uses the following file formats as input:
- Genotype files in Oxford format, for all loci of interest (e.g. Locus001.gen, Locus002.gen, etc.).
- Sample file in Oxford format (e.g. study1.sample)
For meta-analysis, it uses the following input:
- Output files B-LORE summary statistics.
- List of loci to be analyzed. This is a single file containing 2 columns with no header. The first column lists the name of the loci (e.g. Locus001, Locus002, etc.) and the second column is a binary number (1 or 0) indicating if it is a SNP locus (1) or a covariate locus (0). [Note: The summary statistics at each study outputs this file]
- (Optional) Functional genomics data, separately for each locus.
Each feature file contains 2 parts:
(a) a header line detailing the names of the columns in the file, and
(b) a line for each SNP detailing the information for that SNP.
The columns are tab-separated.
The annotation tracks are present from column 4 onwards.
The first 3 columns are:
- RSID: must have the same SNP identifier as in the genotype files
- CHR: chromosome number
- POS: base-pair position of the SNP.
- Clone the repository
cd example
tar -zxvf input.tar.gz
This will create an example input folder, with genotypes at 20 loci for 3 populations, a sample file for each population and ENCODE data for the 20 loci../commands.sh
to run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.
An executable file to run B-LORE is provided as bin/blore
. This can used as follows:
blore [--help] [COMMAND] [OPTIONS]
There are 2 commands for B-LORE:
--summary
: for creating summary statistics of individual studies.--meta
: for meta-analysis from summary statistics of multiple studies.
Each of these 2 commands takes different options, as described below.
Create summary statistics of individual studies. Valid options are:
Option | Description | Priority | Default value |
---|---|---|---|
‑‑gen filename(s) | Input genotype file(s), all loci should have separate genotype files and specified here (wildcards allowed) | Required | -- |
‑‑sample filename | Input sample file | Required | -- |
‑‑pheno string | Name of the phenotype as it appears in the header of the sample file | Optional | pheno |
‑‑regoptiom | If specified, the variance of the regularizer will be optimized, otherwise it will be N(0, σ2) where σ is specified by --reg |
Optional | -- |
‑‑reg float | Value of the standard deviation (σ) of the regularizer | Optional | 0.01 |
‑‑pca int | Number of principal components of the genotype to be included as covariates | Optional | 0 |
‑‑cov string(s) | Name of covariate(s) as they appears in the header of the sample file, multiple covariates can be specified as space-separated strings | Optional | None |
‑‑out directory | Name of the output directory where summary statistics will be created | Optional | directory of the genotype files |
‑‑prefix string | Prefix for the summary statistics files | Optional | _summary |
Perform meta-analysis from summary statistics of multiple studies. Valid options are:
Option | Description | Priority | Default value |
---|---|---|---|
‑‑input filename | Input file containing list of loci to be analyzed together | Required | -- |
‑‑statdir filename(s) | Input directory of B-LORE summary statistics | Required | -- |
‑‑feature filename(s) | Input file(s) for genomic feature tracks | Optional | -- |
‑‑params floats | Initial values of the hyperparameters, requires 4 space-separated floats corresponding to βπ μ σ σbg | Optional | 0.01 0.0 0.01 0.01 |
‑‑muvar | If specified, μ will be optimized, otherwise it will be fixed to the initial value (default 0) | Optional | -- |
‑‑zmax int | Maximum number of causal SNPs allowed | Optional | 2 |
‑‑out directory | Name of the output directory where result files will be created | Optional | current directory |
‑‑prefix string | Prefix for the meta-analysis output files | Optional | _meta |
- Clone the repository
cd example
tar -zxvf input.tar.gz
This will create an example input folder, with genotypes at 20 loci for 3 populations, a sample file for each population and ENCODE data for the 20 loci.
View commands.sh
in your favorite editor to see the commands, and execute ./commands.sh
to run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.
- Saikat Banerjee, Lingyao Zeng, Heribert Schunkert and Johannes Soeding (2017). Bayesian multiple logistic regression for case-control GWAS. bioRxiv.
B-LORE is released under the GNU General Public License version 3. See LICENSE for more details. Copyright Johannes Soeding and Saikat Banerjee.