Skip to content

Bayesian multiple logistic regression for GWAS meta-analysis

License

Notifications You must be signed in to change notification settings

soedinglab/b-lore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

B-LORE logo

Bayesian LOgistic REgression

A tool for meta-analysis in GWAS using Bayesian multiple logistic regression

Description

B-LORE is a command line tool that creates summary statistics from multiple logistic regression on GWAS data, and combines the summary statistics from multiple studies in a meta-analysis. It can also incorporate functional information about the SNPs from other external sources. Several genetic regions, or loci are preselected for analysis with B-LORE.

Key features

  1. Association probability: B-LORE outputs probabilities of the input genetic loci being statistically associated with the phenotype.
  2. Finemapping: B-LORE also outputs the probability of each SNP being statistically associated with the phenotype.
  3. Leverage functional genomic data as a prior probability to improve prioritization.
  4. Models data with logistic regression, and is suited for case/control studies.
  5. Combines information over all SNPs in a locus with multiple regression.

Installation

B-LORE is written in python and C++. To run B-LORE, you will need

  • python version 3.4 or higher,
  • the Python packages for scientific computing NumPy and SciPy.
  • C++ compiler

To use B-LORE, you have to download the repository and compile the C++ shared libraries:

git clone https://github.com/soedinglab/b-lore.git
cd b-lore
make

The Makefile uses g++ by default, which you can change depending on the compiler available on your system.

Input files

For calculating summary statistics, it uses the following file formats as input:

  1. Genotype files in Oxford format, for all loci of interest (e.g. Locus001.gen, Locus002.gen, etc.).
  2. Sample file in Oxford format (e.g. study1.sample)

For meta-analysis, it uses the following input:

  1. Output files B-LORE summary statistics.
  2. List of loci to be analyzed. This is a single file containing 2 columns with no header. The first column lists the name of the loci (e.g. Locus001, Locus002, etc.) and the second column is a binary number (1 or 0) indicating if it is a SNP locus (1) or a covariate locus (0). [Note: The summary statistics at each study outputs this file]
  3. (Optional) Functional genomics data, separately for each locus. Each feature file contains 2 parts: (a) a header line detailing the names of the columns in the file, and (b) a line for each SNP detailing the information for that SNP. The columns are tab-separated. The annotation tracks are present from column 4 onwards. The first 3 columns are:
    • RSID: must have the same SNP identifier as in the genotype files
    • CHR: chromosome number
    • POS: base-pair position of the SNP.

Usage

Quick start

  • Clone the repository
  • cd example
  • tar -zxvf input.tar.gz This will create an example input folder, with genotypes at 20 loci for 3 populations, a sample file for each population and ENCODE data for the 20 loci.
  • ./commands.sh to run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.

Command line arguments

An executable file to run B-LORE is provided as bin/blore. This can used as follows:

blore [--help] [COMMAND] [OPTIONS]

There are 2 commands for B-LORE:

  • --summary : for creating summary statistics of individual studies.
  • --meta : for meta-analysis from summary statistics of multiple studies.

Each of these 2 commands takes different options, as described below.

blore --summary [OPTIONS]

Create summary statistics of individual studies. Valid options are:

Option Description Priority Default value
‑‑gen filename(s) Input genotype file(s), all loci should have separate genotype files and specified here (wildcards allowed) Required --
‑‑sample filename Input sample file Required --
‑‑pheno string Name of the phenotype as it appears in the header of the sample file Optional pheno
‑‑regoptiom If specified, the variance of the regularizer will be optimized, otherwise it will be N(0, σ2) where σ is specified by --reg Optional --
‑‑reg float Value of the standard deviation (σ) of the regularizer Optional 0.01
‑‑pca int Number of principal components of the genotype to be included as covariates Optional 0
‑‑cov string(s) Name of covariate(s) as they appears in the header of the sample file, multiple covariates can be specified as space-separated strings Optional None
‑‑out directory Name of the output directory where summary statistics will be created Optional directory of the genotype files
‑‑prefix string Prefix for the summary statistics files Optional _summary

blore --meta [OPTIONS]

Perform meta-analysis from summary statistics of multiple studies. Valid options are:

Option Description Priority Default value
‑‑input filename Input file containing list of loci to be analyzed together Required --
‑‑statdir filename(s) Input directory of B-LORE summary statistics Required --
‑‑feature filename(s) Input file(s) for genomic feature tracks Optional --
‑‑params floats Initial values of the hyperparameters, requires 4 space-separated floats corresponding to βπ μ σ σbg Optional 0.01 0.0 0.01 0.01
‑‑muvar If specified, μ will be optimized, otherwise it will be fixed to the initial value (default 0) Optional --
‑‑zmax int Maximum number of causal SNPs allowed Optional 2
‑‑out directory Name of the output directory where result files will be created Optional current directory
‑‑prefix string Prefix for the meta-analysis output files Optional _meta

Example

  • Clone the repository
  • cd example
  • tar -zxvf input.tar.gz This will create an example input folder, with genotypes at 20 loci for 3 populations, a sample file for each population and ENCODE data for the 20 loci.

View commands.sh in your favorite editor to see the commands, and execute ./commands.sh to run B-LORE on the 3 populations to generate summary statistics, followed by a meta-analysis.

Citation

License

B-LORE is released under the GNU General Public License version 3. See LICENSE for more details. Copyright Johannes Soeding and Saikat Banerjee.

Contact

Saikat Banerjee

About

Bayesian multiple logistic regression for GWAS meta-analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published